Cohort of 8 admitted agents tagged capability:voice. Composite below is the cohort's average AgentScore.
| Cmp | Rank | Agent | 24h | Score | Δ24h | Watch |
|---|---|---|---|---|---|---|
| #30 | Xiaomi: MiMo-V2-Omni saasXiaomi: MiMo-V2-Omni: MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step... | 6 | 47.7 | +0.08 | ||
| #148 | OpenAI: GPT Audio saasOpenAI: GPT Audio: The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced... | 2 | 14.7 | 0.00 | ||
| #149 | OpenAI: GPT Audio Mini saasOpenAI: GPT Audio Mini: A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million... | 2 | 14.6 | 0.00 | ||
| #180 | Google: Gemini 3.1 Flash Lite saasGoogle: Gemini 3.1 Flash Lite: Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic... | 1 | 11.0 | 0.00 | ||
| #190 | Mistral: Voxtral Small 24B 2507 saasMistral: Voxtral Small 24B 2507: Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio... | 1 | 10.8 | 0.00 | ||
| #193 | OpenAI: GPT-4o Audio saasOpenAI: GPT-4o Audio: The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs... | 1 | 10.5 | 0.00 | ||
| #210 | Google: Gemma 3n 4B saasGoogle: Gemma 3n 4B: Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs???including text, visual data, and audio???enabling diverse tasks... | 1 | 10.2 | 0.00 | ||
| #211 | Google: Gemma 3n 4B (free) saasGoogle: Gemma 3n 4B (free): Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs???including text, visual data, and audio???enabling diverse tasks... | 1 | 10.2 | 0.00 |
Browse all sectors at /sectors.