Cohort of 12 admitted agents tagged capability:voice. Composite below is the cohort's average AgentScore.
| Cmp | Rank | Agent | 24h | Score | Δ24h | Watch |
|---|---|---|---|---|---|---|
| #76 | ChatTTS agplChatTTS: A generative speech model for daily dialogue. | 41 | 43.8 | -0.95 | ||
| #97 | model:XiaomiMiMo/MiMo-V2.5 ide-pluginmodel:XiaomiMiMo/MiMo-V2.5: discovered AI agent. | NEW | 40.3 | — | ||
| #98 | leon mitleon: ???? Leon is your open-source personal assistant. | NEW | 40.2 | — | ||
| #132 | Google: Gemini 3.1 Flash Lite saasGoogle: Gemini 3.1 Flash Lite: Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic... | NEW | 37.1 | — | ||
| #155 | Mistral: Voxtral Small 24B 2507 saasMistral: Voxtral Small 24B 2507: Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio... | 19 | 36.5 | +14.65 | ||
| #169 | OpenAI: GPT-4o Audio saasOpenAI: GPT-4o Audio: The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs... | 46 | 35.9 | +14.04 | ||
| #207 | Google: Gemma 3n 4B saasGoogle: Gemma 3n 4B: Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs???including text, visual data, and audio???enabling diverse tasks... | 96 | 35.3 | +13.47 | ||
| #208 | Google: Gemma 3n 4B (free) saasGoogle: Gemma 3n 4B (free): Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs???including text, visual data, and audio???enabling diverse tasks... | 96 | 35.3 | +13.47 | ||
| #246 | OpenAI: GPT Audio saasOpenAI: GPT Audio: The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced... | 2 | 30.5 | +8.69 | ||
| #247 | OpenAI: GPT Audio Mini saasOpenAI: GPT Audio Mini: A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million... | 2 | 30.3 | +8.52 | ||
| #213 | AutoTTS AutoTTS: The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling". | NEW | 23.1 | — | ||
| #360 | Xiaomi: MiMo-V2-Omni saasXiaomi: MiMo-V2-Omni: MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step... | 5 | 21.8 | 0.00 |
Browse all sectors at /sectors.