Cohort of 18 admitted agents tagged capability:voice. Composite below is the cohort's average AgentScore.
| Cmp | Rank | Agent | 24h | Score | Δ24h | Watch |
|---|---|---|---|---|---|---|
| #30 | Xiaomi: MiMo-V2-Omni saasXiaomi: MiMo-V2-Omni: MiMo-V2-Omni is a frontier omni-modal model that natively processes image, video, and audio inputs within a unified architecture. It combines strong multimodal perception with agentic capability - visual grounding, multi-step... | 6 | 47.7 | +0.08 | ||
| #97 | ChatTTS agplChatTTS: A generative speech model for daily dialogue. | 1 | 39.7 | +0.24 | ||
| #125 | memory-os mitide-pluginmemory-os: A 6-layer memory operating system for Hermes Agent — persistent memory with Qdrant, structured facts, fabric recall, auto-curated wiki, and surgical context injection. Runs locally, any LLM provider. | 1 | 35.7 | -0.01 | ||
| #149 | leon mitleon: ???? Leon is your open-source personal assistant. | 1 | 33.3 | +0.00 | ||
| #203 | model:XiaomiMiMo/MiMo-V2.5 ide-pluginmodel:XiaomiMiMo/MiMo-V2.5: discovered AI agent. | 3 | 29.8 | -0.11 | ||
| #215 | no_ai_slop_writing_rules no_ai_slop_writing_rules: Claude Code reference: write in Louis Rossmann's voice, never like AI slop. Portable CLAUDE.md plus skills. | 1 | 28.8 | -0.01 | ||
| #437 | the-muser mitthe-muser: The open-source alternative to Suno and ElevenLabs Music. Natural language music composition, run locally, own everything. | 1 | 20.2 | 0.00 | ||
| #449 | threejs-game-skills mitthreejs-game-skills: Agent skills for building playable, polished Three.js browser games with gameplay, AAA-style graphics, UI, QA, and optional AI-generated 3D, image, and audio assets. | 83 | 19.9 | -2.24 | ||
| #554 | qiaomu-app-review-insights mitide-pluginqiaomu-app-review-insights: 把 App Store 评价变成产品研究证据,发现痛点、机会和版本风险 | Turn App Store reviews into product research evidence: pain points, opportunities, and version risks. | 5 | 17.7 | +0.01 | ||
| #584 | jarvis_ai mitsaasjarvis_ai: Iron-Man-style voice assistant + holographic HUD for Hermes Agent. Local Whisper STT, ElevenLabs voice, agent-summoned media panels, runs on your own hardware. | 6 | 17.2 | 0.00 | ||
| #653 | AutoTTS AutoTTS: The offical repo for "LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling". | 5 | 16.3 | 0.00 | ||
| #148 | OpenAI: GPT Audio saasOpenAI: GPT Audio: The gpt-audio model is OpenAI's first generally available audio model. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Audio is priced... | 2 | 14.7 | 0.00 | ||
| #149 | OpenAI: GPT Audio Mini saasOpenAI: GPT Audio Mini: A cost-efficient version of GPT Audio. The new snapshot features an upgraded decoder for more natural sounding voices and maintains better voice consistency. Input is priced at $0.60 per million... | 2 | 14.6 | 0.00 | ||
| #180 | Google: Gemini 3.1 Flash Lite saasGoogle: Gemini 3.1 Flash Lite: Gemini 3.1 Flash Lite is Google’s GA high-efficiency multimodal model optimized for low-latency, high-volume workloads. It supports text, image, video, audio, and PDF inputs, and is designed for lightweight agentic... | 1 | 11.0 | 0.00 | ||
| #190 | Mistral: Voxtral Small 24B 2507 saasMistral: Voxtral Small 24B 2507: Voxtral Small is an enhancement of Mistral Small 3, incorporating state-of-the-art audio input capabilities while retaining best-in-class text performance. It excels at speech transcription, translation and audio understanding. Input audio... | 1 | 10.8 | 0.00 | ||
| #193 | OpenAI: GPT-4o Audio saasOpenAI: GPT-4o Audio: The gpt-4o-audio-preview model adds support for audio inputs as prompts. This enhancement allows the model to detect nuances within audio recordings and add depth to generated user experiences. Audio outputs... | 1 | 10.5 | 0.00 | ||
| #211 | Google: Gemma 3n 4B (free) saasGoogle: Gemma 3n 4B (free): Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs???including text, visual data, and audio???enabling diverse tasks... | 1 | 10.2 | 0.00 | ||
| #210 | Google: Gemma 3n 4B saasGoogle: Gemma 3n 4B: Gemma 3n E4B-it is optimized for efficient execution on mobile and low-resource devices, such as phones, laptops, and tablets. It supports multimodal inputs???including text, visual data, and audio???enabling diverse tasks... | 1 | 10.2 | 0.00 |
Browse all sectors at /sectors.