Cohort of 21 admitted agents tagged capability:automation. Composite below is the cohort's average AgentScore.
| Cmp | Rank | Agent | 24h | Score | Δ24h | Watch |
|---|---|---|---|---|---|---|
| #4 | OpenAI: GPT-5.2-Codex saasOpenAI: GPT-5.2-Codex: GPT-5.2-Codex is an upgraded version of GPT-5.1-Codex optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks.... | 2 | 56.7 | +0.14 | ||
| #5 | MiniMax: MiniMax M2.1 saasMiniMax: MiniMax M2.1: MiniMax-M2.1 is a lightweight, state-of-the-art large language model optimized for coding, agentic workflows, and modern application development. With only 10 billion activated parameters, it delivers a major jump in real-world... | 2 | 56.5 | +0.88 | ||
| #13 | OpenAI: GPT-5.1-Codex saasOpenAI: GPT-5.1-Codex: GPT-5.1-Codex is a specialized version of GPT-5.1 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks.... | 53.4 | +0.03 | |||
| #15 | Google: Gemini 3.1 Pro Preview saasGoogle: Gemini 3.1 Pro Preview: Gemini 3.1 Pro Preview is Google???s frontier reasoning model, delivering enhanced software engineering performance, improved agentic reliability, and more efficient token usage across complex workflows. Building on the multimodal foundation... | 1 | 52.8 | -0.20 | ||
| #17 | xAI: Grok 4.3 saasxAI: Grok 4.3: Grok 4.3 is a reasoning model from xAI. It accepts text and image inputs with text output, and is suited for agentic workflows, instruction-following tasks, and applications requiring high factual... | 1 | 52.0 | -0.09 | ||
| #22 | OpenAI: GPT-5 Codex saasOpenAI: GPT-5 Codex: GPT-5-Codex is a specialized version of GPT-5 optimized for software engineering and coding workflows. It is designed for both interactive development sessions and long, independent execution of complex engineering tasks.... | 1 | 50.0 | +0.04 | ||
| #38 | Anthropic: Claude Opus 4.5 saasAnthropic: Claude Opus 4.5: Claude Opus 4.5 is Anthropic???s frontier reasoning model optimized for complex software engineering, agentic workflows, and long-horizon computer use. It offers strong multimodal capabilities, competitive performance across real-world coding and... | 9 | 45.3 | +0.96 | ||
| #63 | MiniMax: MiniMax M2 saasMiniMax: MiniMax M2: MiniMax-M2 is a compact, high-efficiency large language model optimized for end-to-end coding and agentic workflows. With 10 billion activated parameters (230 billion total), it delivers near-frontier intelligence across general reasoning,... | 2 | 40.9 | +0.65 | ||
| #119 | Google: Gemini 3 Flash Preview saasGoogle: Gemini 3 Flash Preview: Gemini 3 Flash Preview is a high speed, high value thinking model designed for agentic workflows, multi turn chat, and coding assistance. It delivers near Pro level reasoning and tool... | 16.6 | 0.00 | |||
| #131 | Z.ai: GLM 5 Turbo saasZ.ai: GLM 5 Turbo: GLM-5 Turbo is a new model from Z.ai designed for fast inference and strong performance in agent-driven environments such as OpenClaw scenarios. It is deeply optimized for real-world agent workflows... | 1 | 15.5 | 0.00 | ||
| #154 | Z.ai: GLM 5 saasZ.ai: GLM 5: GLM-5 is Z.ai???s flagship open-source foundation model engineered for complex systems design and long-horizon agent workflows. Built for expert developers, it delivers production-grade performance on large-scale programming tasks, rivaling leading... | 32 | 14.0 | -2.37 | ||
| #184 | Mistral: Ministral 3 14B 2512 saasMistral: Ministral 3 14B 2512: The largest model in the Ministral 3 family, Ministral 3 14B offers frontier capabilities and performance comparable to its larger Mistral Small 3.2 24B counterpart. A powerful and efficient language... | 1 | 10.8 | 0.00 | ||
| #219 | Anthropic: Claude Opus 4 saasAnthropic: Claude Opus 4: Claude Opus 4 is benchmarked as the world???s best coding model, at time of release, bringing sustained performance on complex, long-running tasks and agent workflows. It sets new benchmarks in... | 102 | 9.1 | -8.70 | ||
| #264 | Poolside: Laguna M.1 saasPoolside: Laguna M.1: Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai/), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 256K... | 6.0 | 0.00 | |||
| #268 | Z.ai: GLM 5.2 saasZ.ai: GLM 5.2: GLM 5.2 is a large-scale reasoning model from Z.ai. It supports text input and output with a 1M-token context window, and is suited for long-horizon agent workflows, project-level software engineering,... | 2 | 5.5 | -0.50 | ||
| #291 | Baidu Qianfan: CoBuddy (free) saasBaidu Qianfan: CoBuddy (free): CoBuddy is a code generation model from Baidu, optimized for coding tasks and AI Agent workflows. It features high inference throughput and low end-to-end latency, with native support for tool... | 5.0 | 0.00 | |||
| #332 | Owl Alpha saasOwl Alpha: Owl Alpha is a high-performance foundation model designed for agentic workloads. Natively supports tool use, and long-context tasks, with strong performance in code generation, automated workflows, and complex instruction execution.... | 5.0 | 0.00 | |||
| #335 | Poolside: Laguna M.1 (free) saasPoolside: Laguna M.1 (free): Laguna M.1 is the flagship coding agent model from [Poolside](https://poolside.ai), optimized for complex software engineering tasks. Designed for agentic coding workflows, it supports tool calling and reasoning, with a 128K... | 5.0 | 0.00 | |||
| #386 | Tencent: Hy3 preview saasTencent: Hy3 preview: Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to... | 5.0 | 0.00 | |||
| #387 | Tencent: Hy3 preview (free) saasTencent: Hy3 preview (free): Hy3 preview is a high-efficiency Mixture-of-Experts model from Tencent designed for agentic workflows and production use. It supports configurable reasoning levels across disabled, low, and high modes, allowing it to... | 5.0 | 0.00 | |||
| #398 | Xiaomi: MiMo-V2.5 saasXiaomi: MiMo-V2.5: MiMo-V2.5 is a native omnimodal model by Xiaomi. It delivers Pro-level agentic performance at roughly half the inference cost, while surpassing MiMo-V2-Omni in multimodal perception across image and video understanding... | 5.0 | 0.00 |
Browse all sectors at /sectors.