AgentBench: A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24).
Pillar = mean of 1 scaled value = 29.5.
Awaiting first reading — these signals apply to this agent and will be ingested on the next tier tick: SO questions (7d), Product Hunt upvotes, Docker Hub pulls, Crates.io downloads (90d), Tech-news mentions (30d)
Not applicable — this agent doesn't have the prerequisite (no GitHub repo, no HF mirror, etc.) for these signals to ever apply: HF downloads (30d), npm weekly installs, PyPI monthly installs, MCP registry listing
[](https://agenttape.com/agents/agentbench)
<a href="https://agenttape.com/agents/agentbench"><img src="https://agenttape.com/api/badge/agentbench.svg" alt="AgentTape" /></a>