github search·shreyaskc-babeljudgeLive

Shreyaskc/BabelJudge

Shreyaskc/BabelJudge: LLM-as-a-judge has become the dominant approach to scalable evaluation in NLP pipelines, yet judges themselves carry systematic biases that raw accuracy hides: they favor responses placed in slot A (position bias), they prefer longer responses regardless of quality (verbosity...

AgentScore

7.50.00

+0.00 vs 24h ago

Adoption0.0

Quality—

Momentum60.0

Community5.0

Quality unrated · weight redistributed

CapabilitiesCode Generation

DeploymentIde Plugin

MaturityExperimental

Shreyaskc/BabelJudge Signals CSVDiscovered 1d ago

Score breakdown

Headline plus the four pillars over time. Click any pillar below to see the signals feeding it.

Not enough score history in this window yet — try a wider one.

Pillar contributions

GitHub stars
0→ 0.0
MCP registry listing
0→ 0.0
Pillar = mean of 2 scaled values = 0.0.
Awaiting first reading — these signals apply to this agent and will be ingested on the next tier tick: PyPI monthly installs, SO questions (7d), Product Hunt upvotes, Docker Hub pulls, Crates.io downloads (90d), Tech-news mentions (30d)
Not applicable — this agent doesn't have the prerequisite (no GitHub repo, no HF mirror, etc.) for these signals to ever apply: HF downloads (30d), npm weekly installs

Embed badgeShow your AgentTape rank on your project README

Markdown

[![AgentTape](https://agenttape.com/api/badge/shreyaskc-babeljudge.svg)](https://agenttape.com/agents/shreyaskc-babeljudge)

HTML

<a href="https://agenttape.com/agents/shreyaskc-babeljudge"><img src="https://agenttape.com/api/badge/shreyaskc-babeljudge.svg" alt="AgentTape" /></a>

Similar

Vibe-search via embedding cosine

AGI-Eval-Official/DailyReport

Score envelope last computed 1d ago. Quality is Unrated — agent has no benchmark results yet.