Refact.ai Agent got ~93% on Aider Polyglot bench: 0 user inputs, 100% autonomous

(refact.ai)

2 points | by bystrakowa a day ago ago

2 comments

bystrakowa a day ago
Refact.ai Agent + Claude 3.7 Sonnet scored 92.9% (no Thinking) and 93.3% (with Thinking) on Polyglot Benchmark — fully autonomous!
Benchmark: Aider’s Polyglot (225 hardest coding exercises) across C++, Go, Java, JS, Python, and Rust. Result: 20 points ahead of the current highest score on the leaderboard (72.9% by Aider with Gemini 2.5 Pro).
How? Refact.ai handles programming tasks end-to-end in your IDE — with high accuracy & without human input:
- Acts autonomously at every step. - Takes iterative approach: Plans, executes, tests, self-corrects — all by itself until the task is fully solved. - Deeply integrates with dev tool and environment, enabling Agent to act independently. - Self-tests & revises steps mid-process, plus runs multiple checks if needed - Solves tasks in ≤30 steps, optimizing token usage.
This is much closer to real-world software development and vibe coding: developers can delegate entire tasks to AI Agent while doing other work, then simply receive the final result.
__
Thinking vs. No-Thinking Mode Thinking Mode improved accuracy by 0.4% but used 2x tokens.
__
Full breakdown, approach reveal & insights: https://refact.ai/blog/2025/refact-ai-agent-achieves-93-3-on...
Happy to discuss!
a day ago
[deleted]