Measuring AI Ability to Complete Long Tasks

(metr.org)

242 points | by spicypete 2 days ago ago

193 comments