Benchmark that evaluates LLMs using 759 NYT Connections puzzles

(github.com)

1 points | by ShrugLife 14 hours ago ago

No comments yet.