
Machine Learning
MaxProof Claims Gold-Medal Math Olympiad Performance Through Population-Level Test-Time Scaling
6/12/2026

AI
New AI Benchmarks Are Testing Consistency Instead of Memorization
5/31/2026
AI
Daniel Jalkut’s balanced take on AI: why extremes miss the point
5/30/2026
AI
Google's Gemini 1.5 Pro Sets New Benchmarks in AI Performance
5/18/2026
AI
Claude Code Opus 4.5 Shows Performance Degradation, Independent Tracker Reveals
1/29/2026

AI
AI Labs Turn to Pokémon Blue as Unconventional Reasoning Benchmark
1/23/2026