Vending-Bench 2: New Benchmark Exposes AI's Limits in Long-Horizon Business Simulation
Andon Labs launches Vending-Bench 2, a year-long simulation where AI models manage a vending machine business amid adversarial suppliers, delays, and competition. Gemini 3 Pro leads with $5,478 in profits, but all frontier models lag far behind human baselines, highlighting gaps in coherence, negotiation, and strategy. The multi-agent Arena variant intensifies these challenges, signaling critical needs for advancing autonomous economic agents.