A new benchmarking study reveals surprising scenarios where large language models underperform basic random algorithms in reasoning tasks. The findings challenge assumptions about LLM capabilities and highlight critical gaps in complex decision-making.