New research exposes critical flaws in large language models' ability to navigate Persian taarof—a complex system of ritual politeness—with accuracy rates 40-48% below native speakers. The study introduces TaarofBench, the first cultural benchmark for Iranian social norms, and demonstrates how fine-tuning can boost alignment by up to 42%, revealing the limitations of Western-centric AI training.