New research reveals speculative decoding's true potential for accelerating LLM inference hinges on a critical factor: domain-specific draft models. While theoretical gains promise 3x speedups, real-world implementations fall short without custom training – a crucial insight for latency-sensitive AI applications.