Overview
Developed by Mistral AI, the Mistral 7B model outperformed much larger models (like Llama 2 13B) on many benchmarks while being faster and cheaper to run.
Key Technologies
- Grouped-Query Attention (GQA): Speeds up inference by sharing keys and values across multiple attention heads.
- Sliding Window Attention (SWA): Allows the model to handle longer sequences more efficiently by only attending to a local window of tokens.
Significance
It proved that architectural innovations can be just as important as raw parameter count.