Overview

Developed by Mistral AI, the Mistral 7B model outperformed much larger models (like Llama 2 13B) on many benchmarks while being faster and cheaper to run.

Key Technologies

  • Grouped-Query Attention (GQA): Speeds up inference by sharing keys and values across multiple attention heads.
  • Sliding Window Attention (SWA): Allows the model to handle longer sequences more efficiently by only attending to a local window of tokens.

Significance

It proved that architectural innovations can be just as important as raw parameter count.

Related Terms