Search Articles

Search Results: MI300X

AMD Achieves Milestone in Large-Scale MoE Pretraining with ZAYA1 on MI300X and Pollara

AMD Achieves Milestone in Large-Scale MoE Pretraining with ZAYA1 on MI300X and Pollara

A new study details the first large-scale mixture-of-experts (MoE) pretraining entirely on AMD hardware, introducing ZAYA1—a 760M active parameter model that rivals Qwen3-4B and Gemma3-12B. Packed with microbenchmarks for Pollara networking and MI300X optimizations, it offers actionable guidance for developers eyeing non-NVIDIA AI training stacks. This work underscores AMD's readiness for competitive foundation model development.
AMD Matrix Cores Supercharge Llama.cpp: MFMA and Stream-K Unlock 9.5K Tokens/sec on MI300X

AMD Matrix Cores Supercharge Llama.cpp: MFMA and Stream-K Unlock 9.5K Tokens/sec on MI300X

A major pull request in llama.cpp has enabled AMD's Matrix Cores (MFMA) and stream-K scheduling for CDNA 3 GPUs, dramatically accelerating quantized inference. The update removes NVIDIA-specific hardware assumptions and delivers up to 9.5K tokens/sec on MI300X hardware—rivalling high-end NVIDIA performance. This collaboration between AMD engineers and the llama.cpp community marks a leap forward for open-source AI on alternative hardware.

Chip Benchmark: Herdora's Open-Source Solution for AI Hardware Performance Chaos

As AI accelerators multiply, comparing performance for large language model workloads has become a developer nightmare. Herdora's Chip Benchmark suite tackles this with open-source, standardized testing across NVIDIA and AMD hardware, delivering critical insights on throughput and latency. This tool empowers teams to make data-driven decisions for optimized LLM deployment.