
AI
QCon London 2026: Reliable Retrieval for Production AI Systems
3/17/2026

Cloud
Evaluating Azure Local: Strategies for Testing and Deployment
3/17/2026

AI
AI Agent Evaluation: Building Quality into Your Cloud-Native Ecosystem
3/17/2026

AI
Evaluating AI Agents in Practice: Benchmarks, Frameworks, and Lessons Learned
3/16/2026

AI
Improving LLM-as-a-Judge Evaluators: Calibration, Bias Mitigation, and Statistical Validation
3/5/2026

AI
Microsoft Open Sources Evals for Agent Interop Starter Kit to Benchmark Enterprise AI Agents
2/27/2026

AI
AI's Mathematical Renaissance: How Reasoning Models Are Transforming Mathematics and Evaluation
2/17/2026

AI
When AI Takes the Couch: Psychometric Jailbreaks Reveal Internal Conflict in Frontier Models
2/5/2026

LLMs
FACTS Benchmark Suite Introduced to Evaluate Factual Accuracy of Large Language Models
1/12/2026

LLMs
LLM poetry and the 'greatness' question
1/11/2026