LLMs Articles | LavX News | LavX News

OpenAI’s GPT‑4 Turbo Beats GPT‑4 on Cost and Speed, but Gains Are Incremental

Real‑time LLM Inference on Standard GPUs Hits 3,000 tokens/s per Request

Real‑time LLM Inference on Standard GPUs Hits 3,000 tokens/s per Request

Stepfun releases Step 3.7 Flash: a 196 B sparse‑MoE model tuned for agent pipelines

Stepfun releases Step 3.7 Flash: a 196 B sparse‑MoE model tuned for agent pipelines

Claude Opus 4.8: modest gains, honest trade‑offs

Claude Opus 4.8: modest gains, honest trade‑offs

AI Models Need Sleep: CMU Research Shows Performance Boost from 'Napping' LLMs

AI Models Need Sleep: CMU Research Shows Performance Boost from 'Napping' LLMs

Mysterious Hy3 LLM Surges to Top of OpenRouter Rankings

Mysterious Hy3 LLM Surges to Top of OpenRouter Rankings

Claude Opus 4.8 Arrives with Bigger Coding Gains and Sharper Honesty

Claude Opus 4.8 Arrives with Bigger Coding Gains and Sharper Honesty

About LLMs at Zig Days

About LLMs at Zig Days

Beyond Benchmarks: Frontier LLM Disagreement on Fact-Checks

Beyond Benchmarks: Frontier LLM Disagreement on Fact-Checks

MiniMax’s M3 LLM promises faster sparse attention but still faces open questions

MiniMax’s M3 LLM promises faster sparse attention but still faces open questions

Self‑Improving Tax Agents with Codex: What the Pilot Actually Achieved

Self‑Improving Tax Agents with Codex: What the Pilot Actually Achieved

Maintainability Sensors for Coding Agents – A Pragmatic Field Report

Maintainability Sensors for Coding Agents – A Pragmatic Field Report

Sleep‑Inspired Consolidation for Long‑Context Language Models

Sleep‑Inspired Consolidation for Long‑Context Language Models