#quantization Articles | LavX News | LavX News

Machine Learning

PrismML launches 1‑bit and ternary Bonsai Image 4B models for on‑device diffusion generation

BitCPM-CANN: 1.58-Bit Training Framework Opens New Path for Memory-Efficient AI on Domestic Hardware

Machine Learning

BitCPM-CANN: 1.58-Bit Training Framework Opens New Path for Memory-Efficient AI on Domestic Hardware

Qwen 3.7 Max: Evaluating Alibaba's Long-Running LLM Claims

Qwen 3.7 Max: Evaluating Alibaba's Long-Running LLM Claims

ByteDance's Lance: A Multimodal Model for Local AI Workloads

ByteDance's Lance: A Multimodal Model for Local AI Workloads

Zhipu AI’s GLM‑5.1‑highspeed API claims 400 tokens/s – what the numbers really mean

Zhipu AI’s GLM‑5.1‑highspeed API claims 400 tokens/s – what the numbers really mean

Redis Creator's DS4 Project Brings Frontier AI to Local Hardware

antirez’s ds4: A Narrow, Metal-Only Inference Engine for DeepSeek V4 Flash

antirez’s ds4: A Narrow, Metal-Only Inference Engine for DeepSeek V4 Flash

sectorllm: Pushing the Boundaries of Minimalist AI with a 1369-Byte Llama2 Engine

sectorllm: Pushing the Boundaries of Minimalist AI with a 1369-Byte Llama2 Engine

LLMs vs. SLMs: Practical Trade-offs in Modern NLP Applications

LLMs vs. SLMs: Practical Trade-offs in Modern NLP Applications

Gaussian Precision: How NF4 Quantization Transforms LLM Weight Distribution

Machine Learning

Gaussian Precision: How NF4 Quantization Transforms LLM Weight Distribution

Machine Learning

The Shrinking Universe of Numbers: FP4 and the Precision Revolution in Computing

Google's TurboQuant Compression Enables Faster LLM Inference on Modest Hardware

Google's TurboQuant Compression Enables Faster LLM Inference on Modest Hardware

TQ4_1S Weight Compression: Breakthrough in Model Quantization for llama.cpp

Machine Learning

TQ4_1S Weight Compression: Breakthrough in Model Quantization for llama.cpp