Scaling to 5,000 Virtual IoT Devices: The KVM-Powered Erlang Breakthrough on Ampere

Running thousands of virtual IoT devices for realistic testing has long been a resource-intensive challenge. But when Underjord's Lars Wikman paired Ampere Computing's 192-core ARM server with a custom bootloader and KVM acceleration, he achieved a staggering 5,100 concurrent virtual Nerves devices—each running a full Erlang/Elixir stack. This milestone offers profound insights for embedded developers scaling cloud-based simulations.

Why This Matters for Embedded Development

Nerves revolutionizes IoT development by treating the BEAM virtual machine as the primary OS layer, using Linux only for kernel/driver functionality. This enables memory-safe, fault-tolerant applications—but simulating thousands of devices demands extreme efficiency. Traditional emulation consumed ~650MB per instance, making large-scale testing impractical:

# Legacy QEMU command (no acceleration)
qemu-system-aarch64 -machine virt -cpu cortex-a53 ...

The breakthrough came through two key innovations:
1. little_loader: Frank Hunleth's minimalist bootloader replaced U-Boot, enabling direct kernel loading from Nerves' A/B partition structure while slashing boot times to seconds
2. KVM Acceleration: By leveraging the host's ARM cores via -accel=kvm and -cpu host, QEMU bypassed emulation overhead:

# Accelerated configuration
qemu-system-aarch64 \
  -machine virt,accel=kvm \
  -cpu host \
  -m 110M \
  -kernel little_loader.elf

The Performance Transformation

Results defied expectations:
- 500MB+ memory reduction per instance
- Boot times slashed from >10 seconds to <10 seconds
- 3,389 devices sustained before OOM kills (pre-tuning)
- 5,100 devices achieved after aggressive optimization

Critical tuning included:
- BEAM Allocators: Switched to reduced-memory variants
- Linux Kernel: Adjusted vm.swappiness, vm.dirty_ratio, and vm.vfs_cache_pressure
- ZRAM: Enabled compression for in-memory blocks
- Erlang Mode: Used interactive instead of embedded runtime for leaner startup

Implications for Developers

Cloud-Based Device Testing: ARM servers now enable realistic thousand-device simulations for CI/CD pipelines
Cross-Platform Parity: Identical workflows function on Apple Silicon (using HVF) and ARM servers
Resource Efficiency: 150-160MB/resident memory per device makes large-scale simulations economically viable
Nerves Ecosystem Growth: The upcoming nerves_system_qemu_aarch64 package will democratize these capabilities

Pushing Boundaries Further

While NUMA/core pinning remains unexplored, idle CPU utilization stayed below 20% with thousands of devices—suggesting room for even denser packing. The real triumph? Transforming a "stunt" into production-grade tooling. As Wikman notes: "The result is something we should get good mileage out of"—proving that deep dives into bootloaders and memory tunings can yield unexpected practical dividends.

Source: Underjord

#NervesFramework #KVMVirtualization #AmpereARM