Toykv turns RESP2 into a crash-tested Go KV store

A Go key-value store uses RESP2, append-only persistence and crash tests to show why disks should store absolute expiry times, not relative TTLs.

Toykv is a single-node, in-memory key-value store written in Go. It stores strings, speaks RESP2 on the socket and keeps state across restarts with an append-only file. Prajwal Mahajan tagged v1.0.0 on June 17, 2026, after about four weeks of work.

The project sits in the same family as Toymq, a message queue that made different calls on storage format, protocol and testing. Toymq kept its WAL bare at first. Toykv starts its AOF with a version byte. Toymq used its own wire format. Toykv writes RESP arrays to disk. Toymq defended custom delivery rules. Toykv borrows Redis clients as test tools.

The repo, prajwalmahajan101/toykv, frames the project as a learning artifact rather than a Redis replacement. That framing matters. A small store gives you room to inspect the failure surface without hiding behind clustering, snapshots, metrics and years of compatibility work. Redis and Valkey solve production needs. Toykv shows the mechanics.

The problem: persistence creates time bugs

A memory store can answer GET and SET with little ceremony. Persistence changes the contract. Once the server replies +OK, the client believes the write survived. The implementation now has to order memory mutation, log append, fsync and network response.

Toykv chooses appendfsync=always as the default. The handler mutates the store, appends the RESP record to the AOF, calls file.Sync() and then returns +OK. That choice caps throughput in the low thousands of SETs per second on commodity NVMe, but it keeps the commit point honest. The client receives acknowledgment after the kernel reports that bytes reached stable storage.

TTL support adds a second trap. Clients want relative time because humans think in durations. SET k v EX 5 means the key should expire five seconds from now. Disk needs a different contract. If the server stores EX 5 in the AOF, then replay after a 12-hour outage grants the key five new seconds. The server has extended the key without a user command.

Toykv handles that by converting all accepted TTL commands into absolute deadlines before disk write. A wire command such as SET k v EX 5 becomes SET k v plus PEXPIREAT k <unix-ms>. Replay can run one second later or one year later and reach the same conclusion about the key.

Wire-to-disk TTL canonicalisation: relative durations on the wire collapse through now_ms to absolute PEXPIREAT records on disk

That rule gives Toykv its core storage invariant: the socket may accept durations, but the AOF records moments. A duration needs the process clock that interpreted it. A timestamp carries the decision in the bytes.

The solution: write the same bytes you speak

Toykv stores one file at data/appendonly.aof. The file begins with an eight-byte header: TOYKV\0\0 plus a version byte. The rest of the file contains RESP arrays, the same frame shape the server sends and receives on the network.

That format removes a class of drift. The append path uses the RESP writer. The replay path uses the RESP reader. The server then dispatches parsed commands through the same command table that serves live clients. During replay, the AOF handle stays nil, so replayed mutations do not append themselves again.

AOF replay header + version dispatch: 8-byte magic + version, then loop over RESP records, dispatching through the live server table

The design gives the code one codec with two consumers. A custom disk frame would force the project to prove that wire parsing and log parsing agree on arity, bulk lengths, command names and error handling. RESP already carries array counts and bulk lengths, so Toykv gets truncation detection for torn records without adding a per-record CRC in v1.

The version byte pays off once TTL changes the on-disk command set. Version 1 files contain string operations. Version 2 files can include PEXPIREAT. A newer binary can read the header, choose the replay rules and accept older files. Without that byte, the binary would have to infer format from command contents or run a migration before it could trust replay.

The choice differs from Toymq for a reason. Toymq could defer its WAL version because the project owned its record shape and had a narrower migration story. Toykv writes a file that may sit on disk across binary upgrades and now includes time semantics. A single byte gives future replay code the context it needs.

The crash matrix keeps tests close to risk

Toykv does not rely on one large soak test to prove durability. The project maps each failure surface to a test that lives near the layer that introduced the risk.

Crash matrix layer flow: storage → TTL → live rewrite → protocol compatibility → composed chaos

The AOF crash test runs a child server, writes SET operations, kills the process and checks that acknowledged keys return after restart. The partial-tail replay test feeds torn records to the replayer and expects refusal with an offset. TTL tests cover sweeper races and crash round trips. Rewrite tests cover live appends, temp files and restart after compaction. Protocol tests use redis-cli and go-redis/v9 against the shipped binary.

The chaos suite then overlaps kill, pause, rewrite and writes. That suite checks that acknowledged SETs survive, INCR remains monotonic and the server avoids panics under combined faults. The project treats chaos as a release confidence check after lower layers prove their own invariants.

That layering keeps the tests cheap. If a protocol test fails, the maintainer inspects socket bytes. If a replay test fails, the maintainer inspects AOF parsing. If a rewrite test fails, the maintainer inspects the snapshot, side buffer and rename path. The test file points at the risk owner.

RESP2 doubles as a test strategy

Toykv could have invented a protocol. It chose RESP2 because Redis clients already encode the contract. redis-cli checks byte-level behavior. go-redis/v9 checks the API shape that a Go user would hit.

That choice gives Toykv third-party pressure. The clients do not know Toykv internals. They care about status replies, nil bulk strings, integer return values and command syntax. If Toykv drifts from RESP2, those clients fail without any custom client code in the repo.

The protocol layer owns wire behavior, including TTL sentinel values such as -2 for missing keys and -1 for keys without expiry. It does not own crash survival. Lower tests already prove that acknowledged mutations replay. That split matters because protocol compatibility and durability fail in different ways.

BGREWRITEAOF exposed a buffer bug

BGREWRITEAOF creates a compact AOF while the server continues to accept writes. Toykv snapshots the store, writes a temp file, captures live appends in a side buffer and then swaps the new file into place with the captured tail.

The hard part looked like it would involve rename ordering, directory fsync and file descriptor swap. Those invariants held. A smaller bug broke the path: the side buffer stayed empty after writes that looked successful.

The cause sat in bufio. The live path built a RESP writer on top of an existing buffered writer, and Go reused the existing buffer. The rewrite path built a RESP writer on top of a bytes.Buffer, so Go inserted a new inner buffer. The code wrote records into that inner buffer, but nothing flushed it before the swap.

The fix required flushing the mirror writer before draining and swapping. The lesson applies to any dual-write path: two sinks mean two flush points, and the swap must order both.

Durability commit point: handler writes RESP into the AOF, calls fsync, only then returns +OK to the client

The TTL sweeper test taught a memory-model lesson

Toykv uses a 1 Hz sweeper that samples keys under a read lock and upgrades to a write lock for eviction. Go's sync.RWMutex has no lock upgrade operation, so the sweeper releases and reacquires. It then checks expiry again before deleting.

The first stress test reported spurious nil reads for keys that appeared unexpired. The store behaved correctly. The test measured time at the wrong point.

The test captured t0 before calling Get. The Get method checked time later, after scheduler delay in some runs. A key could expire between t0 and the internal check, and the test would still accuse the store.

The corrected test captures tAfter after Get, loads the writer's last expiry timestamp with atomic ordering before the call and then reports a violation only if tAfter remains before the loaded expiry. That gives the assertion a real happens-before story instead of a scheduling guess.

The trade-offs stay visible

Toykv keeps v1 narrow: strings only, no auth, no TLS, no cluster, no RDB snapshots and no segment rotation. That scope keeps the durability story readable. The AOF scan costs O(disk) on startup. The lack of CRC means RESP framing catches torn writes, while some corruption patterns would need a future checksum. Per-write fsync protects acknowledged writes at the cost of throughput.

Those trade-offs suit a teaching store. A manifest would add another consistency object. Segments would add lifecycle rules. A background fsync mode would need a weaker acknowledgment contract. Each feature would widen the crash matrix.

The project still leaves room for a v2. Candidate additions include lists, sets, hashes, AUTH, TLS, RDB snapshots, INFO, metrics, SCAN and RENAME. Each one should arrive with its own failure surface and tests. A command that changes state also changes recovery.

The pattern worth reusing

Toykv's strongest decision is the smallest one: convert relative TTLs into absolute deadlines before persistence. That rule keeps replay deterministic. The second decision, RESP arrays on disk, keeps live dispatch and recovery on the same parser. The third decision, RESP2 on the wire, lets mature clients test the server from outside the repo.

Together, those choices reduce the amount of code that has to stay consistent under crash, restart and upgrade. The crash matrix still has real work to do, but each row has a clean owner.

A durable store needs more than a log file. It needs a precise commit point, a replay format that carries context and tests that kill the process at inconvenient times. Toykv gives you those pieces in about 5,000 lines of Go.

#Go #Key-Value Store #Persistence #RESP2 #crash testing