An open‑source FPGA core called z386 rebuilds the 80386 around Intel’s recovered 37‑bit microcode. The project now boots DOS, runs protected‑mode extenders and classic games, and offers a compact 16 KB L1 cache that lets the core operate at ~85 MHz—roughly the speed of a fast 386 with modern FPGA timing.
z386 – an 80386 built from Intel’s own microcode
The z386 project is the fifth entry in a series that started with a microcode‑driven 8086 (z8086). Unlike most FPGA x86 cores, which re‑implement the instruction set in RTL, z386 tries to reuse the original 386 control ROM that Intel shipped in the 1990s. The result is a CPU that behaves like a genuine 386, but runs on modern FPGA fabrics at a clock speed that would have been impossible for the silicon of the era.
The problem it solves
- Preserving historical hardware – The 80386 introduced 32‑bit protected mode, paging, and a rich segmentation model. Those features are still the foundation of today’s x86 CPUs, yet no open‑source implementation reproduces the original hardware contract.
- Educational reconstruction – By wiring the recovered microcode to a set of FPGA‑friendly blocks, the project shows how the 386’s internal state machines interact, giving developers a concrete view of a classic architecture that is otherwise only described in Intel’s papers.
- Practical retro‑computing – Running real DOS software (DOS 6/7, DOS/4GW, Doom, Cannon Fodder) on an FPGA board provides a usable platform for hobbyists who want authentic 386 performance without hunting down vintage hardware.
Architecture at a glance
The 386 is organized around eight cooperating units rather than a deep, RISC‑style pipeline. z386 mirrors that layout closely, which makes the original microcode usable with only modest glue logic.

| Unit | Role in z386 |
|---|---|
| Prefetch | 16‑byte code queue filled with 32‑bit bursts from the memory system. |
| Decoder | Byte‑wise state machine that builds a 111‑bit decoded‑instruction record and pushes it into a three‑entry FIFO. |
| Microcode sequencer | Fetches 37‑bit micro‑ops from the recovered ROM, handles jumps, delay slots, and the run‑next‑instruction (RNI) contract. |
| ALU & shifter | Implements arithmetic, logic, flag updates, and uses FPGA DSP blocks for fast multiplication/division. |
| Segmentation | Calculates linear addresses, maintains hidden descriptor caches, and enforces segment limits. |
| Protection PLA | Re‑creates Intel’s selector/descriptor validation logic, feeding redirects back to the sequencer after three cycles. |
| Paging | 32‑entry TLB, page‑walk engine, Accessed/Dirty updates, and fault generation. |
| BIU / Cache | Connects the core to SDRAM, I/O, and a small on‑chip L1 cache (see below). |
Front‑end: prefetch and decode
The original 386 fetched four bytes every two clock cycles from a non‑multiplexed 32‑bit bus. To keep the pipeline fed, z386 implements a 16‑byte prefetch queue that is populated with 32‑bit reads. The decoder consumes the stream byte‑by‑byte to keep the state machine simple, but it can also grab a 32‑bit window when it knows the next bytes are just a displacement or immediate value. This hybrid approach reduces the number of cycles spent gathering literals while preserving the classic byte‑wise control flow.
The decoder itself is driven by two small PLA tables that were recovered from Intel’s design:
- Control PLA – decides whether the current byte is a prefix, an opcode, a ModR/M, a SIB, or an immediate.
- Entry PLA – maps the partially decoded instruction to the microcode entry point in the ROM.
Because the 386’s microcode is dense and context‑dependent, many instructions need a second PLA pass after the ModR/M byte is known. For example, the instruction 8B 44 24 08 (MOV EAX,[ESP+8]) is resolved in two passes, ending up at microcode address 0x019.
Microcode sequencer – the heart of the machine
Each micro‑instruction is 37 bits wide and is split into fields such as source → dest, alu_src, alu/jump, op, sub, and bus. The sequencer steps through these words, applying the specified register moves, ALU operations, and bus cycles. A key characteristic of the 386 microcode is the delay slot that follows every branch or RNI signal. The slot executes before the branch takes effect, which is why a simple register‑to‑register move always costs two cycles.
The sequencer also respects the Protection PLA: after a selector check, the PLA may redirect execution, but the redirect only becomes visible three cycles later. Those three micro‑ops can be used for useful housekeeping (e.g., writing back a register) before the fault handler takes over.
Cache – making the FPGA version fast enough
The real 386 never had an on‑chip L1 cache; high‑end boards used external SRAM caches of 32 KB–128 KB. Running directly from SDRAM on an FPGA caused high CPI and contention with the prefetch unit. The solution is a 16 KB, 4‑way set‑associative VIPT cache:
- Line size – 16 bytes (four DWORDs)
- Associativity – 4‑way, giving 64 lines per way
- Policy – PLRU replacement, write‑through, read‑allocate
- Write buffer – 2 entries to hide SDRAM latency
Because the index fits inside the 4 KB page offset, the cache can start a virtual‑index, physical‑tag lookup while the paging unit is still translating the address. The tag comparison happens a cycle later, and a hit returns data with zero wait‑states. Misses trigger a burst read from SDRAM; the requested word is forwarded as soon as it arrives, while the rest of the line continues filling.
Testing methodology
- Real‑mode fuzzing – The
SingleStepTests/80386suite from gloriouscow runs one instruction at a time, comparing registers, flags, memory, and exceptions against a reference emulator. It catches early bugs before any BIOS is involved. - Protected‑mode fuzzing – A custom harness built on 86Box runs the same single‑step tests in protected mode, exercising paging, privilege checks, and fault handling.
- Hand‑crafted corner cases – Programs that trigger call gates, VM86 transitions, interrupt‑shadow behavior, and prefetch flush loops were written to verify subtle contracts.
- Full‑system integration – SeaBIOS, FreeDOS, HIMEM, EMM386, DOS/4GW, DOS/32A, and classic games (Doom, FastDoom, Cannon Fodder) serve as end‑to‑end validation. Debug output is routed through an I/O port to make early boot visible.
How z386 compares to ao486
| Feature | z386 (386‑style) | ao486 (486‑style) |
|---|---|---|
| Organization | Large cooperating units, coarse‑grained control | Fine‑grained pipeline stages |
| Control model | Original Intel microcode ROM drives hardware | Staged command flow, hand‑written RTL |
| Front‑end | 16‑byte raw queue, 3‑entry decoded FIFO | 32‑byte raw queue, instruction aligner feeding D1/D2 pipeline |
| Memory model | Segmentation, paging, VIPT L1 cache, explicit bus contracts | Similar pieces, but pipelined differently |
| Performance risk | High CPI, Fmax pressure due to coarse steps | Pipeline hazards, forwarding logic |
| Typical CPI | ~2–3 for simple ops (real 386) | ~1.5 on modern FPGA implementations |
Both cores run on a range of FPGA families (Cyclone V, Gowin GW5A, etc.), but they illustrate two very different philosophies for resurrecting legacy x86 hardware.
Current status and outlook
- Clock – 85 MHz on a Cyclone V (90 MHz on ao486 for reference)
- Performance – 3D Bench FPS 34 vs. 43 on ao486; Doom runs at 16.5 FPS at max details (21 FPS on ao486)
- Software – DOS 6/7 boots, DOS extenders work, classic games run, but Windows still fails to start.
- Future work – Expand cache size, tighten timing for higher clock rates, improve protected‑mode coverage, and add support for modern peripherals (USB, SATA) via soft‑IP cores.
Why it matters
The 80386 was the first x86 that made 32‑bit protected mode practical, opening the door for OS/2, Windows NT, and eventually Linux. By reconstructing the chip with its original microcode, z386 provides a reference implementation that is both historically accurate and usable for modern retro‑computing. It also demonstrates how recovered microcode can serve as a high‑level specification, turning a dense, hand‑tuned control program into a functional hardware model.
Resources
- Project repository – https://github.com/nand2mario/z386
- Microcode disassembly – https://github.com/reenigne/80386-microcode
- Full write‑up series – https://nand2mario.github.io/posts/2026/z386/
- Live demo video – https://youtu.be/xxxxxx (Doom running on a DE10‑Nano)
The author is a regular contributor to the retro‑computing community and can be followed on X @nand2mario.

Comments
Please log in or register to join the discussion