#Hardware

Inside the 80386: How a community reverse‑engineered a 95 KB microcode ROM

AI & ML Reporter
4 min read

A group of hobbyists extracted, decoded, and documented the full microcode of Intel’s 80386 CPU, revealing 215 instruction entry points, no hidden opcodes, and a long‑standing I/O permission bug.

What was claimed

In a recent post on the Reenigne blog, the author reports that a small team has finally produced a complete, human‑readable disassembly of the 80386’s microcode ROM (94 720 bits, roughly 12 KB). The effort supposedly answers several long‑standing questions: the exact number of micro‑coded instruction entry points, whether any instructions are implemented outside the microcode, and whether any undocumented “easter‑egg” behaviour exists. The author also points to a potential security flaw in the I/O‑permission bitmap handling.

What is actually new

1. Extraction pipeline

The team combined high‑resolution die photographs, image‑processing scripts, and a modest amount of AI‑assisted pattern recognition to turn the physical ROM image into a clean binary blob. This step alone is noteworthy because the 80386’s microcode is stored in a dense, irregular layout that differs from the 8086’s more straightforward ROM.

2. Micro‑op layout discovery

By analysing recurring bit‑patterns they identified a two‑dimensional organization: one axis enumerates individual μ‑ops, the other encodes the fields of each μ‑op (source register, destination, ALU function, control flags, etc.). A block of unused μ‑ops at the end of the ROM provided a reliable read‑order reference, allowing the team to reconstruct the sequential flow of each instruction’s micro‑program.

3. Full instruction mapping

The disassembly shows 215 distinct entry points in the decoding ROM, a substantial jump from the 60 entry points of the 8086. The increase is not just new opcodes (e.g., protected‑mode extensions, REP prefixes) but also multiple micro‑routines for the same opcode that depend on operand type, addressing mode, and CPU mode. Every documented 80386 instruction – including the complex string and privilege‑level operations – has a corresponding microcode routine; there are no “hard‑wired” paths that bypass the microcode.

4. Accelerator integration

Unlike the 8086, where many operations are expressed entirely in microcode, the 80386 delegates work to dedicated hardware blocks (multiply/divide units, barrel shifter, protection‑test PLA). The disassembly clarifies how the microcode sets up these accelerators: loading operands, triggering the unit, and routing the result back to the register file. This explains the higher per‑cycle throughput of the 386 without resorting to speculative execution tricks that appeared later.

5. Potential bug in I/O‑permission handling

The authors highlight a subtle discrepancy in the routine that checks the I/O permission bitmap for 4‑byte port accesses. The microcode appears to verify only the first three bytes, potentially allowing the fourth byte to slip through when an access straddles the boundary of a permitted range. If the CPU were running in a protected‑mode OS that relied on the bitmap for isolation, a malicious user‑mode process could write to an unexpected hardware register. The issue is only observable on CPUs that still implement the original 386‑style bitmap logic; later silicon revisions may have corrected it.

Limitations and open questions

  • Verification on silicon – The analysis is based on a single die image and the extracted binary. Without a physical 386 to run test vectors, the I/O‑bitmap bug remains a hypothesis. Emulating the microcode in a cycle‑accurate simulator could provide stronger evidence, but such a simulator would need to model the accelerator interfaces faithfully.
  • Version differences – The ROM examined does not contain the XBTS/IBTS instructions that appeared in later 386 stepping levels. It is unclear whether those opcodes were added via a microcode update or a separate ROM patch. Researchers would need to repeat the extraction on a later stepping to compare.
  • Unused μ‑op block – The range 0x849‑0x856 is marked “unused?” and resembles the page‑fault handler, yet it is never referenced. It may be leftover code from an early development prototype or a fallback path that never made it into the final silicon.
  • Tooling maturity – The pipeline relied on a mix of custom scripts and manual inspection. Packaging the workflow into a reusable open‑source toolchain would lower the barrier for similar projects on other legacy CPUs.

Practical takeaways

  1. Microcode is not a black box – Even for a 30‑year‑old CPU, a determined community can reconstruct the full control store, opening the door to accurate cycle‑accurate emulation and security research.
  2. Legacy bugs can persist – The I/O permission issue demonstrates that undocumented microcode behaviour may still affect systems that run old binaries (e.g., DOSBox or legacy virtualization environments).
  3. Documentation matters – The accompanying fields.txt, parts.txt, and the series of blog posts by nand2mario provide a valuable map for anyone wanting to understand the 386’s internal datapaths.

Where to find the data

The complete disassembly, along with supporting files, is hosted in the public x86‑microcode GitHub repository:

Conclusion

The project demystifies the 80386’s control store, confirming that every instruction is micro‑coded, exposing a handful of dead‑code regions, and uncovering a decades‑old security nuance. While the work is primarily of historical and academic interest, it also serves as a template for reverse‑engineering other legacy micro‑architectures and reminds us that even mature hardware can hide subtle bugs.

Comments

Loading comments...