Eric Jang describes how he recreated AlphaGo using modern tools, what the implementation adds beyond the original papers, and where the approach still falls short. The article breaks down the claimed breakthroughs, the concrete technical contributions, and the practical limits of a hobby‑scale Go AI.
Rebuilding AlphaGo from Scratch: What Eric Jang’s Project Really Shows
Featured image – a chalkboard sketch of Monte‑Carlo Tree Search
What’s claimed
In a recent interview, Eric Jang (formerly VP of AI at 1X Technologies and senior researcher at DeepMind Robotics) walked through his open‑source recreation of AlphaGo. The headline‑grabbing claims are:
- Full AlphaGo pipeline – from board encoding to Monte‑Carlo Tree Search (MCTS) and self‑play, all implemented with a few thousand dollars of cloud compute.
- Modern tooling – using large language models (LLMs) for code generation, hyper‑parameter tuning, and automated experiment management.
- Performance comparable to open‑source bots such as KataGo, despite a dramatically smaller research budget.
These points make it sound as if the original AlphaGo research “was just a matter of engineering” and that anyone can now reproduce it with off‑the‑shelf components.
What’s actually new
1. A clean, well‑documented reference implementation
Jang’s repo (see the GitHub repository) provides:
- A minimal ResNet architecture that jointly predicts policy and value heads.
- A compact MCTS loop that follows the PUCT rule used in AlphaGo Zero.
- A self‑play training script that alternates between data collection, supervised updates, and periodic evaluation.
The code is deliberately simplified compared to the production‑grade DeepMind stack: no distributed replay buffers, no asynchronous learners, and a single‑GPU training loop. This makes the system approachable for students and hobbyists.
2. Leveraging LLMs for the engineering pipeline
Jang used Claude‑4.6/4.7 to:
- Generate boiler‑plate Go engine code.
- Propose hyper‑parameter sweeps (learning‑rate schedules, network depth, number of simulations per move).
- Write experiment‑automation scripts that automatically plot win‑rate curves and log training statistics.
The LLM‑driven workflow is not a scientific contribution to Go AI itself, but it does illustrate how today’s coding assistants can shave weeks off the engineering effort.
3. Empirical scaling observations on a tiny budget
By training on a $10 k cloud budget, Jang demonstrated that:
- A 3‑layer ResNet with ~2 M parameters can reach a KataGo‑level strength on 9×9 boards after a few hundred thousand self‑play games.
- Pre‑training on small boards (9×9) and transferring to 19×19 reduces the warm‑up phase dramatically – a practical tip for anyone building a new Go bot.
These findings line up with the “Bitter Lesson” (hardware beats algorithmic tricks) but they are empirical anecdotes, not a systematic study of scaling laws.
Limitations and open challenges
| Aspect | Limitation |
|---|---|
| Compute efficiency | The original AlphaGo Zero used tens of thousands of TPU‑hours per training run. Jang’s implementation cuts that down by orders of magnitude, but the resulting bot still needs hundreds of GPU‑hours to become competitive on 19×19. This is far from “run on a laptop”. |
| Search depth | MCTS is limited to a few hundred simulations per move (often 800‑2 000). Without the massive parallelism of DeepMind’s custom hardware, the bot cannot explore the same breadth of the game tree as the original system. |
| Policy/value architecture | Jang sticks with a ResNet because it works well on modest data. Recent work (e.g., KataGo) shows that carefully engineered attention‑based heads can squeeze extra performance, which is not explored here. |
| Off‑policy data handling | The replay buffer contains states that the current policy would never visit. Jang notes that this can degrade learning, but the implementation does not include advanced techniques such as importance sampling or prioritized replay that are common in modern RL pipelines. |
| Evaluation against strong opponents | The bot is benchmarked mainly against KataGo at low strength settings. It has not been tested against the latest open‑source Go engines (e.g., Leela Zero) or the official AlphaGo‑Zero checkpoints, so the claim of “comparable strength” is provisional. |
| Generalization beyond Go | The project is a domain‑specific recreation. The lessons about using LLMs for code generation do not directly transfer to other combinatorial problems (e.g., StarCraft, protein folding) where the action space is vastly larger and the value function is harder to learn. |
Why the recreation matters (and why it doesn’t rewrite history)
- Educational value – A well‑commented, single‑machine implementation is an excellent teaching tool for students learning MCTS, self‑play, and the policy‑value network paradigm.
- Proof of concept for low‑budget research – Jang shows that with modern cloud pricing and LLM assistance, a solo researcher can reproduce a historically resource‑intensive result.
- No fundamental algorithmic breakthrough – The core ideas (PUCT, policy‑value head, self‑play) are exactly those introduced by the AlphaGo papers. The novelty lies in engineering convenience, not in new theory.
- Scalability remains an open problem – To push a bot from “strong amateur” to “world‑class” still requires massive compute, sophisticated distributed training, and many of the tricks (e.g., rollout‑policy mixing, dynamic simulation budgets) that Jang deliberately omitted for simplicity.
Take‑away for practitioners
- Start with a small board – Train on 9×9 first; the value head learns end‑game evaluation quickly and transfers to 19×19.
- Use a shared ResNet backbone – For modest budgets, a single network with two heads (policy & value) is more compute‑efficient than separate networks.
- Leverage LLMs for boilerplate – Let a coding assistant generate data‑loader scaffolding and experiment scripts, but verify the logic yourself; the models still make mistakes on subtle RL details.
- Monitor off‑policy drift – Periodically prune the replay buffer to keep states that the current policy actually visits; otherwise learning can stagnate.
- Benchmark against multiple opponents – A single comparison to KataGo is insufficient for a rigorous claim of strength.
Where to find the full project
- Code: https://github.com/ericjang/AutoGo (MIT‑licensed) – includes training scripts, a Dockerfile, and pretrained checkpoints for 9×9 and 19×19.
- Blog post with interactive visualisations: https://evjang.com/autogo
- Original AlphaGo papers:
- Mastering the game of Go with deep neural networks and tree search (Nature, 2016) – https://deepmind.com/research/case-studies/alphago
- Mastering the game of Go without human knowledge (Nature, 2017) – https://deepmind.com/research/case-studies/alphago-zero
- KataGo implementation: https://github.com/lightvector/KataGo – a strong open‑source baseline for comparison.
In short, Eric Jang’s recreation is a valuable educational resource that demystifies AlphaGo’s pipeline, but it does not overturn the fact that world‑class Go AI still demands massive compute and many engineering refinements beyond the core algorithm.

Comments
Please log in or register to join the discussion