Amateur AI Research: Training Transformers on a Laptop with OpenAI's Codex
Share this article
When OpenAI released its Codex agent framework, promising to automate coding and research workflows, one developer posed an unconventional challenge: What's the strongest AI model you can train on a laptop in just five minutes? The experiment—dubbed "vibe research"—reveals fascinating insights about small-scale AI training and the emerging human-agent collaboration paradigm.
The Vibe Research Methodology
The researcher employed a tight feedback loop with Codex:
- Agent-Driven Experimentation: Codex modified training scripts and executed 3-4 runs per iteration (~20 minutes total)
- Hypothesis Generation: The AI suggested 2-3 next steps based on results
- Human Steering: The researcher selected directions (occasionally proposing alternatives)
"It's performing a difficult technical task by relying on the model. I have a broad intuitive sense of approaches but not deep enough understanding to do this unassisted," the developer noted about their "vibe research" approach.
Training Breakthroughs and Pitfalls
Initial Attempts
- N-gram Models: Fast (seconds) but produced incoherent remixes of training data (perplexity: 18.5)
- Pure Transformers: Reached 8.53 perplexity but suffered from high variance across training seeds
The Perplexity Trap
After implementing Codex's "shallow fusion" technique (blending transformer predictions with n-gram and kNN heads), perplexity dropped to 7.38—but output quality worsened:
Once upon a time,, in a small house... Tim tried to climb the tree, but he was too big. He was too small...
The lesson? Perplexity alone is a poor quality metric for small models.
Distillation Innovation
The winning approach distilled knowledge from n-gram models into transformers:
1. Train n-gram teacher (10 seconds)
2. Warm-start transformer using n-gram predictions (200 steps)
3. Continue training on original data
This yielded dramatically improved coherence:
"Once upon a time, in a big forest, there lived a little bunny named Ben... They played together all day long. The moral of the story is to help others when they needed it."
Why This Matters for Developers
- Shortcutting Early Learning: N-gram distillation accelerates grammar acquisition, freeing compute for semantic learning
- Agent-Augmented Research: Codex efficiently explored hyperparameter spaces impractical for manual testing
- Hardware Democratization: Proves meaningful experimentation possible on consumer devices
As large labs chase trillion-parameter models, this experiment highlights untapped potential in optimized small-scale architectures—and the emerging reality of AI-assisted research.
Source: AI Research with Codex by Sean Goedecke