When OpenAI released its Codex agent framework, promising to automate coding and research workflows, one developer posed an unconventional challenge: What's the strongest AI model you can train on a laptop in just five minutes? The experiment—dubbed "vibe research"—reveals fascinating insights about small-scale AI training and the emerging human-agent collaboration paradigm.

The Vibe Research Methodology

The researcher employed a tight feedback loop with Codex:

  1. Agent-Driven Experimentation: Codex modified training scripts and executed 3-4 runs per iteration (~20 minutes total)
  2. Hypothesis Generation: The AI suggested 2-3 next steps based on results
  3. Human Steering: The researcher selected directions (occasionally proposing alternatives)

"It's performing a difficult technical task by relying on the model. I have a broad intuitive sense of approaches but not deep enough understanding to do this unassisted," the developer noted about their "vibe research" approach.

Training Breakthroughs and Pitfalls

Initial Attempts

  • N-gram Models: Fast (seconds) but produced incoherent remixes of training data (perplexity: 18.5)
  • Pure Transformers: Reached 8.53 perplexity but suffered from high variance across training seeds

The Perplexity Trap

After implementing Codex's "shallow fusion" technique (blending transformer predictions with n-gram and kNN heads), perplexity dropped to 7.38—but output quality worsened:

Once upon a time,, in a small house... Tim tried to climb the tree, but he was too big. He was too small...

The lesson? Perplexity alone is a poor quality metric for small models.

Distillation Innovation

The winning approach distilled knowledge from n-gram models into transformers:
1. Train n-gram teacher (10 seconds)
2. Warm-start transformer using n-gram predictions (200 steps)
3. Continue training on original data

This yielded dramatically improved coherence:

"Once upon a time, in a big forest, there lived a little bunny named Ben... They played together all day long. The moral of the story is to help others when they needed it."

Why This Matters for Developers

  1. Shortcutting Early Learning: N-gram distillation accelerates grammar acquisition, freeing compute for semantic learning
  2. Agent-Augmented Research: Codex efficiently explored hyperparameter spaces impractical for manual testing
  3. Hardware Democratization: Proves meaningful experimentation possible on consumer devices

As large labs chase trillion-parameter models, this experiment highlights untapped potential in optimized small-scale architectures—and the emerging reality of AI-assisted research.

Source: AI Research with Codex by Sean Goedecke