New research reveals LLMs struggle to create useful skills before solving tasks, but developers find success by generating skills after problem-solving to capture learned insights.

A recent study from Carnegie Mellon University and Microsoft Research reveals a counterintuitive truth about LLM-generated "skills"—pre-defined prompts designed to help language models perform specific tasks. While these skills are valuable when used by LLMs, the research found that LLMs cannot reliably author their own skills before attempting a task. According to the paper:
"Self-generated skills provide no benefit on average, showing that models cannot reliably author the procedural knowledge they benefit from consuming."
The Flawed Pre-Task Approach
The study's method asked models to generate skills before solving a task using this prompt structure:
- Analyze task requirements
- Write 1–5 modular skill documents
- Save skills as markdown files
- Solve the task using those skills
This approach mirrors common prompting strategies like "think step by step" but fails because LLMs already engage in internal reasoning before responding. Forcing them to formalize skills upfront bakes in flawed assumptions without real-world validation.
The Post-Task Alternative That Works
Developers report success by reversing the sequence: Solve the task first, then generate the skill. This captures hard-won insights gained through trial and error. For example, when experimenting with Sparse Autoencoders (SAEs) to manipulate model features (like Anthropic's Golden Gate Claude technique), several critical lessons emerged only after iterative attempts:
- Extracting features from the final layer normalization is ineffective
- Optimal feature extraction occurs mid-network
- SAEs require orders of magnitude more training data than initially assumed
Only after solving these challenges could an LLM distill the process into a reusable skill (feature_extraction.md). When applied to new models, this post-task skill worked immediately.
Why Timing Matters
The key distinction lies in the origin of knowledge:
- Pre-task skills rely on the model's pretrained biases, often containing inaccuracies
- Post-task skills encode newly discovered procedural knowledge from actual problem-solving
As one developer notes: "Skills should distill lessons from millions of tokens of iteration, not textbook knowledge." This makes them valuable for recurring workflows like data transformation, API integrations, or debugging procedures.
Community Response
The findings sparked discussion among practitioners:
- Many confirm anecdotal success with post-task skill generation
- Some question if task complexity determines effectiveness
- Others explore hybrid approaches combining human oversight with automated distillation
For now, the consensus is clear: Treat LLM-generated skills as documentation of conquered challenges, not speculative roadmaps. As one Hacker News commenter put it: "First climb the mountain, then draw the map."
Further Reading: CMU Paper on Self-Generated Skills

Comments
Please log in or register to join the discussion