Local LLMs in the Terminal: A Developer's Experiment with Open-Source AI Coding Assistants
Share this article
Every six months, I emerge from my developer cave to explore the latest AI trends. This time, I discovered a surge in terminal-based LLM tools promising autonomous coding—searching, writing, testing, and committing code. Skeptical of vendor lock-in from unprofitable companies, I embarked on a quest to find open-source alternatives. Here’s what I learned.
Model Selection: The Open-Source Contenders
I tested three leading open-source LLMs for local coding tasks:
Deepseek-R1:8b:
- Popular Chinese model (5.2GB) excelling in benchmarks.
- Issue: Reasoning loops caused indefinite hangs during coding tasks.
Mistral:7b:
- French model optimized for speed.
- Issue: Hallucinated functions and deleted code unpredictably.
Qwen3:8b:
- Alibaba’s model supporting agentic workflows.
- Verdict: Balanced accuracy and stability, running smoothly on my aging Mac. Chosen for further testing.
Building the Local Stack
Ollama: The LLM Workhorse
Ollama serves as the engine, managing models like "Docker for LLMs":
ollama pull qwen3:8b
ollama run qwen3:8b
It exposes a local API (http://localhost:11434), enabling integration with coding tools.
Aider: Terminal Pair Programmer
Aider acts as the conductor, orchestrating file edits, linting, and Git commits. Setup is straightforward:
export OLLAMA_API_BASE=http://127.0.0.1:11434
aider --model ollama_chat/qwen3:8b
Key commands:
- /add [file]: Track files for context
- /ask: Query the model
- /code: Generate and write code
- /architect: Plan and implement features
Real-World Testing: Triumphs and Pitfalls
Refactoring Existing Code:
- Success: Modified a struct in my project
Itakowhen given explicit instructions. - Caveat: Changed an unrelated function, requiring manual correction.
- Verdict: Slower than human editing for simple tasks.
- Success: Modified a struct in my project
Greenfield Development:
- Task: Build a Japanese text parser with
fugashi. - Result: Hallucinated, non-functional Python code.
- Insight: Lack of context cripples new project scaffolding.
- Task: Build a Japanese text parser with
Troubleshooting:
- Task: Diagnose introduced bugs using
/ask. - Result: Halved debug time vs. Googling error messages.
- Strength: Access to actual code context proved invaluable.
- Task: Diagnose introduced bugs using
The Autonomous Illusion: Qwen CLI
Alibaba’s standalone qwen-code tool promised full autonomy but stumbled:
- Context Overload: 40K token limit vs. 1M needed for repository scanning.
- Path Issues: Failed file writes due to absolute/relative path mismatches.
The Terminal Frontier: Pragmatic Over Autonomous
While fully autonomous coding remains elusive, aider + Qwen3 shines as a "supercharged rubber duck" for debugging. The setup eliminates cloud dependencies and enshittification risks—but demands guidance. For now, it’s a potent assistant, not a replacement.
Source: alicegg.tech