Local LLMs in the Terminal: A Developer's Experiment with Open-Source AI Coding Assistants

A developer tests open-source LLMs like Qwen3 for terminal-based coding workflows, revealing practical strengths in troubleshooting but limitations in autonomous development. The experiment highlights the viability—and current constraints—of running local AI coding assistants with tools like Ollama and aider.

Every six months, I emerge from my developer cave to explore the latest AI trends. This time, I discovered a surge in terminal-based LLM tools promising autonomous coding—searching, writing, testing, and committing code. Skeptical of vendor lock-in from unprofitable companies, I embarked on a quest to find open-source alternatives. Here’s what I learned.

Model Selection: The Open-Source Contenders

I tested three leading open-source LLMs for local coding tasks:

Deepseek-R1:8b:
- Popular Chinese model (5.2GB) excelling in benchmarks.
- Issue: Reasoning loops caused indefinite hangs during coding tasks.
Mistral:7b:
- French model optimized for speed.
- Issue: Hallucinated functions and deleted code unpredictably.
Qwen3:8b:
- Alibaba’s model supporting agentic workflows.
- Verdict: Balanced accuracy and stability, running smoothly on my aging Mac. Chosen for further testing.

Building the Local Stack

Ollama: The LLM Workhorse

Ollama serves as the engine, managing models like "Docker for LLMs":

ollama pull qwen3:8b
ollama run qwen3:8b

It exposes a local API (http://localhost:11434), enabling integration with coding tools.

Aider: Terminal Pair Programmer

Aider acts as the conductor, orchestrating file edits, linting, and Git commits. Setup is straightforward:

export OLLAMA_API_BASE=http://127.0.0.1:11434
aider --model ollama_chat/qwen3:8b

Key commands:

/add [file]: Track files for context
/ask: Query the model
/code: Generate and write code
/architect: Plan and implement features

Real-World Testing: Triumphs and Pitfalls

Refactoring Existing Code:
- Success: Modified a struct in my project Itako when given explicit instructions.
- Caveat: Changed an unrelated function, requiring manual correction.
- Verdict: Slower than human editing for simple tasks.
Greenfield Development:
- Task: Build a Japanese text parser with fugashi.
- Result: Hallucinated, non-functional Python code.
- Insight: Lack of context cripples new project scaffolding.
Troubleshooting:
- Task: Diagnose introduced bugs using /ask.
- Result: Halved debug time vs. Googling error messages.
- Strength: Access to actual code context proved invaluable.

The Autonomous Illusion: Qwen CLI

Alibaba’s standalone qwen-code tool promised full autonomy but stumbled:

Context Overload: 40K token limit vs. 1M needed for repository scanning.
Path Issues: Failed file writes due to absolute/relative path mismatches.

The Terminal Frontier: Pragmatic Over Autonomous

While fully autonomous coding remains elusive, aider + Qwen3 shines as a "supercharged rubber duck" for debugging. The setup eliminates cloud dependencies and enshittification risks—but demands guidance. For now, it’s a potent assistant, not a replacement.

Source: alicegg.tech