Article illustration 1

Every six months, I emerge from my developer cave to explore the latest AI trends. This time, I discovered a surge in terminal-based LLM tools promising autonomous coding—searching, writing, testing, and committing code. Skeptical of vendor lock-in from unprofitable companies, I embarked on a quest to find open-source alternatives. Here’s what I learned.

Model Selection: The Open-Source Contenders

I tested three leading open-source LLMs for local coding tasks:

  1. Deepseek-R1:8b:

    • Popular Chinese model (5.2GB) excelling in benchmarks.
    • Issue: Reasoning loops caused indefinite hangs during coding tasks.
  2. Mistral:7b:

    • French model optimized for speed.
    • Issue: Hallucinated functions and deleted code unpredictably.
  3. Qwen3:8b:

    • Alibaba’s model supporting agentic workflows.
    • Verdict: Balanced accuracy and stability, running smoothly on my aging Mac. Chosen for further testing.

Building the Local Stack

Ollama: The LLM Workhorse

Ollama serves as the engine, managing models like "Docker for LLMs":

ollama pull qwen3:8b
ollama run qwen3:8b

It exposes a local API (http://localhost:11434), enabling integration with coding tools.

Article illustration 3

Aider: Terminal Pair Programmer

Aider acts as the conductor, orchestrating file edits, linting, and Git commits. Setup is straightforward:

export OLLAMA_API_BASE=http://127.0.0.1:11434
aider --model ollama_chat/qwen3:8b

Key commands:
- /add [file]: Track files for context
- /ask: Query the model
- /code: Generate and write code
- /architect: Plan and implement features

Real-World Testing: Triumphs and Pitfalls

  1. Refactoring Existing Code:

    • Success: Modified a struct in my project Itako when given explicit instructions.
    • Caveat: Changed an unrelated function, requiring manual correction.
    • Verdict: Slower than human editing for simple tasks.
  2. Greenfield Development:

    • Task: Build a Japanese text parser with fugashi.
    • Result: Hallucinated, non-functional Python code.
    • Insight: Lack of context cripples new project scaffolding.
  3. Troubleshooting:

    • Task: Diagnose introduced bugs using /ask.
    • Result: Halved debug time vs. Googling error messages.
    • Strength: Access to actual code context proved invaluable.

The Autonomous Illusion: Qwen CLI

Alibaba’s standalone qwen-code tool promised full autonomy but stumbled:
- Context Overload: 40K token limit vs. 1M needed for repository scanning.
- Path Issues: Failed file writes due to absolute/relative path mismatches.

The Terminal Frontier: Pragmatic Over Autonomous

While fully autonomous coding remains elusive, aider + Qwen3 shines as a "supercharged rubber duck" for debugging. The setup eliminates cloud dependencies and enshittification risks—but demands guidance. For now, it’s a potent assistant, not a replacement.

Source: alicegg.tech