A deep dive into self-hosting Happy, an open-source Claude Code client, on Kubernetes to create a truly mobile development workflow. The author details their complete architecture—from Tailscale networking and secret management to multi-LLM provider strategies and security considerations—demonstrating how autonomous AI agents can be safely deployed in isolated containers to reclaim dead time and boost productivity.
The transformation from traditional IDE-bound development to a distributed, AI-assisted workflow represents one of the more subtle but profound shifts happening in software engineering practice. While much attention focuses on the capabilities of large language models themselves, the infrastructure that enables their seamless integration into daily work often receives less scrutiny. The author's journey to becoming a "Happy engineer" illustrates how thoughtful infrastructure design can fundamentally alter the temporal and spatial constraints of software development.
The Architecture of Untethered Development
Happy emerges from the community as an open-source alternative to terminal-bound Claude Code sessions, providing a client-server architecture that decouples the AI assistant from physical workstations. Unlike SSH-based workflows that suffer from terminal UI flickering, mobile input limitations, and painful copy-paste operations, Happy implements a proper mobile-first interface that treats text input as a first-class citizen.
The core insight driving this architecture is that mobile development isn't about writing code on phones—it's about reclaiming micro-sessions throughout the day. The author describes deploying Jellyfin to a Kubernetes cluster while walking to the grocery store, transforming what would have been a "handle later" task into immediate execution. This capability fundamentally changes the cognitive overhead of side projects: instead of context-switching costs accumulating across deferred tasks, progress becomes continuous and fluid.
Kubernetes Infrastructure: The Self-Hosting Decision
The decision to self-host Happy's server component stemmed from reliability issues with the public instance, which began timing out frequently before becoming completely unavailable. For someone relying on Claude Code daily, this wasn't an acceptable failure mode.
The production-grade Kubernetes deployment demonstrates careful consideration of operational requirements:
Core Components:
- Happy Server: Node.js/Express application serving on port 3000, deployed with resource limits (128Mi/100m requests, 512Mi limits) and health probes for automatic recovery
- PostgreSQL: Managed via CloudNativePG operator with 10Gi persistent storage on Longhorn, using init containers for Prisma migrations
- Redis: Single replica for session management and caching
- Object Storage: Backblaze B2 via S3-compatible API for file attachments
Network Architecture:
The author runs Traefik as ingress controller and Tailscale for secure remote access, with both operating inside the cluster. Tailscale uses hostNetwork: true to expose itself directly on the host network, creating a WireGuard-encrypted peer-to-peer tunnel. This eliminates public internet exposure entirely—no open ports, no public DNS records, no attack surface beyond the tailnet.
The flow works as: Client → Tailscale tunnel → Traefik ingress → Happy Server → Workspace pods. All traffic remains encrypted at the network layer, and the Happy Server fetches secrets from OpenBao at runtime rather than storing them in Kubernetes Secrets.
Security Philosophy: Controlled Risk in Autonomous Agents
Perhaps the most interesting aspect of this setup is the author's approach to running AI agents in what they term "YOLO mode"—minimal oversight with calculated risk boundaries. This isn't recklessness; it's a pragmatic assessment of threat models for personal development environments.
The reasoning is straightforward: the primary value of AI agents is their autonomy. If every tool call requires manual approval, the workflow collapses back into traditional development patterns. The author accepts that a Personal Access Token might leak or a container might go rogue, but mitigates this through isolation rather than permission gates.
The workspace pods implement strict NetworkPolicy rules:
- Egress: Blocks all private networks (10.0.0.0/8, 172.16.0.0/12, 192.168.0.0/16) except DNS queries, LLM API access, and Happy server communication
- Ingress: Allows SSH only from Tailscale namespace, blocking all other inbound connections
This creates a sandbox where compromised agents can only affect resources that are either public already or easily restorable. The blast radius is contained to things within the author's control, while the agents retain enough freedom to be genuinely useful.
The Multi-Provider LLM Strategy
Cost optimization and capability matching drive a sophisticated multi-provider approach. Rather than relying on a single model, the author routes tasks through different LLMs based on specific requirements:
MiniMax M2.1 ($2-10/month): The workhorse for routine tasks—variable renaming, function extraction, simple refactors. The author notes its "stubbornness" can be beneficial for long-running tasks where model drift is undesirable. While described as "dumber" than alternatives, the cost-effectiveness makes it ideal for high-volume, low-complexity operations.
GLM 4.7 ($3-6/month): The surprising standout for frontend work. The author reports it sometimes outperforms Claude Opus for CSS adjustments and UI components, making it the preferred choice for interface development. There's also an ideological component: supporting open-source models pushing toward state-of-the-art feels better than paying the "Anthropic tax" for closed systems.
Gemini 3.0 via Antigravity (free tier): Specialized for UI debugging with its inspection capabilities, though used less frequently since GLM covers most needs.
Claude Opus 4.5 ($17-20/month): Reserved for complex planning, multi-step refactors, and architectural decisions requiring deep understanding across multiple files.
The limitation is provider switching: currently, the daemon must be restarted with different environment variables. The author uses shell scripts (setup-minimax.sh, setup-zai.sh) to configure ANTHROPIC_* variables before starting sessions. Community development is underway for in-app profile switching (PR #272).
Workspace Isolation and MCP Integration
The development environment runs in a containerized workspace built on Alpine Linux, running as non-root user (UID 1000) for Kubernetes Pod Security Standards compliance. Key features:
- Persistent Storage: 60Gi per workspace split between Nix store (50Gi) and home directory
- SSH Access: Dropbear server on port 2222 via Tailscale
- Nix Integration: Single-user Nix store copied from image template to PVC on first run, avoiding ~4GB duplication
- Multi-arch: Built for AMD64 and ARM64 via GitHub Actions
The workspace integrates with Model Context Protocol servers for tool access:
- gh-actions-mcp: GitHub Actions workflow management
- argocd-mcp: ArgoCD application deployment and sync
- woodpecker-ci-mcp: CI pipeline monitoring
- mcp-searxng: Self-hosted search (necessary because public SearXNG instances disable JSON output)
The Android Certificate Challenge
A notable technical hurdle emerged when connecting the Happy Android app to the self-hosted instance via Tailscale. The app's strict TLS validation rejected connections to domains using self-signed certificates or private CAs, even though Tailscale already provided end-to-end encryption.
The solution required forking the Happy app and modifying Android's network security configuration to trust the private certificate authority. The author contributed this fix back as PR #278, which adds support for user-trusted CAs and sets up GitHub Actions CI for Android builds.
This highlights a broader tension: security best practices (certificate pinning, strict validation) can conflict with legitimate private deployments. The author's approach—maintaining a private CA while accepting that Tailscale's encryption is sufficient—represents a pragmatic middle ground.
Operational Considerations and Costs
The monthly infrastructure costs are remarkably modest:
- LLM APIs: $22-36 (MiniMax + GLM + Claude Pro)
- Storage: Negligible (Backblaze B2 for a few MBs)
- Network: Free (Tailscale personal tier)
- Compute: Self-hosted (no cloud costs)
This cost structure makes the setup accessible for individual developers while providing enterprise-grade reliability through Kubernetes' self-healing capabilities.
Broader Implications
This architecture represents a maturation of AI-assisted development. We're moving beyond simple code completion toward autonomous agents that can handle multi-step tasks, interact with infrastructure, and operate across distributed environments. The key enablers aren't just better models, but better infrastructure for integrating them safely and efficiently.
The author's emphasis on "reclaiming dead time" through mobile access points to a fundamental shift in how we think about development productivity. Traditional metrics focus on lines of code or features shipped per hour. This approach optimizes for time utilization—turning previously lost moments into productive sessions.
The security model is equally significant. Rather than trying to make AI agents perfectly trustworthy, it accepts their fallibility and contains the consequences. This is more realistic than attempting to eliminate all risk, especially as agents become more capable and autonomous.
Getting Started
For those interested in exploring this pattern:
- Try the public server first:
npm install -g happy-coder && happyprovides immediate experience with the mobile workflow - Explore the ecosystem: Happy documentation, Happy GitHub, and community Discord
- Start simple: Begin with one LLM provider and a basic Kubernetes setup (k3s or microk8s) before scaling to production-grade deployments
- Iterate: The author's setup evolved from simple to complex; start with what solves immediate needs
The Happy ecosystem demonstrates how open-source communities can rapidly iterate on infrastructure patterns that large companies might not prioritize. The result is a development environment that feels both cutting-edge and surprisingly practical—a combination that will likely define the next generation of software development tools.
For those seeking lighter alternatives, the community mentions HAPI as a potentially simpler option, though with fewer features than Happy's comprehensive approach.
The transformation described here isn't just about mobile access or AI assistance—it's about reimagining the developer's environment as a distributed, intelligent system that adapts to the developer's context rather than forcing the developer to adapt to the tool's limitations.

Comments
Please log in or register to join the discussion