Three Clients, One Auth System: Hard Lessons From Shipping OAuth and AI Under Deadline

A backend engineer's field notes from two internship tasks: building a single GitHub OAuth design that serves a web portal, a CLI, and automated graders, then wrapping an LLM in enough validation that a marketing pipeline stayed predictable for an entire team.

Most authentication tickets read like a single line of work. "Implement OAuth." The trouble starts the moment more than one kind of client needs to log in, because a browser, a command-line tool, and an automated grader each have different threat models, different storage constraints, and different ideas about where a token is allowed to live. Two recent tasks from an HNG internship, one solo and one on a team, make a useful pair of case studies in what actually breaks when you push auth and AI into production rather than a demo.

The problem: one identity system, three runtimes

The first project, Insighta Labs, is a queryable profile-intelligence API. By the time it grew past basic CRUD, the backend needed GitHub OAuth with PKCE, JWT access and refresh tokens with rotation, role-based access control separating admin from analyst, rate limiting, API versioning, and three first-class clients that all had to authenticate. The API issued tokens and set cookies. A React web portal authenticated with HTTP-only cookies plus CSRF protection. A CLI used PKCE with a local callback and Bearer tokens. Every /api/* route demanded an X-API-Version: 1 header and a valid session.

The token lifetimes were deliberately brutal: access tokens expired in three minutes, refresh tokens in five, with rotation on every refresh. That is far too aggressive for a production system serving real users, but as a forcing function during development it is excellent. Short TTLs surface refresh bugs immediately instead of letting them hide behind a token that stays valid all day. If your rotation logic is broken, you find out within minutes of logging in, not when a customer's session mysteriously dies a week later.

The underlying constraint is a classic distributed-identity problem. Reviewers and real users had to prove who they were without sharing a single login mechanism. Browsers must never see raw tokens in JavaScript, which is the entire argument for HTTP-only cookies. A CLI cannot follow cookie redirects the way a single-page app does, so it needs the PKCE authorization-code flow with a loopback callback. And automated graders needed a deterministic test path that did not depend on a live round trip to GitHub's OAuth servers. One design, three execution environments, and a hard requirement that none of them be the "works on my machine" special case.

The solution: explicit paths, not a generic handler

The instinct to write one clever login handler that branches on flags is the wrong one here. The cleaner approach splits auth into separate, documented flows that happen to share a controller.

The web flow starts at GET /auth/github, where the server stores the PKCE verifier and redirects to GitHub. The callback at GET /auth/github/callback exchanges the code, sets HTTP-only cookies, and redirects to a configured success URL. A separate GET /auth/csrf-token endpoint hands the portal a double-submit token, which it then attaches as X-CSRF-Token on any unsafe method.

The CLI flow is structurally different. insighta login begins PKCE with a code_challenge and spins up a local http://127.0.0.1:<port>/callback listener. A POST /auth/github/token endpoint completes the exchange and returns JSON tokens rather than setting cookies, and the CLI persists those credentials at ~/.insighta/credentials.json, refreshing before expiry. The grader path adds an optional stub exchange for a test code, so automated checks could obtain real JWTs without ever touching GitHub.

Middleware ordering carried real weight. Rate limits sat on /auth, JWT validation on /api, and CSRF enforcement applied only when a request actually used cookies. Bearer-only CLI calls skip CSRF entirely, because there is no cookie to forge against. Getting this order wrong produces failures that look like logic bugs but are really pipeline-ordering bugs, which are much harder to reason about after the fact.

The trade-offs, written down as failure modes

The most valuable artifact from this work was not the code but the catalog of things that broke, because nearly every one was configuration rather than logic.

The OAuth callback URL was registered against the portal hostname instead of the API. After a successful GitHub login users hit a blank error page. The fix is a single canonical callback on the API; the portal is only ever the post-login redirect target, never the OAuth callback itself. Obvious in hindsight, genuinely painful at one in the morning.

CORS combined with credentials produced silent failures. When you set credentials: true, the browser requires an exact Access-Control-Allow-Origin match. A missing preview-deployment URL, or a stray trailing slash, blocks the preflight before any application code runs, and the UI shows nothing useful. The resolution was a comma-separated allowlist with no trailing slashes and careful origin echoing on auth routes.

CSRF bit the cookie sessions exactly as designed: reads worked, but a DELETE returned 403 until the portal started fetching and sending the CSRF token. The CLI, being Bearer-only, was correctly unaffected. And on the hosting platform, rate limits initially throttled everyone as one client because the proxy's IP was what the limiter saw. Setting a trusted-proxy-hops value let Express read the real client IP from X-Forwarded-For.

The through-line: auth is a product surface, not an implementation detail. If three clients cannot log in reliably, the API does not ship regardless of how clean the controllers look. Writing the failure modes into the README means the next engineer, or the next grader, does not rediscover them at the same hour of the morning.

The second problem: an LLM that ignores your schema

The team task, a marketing microservice for a product called SEIL inside the larger Flowbrand codebase, swapped one hard problem for another. Here the dependency that refused to follow the spec was not a configuration file but a language model.

The service is a NestJS API that accepts business documents, pitch decks, one-pagers, notes, and returns a structured marketing funnel. The flow is a pipeline: register or log in for a JWT, POST a PDF or DOCX under five mebibytes to an upload endpoint that extracts text and returns an upload ID, poll a progress endpoint until the extraction is ready, then call a generate endpoint where Claude produces four stages: awareness, engagement, conversion, and retention. The result persists in PostgreSQL.

MongoDB Atlas image

The stack pairs NestJS 11 and TypeORM with pdf-parse and mammoth for text extraction, Anthropic's Claude for generation, and Swagger for the contract. The architectural decision that paid off most was separating upload from generation. Text extraction runs in the upload handler, and generation refuses any upload that is not yet marked ready. That split keeps slow model calls out of multipart file handling and gives the frontend a clean poll endpoint to drive a progress UI, instead of holding a single request open across both a file parse and an LLM round trip.

The solution: put a bouncer in front of the model

The central insight is that a system prompt is not a contract. Telling Claude to "return raw JSON only" works most of the time, which is precisely the problem, because the failures are intermittent and arrive in production. The model occasionally wrapped its output in markdown code fences, and JSON.parse choked on the backticks, surfacing as a 502 with an unhelpful parse error in the logs.

The fix was a small coercion layer: a function that strips code fences before parsing, followed by a validator that enforces exactly four non-empty strings and rejects anything else. Edge cases got unit tests. The model is allowed to fill four string slots and nothing more. The product defines the funnel structure; the model does not get to invent nested objects or essay-length markdown. Validation at the boundary, not hope in the prompt, is what makes an LLM-backed endpoint safe to expose to teammates.

The rest of the hardening followed the same philosophy of making failure legible. Calling generate before the upload finished returned an explicit 422 carrying the upload ID and status, so the frontend could branch on a known code rather than guess. A missing API key in staging produced a dedicated 503 with an AI_NOT_CONFIGURED code, telling reviewers instantly that the problem was environment, not logic. A global validation pipe with whitelisting rejected unknown fields early as clean 400s instead of letting malformed payloads turn into mysterious 500s deeper in the stack.

The trade-off the team chose to make honestly

One failure mode is worth singling out because the response was to scope down rather than paper over. Uploads worked until a redeploy, after which the upload ID still existed in the database but the file was gone. The hosting platform uses an ephemeral disk. The honest answer was to document that local filesystem storage is fine for a demo but that production needs object storage or a persistent volume, and to ship the MVP with that limitation stated plainly rather than pretend the filesystem was durable. Pretending a constraint does not exist is how you ship a system that fails the first time someone trusts it.

On a team, your API errors are user experience. Structured error codes like UPLOAD_NOT_READY and AI_NOT_CONFIGURED resolve in a glance what would otherwise become a thread of duplicate bug reports, most of which turn out to be a misconfigured base URL anyway. Owning a bounded slice end to end, one repository with its own docs, tests, deploy, and a live URL teammates can hit, is what lets parallel work actually converge instead of collapsing into a single repo nobody can integrate.

What connects the two

The solo task was an exercise in surviving your own complexity: making three authentication flows coexist without leaking tokens or special cases. The team task was the inverse, making complexity survivable for other people, frontend developers and reviewers and an AI model sitting in the middle of the pipeline, all consuming the same boring, predictable contract. Both reduce to the same systems discipline. The interesting failures cluster at the boundaries, in callback URLs and CORS preflights and the gap between what a model promises and what it returns, and the engineering that holds up is the engineering that writes those boundaries down and enforces them with code rather than good intentions.

#OAuth #API Design #LLMs #Authentication #error handling