Fifteen months after the first reasoning models, staff engineers now rely on fast, self‑correcting agents for most code changes, bug hunting, testing, and local setup, while still keeping human judgement for communication and UI work.
From Occasional Helper to Daily Partner
Back in early 2025 the best model for reasoning was OpenAI’s o1. Agents could suggest a line or two, but they often got stuck, required constant supervision, and produced noisy explanations. Fast‑forward to mid‑2026 and the picture is very different: the same engineer now starts every change by asking an LLM agent to solve the problem, reviews the result in a few seconds, and pushes the PR after a single editing pass.

What the Engineer Does With AI Today
| Category | How it’s used | Why it matters |
|---|---|---|
| Code changes | The Copilot CLI (or the GitHub Copilot desktop app) is invoked from a terminal tab. The agent writes the full pull request across one or many repos, then the engineer does a quick 30‑second sanity check before a deeper review if needed. | Turns what used to be a half‑day of manual edits into a matter of minutes, freeing time for design work. |
| Bug investigation | Every bug report is pasted into a fresh agent session. The model can cross‑reference logs, stack traces, and even multiple repositories, delivering a diagnosis about 80 % of the time. | Reduces the number of dead‑end investigations; the engineer still narrows the search space and validates the final answer. |
| Testing & setup | Agents generate unit tests automatically, run integration tests on a temporary dev server, and even execute bash commands to fix local config issues (e.g., a broken nvm switch). |
Replaces a lot of Googling and manual script writing, especially for repetitive setup tasks. |
| Learning & research | Quick follow‑up questions about unfamiliar tech (e.g., Unity, Kafka) are asked in a chat window. The model provides concise explanations and points to relevant docs. | Keeps the learning curve shallow without leaving the terminal. |
| Proofreading | Draft blog posts or long‑form documentation are run through GPT‑5.5 for grammar and style tweaks. | Improves readability while preserving the author’s voice. |
What Still Stays Human‑Only
- Public communication – PR descriptions, ADRs, issue bodies, and Slack messages are written by the engineer. Agents tend to over‑explain and miss the nuanced “core idea” that reviewers expect.
- UI testing – Visual regressions and subtle interaction bugs are still inspected manually; agents aren’t trusted with look‑and‑feel judgments.
- Critical production code – Even when an agent produces a perfect diff, the engineer performs a thorough review before merging.
The New Workflow in Practice
- Open a terminal tab and start a Copilot CLI session.
- Paste the problem statement (bug report, feature request, or refactor description).
- Let the agent generate a PR – the diff may span several files and repositories.
- Skim the diff (≈30 s). If it looks reasonable, move to step 5; otherwise reject and ask the agent to retry with clarified constraints.
- Deep review – run the tests the agent added, verify edge cases, and ensure the change aligns with the design intent.
- Merge – push the PR, add a hand‑written description, and notify reviewers.
For tough bugs the engineer may go through five or six agent attempts before accepting one. Each iteration is guided by additional context: logs, Slack threads, or a local reproduction of the issue. The final successful session often feels like a partnership – the model solves the narrowed‑down problem that the engineer set up.
Testing Becomes Cheap
Because agents now write comprehensive unit tests by default, the engineer treats test creation as a low‑cost experiment. If a test looks useful and isn’t flaky, it is added without a second thought. The engineer still reads the generated tests for obvious mistakes, but the tolerance for minor quirks is higher than for production code.
Local Setup as a Search‑Replace Task
When a developer’s environment misbehaves (e.g., nvm not switching Node versions), the engineer opens a Copilot CLI session and asks the agent to diagnose and fix it. The model runs the necessary bash commands, edits config files, and confirms the fix – essentially a faster, more reliable alternative to a Google search.
Balancing Act: When to Trust the Agent
The engineer warns against two extremes:
- Under‑utilisation – refusing to let agents handle bug triage or test generation leaves a lot of low‑risk work on the human backlog.
- Over‑utilisation – delegating public communication or large UI rewrites to an LLM can erode trust with teammates and introduce subtle bugs.
Finding the sweet spot means constantly asking, “Is this a low‑risk, repeatable task that the agent can finish without my deep involvement?” If the answer is yes, the agent gets the job; if not, the engineer steps in.
A Real‑World Example
An internal request needed the actions/ai-inference GitHub Action to run with Copilot‑backed inference. Previously the engineer would have marked it as “low priority” and left it for weeks. This time, a single agent session produced a working implementation, the engineer added a brief description, and the change shipped within a day. The output wasn’t perfect, but it was good enough to move the project forward quickly.
Looking Ahead
The core responsibilities of a staff engineer—shipping projects, exercising judgement, influencing technical direction—haven’t changed. What has changed is the amount of low‑risk, repetitive work that can be offloaded to an LLM. By embracing agents for code generation, bug hunting, testing, and local setup, engineers can say “yes” to more small‑scale improvements without sacrificing quality.
If you found this perspective useful, consider sharing it on Hacker News or subscribing for future updates.

Comments
Please log in or register to join the discussion