Why Context Engineering Is Reshaping Agentic Software Architecture

Baruch Sadogursky explains how disciplined context artifacts turn LLMs into reliable coding partners, why micro‑services become the default unit of work, and what this means for architects, pricing, and migration strategies.

What changed – the rise of context‑driven AI agents

In the May 18 2026 InfoQ podcast, Baruch Sadogursky (Tessl AI) argued that the real breakthrough in AI‑assisted development is not a bigger model but a systematic way of feeding the model with context artifacts – skills, rules, scripts, feedback loops and evaluation data. He contrasts this context engineering with the ad‑hoc “prompt hacking” that many teams still use. The key points are:

LLMs are powerful reasoning machines, but their stochastic nature makes them prone to hallucinations unless the intent is expressed unambiguously.
A specification becomes the single source of truth; generated code is treated as a disposable intermediate language that can be regenerated when the spec changes.
Agents must ask clarifying questions – a built‑in “question loop” – to achieve the same rigor architects have always applied manually.
Because current LLMs have limited context windows, the practical unit of generation is a micro‑service or small module; the architect orchestrates these pieces.

Provider comparison – how the major cloud AI platforms support (or hinder) context engineering

Feature	AWS Bedrock / Claude	Google Vertex AI / Gemini	Microsoft Azure OpenAI / GPT‑4o	Open‑source (e.g., Llama 3, Mistral)
Native support for external context	Retrieval‑augmented generation (RAG) via Amazon Kendra; requires custom code to inject skills/rules.	Retrieval‑augmented generation with Vertex Search and function calling; easier to bind scripts.	Chat‑completion with tool calling; Azure OpenAI now offers Prompt‑flow for artifact versioning.	No managed service; you must host a vector store and implement your own harness.
Artifact versioning & distribution	No built‑in package manager; teams use CodeArtifact or S3.	Vertex AI Model Registry can store artifacts, but not first‑class for skills.	Azure Machine Learning Model Registry supports versioned pipelines, but not a dedicated “skill store”.	Community projects like LangChain or LLM‑Ops provide registries.
Pricing model	Pay‑per‑token; extra cost for Kendra queries.	Pay‑per‑token + search‑index charges; function calls are free.	Pay‑per‑token; Azure OpenAI Studio adds a modest fee for tool‑use logging.	Free (compute cost only).
Migration considerations	Existing Java/Scala workloads on AWS Lambda can be wrapped with a Tessl‑compatible harness; need to rewrite context artifacts as Lambda layers.	Vertex AI pipelines can import existing Docker containers; moving specs requires re‑authoring in YAML for Cloud Build.	Azure Functions + Azure API Management can host micro‑services; you must map Tessl’s Intent Integrity Kit to Azure Logic Apps.	Open‑source stacks need self‑hosted vector DB (e.g., Qdrant) and CI pipelines for artifact publishing.
Strengths for agentic architecture	Strong enterprise security, IAM integration – good for regulated industries.	Tight integration with BigQuery and Dataflow – useful when specs contain data‑centric rules.	Excellent developer tooling (VS Code extensions, Azure DevOps); good for teams already on Microsoft stack.	Full control over model size and context window – ideal for experimental micro‑service granularity.
Weaknesses	Context window limited to ~8 k tokens; extra latency when chaining multiple services.	Gemini’s context window still ~16 k tokens; pricing can become opaque with heavy RAG usage.	GPT‑4o’s token limits (~128 k) are generous, but cost escalates quickly for large spec‑driven pipelines.	Requires significant ops effort; no SLA for model updates.

Bottom line: If you already run workloads on a single cloud, use that provider’s RAG + tool‑calling features and add a lightweight harness (e.g., Tessl’s Intent Integrity Kit) to manage skills, rules and scripts. For multi‑cloud or highly regulated environments, a hybrid approach – host the LLM on a managed service but keep the context artifacts in a private artifact registry (e.g., Azure Artifacts or AWS CodeArtifact) – gives the best control.

Business impact – what this means for architects, budgets and migration plans

Shift‑left quality assurance – By evaluating the spec before any code is emitted, teams can catch design flaws early, reducing costly re‑work. In pilot projects reported by Tessl, defect density dropped from 1.2 defects/KLOC to 0.3 defects/KLOC after introducing context‑driven prompts.
Cost predictability – Since code is regenerated on demand, you no longer pay for long‑running build pipelines. Token‑based pricing becomes the primary cost driver; budgeting around spec size (average 2 k tokens per micro‑service) is far more stable than traditional CI compute.
Migration path – Existing monoliths can be decomposed incrementally:
- Extract a bounded context into a spec file (Markdown or YAML).
- Create a corresponding skill set (e.g., “CRUD‑service”, “auth‑policy”).
- Use the chosen cloud’s RAG + tool‑calling to generate the micro‑service.
- Orchestrate with a lightweight service mesh (e.g., Istio or Linkerd) while the architect maintains the overall topology.
Governance and liability – The three‑layer model (model, harness, external context) clarifies responsibility. Errors traced to a missing rule belong to the context engineering team; model‑level hallucinations remain the provider’s liability. Companies can now embed audit logs (automatically captured by the harness) into their compliance pipelines.
Talent strategy – Architects become context curators rather than code writers. Upskilling pathways focus on:
- Defining context artifacts (skills, rules, scripts).
- Building evaluation suites for non‑deterministic outputs.
- Managing micro‑service orchestration platforms. This shift reduces reliance on senior developers for manual code reviews and opens career tracks for domain experts who excel at specification writing.

Getting started

Read the official Tessl documentation – it explains how to package skills and rules as versioned artifacts: https://tessl.io/docs.
Experiment with the Intent Integrity Kit – a minimal open‑source demo is on GitHub: https://github.com/tessl/intent‑integrity‑kit.
Pick a cloud provider and enable its RAG service; then wire a simple spec → skill → micro‑service pipeline using the provider’s SDKs.
Measure the spec‑to‑code quality with a tool like SonarQube; iterate on the context until the generated code meets your quality gate.

Outlook

Baruch predicts that within three years the orchestration layer itself will become an autonomous agent capable of composing micro‑services without human intervention. Until then, the architecture stack looks like a classic three‑tier model – specs (source of truth) → AI‑generated services → human‑orchestrated integration – but the economics and risk profile have fundamentally changed.