Infrastructure as Code is evolving from manual scripting to AI-driven intent-based provisioning, where agents encode standards, reduce drift, and integrate seamlessly into existing workflows—transforming how teams build and maintain cloud environments.
A few years ago, I wrote a blog post on whether natural language Infrastructure as Code (IaC) was the future of cloud automation. At the time, it felt a little speculative. Interesting, promising – but not quite real yet. Fast forward to now, and that future feels a lot closer.
This post is a follow-on. Not a prediction, but a reflection on where we are today – and where I feel IaC is clearly heading when you design it with AI in mind.
IaC worked… until it didn’t (quite)
Infrastructure as Code was supposed to bring consistency and confidence. And for a while, it absolutely did. But as teams grew, platforms scaled, and everyone copied that one module “that worked last time”, things got messy. Not broken. Just… messy.
- Five ways to create an Azure VNET
- Three tagging strategies
- Security rules that exist in docs but not in code
Now AI has entered the chat. And honestly? This might be the biggest shift IaC has seen since we stopped clicking around in cloud portals.
Standardisation without the arguments
Let’s start with the big one: standardisation across teams. Traditionally, this meant documentation. Wiki pages. Architecture diagrams nobody updated. And that one person who just knows how things should be done.
AI changes this. Instead of telling teams how to write IaC, we can encode standards directly into agent skills. Think of agent skills as reusable capabilities that understand your rules, patterns, and guardrails.
So instead of saying: “Please follow naming convention A, tagging standard B, and security rule C…”
You get: “Generate infrastructure that already follows A, B, and C.”
No debate. No drift. No “but my service is special”.
That’s standardisation without the friction.
From “writing code” to “describing intent”
Most teams still work the same way today: open an editor, write Terraform (or Bicep), push a PR, hope you didn’t miss anything.
The future looks quieter than that. You start by describing what you want – in plain language – and an agent that understands your stack, policies, and quirks proposes the change.
Something like: “Spin up a production-ready environment that matches payments-prod, but with feature flags enabled and tighter cost alerts.”
Instead of hunting through docs or Slack threads, the agent already knows your naming conventions, tagging strategy, and security baselines. They’re encoded as skills and policies, not tribal knowledge.
Have you ever wished you could just explain a change the way you would to a teammate and skip the boilerplate? That’s the direction this is heading.
Standardisation without killing creativity
Standardisation used to mean someone sharing a 40-page “platform guidelines” PDF and quietly hoping people read it.
With AI-native IaC, standards move from static documents into living, enforceable skills and checklists that sit directly in the path of every change.
Instead of: “Please remember to tag your resources correctly.”
You get: “This plan will be rejected because three resources are missing mandatory tags – here’s the patch to fix them.”
Your golden patterns – networking, AKS clusters, API gateways – become composable blueprints that agents use by default.
Less drift. Fewer reviews. Fewer meetings that exist purely to say “this doesn’t match the platform”.
It’s still your architecture. Just consistently applied, even when things scale.
The trade-offs, context and the occasional “wait, that’s not how prod works at 2 a.m.”
Let AI do the heavy lifting
Here’s the uncomfortable bit: Most IaC work is repetitive. VNETs. Subnets. Identity. Logging. Monitoring. Again. And again.
This is where AI is genuinely useful. It doesn’t get bored. It doesn’t forget diagnostics. And it definitely doesn’t skip tagging because it’s Friday at 4:59pm.
By combining natural language prompts with predefined agent skills, we move from writing IaC to reviewing it.
Instead of: Write 300 lines of Terraform
It becomes: “I need a production-ready AKS cluster, aligned with our platform standards.”
The result is already compliant. Already secure. Already a bit boring – which, honestly, is exactly what you want from infrastructure.
MCP servers: the quiet enabler
For any of this to work properly, AI needs context. Real context.
That’s where MCP (Model Context Protocol) servers come in. Think of them as the bridge between AI and your world:
- Your repos
- Your modules
- Your standards
- Your documentation
Without that context, AI guesses. With it, AI actually knows.
Under the hood, MCP servers act as a safe adapter between agents and your cloud, CI/CD, and observability tools – without handing over full access.
Vendors are already shipping IaC-focused MCP servers. Platform tools are baking them in so you can ask things like:
- “Why did this deployment fail?”
- “Show me all non-compliant stacks.”
And get real answers, using real infrastructure data, without jumping between portal, CLI, and editor.
Aligning with how you work (out of the box)
One thing I really like about where this is heading is that it doesn’t force a single way of working.
- Some teams love GitHub Actions
- Some prefer Azure DevOps
- Some live entirely in pull requests
Good AI adapts to you.
Out of the box, you get sensible defaults – baseline security checks, tagging models, cost guardrails. But the real value comes when you align AI with your own standards and quirks.
- Naming rules? Encoded
- Security baselines? Encoded
- Approved module versions? Encoded
The agent doesn’t just generate code. It generates your code.
Spend time on skills, not syntax
This needs a mindset shift. Instead of asking: “How do I write this Terraform?”
We should probably be asking: “What skills does my agent actually need?”
- Networking? Security review? Cost optimisation? Well-Architected checks?
Once you think this way, mandatory checklists appear naturally. Before IaC is merged, the agent confirms:
- Security checks done
- Observability enabled
- Cost controls applied
No more relying on memory. Or hope.
Agent skills are your new platform API
If MCP servers are the wiring, agent skills are the API. A skill is just “a thing an agent can do with guardrails”:
- Generate Terraform
- Run terraform plan
- Validate security posture
- Query cost anomalies
You might define skills like:
- “Provision network according to Platform-Network-2026”
- “Validate this plan against our security baseline”
GitHub’s Agentic Workflows already show how these behaviours can be described in markdown and executed safely inside CI.
So the real question for platform teams becomes: What skills do we actually want our agents to have?
Under one roof: AI in your existing workflows
The important part: this isn’t a brand-new ecosystem. It’s being wired directly into tools you already use.
AI-powered Terraform reviewers comment on pull requests like any other reviewer – just with perfect consistency and far more patience.
Agentic workflows compile down to normal pipelines, with the same logs, approvals, and permissions you already trust.
There’s no separate “AI platform” to roll out. It quietly shows up in your repos, pipelines, and IDEs and starts helping.
Tackling tech debt with agents
Now for the elephant in the room: tech debt in IaC. Old modules. Deprecated resources. Patterns that made sense in 2019 but probably shouldn’t still exist.
This is where agents really start to earn their keep. An IaC-aware agent can scan repos, flag outdated patterns, and group them into sensible refactor tasks.
You can say: “Upgrade our networking modules to v3, keep behaviour identical, and open PRs per service.”
Is it perfect on the first pass? No. But it turns multi-month clean-ups into guided reviews instead of archaeological digs.
Over time, you can even treat IaC tech debt as an explicit backlog for agents – nightly workflows that find issues, open PRs, and gently nudge things back towards your desired architecture.
Humans still make the calls. Agents do the legwork.
So… where does this leave you?
IaC isn’t going away. But how we create it is changing quickly.
AI won’t replace platform engineers. But it will change where time gets spent.
- Less typing. Less policing. More reviewing. More designing. More thinking.
And honestly? That feels like progress.
If you’re wondering where to start:
- Capture your standards in code and policy
- Try one or two focused AI-assisted use cases
- Start asking “what skills do our agents need?” alongside normal platform planning
And maybe most importantly: treat this as collaboration, not replacement.
The sweet spot is AI doing the heavy lifting while humans handle judgement, context – and the occasional “wait… that’s not how prod actually works at 2am.”
Comments
Please log in or register to join the discussion