A cloud‑init based script injects a short‑lived SSH host key, uses it to securely fetch long‑term host keys, and then discards the temporary key, thereby protecting the initial SSH handshake from MITM attacks without relying on provider‑specific features.
Thesis
When you spin up a fresh virtual machine, the very first SSH connection is traditionally protected by trust‑on‑first‑use (TOFU): you type yes to accept an unknown host key. That single interaction is a perfect window for a man‑in‑the‑middle (MITM) attacker who can reroute traffic or serve a forged host key. The script presented by Joachim Schipper replaces TOFU with a deterministic, provider‑agnostic handshake that guarantees the first connection is authenticated, even on clouds that expose no native solution.
Key Arguments
1. Temporary host key via cloud‑init
- Cloud‑init is universally supported on modern VPS and cloud images. By embedding a temporary private host key in the user‑data, the VM boots with a known key that only the provisioning script possesses.
- The script stores this key in a transient directory, never writing it to
~/.ssh/known_hosts. Consequently, the temporary key cannot be inadvertently reused after the hand‑off.
2. Secure hand‑off to the long‑term key
- Once the VM is reachable, the script logs in using the temporary key, generates a fresh long‑term host key pair (
/etc/ssh/ssh_host_*), and copies the public part back to the administrator’s workstation. - OpenSSH’s built‑in key‑rotation mechanism writes the received public key into
~/.ssh/known_hostsin the exact format the client expects, respecting options such asHashKnownHosts. This eliminates the risk of a compromised VM injecting malicious entries.
3. Ephemeral nature makes leaked cloud‑init data harmless
- Many providers allow the user‑data to be read from the metadata service (
http://169.254.169.254/...). If an attacker later obtains that payload, the temporary private key is already destroyed, so the leaked data yields no useful credential. - By contrast, a naïve approach that injects the long‑term private host key via cloud‑init leaves that key permanently exposed to any process that can reach the metadata endpoint, opening the door to SSRF attacks or insider compromise.
4. Threat model coverage
| Threat actor | Network control | Access to VM/metadata | Outcome when script is used |
|---|---|---|---|
| MITM only | ✓ | ✗ | Cannot impersonate the host because the temporary key is unknown and the long‑term key is never transmitted in clear. |
| Compromised admin workstation (no VM connection) | ✓ | ✗ | No long‑term private host key ever resides on the workstation, so the attacker gains nothing. |
| Compromised VM or provider | ✓ | ✓ | The attacker may read the long‑term private host key after it is generated, but by then the initial connection has already been authenticated securely; further attacks would require additional footholds. |
5. Minimal dependencies and portability
- The only prerequisite is a cloud‑init capable image; no provider‑specific APIs, no external key‑distribution services, and no reliance on proprietary console access.
- The script is hardened with defensive checks (e.g., ensuring the temporary key file is removed on exit, verifying that the retrieved public key matches the generated private key, and aborting on any unexpected state).
Implications
- Operational security improves dramatically for teams that spin up transient test environments, CI runners, or short‑lived bastion hosts. The first‑login step no longer requires a manual “yes” that could be spoofed.
- Compliance regimes that forbid TOFU (e.g., certain PCI‑DSS interpretations) can now meet the requirement without purchasing a managed SSH‑key service from the cloud provider.
- Cost efficiency: because the method works on any cloud, organizations can avoid premium offerings that bundle host‑key provisioning, thereby reducing per‑instance overhead.
Counter‑Perspectives
- Complexity vs. simplicity – Critics may argue that adding a custom script introduces more moving parts than the traditional TOFU flow. However, the script is deliberately small (≈ 70 lines) and self‑contained; its logic can be audited and version‑controlled alongside infrastructure‑as‑code repositories.
- Residual risk after hand‑off – Once the long‑term host key resides on the VM, a determined attacker who gains full VM control can still read it. The technique does not protect against post‑boot compromise; it merely secures the initial trust establishment.
- Dependency on cloud‑init correctness – If a provider’s cloud‑init implementation is buggy or stripped down, the temporary key injection could fail. In such cases, the script falls back to aborting the deployment, which is preferable to silently proceeding with an insecure connection.
Practical Steps to Adopt the Technique
- Create a cloud‑init user‑data file that embeds a PEM‑encoded temporary private key (generated on the provisioning workstation) and the script itself.
- Launch the VM with that user‑data via the provider’s API, CLI, or Terraform.
- Run the script locally (or as part of your provisioning pipeline) to:
- SSH into the VM using the temporary key.
- Trigger key generation on the VM (
ssh-keygen -A). - Pull the newly created public host key back to the workstation.
- Append the public key to
~/.ssh/known_hostsautomatically.
- Verify that the temporary key file has been removed from the VM (
/tmp/ssh_temp_*). - Proceed with normal operations using the long‑term host key; future connections will be validated against the entry added in step 3.
Conclusion
By leveraging a disposable SSH host key injected through cloud‑init, the script transforms the vulnerable TOFU moment into a deterministic, verifiable handshake that works across any VPS or cloud provider. The approach thwarts network‑level MITM attacks, renders leaked cloud‑init data inert, and does so without persisting private key material on the administrator’s workstation. While it does not replace comprehensive hardening of the VM itself, it fills a critical gap in the bootstrapping phase of cloud infrastructure, offering a pragmatic, provider‑agnostic safeguard for modern DevOps pipelines.
Comments
Please log in or register to join the discussion