Axe:ploitAxe:ploit
Cloud Control Plane Lateral Movement: How a Stolen Pod Token Becomes Full Account Compromise

Cloud Control Plane Lateral Movement: How a Stolen Pod Token Becomes Full Account Compromise

Jason

Jason

@Jason

The Capital One breach of 2019 , 100 million customer records exposed , is often cited as an SSRF story, and the initial vector was indeed server-side request forgery against the EC2 metadata service. But the more important part of the story, the part that determined the breach's scope, was what happened after the SSRF: the attacker used the harvested temporary credentials to enumerate and access S3 buckets across Capital One's AWS environment. The SSRF was the initial access. The IAM permissions attached to the compromised workload , which allowed listing and reading S3 buckets far beyond what the workload functionally needed , determined the blast radius.

This distinction matters because it reveals where the real leverage is in cloud security. Preventing every possible initial access vector is a losing proposition; the attack surface is too broad, the software too complex, the dependencies too numerous. What determines whether an initial compromise becomes a catastrophic breach is what the attacker can reach after the initial foothold , and in cloud environments, "what they can reach" is primarily determined by the IAM permissions and trust relationships that exist in the control plane.

The Cloud Security Architecture Gap

Traditional security architectures are network-centric. The mental model is a castle with walls: external networks are untrusted, internal networks are trusted (or semi-trusted), and security controls focus on the boundaries between zones. Firewalls, network segmentation, VPNs, and NACLs are the primary enforcement mechanisms.

Cloud environments operate on a different axis. The control plane , the set of APIs that manage cloud resources (IAM, EC2, S3, KMS, CloudTrail, etc.) , is accessible from any workload that has valid credentials, regardless of network position. A compromised pod in a Kubernetes cluster can call the AWS STS API to assume a role in a different AWS account, even if the pod has no network path to any resource in that account. The network is not the trust boundary. The IAM policy is.

This means that network segmentation, while valuable for containing lateral movement at the workload level (preventing a compromised web server from directly accessing a database), does not contain lateral movement at the control plane level. An attacker who harvests IAM credentials from a compromised workload can enumerate resources, modify policies, and escalate privileges through the control plane APIs , all of which are reachable over HTTPS to public AWS/GCP/Azure endpoints.

flowchart TD subgraph WorkloadLayer ["Workload Layer (Network Segmented)"] CompPod["Compromised Workload"] end subgraph ControlPlane ["Control Plane (API-Accessible Everywhere)"] STS["AWS STS / AssumeRole"] IAM["IAM Policy Enumeration"] S3["S3 / Storage Enumeration"] KMS["KMS / Key Management"] CT["CloudTrail / Logging Config"] end subgraph Impact ["Impact"] DataAccess["Cross-Account Data Access"] Persistence["Persistent Backdoor Roles"] Evasion["Logging Suppression"] end CompPod -->|"IMDS credential harvest\n(169.254.169.254)"| STS STS -->|"Enumerate attached\npolicies and roles"| IAM IAM -->|"Discover overly\npermissive policies"| S3 IAM -->|"Find role chaining\nopportunities"| STS STS -->|"Assume cross-account\nroles"| DataAccess IAM -->|"Create new IAM user\nor access key"| Persistence IAM -->|"Modify CloudTrail\nconfiguration"| CT CT --> Evasion S3 --> DataAccess

The Credential Harvesting Phase

The first step in cloud lateral movement is almost always credential harvesting, and the most common source is the instance metadata service (IMDS). Every EC2 instance, GCP VM, and Azure VM has a metadata endpoint (169.254.169.254 for AWS and GCP, 169.254.169.254 or the IMDS endpoint for Azure) that provides temporary credentials for the IAM role attached to the instance. Any process running on the instance can query this endpoint. Any SSRF vulnerability that can reach localhost can query it. Any container with host networking can query it.

AWS introduced IMDSv2 in 2019, which requires a PUT request with a TTL header to obtain a session token before accessing metadata. This mitigates SSRF-based credential theft because most SSRF vulnerabilities use GET requests and cannot set custom headers. But IMDSv2 is not enabled by default on existing instances , it must be explicitly required, and enforcing it across an organization means auditing every instance, every auto-scaling group, and every launch template. Many organizations have enabled IMDSv2 on new instances while leaving existing instances on IMDSv1, creating a mixed environment where some workloads are protected and others are not.

Beyond IMDS, credentials can be harvested from environment variables (a common pattern in containerized workloads that receive credentials via injected secrets), from application configuration files, from the ~/.aws/credentials file on developer machines that are part of the workload environment, and from the Kubernetes service account token that is automatically mounted into every pod. Each of these is a credential source that, if the workload is compromised, provides the attacker with cloud API access.

The Enumeration and Escalation Phase

Once the attacker has credentials, the next step is understanding what those credentials can do. This is the IAM enumeration phase, and it is where the architectural debt of overly-permissive roles becomes exploitable.

The basic enumeration is straightforward: aws sts get-caller-identity identifies who the credentials belong to. aws iam list-attached-role-policies reveals what policies are attached. aws iam get-policy-version retrieves the actual policy document. Within minutes, the attacker has a complete picture of the compromised identity's permissions.

The privilege escalation paths that follow are well-documented. Rhino Security Labs' research on AWS privilege escalation identified over 20 distinct escalation techniques, including:

  • Creating a new IAM user or access key pair if the compromised role has iam:CreateUser or iam:CreateAccessKey
  • Attaching a more permissive policy if the role has iam:AttachRolePolicy
  • Assuming another role if the trust policy of that role permits the compromised role as a principal
  • Passing a role to a new Lambda function or EC2 instance and executing code under that role's permissions
  • Modifying a Lambda function's code to execute arbitrary commands under the function's execution role

The attacker does not need all of these. They need one. And the probability that at least one escalation path exists in an environment that has grown organically over several years, with multiple teams creating roles for different purposes, with legacy roles that were never cleaned up, with "temporary" broad permissions that became permanent , that probability is high.

Role Chaining and Cross-Account Movement

The most damaging lateral movement pattern in cloud environments is role chaining: using one assumed role to assume another, potentially across AWS account boundaries. This is possible because IAM role trust policies can specify other roles as trusted principals, creating a graph of assumable identities.

In organizations that use multiple AWS accounts (which is the recommended architecture), cross-account roles are necessary for legitimate operational purposes , a CI/CD account needs to deploy to production accounts, a logging account needs to read CloudTrail from all accounts, a security account needs to assess resources across the organization. Each of these cross-account trust relationships is an edge in a graph that an attacker can traverse.

The problem is that this graph is rarely visible to the security team as a whole. Each trust relationship was created by a specific team for a specific purpose, and the aggregate graph , "from which roles can an attacker reach which other roles, transitively?" , is not something anyone has mapped. Tools like Cartography, PMapper, and CloudMapper attempt to map these graphs, but they need to be run continuously to stay current, and interpreting the output requires understanding which paths are operationally necessary and which are vestiges of past configurations.

Why Detection Is Hard

Control plane lateral movement is particularly difficult to detect because the attacker's actions closely resemble legitimate operations. An AssumeRole call is a normal operational event , it happens thousands of times per day in any reasonably-sized AWS environment. A ListBuckets call is how applications discover their storage. An AttachRolePolicy call is how administrators manage permissions.

The signals that distinguish adversarial control plane activity from legitimate operations are contextual, not technical:

  • A role that has never called AssumeRole before suddenly assumes five different cross-account roles in rapid succession
  • iam:CreateUser is called from an EC2 instance role that has no documented need for IAM management
  • CloudTrail configuration is modified by a principal that is not the organization's logging administrator
  • s3:ListBuckets is called from a role whose associated workload has no reason to enumerate storage

These detection rules require behavioral baselines , understanding what each role normally does, so that deviations are identifiable. Building and maintaining these baselines is a significant operational investment, and few organizations have made it. The result is that CloudTrail logs contain the evidence of the lateral movement, but nobody looks at it with sufficient context to identify the anomaly until the breach is already complete.

Architectural Controls That Actually Limit Blast Radius

The leverage in cloud lateral movement defense is not in detecting and stopping each step. It is in designing the IAM architecture so that the steps are not possible , so that a compromised workload's credentials cannot reach beyond a tightly bounded scope.

Least privilege, actually enforced. This is universally recommended and almost universally unachieved. The gap is not conceptual but operational: determining the minimum set of permissions a workload needs requires understanding what the workload does, which changes over time, which varies between environments, and which is often only discoverable by running with broad permissions and observing what is actually used. AWS IAM Access Analyzer and similar tools can generate least-privilege policies from CloudTrail access logs, but they require a representative observation period and produce policies that may break workloads if the observation period missed an infrequent operation. The pragmatic approach is iterative: start with a broad policy, log access, periodically tighten based on observed usage, and accept that the process never fully converges.

Permission boundaries. IAM permission boundaries cap the maximum permissions a role can obtain, even if more permissive policies are attached. Setting a permission boundary on every role in an account that prevents iam:*, organizations:*, and sts:AssumeRole (except to specifically enumerated roles) limits the escalation surface substantially. It does not prevent all lateral movement, but it removes the most catastrophic escalation paths.

Workload identity narrowing. Kubernetes workloads should use IAM Roles for Service Accounts (IRSA on EKS, Workload Identity on GKE) rather than node-level instance roles. This scopes the IAM identity to the specific pod rather than to the entire node, reducing the blast radius from "every pod on this node" to "this specific workload." Combined with pod security policies that restrict access to the IMDS endpoint, this significantly limits credential harvesting opportunities.

Cross-account trust minimization. Every cross-account AssumeRole trust relationship should have a documented justification, a defined review period, and conditions (such as ExternalId, source IP restrictions, or MFA requirements) that limit when the trust can be exercised. The goal is not to eliminate cross-account access , it is operationally necessary , but to ensure that each trust edge is narrow, conditional, and monitored.

The common thread is that cloud lateral movement is an identity problem, not a network problem. The attackers move through IAM permissions, not through network connections. The defenses must therefore be identity-centric: narrow the permissions, bound the escalation paths, monitor the identity-level telemetry, and assume that any credential might be compromised.

Integrate Axe:ploit into your workflow today!