Multi-Account AWS with Terragrunt
As your organization grows, a single AWS account becomes a liability: blast radius is unlimited, IAM policies become unmanageable, and cost attribution is impossible. AWS Organizations with multiple accounts solves these problems, but managing Terraform across 10+ accounts introduces its own complexity. Terragrunt makes it manageable by keeping your configuration DRY and your state safely isolated.
Repository Layout
The key architectural decision is separating Terraform modules (reusable infrastructure definitions) from Terragrunt live configurations (environment-specific parameterization). This separation enables code reuse while keeping each environment's state independent.
infrastructure/
modules/ # Reusable Terraform modules
vpc/
eks-cluster/
rds-postgres/
s3-bucket/
iam-baseline/
live/ # Terragrunt live configurations
terragrunt.hcl # Root config (provider, backend defaults)
_envcommon/ # Shared per-component defaults
vpc.hcl
eks.hcl
rds.hcl
management/ # Management account (111111111111)
account.hcl
us-east-1/
organization/terragrunt.hcl
sso/terragrunt.hcl
security/ # Security account (222222222222)
account.hcl
eu-west-1/
guardduty/terragrunt.hcl
securityhub/terragrunt.hcl
production/ # Production account (333333333333)
account.hcl
eu-west-1/
vpc/terragrunt.hcl
eks/terragrunt.hcl
rds/terragrunt.hcl
staging/ # Staging account (444444444444)
account.hcl
eu-west-1/
vpc/terragrunt.hcl
eks/terragrunt.hcl
rds/terragrunt.hclRoot Terragrunt Configuration
The root terragrunt.hcl defines settings inherited by all child configurations: the remote state backend, the provider, and common variables.
# live/terragrunt.hcl
locals {
account_vars = read_terragrunt_config(find_in_parent_folders("account.hcl"))
region_vars = read_terragrunt_config(find_in_parent_folders("region.hcl", "empty.hcl"))
account_id = local.account_vars.locals.account_id
account_name = local.account_vars.locals.account_name
region = try(local.region_vars.locals.region, "eu-west-1")
}
# Automatically configure the S3 backend with per-account state isolation
remote_state {
backend = "s3"
generate = {
path = "backend.tf"
if_exists = "overwrite_terragrunt"
}
config = {
bucket = "terraform-state-${local.account_id}"
key = "${path_relative_to_include()}/terraform.tfstate"
region = "eu-west-1"
encrypt = true
dynamodb_table = "terraform-locks"
# State bucket lives in the management account
role_arn = "arn:aws:iam::111111111111:role/TerraformStateAccess"
}
}
# Generate the AWS provider with cross-account role assumption
generate "provider" {
path = "provider.tf"
if_exists = "overwrite_terragrunt"
contents = <<EOF
provider "aws" {
region = "${local.region}"
assume_role {
role_arn = "arn:aws:iam::${local.account_id}:role/TerraformExecutionRole"
}
default_tags {
tags = {
ManagedBy = "terraform"
Account = "${local.account_name}"
Repository = "infrastructure"
}
}
}
EOF
}DRY Environment Configuration
The _envcommon/ directory contains shared defaults for each infrastructure component. Individual environments override only what differs.
# live/_envcommon/vpc.hcl
locals {
base_source_url = "git::[email protected]:myorg/infrastructure-modules.git//vpc"
}
terraform {
source = "${local.base_source_url}?ref=v2.3.0"
}
inputs = {
enable_nat_gateway = true
single_nat_gateway = false
enable_dns_hostnames = true
enable_flow_logs = true
public_subnet_tags = {
"kubernetes.io/role/elb" = "1"
}
private_subnet_tags = {
"kubernetes.io/role/internal-elb" = "1"
}
}# live/production/eu-west-1/vpc/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
include "envcommon" {
path = "${dirname(find_in_parent_folders())}/_envcommon/vpc.hcl"
merge_strategy = "deep"
}
inputs = {
name = "production-vpc"
cidr_block = "10.1.0.0/16"
azs = ["eu-west-1a", "eu-west-1b", "eu-west-1c"]
private_subnets = ["10.1.1.0/24", "10.1.2.0/24", "10.1.3.0/24"]
public_subnets = ["10.1.101.0/24", "10.1.102.0/24", "10.1.103.0/24"]
}# live/staging/eu-west-1/vpc/terragrunt.hcl
include "root" {
path = find_in_parent_folders()
}
include "envcommon" {
path = "${dirname(find_in_parent_folders())}/_envcommon/vpc.hcl"
merge_strategy = "deep"
}
inputs = {
name = "staging-vpc"
cidr_block = "10.2.0.0/16"
azs = ["eu-west-1a", "eu-west-1b"]
private_subnets = ["10.2.1.0/24", "10.2.2.0/24"]
public_subnets = ["10.2.101.0/24", "10.2.102.0/24"]
# Cost optimization: single NAT gateway in staging
single_nat_gateway = true
}Cross-Account Dependencies
Terragrunt's dependency block lets you reference outputs from other Terragrunt configurations, even across accounts:
# live/production/eu-west-1/eks/terragrunt.hcl
dependency "vpc" {
config_path = "../vpc"
mock_outputs = {
vpc_id = "vpc-mock"
private_subnets = ["subnet-mock-1", "subnet-mock-2"]
}
}
inputs = {
cluster_name = "production"
cluster_version = "1.29"
vpc_id = dependency.vpc.outputs.vpc_id
subnet_ids = dependency.vpc.outputs.private_subnets
}Environment Promotion Workflow
Promoting infrastructure changes through environments follows the same pattern as application deployments:
- Bump the module version in staging's
_envcommonor directly in the component'sterragrunt.hcl. - Open a PR. Atlantis (or a GitHub Actions workflow) runs
terragrunt planand posts the plan as a PR comment. - Review and apply to staging. Validate with integration tests.
- Promote to production by updating the production config to use the same module version. Another PR, another review.
Atlantis Integration
Atlantis provides a pull-request-driven workflow for Terraform. Configure it to work with Terragrunt using a custom workflow:
# atlantis.yaml (repo-level config)
version: 3
automerge: false
parallel_plan: true
parallel_apply: false
projects:
- name: production-vpc
dir: live/production/eu-west-1/vpc
workflow: terragrunt
autoplan:
when_modified: ["*.hcl", "../../_envcommon/vpc.hcl"]
- name: staging-vpc
dir: live/staging/eu-west-1/vpc
workflow: terragrunt
autoplan:
when_modified: ["*.hcl", "../../_envcommon/vpc.hcl"]
workflows:
terragrunt:
plan:
steps:
- env:
name: TERRAGRUNT_TFPATH
value: terraform
- run: terragrunt plan -no-color -out=$PLANFILE
apply:
steps:
- run: terragrunt apply -no-color $PLANFILEDrift Detection
Schedule regular drift detection runs to catch manual changes made outside of Terraform:
#!/bin/bash
# scripts/detect-drift.sh
set -euo pipefail
ACCOUNTS=("production" "staging" "security")
DRIFT_FOUND=0
for account in "${ACCOUNTS[@]}"; do
echo "Checking drift in $account..."
cd "live/$account"
terragrunt run-all plan -detailed-exitcode -no-color 2>&1 | tee "/tmp/drift-$account.log"
EXIT_CODE=${PIPESTATUS[0]}
if [ "$EXIT_CODE" -eq 2 ]; then
echo "DRIFT DETECTED in $account"
DRIFT_FOUND=1
# Send Slack notification
curl -X POST "$SLACK_WEBHOOK" -d "{\"text\":\"Drift detected in *$account* account\"}"
fi
cd ../..
done
exit $DRIFT_FOUNDKey Takeaways
- Separate modules (what to build) from live configs (where and how to build it).
- Use
_envcommon/to define component defaults; override only environment-specific values. - Isolate state per account with dedicated S3 buckets and DynamoDB lock tables.
- Pin module versions and promote them through environments like application releases.
- Automate plan/apply with Atlantis for auditability and team collaboration.
- Run scheduled drift detection to catch out-of-band changes before they cause incidents.