Multi-Cloud Infrastructure Management with Terraform: AWS, Azure, and GCP
In today's cloud landscape, many organizations adopt multi-cloud strategies for redundancy, cost optimization, or leveraging best-of-breed services. Terraform excels at managing infrastructure across multiple cloud providers with a unified workflow. This guide explores proven patterns for multi-cloud infrastructure management.
Why Multi-Cloud?
Organizations choose multi-cloud strategies for several reasons:
- Avoid Vendor Lock-in: Reduce dependency on a single cloud provider
- Cost Optimization: Leverage competitive pricing across providers
- Geographic Coverage: Use providers with better regional presence
- Best-of-Breed Services: Choose optimal services from each cloud
- Regulatory Compliance: Meet data residency requirements
- Disaster Recovery: Cross-cloud backup and failover capabilities
Multi-Cloud Terraform Project Structure
Recommended Directory Layout
terraform-multi-cloud/
āāā providers/
ā āāā aws/
ā ā āāā main.tf
ā ā āāā variables.tf
ā ā āāā outputs.tf
ā āāā azure/
ā ā āāā main.tf
ā ā āāā variables.tf
ā ā āāā outputs.tf
ā āāā gcp/
ā āāā main.tf
ā āāā variables.tf
ā āāā outputs.tf
āāā modules/
ā āāā networking/
ā ā āāā aws/
ā ā āāā azure/
ā ā āāā gcp/
ā āāā compute/
ā ā āāā aws/
ā ā āāā azure/
ā ā āāā gcp/
ā āāā database/
ā āāā aws/
ā āāā azure/
ā āāā gcp/
āāā environments/
ā āāā dev/
ā āāā staging/
ā āāā production/
āāā shared/
āāā backend.tf
āāā versions.tf
Provider Configuration
Multi-Provider Setup
# providers.tf
terraform {
required_version = ">= 1.6.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
azurerm = {
source = "hashicorp/azurerm"
version = "~> 3.0"
}
google = {
source = "hashicorp/google"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "terraform-state-multi-cloud"
key = "production/terraform.tfstate"
region = "us-east-1"
encrypt = true
dynamodb_table = "terraform-locks"
}
}
# AWS Provider
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = var.project_name
}
}
}
# AWS Secondary Region (for DR)
provider "aws" {
alias = "secondary"
region = var.aws_secondary_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = var.project_name
Region = "Secondary"
}
}
}
# Azure Provider
provider "azurerm" {
features {
resource_group {
prevent_deletion_if_contains_resources = true
}
key_vault {
purge_soft_delete_on_destroy = false
}
}
subscription_id = var.azure_subscription_id
}
# GCP Provider
provider "google" {
project = var.gcp_project_id
region = var.gcp_region
}
# GCP Beta Provider (for preview features)
provider "google-beta" {
project = var.gcp_project_id
region = var.gcp_region
}
Cross-Cloud Networking
VPN Connectivity Between Clouds
# aws-to-azure-vpn.tf
# AWS Side - Customer Gateway and VPN Connection
resource "aws_customer_gateway" "azure" {
bgp_asn = 65000
ip_address = azurerm_public_ip.vpn_gateway.ip_address
type = "ipsec.1"
tags = {
Name = "Azure-VPN-Gateway"
}
}
resource "aws_vpn_gateway" "main" {
vpc_id = aws_vpc.main.id
tags = {
Name = "Main-VPN-Gateway"
}
}
resource "aws_vpn_connection" "azure" {
vpn_gateway_id = aws_vpn_gateway.main.id
customer_gateway_id = aws_customer_gateway.azure.id
type = "ipsec.1"
static_routes_only = false
tags = {
Name = "AWS-to-Azure-VPN"
}
}
# Azure Side - Virtual Network Gateway
resource "azurerm_resource_group" "networking" {
name = "rg-networking-${var.environment}"
location = var.azure_region
}
resource "azurerm_virtual_network" "main" {
name = "vnet-main-${var.environment}"
location = azurerm_resource_group.networking.location
resource_group_name = azurerm_resource_group.networking.name
address_space = ["10.1.0.0/16"]
}
resource "azurerm_subnet" "gateway" {
name = "GatewaySubnet"
resource_group_name = azurerm_resource_group.networking.name
virtual_network_name = azurerm_virtual_network.main.name
address_prefixes = ["10.1.255.0/24"]
}
resource "azurerm_public_ip" "vpn_gateway" {
name = "pip-vpn-gateway"
location = azurerm_resource_group.networking.location
resource_group_name = azurerm_resource_group.networking.name
allocation_method = "Static"
sku = "Standard"
}
resource "azurerm_virtual_network_gateway" "main" {
name = "vng-main-${var.environment}"
location = azurerm_resource_group.networking.location
resource_group_name = azurerm_resource_group.networking.name
type = "Vpn"
vpn_type = "RouteBased"
active_active = false
enable_bgp = true
sku = "VpnGw2"
ip_configuration {
name = "vnetGatewayConfig"
public_ip_address_id = azurerm_public_ip.vpn_gateway.id
private_ip_address_allocation = "Dynamic"
subnet_id = azurerm_subnet.gateway.id
}
bgp_settings {
asn = 65000
}
}
# Local Network Gateway (represents AWS)
resource "azurerm_local_network_gateway" "aws" {
name = "lng-aws"
location = azurerm_resource_group.networking.location
resource_group_name = azurerm_resource_group.networking.name
gateway_address = aws_vpn_connection.azure.tunnel1_address
bgp_settings {
asn = 64512
bgp_peering_address = aws_vpn_connection.azure.tunnel1_bgp_asn
}
}
# VPN Connection
resource "azurerm_virtual_network_gateway_connection" "aws" {
name = "cn-azure-to-aws"
location = azurerm_resource_group.networking.location
resource_group_name = azurerm_resource_group.networking.name
type = "IPsec"
virtual_network_gateway_id = azurerm_virtual_network_gateway.main.id
local_network_gateway_id = azurerm_local_network_gateway.aws.id
shared_key = var.vpn_shared_key
ipsec_policy {
dh_group = "DHGroup2"
ike_encryption = "AES256"
ike_integrity = "SHA256"
ipsec_encryption = "AES256"
ipsec_integrity = "SHA256"
pfs_group = "PFS2"
sa_lifetime = 27000
}
}
Multi-Cloud Load Balancing
Global Load Balancer with Cloud-Specific Backends
# global-load-balancer.tf
# AWS Application Load Balancer
module "aws_alb" {
source = "./modules/compute/aws"
name = "app-alb-${var.environment}"
vpc_id = aws_vpc.main.id
subnet_ids = aws_subnet.public[*].id
security_group_ids = [aws_security_group.alb.id]
target_instances = aws_instance.app[*].id
}
# Azure Application Gateway
module "azure_app_gateway" {
source = "./modules/compute/azure"
name = "ag-app-${var.environment}"
resource_group_name = azurerm_resource_group.compute.name
location = var.azure_region
subnet_id = azurerm_subnet.app_gateway.id
backend_pool_ips = azurerm_linux_virtual_machine.app[*].private_ip_address
}
# GCP Load Balancer
module "gcp_load_balancer" {
source = "./modules/compute/gcp"
name = "lb-app-${var.environment}"
project = var.gcp_project_id
region = var.gcp_region
backend_instances = google_compute_instance.app[*].self_link
}
# DNS-based Global Load Balancing with Route 53
resource "aws_route53_zone" "main" {
name = var.domain_name
}
# Geolocation routing to nearest cloud provider
resource "aws_route53_record" "app_us" {
zone_id = aws_route53_zone.main.zone_id
name = "app.${var.domain_name}"
type = "A"
geolocation_routing_policy {
continent = "NA"
}
alias {
name = module.aws_alb.dns_name
zone_id = module.aws_alb.zone_id
evaluate_target_health = true
}
set_identifier = "AWS-US"
}
resource "aws_route53_record" "app_eu" {
zone_id = aws_route53_zone.main.zone_id
name = "app.${var.domain_name}"
type = "A"
geolocation_routing_policy {
continent = "EU"
}
alias {
name = module.azure_app_gateway.fqdn
zone_id = module.azure_app_gateway.zone_id
evaluate_target_health = true
}
set_identifier = "Azure-EU"
}
resource "aws_route53_record" "app_asia" {
zone_id = aws_route53_zone.main.zone_id
name = "app.${var.domain_name}"
type = "A"
geolocation_routing_policy {
continent = "AS"
}
alias {
name = module.gcp_load_balancer.ip_address
zone_id = module.gcp_load_balancer.zone_id
evaluate_target_health = true
}
set_identifier = "GCP-Asia"
}
# Health check for failover
resource "aws_route53_health_check" "app" {
fqdn = "app.${var.domain_name}"
port = 443
type = "HTTPS"
resource_path = "/health"
failure_threshold = 3
request_interval = 30
tags = {
Name = "app-health-check"
}
}
Cross-Cloud Database Replication
Multi-Cloud Database Setup
# databases.tf
# AWS RDS PostgreSQL (Primary)
resource "aws_db_instance" "primary" {
identifier = "postgres-primary-${var.environment}"
engine = "postgres"
engine_version = "15.4"
instance_class = "db.r6i.xlarge"
allocated_storage = 100
storage_encrypted = true
db_name = var.database_name
username = var.database_username
password = var.database_password
vpc_security_group_ids = [aws_security_group.database.id]
db_subnet_group_name = aws_db_subnet_group.main.name
backup_retention_period = 7
backup_window = "03:00-04:00"
maintenance_window = "mon:04:00-mon:05:00"
enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
deletion_protection = true
skip_final_snapshot = false
final_snapshot_identifier = "postgres-primary-final-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
tags = {
Name = "Primary Database"
Role = "Primary"
}
}
# Azure Database for PostgreSQL (Read Replica)
resource "azurerm_postgresql_flexible_server" "replica" {
name = "psql-replica-${var.environment}"
resource_group_name = azurerm_resource_group.database.name
location = var.azure_region
version = "15"
administrator_login = var.database_username
administrator_password = var.database_password
sku_name = "GP_Standard_D4s_v3"
storage_mb = 102400
backup_retention_days = 7
geo_redundant_backup_enabled = true
high_availability {
mode = "ZoneRedundant"
}
tags = {
Name = "Replica Database"
Role = "Replica"
}
}
# GCP Cloud SQL (Read Replica)
resource "google_sql_database_instance" "replica" {
name = "postgres-replica-${var.environment}"
database_version = "POSTGRES_15"
region = var.gcp_region
settings {
tier = "db-custom-4-16384"
availability_type = "REGIONAL"
disk_size = 100
disk_type = "PD_SSD"
backup_configuration {
enabled = true
start_time = "03:00"
point_in_time_recovery_enabled = true
transaction_log_retention_days = 7
}
ip_configuration {
ipv4_enabled = false
private_network = google_compute_network.main.id
}
database_flags {
name = "max_connections"
value = "200"
}
}
deletion_protection = true
}
# Data replication configuration (using external tools like pglogical or Debezium)
resource "null_resource" "setup_replication" {
depends_on = [
aws_db_instance.primary,
azurerm_postgresql_flexible_server.replica,
google_sql_database_instance.replica
]
provisioner "local-exec" {
command = <<-EOT
# Install and configure replication
./scripts/setup-multi-cloud-replication.sh \
--primary ${aws_db_instance.primary.endpoint} \
--azure-replica ${azurerm_postgresql_flexible_server.replica.fqdn} \
--gcp-replica ${google_sql_database_instance.replica.connection_name}
EOT
}
}
Unified Monitoring and Observability
Cross-Cloud Monitoring with Datadog
# monitoring.tf
terraform {
required_providers {
datadog = {
source = "DataDog/datadog"
version = "~> 3.0"
}
}
}
provider "datadog" {
api_key = var.datadog_api_key
app_key = var.datadog_app_key
}
# AWS Integration
resource "datadog_integration_aws" "main" {
account_id = var.aws_account_id
role_name = "DatadogIntegrationRole"
host_tags = [
"cloud:aws",
"environment:${var.environment}"
]
account_specific_namespace_rules = {
auto_scaling = true
ec2 = true
elb = true
lambda = true
rds = true
s3 = true
}
}
# Azure Integration
resource "datadog_integration_azure" "main" {
tenant_name = var.azure_tenant_id
client_id = var.azure_client_id
client_secret = var.azure_client_secret
host_filters = "environment:${var.environment}"
}
# GCP Integration
resource "datadog_integration_gcp" "main" {
project_id = var.gcp_project_id
private_key_id = var.gcp_private_key_id
private_key = var.gcp_private_key
client_email = var.gcp_client_email
host_filters = "environment:${var.environment}"
}
# Multi-Cloud Dashboard
resource "datadog_dashboard" "multi_cloud" {
title = "Multi-Cloud Infrastructure Overview"
description = "Unified view of AWS, Azure, and GCP infrastructure"
layout_type = "ordered"
widget {
group_definition {
title = "AWS Metrics"
layout_type = "ordered"
widget {
timeseries_definition {
title = "EC2 CPU Utilization"
request {
q = "avg:aws.ec2.cpuutilization{environment:${var.environment}} by {instance_id}"
}
}
}
}
}
widget {
group_definition {
title = "Azure Metrics"
layout_type = "ordered"
widget {
timeseries_definition {
title = "VM CPU Percentage"
request {
q = "avg:azure.vm.percentage_cpu{environment:${var.environment}} by {name}"
}
}
}
}
}
widget {
group_definition {
title = "GCP Metrics"
layout_type = "ordered"
widget {
timeseries_definition {
title = "Compute Engine CPU Utilization"
request {
q = "avg:gcp.compute.instance.cpu.utilization{environment:${var.environment}} by {instance_name}"
}
}
}
}
}
}
# Multi-Cloud Alerts
resource "datadog_monitor" "high_cpu_multi_cloud" {
name = "High CPU Usage - Multi-Cloud"
type = "metric alert"
message = "CPU usage is high across cloud providers @pagerduty"
query = <<-EOQ
avg(last_5m):avg:aws.ec2.cpuutilization{environment:${var.environment}} > 80 or
avg(last_5m):avg:azure.vm.percentage_cpu{environment:${var.environment}} > 80 or
avg(last_5m):avg:gcp.compute.instance.cpu.utilization{environment:${var.environment}} > 0.8
EOQ
monitor_thresholds {
critical = 80
warning = 70
}
notify_no_data = false
renotify_interval = 60
tags = ["multi-cloud", "cpu", "infrastructure"]
}
Cost Management Across Clouds
Unified Cost Tracking
# cost-management.tf
# AWS Cost Anomaly Detection
resource "aws_ce_anomaly_monitor" "service" {
name = "ServiceMonitor"
monitor_type = "DIMENSIONAL"
monitor_dimension = "SERVICE"
}
resource "aws_ce_anomaly_subscription" "alerts" {
name = "CostAnomalyAlerts"
frequency = "DAILY"
monitor_arn_list = [
aws_ce_anomaly_monitor.service.arn
]
subscriber {
type = "EMAIL"
address = var.cost_alert_email
}
threshold_expression {
dimension {
key = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
values = ["100"]
match_options = ["GREATER_THAN_OR_EQUAL"]
}
}
}
# Azure Cost Management Budget
resource "azurerm_consumption_budget_resource_group" "main" {
name = "budget-${var.environment}"
resource_group_id = azurerm_resource_group.main.id
amount = 5000
time_grain = "Monthly"
time_period {
start_date = "2026-01-01T00:00:00Z"
}
notification {
enabled = true
threshold = 80
operator = "GreaterThan"
contact_emails = [
var.cost_alert_email
]
}
notification {
enabled = true
threshold = 100
operator = "GreaterThan"
contact_emails = [
var.cost_alert_email
]
}
}
# GCP Budget Alert
resource "google_billing_budget" "main" {
billing_account = var.gcp_billing_account
display_name = "Budget-${var.environment}"
budget_filter {
projects = ["projects/${var.gcp_project_id}"]
labels = {
environment = var.environment
}
}
amount {
specified_amount {
currency_code = "USD"
units = "5000"
}
}
threshold_rules {
threshold_percent = 0.8
}
threshold_rules {
threshold_percent = 1.0
}
all_updates_rule {
monitoring_notification_channels = [
google_monitoring_notification_channel.email.id
]
}
}
Best Practices for Multi-Cloud Terraform
1. State Management
# Use separate state files per cloud provider
terraform {
backend "s3" {
bucket = "terraform-state-${var.cloud_provider}"
key = "${var.environment}/${var.cloud_provider}/terraform.tfstate"
region = "us-east-1"
}
}
2. Consistent Tagging Strategy
# locals.tf
locals {
common_tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = var.project_name
CostCenter = var.cost_center
Owner = var.owner_email
}
# Cloud-specific tag formats
aws_tags = local.common_tags
azure_tags = { for k, v in local.common_tags : lower(k) => v }
gcp_labels = { for k, v in local.common_tags : lower(replace(k, " ", "_")) => lower(v) }
}
3. Module Abstraction
# modules/compute/interface/main.tf
variable "cloud_provider" {
type = string
validation {
condition = contains(["aws", "azure", "gcp"], var.cloud_provider)
error_message = "Cloud provider must be aws, azure, or gcp."
}
}
module "compute" {
source = "./modules/compute/${var.cloud_provider}"
instance_count = var.instance_count
instance_type = var.instance_type
tags = local.common_tags
}
Key Takeaways
- Unified Tooling: Use Terraform for consistent infrastructure management
- Network Connectivity: Implement VPN or dedicated connections between clouds
- Centralized Monitoring: Use tools like Datadog for unified observability
- Cost Management: Track and optimize costs across all providers
- Security: Implement consistent security policies across clouds
- Disaster Recovery: Leverage multi-cloud for resilience
- Abstraction: Create provider-agnostic modules where possible
Conclusion
Multi-cloud infrastructure management with Terraform provides flexibility, resilience, and optimization opportunities. While it adds complexity, the benefits of avoiding vendor lock-in and leveraging best-of-breed services make it worthwhile for many organizations.
Start with a single cloud, master Terraform fundamentals, then gradually expand to multi-cloud as your needs grow.
Want to dive deeper? Check out my posts on Terraform best practices and cloud-specific implementations!