Suyog Maid
Suyog Maid
šŸ“„
Article2026-01-19

Multi-Cloud Infrastructure Management with Terraform: AWS, Azure, and GCP

#terraform#multi-cloud#aws#azure#gcp#infrastructure-as-code#devops

Multi-Cloud Infrastructure Management with Terraform: AWS, Azure, and GCP

In today's cloud landscape, many organizations adopt multi-cloud strategies for redundancy, cost optimization, or leveraging best-of-breed services. Terraform excels at managing infrastructure across multiple cloud providers with a unified workflow. This guide explores proven patterns for multi-cloud infrastructure management.

Why Multi-Cloud?

Organizations choose multi-cloud strategies for several reasons:

  • Avoid Vendor Lock-in: Reduce dependency on a single cloud provider
  • Cost Optimization: Leverage competitive pricing across providers
  • Geographic Coverage: Use providers with better regional presence
  • Best-of-Breed Services: Choose optimal services from each cloud
  • Regulatory Compliance: Meet data residency requirements
  • Disaster Recovery: Cross-cloud backup and failover capabilities

Multi-Cloud Terraform Project Structure

Recommended Directory Layout

terraform-multi-cloud/
ā”œā”€ā”€ providers/
│   ā”œā”€ā”€ aws/
│   │   ā”œā”€ā”€ main.tf
│   │   ā”œā”€ā”€ variables.tf
│   │   └── outputs.tf
│   ā”œā”€ā”€ azure/
│   │   ā”œā”€ā”€ main.tf
│   │   ā”œā”€ā”€ variables.tf
│   │   └── outputs.tf
│   └── gcp/
│       ā”œā”€ā”€ main.tf
│       ā”œā”€ā”€ variables.tf
│       └── outputs.tf
ā”œā”€ā”€ modules/
│   ā”œā”€ā”€ networking/
│   │   ā”œā”€ā”€ aws/
│   │   ā”œā”€ā”€ azure/
│   │   └── gcp/
│   ā”œā”€ā”€ compute/
│   │   ā”œā”€ā”€ aws/
│   │   ā”œā”€ā”€ azure/
│   │   └── gcp/
│   └── database/
│       ā”œā”€ā”€ aws/
│       ā”œā”€ā”€ azure/
│       └── gcp/
ā”œā”€ā”€ environments/
│   ā”œā”€ā”€ dev/
│   ā”œā”€ā”€ staging/
│   └── production/
└── shared/
    ā”œā”€ā”€ backend.tf
    └── versions.tf

Provider Configuration

Multi-Provider Setup

# providers.tf
terraform {
  required_version = ">= 1.6.0"
  
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "~> 3.0"
    }
    
    google = {
      source  = "hashicorp/google"
      version = "~> 5.0"
    }
  }
  
  backend "s3" {
    bucket         = "terraform-state-multi-cloud"
    key            = "production/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    dynamodb_table = "terraform-locks"
  }
}

# AWS Provider
provider "aws" {
  region = var.aws_region
  
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "Terraform"
      Project     = var.project_name
    }
  }
}

# AWS Secondary Region (for DR)
provider "aws" {
  alias  = "secondary"
  region = var.aws_secondary_region
  
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "Terraform"
      Project     = var.project_name
      Region      = "Secondary"
    }
  }
}

# Azure Provider
provider "azurerm" {
  features {
    resource_group {
      prevent_deletion_if_contains_resources = true
    }
    
    key_vault {
      purge_soft_delete_on_destroy = false
    }
  }
  
  subscription_id = var.azure_subscription_id
}

# GCP Provider
provider "google" {
  project = var.gcp_project_id
  region  = var.gcp_region
}

# GCP Beta Provider (for preview features)
provider "google-beta" {
  project = var.gcp_project_id
  region  = var.gcp_region
}

Cross-Cloud Networking

VPN Connectivity Between Clouds

# aws-to-azure-vpn.tf
# AWS Side - Customer Gateway and VPN Connection
resource "aws_customer_gateway" "azure" {
  bgp_asn    = 65000
  ip_address = azurerm_public_ip.vpn_gateway.ip_address
  type       = "ipsec.1"
  
  tags = {
    Name = "Azure-VPN-Gateway"
  }
}

resource "aws_vpn_gateway" "main" {
  vpc_id = aws_vpc.main.id
  
  tags = {
    Name = "Main-VPN-Gateway"
  }
}

resource "aws_vpn_connection" "azure" {
  vpn_gateway_id      = aws_vpn_gateway.main.id
  customer_gateway_id = aws_customer_gateway.azure.id
  type                = "ipsec.1"
  static_routes_only  = false
  
  tags = {
    Name = "AWS-to-Azure-VPN"
  }
}

# Azure Side - Virtual Network Gateway
resource "azurerm_resource_group" "networking" {
  name     = "rg-networking-${var.environment}"
  location = var.azure_region
}

resource "azurerm_virtual_network" "main" {
  name                = "vnet-main-${var.environment}"
  location            = azurerm_resource_group.networking.location
  resource_group_name = azurerm_resource_group.networking.name
  address_space       = ["10.1.0.0/16"]
}

resource "azurerm_subnet" "gateway" {
  name                 = "GatewaySubnet"
  resource_group_name  = azurerm_resource_group.networking.name
  virtual_network_name = azurerm_virtual_network.main.name
  address_prefixes     = ["10.1.255.0/24"]
}

resource "azurerm_public_ip" "vpn_gateway" {
  name                = "pip-vpn-gateway"
  location            = azurerm_resource_group.networking.location
  resource_group_name = azurerm_resource_group.networking.name
  allocation_method   = "Static"
  sku                 = "Standard"
}

resource "azurerm_virtual_network_gateway" "main" {
  name                = "vng-main-${var.environment}"
  location            = azurerm_resource_group.networking.location
  resource_group_name = azurerm_resource_group.networking.name
  
  type     = "Vpn"
  vpn_type = "RouteBased"
  
  active_active = false
  enable_bgp    = true
  sku           = "VpnGw2"
  
  ip_configuration {
    name                          = "vnetGatewayConfig"
    public_ip_address_id          = azurerm_public_ip.vpn_gateway.id
    private_ip_address_allocation = "Dynamic"
    subnet_id                     = azurerm_subnet.gateway.id
  }
  
  bgp_settings {
    asn = 65000
  }
}

# Local Network Gateway (represents AWS)
resource "azurerm_local_network_gateway" "aws" {
  name                = "lng-aws"
  location            = azurerm_resource_group.networking.location
  resource_group_name = azurerm_resource_group.networking.name
  gateway_address     = aws_vpn_connection.azure.tunnel1_address
  
  bgp_settings {
    asn                 = 64512
    bgp_peering_address = aws_vpn_connection.azure.tunnel1_bgp_asn
  }
}

# VPN Connection
resource "azurerm_virtual_network_gateway_connection" "aws" {
  name                = "cn-azure-to-aws"
  location            = azurerm_resource_group.networking.location
  resource_group_name = azurerm_resource_group.networking.name
  
  type                       = "IPsec"
  virtual_network_gateway_id = azurerm_virtual_network_gateway.main.id
  local_network_gateway_id   = azurerm_local_network_gateway.aws.id
  
  shared_key = var.vpn_shared_key
  
  ipsec_policy {
    dh_group         = "DHGroup2"
    ike_encryption   = "AES256"
    ike_integrity    = "SHA256"
    ipsec_encryption = "AES256"
    ipsec_integrity  = "SHA256"
    pfs_group        = "PFS2"
    sa_lifetime      = 27000
  }
}

Multi-Cloud Load Balancing

Global Load Balancer with Cloud-Specific Backends

# global-load-balancer.tf
# AWS Application Load Balancer
module "aws_alb" {
  source = "./modules/compute/aws"
  
  name               = "app-alb-${var.environment}"
  vpc_id             = aws_vpc.main.id
  subnet_ids         = aws_subnet.public[*].id
  security_group_ids = [aws_security_group.alb.id]
  
  target_instances = aws_instance.app[*].id
}

# Azure Application Gateway
module "azure_app_gateway" {
  source = "./modules/compute/azure"
  
  name                = "ag-app-${var.environment}"
  resource_group_name = azurerm_resource_group.compute.name
  location            = var.azure_region
  
  subnet_id = azurerm_subnet.app_gateway.id
  backend_pool_ips = azurerm_linux_virtual_machine.app[*].private_ip_address
}

# GCP Load Balancer
module "gcp_load_balancer" {
  source = "./modules/compute/gcp"
  
  name    = "lb-app-${var.environment}"
  project = var.gcp_project_id
  region  = var.gcp_region
  
  backend_instances = google_compute_instance.app[*].self_link
}

# DNS-based Global Load Balancing with Route 53
resource "aws_route53_zone" "main" {
  name = var.domain_name
}

# Geolocation routing to nearest cloud provider
resource "aws_route53_record" "app_us" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "app.${var.domain_name}"
  type    = "A"
  
  geolocation_routing_policy {
    continent = "NA"
  }
  
  alias {
    name                   = module.aws_alb.dns_name
    zone_id                = module.aws_alb.zone_id
    evaluate_target_health = true
  }
  
  set_identifier = "AWS-US"
}

resource "aws_route53_record" "app_eu" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "app.${var.domain_name}"
  type    = "A"
  
  geolocation_routing_policy {
    continent = "EU"
  }
  
  alias {
    name                   = module.azure_app_gateway.fqdn
    zone_id                = module.azure_app_gateway.zone_id
    evaluate_target_health = true
  }
  
  set_identifier = "Azure-EU"
}

resource "aws_route53_record" "app_asia" {
  zone_id = aws_route53_zone.main.zone_id
  name    = "app.${var.domain_name}"
  type    = "A"
  
  geolocation_routing_policy {
    continent = "AS"
  }
  
  alias {
    name                   = module.gcp_load_balancer.ip_address
    zone_id                = module.gcp_load_balancer.zone_id
    evaluate_target_health = true
  }
  
  set_identifier = "GCP-Asia"
}

# Health check for failover
resource "aws_route53_health_check" "app" {
  fqdn              = "app.${var.domain_name}"
  port              = 443
  type              = "HTTPS"
  resource_path     = "/health"
  failure_threshold = 3
  request_interval  = 30
  
  tags = {
    Name = "app-health-check"
  }
}

Cross-Cloud Database Replication

Multi-Cloud Database Setup

# databases.tf
# AWS RDS PostgreSQL (Primary)
resource "aws_db_instance" "primary" {
  identifier = "postgres-primary-${var.environment}"
  
  engine               = "postgres"
  engine_version       = "15.4"
  instance_class       = "db.r6i.xlarge"
  allocated_storage    = 100
  storage_encrypted    = true
  
  db_name  = var.database_name
  username = var.database_username
  password = var.database_password
  
  vpc_security_group_ids = [aws_security_group.database.id]
  db_subnet_group_name   = aws_db_subnet_group.main.name
  
  backup_retention_period = 7
  backup_window          = "03:00-04:00"
  maintenance_window     = "mon:04:00-mon:05:00"
  
  enabled_cloudwatch_logs_exports = ["postgresql", "upgrade"]
  
  deletion_protection = true
  skip_final_snapshot = false
  final_snapshot_identifier = "postgres-primary-final-${formatdate("YYYY-MM-DD-hhmm", timestamp())}"
  
  tags = {
    Name = "Primary Database"
    Role = "Primary"
  }
}

# Azure Database for PostgreSQL (Read Replica)
resource "azurerm_postgresql_flexible_server" "replica" {
  name                = "psql-replica-${var.environment}"
  resource_group_name = azurerm_resource_group.database.name
  location            = var.azure_region
  
  version                      = "15"
  administrator_login          = var.database_username
  administrator_password       = var.database_password
  
  sku_name   = "GP_Standard_D4s_v3"
  storage_mb = 102400
  
  backup_retention_days        = 7
  geo_redundant_backup_enabled = true
  
  high_availability {
    mode = "ZoneRedundant"
  }
  
  tags = {
    Name = "Replica Database"
    Role = "Replica"
  }
}

# GCP Cloud SQL (Read Replica)
resource "google_sql_database_instance" "replica" {
  name             = "postgres-replica-${var.environment}"
  database_version = "POSTGRES_15"
  region           = var.gcp_region
  
  settings {
    tier              = "db-custom-4-16384"
    availability_type = "REGIONAL"
    disk_size         = 100
    disk_type         = "PD_SSD"
    
    backup_configuration {
      enabled                        = true
      start_time                     = "03:00"
      point_in_time_recovery_enabled = true
      transaction_log_retention_days = 7
    }
    
    ip_configuration {
      ipv4_enabled    = false
      private_network = google_compute_network.main.id
    }
    
    database_flags {
      name  = "max_connections"
      value = "200"
    }
  }
  
  deletion_protection = true
}

# Data replication configuration (using external tools like pglogical or Debezium)
resource "null_resource" "setup_replication" {
  depends_on = [
    aws_db_instance.primary,
    azurerm_postgresql_flexible_server.replica,
    google_sql_database_instance.replica
  ]
  
  provisioner "local-exec" {
    command = <<-EOT
      # Install and configure replication
      ./scripts/setup-multi-cloud-replication.sh \
        --primary ${aws_db_instance.primary.endpoint} \
        --azure-replica ${azurerm_postgresql_flexible_server.replica.fqdn} \
        --gcp-replica ${google_sql_database_instance.replica.connection_name}
    EOT
  }
}

Unified Monitoring and Observability

Cross-Cloud Monitoring with Datadog

# monitoring.tf
terraform {
  required_providers {
    datadog = {
      source  = "DataDog/datadog"
      version = "~> 3.0"
    }
  }
}

provider "datadog" {
  api_key = var.datadog_api_key
  app_key = var.datadog_app_key
}

# AWS Integration
resource "datadog_integration_aws" "main" {
  account_id = var.aws_account_id
  role_name  = "DatadogIntegrationRole"
  
  host_tags = [
    "cloud:aws",
    "environment:${var.environment}"
  ]
  
  account_specific_namespace_rules = {
    auto_scaling = true
    ec2          = true
    elb          = true
    lambda       = true
    rds          = true
    s3           = true
  }
}

# Azure Integration
resource "datadog_integration_azure" "main" {
  tenant_name   = var.azure_tenant_id
  client_id     = var.azure_client_id
  client_secret = var.azure_client_secret
  
  host_filters = "environment:${var.environment}"
}

# GCP Integration
resource "datadog_integration_gcp" "main" {
  project_id     = var.gcp_project_id
  private_key_id = var.gcp_private_key_id
  private_key    = var.gcp_private_key
  client_email   = var.gcp_client_email
  
  host_filters = "environment:${var.environment}"
}

# Multi-Cloud Dashboard
resource "datadog_dashboard" "multi_cloud" {
  title       = "Multi-Cloud Infrastructure Overview"
  description = "Unified view of AWS, Azure, and GCP infrastructure"
  layout_type = "ordered"
  
  widget {
    group_definition {
      title       = "AWS Metrics"
      layout_type = "ordered"
      
      widget {
        timeseries_definition {
          title = "EC2 CPU Utilization"
          request {
            q = "avg:aws.ec2.cpuutilization{environment:${var.environment}} by {instance_id}"
          }
        }
      }
    }
  }
  
  widget {
    group_definition {
      title       = "Azure Metrics"
      layout_type = "ordered"
      
      widget {
        timeseries_definition {
          title = "VM CPU Percentage"
          request {
            q = "avg:azure.vm.percentage_cpu{environment:${var.environment}} by {name}"
          }
        }
      }
    }
  }
  
  widget {
    group_definition {
      title       = "GCP Metrics"
      layout_type = "ordered"
      
      widget {
        timeseries_definition {
          title = "Compute Engine CPU Utilization"
          request {
            q = "avg:gcp.compute.instance.cpu.utilization{environment:${var.environment}} by {instance_name}"
          }
        }
      }
    }
  }
}

# Multi-Cloud Alerts
resource "datadog_monitor" "high_cpu_multi_cloud" {
  name    = "High CPU Usage - Multi-Cloud"
  type    = "metric alert"
  message = "CPU usage is high across cloud providers @pagerduty"
  
  query = <<-EOQ
    avg(last_5m):avg:aws.ec2.cpuutilization{environment:${var.environment}} > 80 or
    avg(last_5m):avg:azure.vm.percentage_cpu{environment:${var.environment}} > 80 or
    avg(last_5m):avg:gcp.compute.instance.cpu.utilization{environment:${var.environment}} > 0.8
  EOQ
  
  monitor_thresholds {
    critical = 80
    warning  = 70
  }
  
  notify_no_data    = false
  renotify_interval = 60
  
  tags = ["multi-cloud", "cpu", "infrastructure"]
}

Cost Management Across Clouds

Unified Cost Tracking

# cost-management.tf
# AWS Cost Anomaly Detection
resource "aws_ce_anomaly_monitor" "service" {
  name              = "ServiceMonitor"
  monitor_type      = "DIMENSIONAL"
  monitor_dimension = "SERVICE"
}

resource "aws_ce_anomaly_subscription" "alerts" {
  name      = "CostAnomalyAlerts"
  frequency = "DAILY"
  
  monitor_arn_list = [
    aws_ce_anomaly_monitor.service.arn
  ]
  
  subscriber {
    type    = "EMAIL"
    address = var.cost_alert_email
  }
  
  threshold_expression {
    dimension {
      key           = "ANOMALY_TOTAL_IMPACT_ABSOLUTE"
      values        = ["100"]
      match_options = ["GREATER_THAN_OR_EQUAL"]
    }
  }
}

# Azure Cost Management Budget
resource "azurerm_consumption_budget_resource_group" "main" {
  name              = "budget-${var.environment}"
  resource_group_id = azurerm_resource_group.main.id
  
  amount     = 5000
  time_grain = "Monthly"
  
  time_period {
    start_date = "2026-01-01T00:00:00Z"
  }
  
  notification {
    enabled   = true
    threshold = 80
    operator  = "GreaterThan"
    
    contact_emails = [
      var.cost_alert_email
    ]
  }
  
  notification {
    enabled   = true
    threshold = 100
    operator  = "GreaterThan"
    
    contact_emails = [
      var.cost_alert_email
    ]
  }
}

# GCP Budget Alert
resource "google_billing_budget" "main" {
  billing_account = var.gcp_billing_account
  display_name    = "Budget-${var.environment}"
  
  budget_filter {
    projects = ["projects/${var.gcp_project_id}"]
    
    labels = {
      environment = var.environment
    }
  }
  
  amount {
    specified_amount {
      currency_code = "USD"
      units         = "5000"
    }
  }
  
  threshold_rules {
    threshold_percent = 0.8
  }
  
  threshold_rules {
    threshold_percent = 1.0
  }
  
  all_updates_rule {
    monitoring_notification_channels = [
      google_monitoring_notification_channel.email.id
    ]
  }
}

Best Practices for Multi-Cloud Terraform

1. State Management

# Use separate state files per cloud provider
terraform {
  backend "s3" {
    bucket = "terraform-state-${var.cloud_provider}"
    key    = "${var.environment}/${var.cloud_provider}/terraform.tfstate"
    region = "us-east-1"
  }
}

2. Consistent Tagging Strategy

# locals.tf
locals {
  common_tags = {
    Environment = var.environment
    ManagedBy   = "Terraform"
    Project     = var.project_name
    CostCenter  = var.cost_center
    Owner       = var.owner_email
  }
  
  # Cloud-specific tag formats
  aws_tags   = local.common_tags
  azure_tags = { for k, v in local.common_tags : lower(k) => v }
  gcp_labels = { for k, v in local.common_tags : lower(replace(k, " ", "_")) => lower(v) }
}

3. Module Abstraction

# modules/compute/interface/main.tf
variable "cloud_provider" {
  type = string
  validation {
    condition     = contains(["aws", "azure", "gcp"], var.cloud_provider)
    error_message = "Cloud provider must be aws, azure, or gcp."
  }
}

module "compute" {
  source = "./modules/compute/${var.cloud_provider}"
  
  instance_count = var.instance_count
  instance_type  = var.instance_type
  tags           = local.common_tags
}

Key Takeaways

  1. Unified Tooling: Use Terraform for consistent infrastructure management
  2. Network Connectivity: Implement VPN or dedicated connections between clouds
  3. Centralized Monitoring: Use tools like Datadog for unified observability
  4. Cost Management: Track and optimize costs across all providers
  5. Security: Implement consistent security policies across clouds
  6. Disaster Recovery: Leverage multi-cloud for resilience
  7. Abstraction: Create provider-agnostic modules where possible

Conclusion

Multi-cloud infrastructure management with Terraform provides flexibility, resilience, and optimization opportunities. While it adds complexity, the benefits of avoiding vendor lock-in and leveraging best-of-breed services make it worthwhile for many organizations.

Start with a single cloud, master Terraform fundamentals, then gradually expand to multi-cloud as your needs grow.


Want to dive deeper? Check out my posts on Terraform best practices and cloud-specific implementations!

Share this insight