AWS EC2 Auto Scaling: Best Practices for High Availability and Cost Optimization
Auto Scaling is one of the most powerful features of AWS EC2, enabling your applications to automatically adjust capacity based on demand. In this comprehensive guide, I'll share battle-tested strategies for implementing Auto Scaling that balances performance, availability, and cost.
Why Auto Scaling Matters
Auto Scaling provides critical benefits for modern cloud applications:
- High Availability: Automatically replace unhealthy instances
- Cost Optimization: Scale down during low-traffic periods
- Performance: Scale up to meet demand spikes
- Fault Tolerance: Distribute instances across multiple AZs
- Predictable Costs: Set minimum and maximum capacity limits
- Zero Manual Intervention: Fully automated scaling decisions
Auto Scaling Components
Auto Scaling Group (ASG)
# CloudFormation template for ASG
Resources:
WebServerAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: web-server-asg
VPCZoneIdentifier:
- !Ref PrivateSubnet1
- !Ref PrivateSubnet2
- !Ref PrivateSubnet3
LaunchTemplate:
LaunchTemplateId: !Ref WebServerLaunchTemplate
Version: !GetAtt WebServerLaunchTemplate.LatestVersionNumber
MinSize: 2
MaxSize: 10
DesiredCapacity: 3
HealthCheckType: ELB
HealthCheckGracePeriod: 300
TargetGroupARNs:
- !Ref WebServerTargetGroup
Tags:
- Key: Name
Value: web-server
PropagateAtLaunch: true
- Key: Environment
Value: production
PropagateAtLaunch: true
Launch Template Configuration
WebServerLaunchTemplate:
Type: AWS::EC2::LaunchTemplate
Properties:
LaunchTemplateName: web-server-template
LaunchTemplateData:
ImageId: !Ref LatestAmiId
InstanceType: t3.medium
IamInstanceProfile:
Arn: !GetAtt EC2InstanceProfile.Arn
SecurityGroupIds:
- !Ref WebServerSecurityGroup
UserData:
Fn::Base64: !Sub |
#!/bin/bash
set -e
# Update system
yum update -y
# Install CloudWatch agent
wget https://s3.amazonaws.com/amazoncloudwatch-agent/amazon_linux/amd64/latest/amazon-cloudwatch-agent.rpm
rpm -U ./amazon-cloudwatch-agent.rpm
# Configure CloudWatch agent
cat > /opt/aws/amazon-cloudwatch-agent/etc/config.json <<EOF
{
"metrics": {
"namespace": "WebServer",
"metrics_collected": {
"mem": {
"measurement": [
{"name": "mem_used_percent", "rename": "MemoryUtilization", "unit": "Percent"}
],
"metrics_collection_interval": 60
},
"disk": {
"measurement": [
{"name": "used_percent", "rename": "DiskUtilization", "unit": "Percent"}
],
"metrics_collection_interval": 60,
"resources": ["*"]
}
}
}
}
EOF
# Start CloudWatch agent
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl \
-a fetch-config \
-m ec2 \
-s \
-c file:/opt/aws/amazon-cloudwatch-agent/etc/config.json
# Install application
aws s3 cp s3://${DeploymentBucket}/app-latest.tar.gz /tmp/
tar -xzf /tmp/app-latest.tar.gz -C /opt/app
# Start application
systemctl enable app
systemctl start app
# Signal success to CloudFormation
/opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} \
--resource WebServerAutoScalingGroup --region ${AWS::Region}
MetadataOptions:
HttpTokens: required
HttpPutResponseHopLimit: 1
Monitoring:
Enabled: true
TagSpecifications:
- ResourceType: instance
Tags:
- Key: Name
Value: web-server
- ResourceType: volume
Tags:
- Key: Name
Value: web-server-volume
Scaling Policies
Target Tracking Scaling
# CPU-based target tracking
CPUTargetTrackingPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
PolicyType: TargetTrackingScaling
TargetTrackingConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ASGAverageCPUUtilization
TargetValue: 70.0
ScaleInCooldown: 300
ScaleOutCooldown: 60
# ALB Request Count per Target
RequestCountTargetTrackingPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
PolicyType: TargetTrackingScaling
TargetTrackingConfiguration:
PredefinedMetricSpecification:
PredefinedMetricType: ALBRequestCountPerTarget
ResourceLabel: !Join
- '/'
- - !GetAtt ApplicationLoadBalancer.LoadBalancerFullName
- !GetAtt WebServerTargetGroup.TargetGroupFullName
TargetValue: 1000.0
# Custom Metric - Memory Utilization
MemoryTargetTrackingPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
PolicyType: TargetTrackingScaling
TargetTrackingConfiguration:
CustomizedMetricSpecification:
MetricName: MemoryUtilization
Namespace: WebServer
Statistic: Average
TargetValue: 75.0
Step Scaling Policy
ScaleUpPolicy:
Type: AWS::AutoScaling::ScalingPolicy
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
PolicyType: StepScaling
AdjustmentType: PercentChangeInCapacity
MetricAggregationType: Average
EstimatedInstanceWarmup: 300
StepAdjustments:
- MetricIntervalLowerBound: 0
MetricIntervalUpperBound: 10
ScalingAdjustment: 10
- MetricIntervalLowerBound: 10
MetricIntervalUpperBound: 20
ScalingAdjustment: 20
- MetricIntervalLowerBound: 20
ScalingAdjustment: 30
HighCPUAlarm:
Type: AWS::CloudWatch::Alarm
Properties:
AlarmName: web-server-high-cpu
AlarmDescription: Trigger scale up when CPU is high
MetricName: CPUUtilization
Namespace: AWS/EC2
Statistic: Average
Period: 60
EvaluationPeriods: 2
Threshold: 80
ComparisonOperator: GreaterThanThreshold
Dimensions:
- Name: AutoScalingGroupName
Value: !Ref WebServerAutoScalingGroup
AlarmActions:
- !Ref ScaleUpPolicy
Scheduled Scaling
# Scale up for business hours
ScaleUpScheduledAction:
Type: AWS::AutoScaling::ScheduledAction
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
MinSize: 5
MaxSize: 15
DesiredCapacity: 8
Recurrence: "0 8 * * MON-FRI" # 8 AM weekdays
# Scale down for off-hours
ScaleDownScheduledAction:
Type: AWS::AutoScaling::ScheduledAction
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
MinSize: 2
MaxSize: 5
DesiredCapacity: 2
Recurrence: "0 18 * * MON-FRI" # 6 PM weekdays
# Weekend minimal capacity
WeekendScheduledAction:
Type: AWS::AutoScaling::ScheduledAction
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
MinSize: 1
MaxSize: 3
DesiredCapacity: 1
Recurrence: "0 0 * * SAT" # Saturday midnight
Advanced Health Checks
Custom Health Check Script
#!/bin/bash
# /usr/local/bin/health-check.sh
# Check application health
check_app_health() {
local response=$(curl -sf http://localhost:8080/health)
if [ $? -ne 0 ]; then
echo "Application health check failed"
return 1
fi
# Verify response contains expected data
if ! echo "$response" | grep -q '"status":"healthy"'; then
echo "Application not healthy"
return 1
fi
return 0
}
# Check database connectivity
check_database() {
timeout 5 nc -z db.example.com 5432
if [ $? -ne 0 ]; then
echo "Database connection failed"
return 1
fi
return 0
}
# Check disk space
check_disk_space() {
local usage=$(df -h / | awk 'NR==2 {print $5}' | sed 's/%//')
if [ "$usage" -gt 90 ]; then
echo "Disk usage critical: ${usage}%"
return 1
fi
return 0
}
# Check memory
check_memory() {
local mem_usage=$(free | grep Mem | awk '{print ($3/$2) * 100.0}' | cut -d. -f1)
if [ "$mem_usage" -gt 95 ]; then
echo "Memory usage critical: ${mem_usage}%"
return 1
fi
return 0
}
# Run all checks
main() {
check_app_health || exit 1
check_database || exit 1
check_disk_space || exit 1
check_memory || exit 1
echo "All health checks passed"
exit 0
}
main
ELB Health Check Configuration
WebServerTargetGroup:
Type: AWS::ElasticLoadBalancingV2::TargetGroup
Properties:
Name: web-server-tg
Port: 8080
Protocol: HTTP
VpcId: !Ref VPC
HealthCheckEnabled: true
HealthCheckProtocol: HTTP
HealthCheckPath: /health
HealthCheckIntervalSeconds: 30
HealthCheckTimeoutSeconds: 5
HealthyThresholdCount: 2
UnhealthyThresholdCount: 3
Matcher:
HttpCode: 200
TargetGroupAttributes:
- Key: deregistration_delay.timeout_seconds
Value: 30
- Key: stickiness.enabled
Value: true
- Key: stickiness.type
Value: lb_cookie
- Key: stickiness.lb_cookie.duration_seconds
Value: 86400
Lifecycle Hooks
Instance Launch Hook
# lambda_launch_hook.py
import boto3
import json
autoscaling = boto3.client('autoscaling')
ec2 = boto3.client('ec2')
def lambda_handler(event, context):
"""
Process Auto Scaling launch lifecycle hook
"""
message = json.loads(event['Records'][0]['Sns']['Message'])
instance_id = message['EC2InstanceId']
lifecycle_hook_name = message['LifecycleHookName']
auto_scaling_group_name = message['AutoScalingGroupName']
lifecycle_action_token = message['LifecycleActionToken']
try:
# Wait for instance to be running
waiter = ec2.get_waiter('instance_running')
waiter.wait(InstanceIds=[instance_id])
# Perform custom initialization
initialize_instance(instance_id)
# Register with monitoring system
register_monitoring(instance_id)
# Complete lifecycle action
autoscaling.complete_lifecycle_action(
LifecycleHookName=lifecycle_hook_name,
AutoScalingGroupName=auto_scaling_group_name,
LifecycleActionToken=lifecycle_action_token,
LifecycleActionResult='CONTINUE',
InstanceId=instance_id
)
return {'statusCode': 200, 'body': 'Launch hook completed'}
except Exception as e:
print(f"Error: {e}")
# Abandon lifecycle action on error
autoscaling.complete_lifecycle_action(
LifecycleHookName=lifecycle_hook_name,
AutoScalingGroupName=auto_scaling_group_name,
LifecycleActionToken=lifecycle_action_token,
LifecycleActionResult='ABANDON',
InstanceId=instance_id
)
raise
def initialize_instance(instance_id):
"""Custom instance initialization"""
# Add to configuration management
# Update DNS records
# Configure monitoring
pass
def register_monitoring(instance_id):
"""Register instance with monitoring system"""
# Add to Datadog, New Relic, etc.
pass
Lifecycle Hook Configuration
LaunchLifecycleHook:
Type: AWS::AutoScaling::LifecycleHook
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
LifecycleTransition: autoscaling:EC2_INSTANCE_LAUNCHING
DefaultResult: ABANDON
HeartbeatTimeout: 300
NotificationTargetARN: !Ref LifecycleHookTopic
TerminateLifecycleHook:
Type: AWS::AutoScaling::LifecycleHook
Properties:
AutoScalingGroupName: !Ref WebServerAutoScalingGroup
LifecycleTransition: autoscaling:EC2_INSTANCE_TERMINATING
DefaultResult: CONTINUE
HeartbeatTimeout: 180
NotificationTargetARN: !Ref LifecycleHookTopic
Cost Optimization Strategies
Mixed Instance Types
MixedInstancesAutoScalingGroup:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
MixedInstancesPolicy:
InstancesDistribution:
OnDemandAllocationStrategy: prioritized
OnDemandBaseCapacity: 2
OnDemandPercentageAboveBaseCapacity: 30
SpotAllocationStrategy: capacity-optimized
SpotInstancePools: 4
LaunchTemplate:
LaunchTemplateSpecification:
LaunchTemplateId: !Ref WebServerLaunchTemplate
Version: !GetAtt WebServerLaunchTemplate.LatestVersionNumber
Overrides:
- InstanceType: t3.medium
WeightedCapacity: 2
- InstanceType: t3a.medium
WeightedCapacity: 2
- InstanceType: t2.medium
WeightedCapacity: 2
- InstanceType: m5.large
WeightedCapacity: 4
- InstanceType: m5a.large
WeightedCapacity: 4
MinSize: 2
MaxSize: 20
DesiredCapacity: 4
Capacity Rebalancing
AutoScalingGroupWithRebalancing:
Type: AWS::AutoScaling::AutoScalingGroup
Properties:
AutoScalingGroupName: web-server-spot-asg
CapacityRebalance: true
MixedInstancesPolicy:
InstancesDistribution:
SpotAllocationStrategy: capacity-optimized
OnDemandPercentageAboveBaseCapacity: 0
LaunchTemplate:
LaunchTemplateSpecification:
LaunchTemplateId: !Ref WebServerLaunchTemplate
Version: !GetAtt WebServerLaunchTemplate.LatestVersionNumber
Monitoring and Alerts
CloudWatch Dashboard
# create_dashboard.py
import boto3
import json
cloudwatch = boto3.client('cloudwatch')
dashboard_body = {
"widgets": [
{
"type": "metric",
"properties": {
"metrics": [
["AWS/AutoScaling", "GroupDesiredCapacity", {"stat": "Average"}],
[".", "GroupInServiceInstances", {"stat": "Average"}],
[".", "GroupMinSize", {"stat": "Average"}],
[".", "GroupMaxSize", {"stat": "Average"}]
],
"period": 300,
"stat": "Average",
"region": "us-east-1",
"title": "Auto Scaling Group Metrics",
"yAxis": {"left": {"min": 0}}
}
},
{
"type": "metric",
"properties": {
"metrics": [
["AWS/EC2", "CPUUtilization", {"stat": "Average"}],
["WebServer", "MemoryUtilization", {"stat": "Average"}]
],
"period": 60,
"stat": "Average",
"region": "us-east-1",
"title": "Instance Metrics"
}
}
]
}
response = cloudwatch.put_dashboard(
DashboardName='AutoScaling-WebServer',
DashboardBody=json.dumps(dashboard_body)
)
Key Takeaways
- Use Target Tracking: Simplest and most effective scaling policy for most workloads
- Multi-AZ Deployment: Always distribute instances across availability zones
- Health Checks: Combine EC2 and ELB health checks for comprehensive monitoring
- Lifecycle Hooks: Use for custom initialization and graceful shutdown
- Mixed Instances: Combine On-Demand and Spot for cost optimization
- Scheduled Scaling: Predictable traffic patterns benefit from scheduled actions
- Proper Cooldowns: Prevent scaling thrashing with appropriate cooldown periods
Conclusion
AWS Auto Scaling is essential for building resilient, cost-effective infrastructure. By implementing these best practices, you'll achieve high availability while optimizing costs through intelligent capacity management.
Start with simple target tracking policies, monitor your metrics, and gradually add complexity as needed. Remember: the best scaling strategy is one that matches your application's specific traffic patterns and requirements.
Need help with AWS infrastructure? Check out my posts on Terraform for infrastructure as code and Kubernetes for container orchestration!