Auto-scaling Configuration - Complete Guide
Published: September 25, 2024 | Reading time: 25 minutes
Auto-scaling Overview
Auto-scaling automatically adjusts resources based on demand:
Scaling Types
# Scaling Types
- Horizontal scaling (scale out/in)
- Vertical scaling (scale up/down)
- Predictive scaling
- Scheduled scaling
- Manual scaling
- Target tracking scaling
AWS Auto Scaling
EC2 Auto Scaling Groups
AWS Auto Scaling Setup
# Create Launch Template
aws ec2 create-launch-template \
--launch-template-name web-server-template \
--launch-template-data '{
"ImageId": "ami-0abcdef1234567890",
"InstanceType": "t3.micro",
"KeyName": "my-key",
"SecurityGroupIds": ["sg-12345678"],
"UserData": "'$(base64 -w 0 user-data.sh)'",
"TagSpecifications": [{
"ResourceType": "instance",
"Tags": [{"Key": "Name", "Value": "web-server"}]
}]
}'
# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name web-asg \
--launch-template LaunchTemplateName=web-server-template,Version=1 \
--min-size 1 \
--max-size 10 \
--desired-capacity 2 \
--vpc-zone-identifier "subnet-12345678,subnet-87654321" \
--health-check-type EC2 \
--health-check-grace-period 300
# Create Scaling Policy
aws autoscaling put-scaling-policy \
--auto-scaling-group-name web-asg \
--policy-name scale-up-policy \
--policy-type TargetTrackingScaling \
--target-tracking-configuration '{
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
}
}'
# Create CloudWatch Alarms
aws cloudwatch put-metric-alarm \
--alarm-name "High CPU Utilization" \
--alarm-description "Alarm when CPU exceeds 80%" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:autoscaling:region:account:scalingPolicy:policy-id:autoScalingGroupName/web-asg:policyName/scale-up-policy
Advanced AWS Scaling
Advanced Scaling Configuration
# Predictive Scaling
aws autoscaling put-scaling-policy \
--auto-scaling-group-name web-asg \
--policy-name predictive-scaling-policy \
--policy-type PredictiveScaling \
--predictive-scaling-configuration '{
"MetricSpecification": {
"TargetValue": 70.0,
"PredefinedMetricSpecification": {
"PredefinedMetricType": "ASGAverageCPUUtilization"
}
},
"Mode": "ForecastAndScale",
"SchedulingBufferTime": 10
}'
# Scheduled Scaling
aws autoscaling put-scheduled-update-group-action \
--auto-scaling-group-name web-asg \
--scheduled-action-name scale-up-morning \
--start-time "2024-01-01T08:00:00Z" \
--recurrence "0 8 * * MON-FRI" \
--min-size 3 \
--max-size 15 \
--desired-capacity 5
# Mixed Instance Types
aws autoscaling create-auto-scaling-group \
--auto-scaling-group-name web-asg-mixed \
--mixed-instances-policy '{
"LaunchTemplate": {
"LaunchTemplateSpecification": {
"LaunchTemplateName": "web-server-template",
"Version": "1"
},
"Overrides": [
{"InstanceType": "t3.micro"},
{"InstanceType": "t3.small"},
{"InstanceType": "t3.medium"}
]
},
"InstancesDistribution": {
"OnDemandBaseCapacity": 1,
"OnDemandPercentageAboveBaseCapacity": 50,
"SpotAllocationStrategy": "diversified"
}
}' \
--min-size 1 \
--max-size 10 \
--desired-capacity 2
# Lifecycle Hooks
aws autoscaling put-lifecycle-hook \
--auto-scaling-group-name web-asg \
--lifecycle-hook-name scale-up-hook \
--lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
--default-result CONTINUE \
--heartbeat-timeout 300
Kubernetes Auto-scaling
Horizontal Pod Autoscaler
Kubernetes HPA Setup
# HPA YAML Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa
namespace: default
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-deployment
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Pods
value: 2
periodSeconds: 60
- type: Percent
value: 100
periodSeconds: 60
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Pods
value: 1
periodSeconds: 60
# Apply HPA
kubectl apply -f hpa.yaml
# Check HPA Status
kubectl get hpa
kubectl describe hpa web-hpa
# Custom Metrics HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-hpa-custom
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-deployment
minReplicas: 1
maxReplicas: 10
metrics:
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: "100"
Vertical Pod Autoscaler
Kubernetes VPA Setup
# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-install.sh
# VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-vpa
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-deployment
updatePolicy:
updateMode: "Auto" # Auto, Initial, Off
resourcePolicy:
containerPolicies:
- containerName: web
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 1000m
memory: 1Gi
controlledResources: ["cpu", "memory"]
# Apply VPA
kubectl apply -f vpa.yaml
# Check VPA Status
kubectl get vpa
kubectl describe vpa web-vpa
# VPA Modes:
# - Off: Only provides recommendations
# - Initial: Sets resources on pod creation
# - Auto: Updates resources on pod restart
Docker Swarm Scaling
Docker Swarm Auto-scaling
Docker Swarm Scaling
# Initialize Docker Swarm
docker swarm init
# Create Service with Scaling
docker service create \
--name web-service \
--replicas 3 \
--publish 80:3000 \
--constraint 'node.role==worker' \
--update-parallelism 1 \
--update-delay 10s \
--update-failure-action rollback \
nginx:latest
# Scale Service
docker service scale web-service=5
# Update Service
docker service update \
--image nginx:1.21 \
--update-parallelism 2 \
--update-delay 5s \
web-service
# Rolling Update
docker service update \
--image nginx:1.22 \
--update-parallelism 1 \
--update-delay 30s \
--update-failure-action rollback \
web-service
# Service Constraints
docker service create \
--name web-service \
--constraint 'node.labels.zone==us-east-1' \
--constraint 'node.role==worker' \
--replicas 3 \
nginx:latest
# Service Placement
docker service create \
--name web-service \
--placement-pref 'spread=node.labels.zone' \
--replicas 6 \
nginx:latest
# Health Check
docker service create \
--name web-service \
--health-cmd "curl -f http://localhost:3000/health || exit 1" \
--health-interval 30s \
--health-timeout 10s \
--health-retries 3 \
--replicas 3 \
nginx:latest
Google Cloud Auto-scaling
GCP Instance Groups
GCP Auto-scaling Setup
# Create Instance Template
gcloud compute instance-templates create web-template \
--machine-type=e2-micro \
--image-family=ubuntu-2004-lts \
--image-project=ubuntu-os-cloud \
--boot-disk-size=10GB \
--boot-disk-type=pd-standard \
--tags=web-server
# Create Managed Instance Group
gcloud compute instance-groups managed create web-instance-group \
--base-instance-name web-instance \
--size 3 \
--template web-template \
--zone us-central1-a
# Configure Auto-scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
--zone us-central1-a \
--max-num-replicas 10 \
--min-num-replicas 1 \
--target-cpu-utilization 0.7 \
--cool-down-period 60s
# Create Health Check
gcloud compute health-checks create http web-health-check \
--port 3000 \
--request-path /health \
--check-interval 30s \
--timeout 5s \
--healthy-threshold 2 \
--unhealthy-threshold 3
# Apply Health Check to Instance Group
gcloud compute instance-groups managed set-autoscaling web-instance-group \
--zone us-central1-a \
--health-check web-health-check \
--initial-delay 300s
# Custom Metrics Auto-scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
--zone us-central1-a \
--custom-metric-utilization metric-type=custom.googleapis.com/http_requests_per_second,target=100 \
--max-num-replicas 20 \
--min-num-replicas 2
# Scheduled Scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
--zone us-central1-a \
--schedule "scale-up-morning" \
--schedule-cron "0 8 * * MON-FRI" \
--schedule-min-num-replicas 5 \
--schedule-max-num-replicas 15
Azure Auto-scaling
Azure Virtual Machine Scale Sets
Azure Auto-scaling Configuration
# Create Resource Group
az group create --name myResourceGroup --location eastus
# Create Virtual Machine Scale Set
az vmss create \
--resource-group myResourceGroup \
--name myScaleSet \
--image UbuntuLTS \
--upgrade-policy-mode automatic \
--instance-count 3 \
--admin-username azureuser \
--generate-ssh-keys
# Configure Auto-scaling Rules
az monitor autoscale create \
--resource-group myResourceGroup \
--resource myScaleSet \
--resource-type Microsoft.Compute/virtualMachineScaleSets \
--name myAutoscaleSetting \
--min-count 1 \
--max-count 10 \
--count 3
# Add Scale-out Rule
az monitor autoscale rule create \
--resource-group myResourceGroup \
--autoscale-name myAutoscaleSetting \
--condition "Percentage CPU > 70 avg 5m" \
--scale out 1
# Add Scale-in Rule
az monitor autoscale rule create \
--resource-group myResourceGroup \
--autoscale-name myAutoscaleSetting \
--condition "Percentage CPU < 30 avg 5m" \
--scale in 1
# Create Health Probe
az network lb probe create \
--resource-group myResourceGroup \
--lb-name myLoadBalancer \
--name myHealthProbe \
--protocol tcp \
--port 80 \
--interval 15 \
--threshold 4
# Scheduled Scaling
az monitor autoscale profile create \
--resource-group myResourceGroup \
--autoscale-name myAutoscaleSetting \
--name "Weekday Profile" \
--recurrence week \
--start 2024-01-01T08:00:00 \
--timezone "Pacific Standard Time" \
--min-count 5 \
--max-count 15 \
--count 8
Monitoring and Alerting
Scaling Metrics
Scaling Monitoring Setup
# Prometheus Auto-scaling Metrics
# Custom metrics for scaling decisions
const promClient = require('prom-client');
// Create custom metrics
const httpRequestsTotal = new promClient.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code']
});
const activeConnections = new promClient.Gauge({
name: 'active_connections',
help: 'Number of active connections'
});
const responseTime = new promClient.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
});
// Scaling decision logic
function shouldScale() {
const cpuUsage = getCPUUsage();
const memoryUsage = getMemoryUsage();
const requestRate = getRequestRate();
if (cpuUsage > 80 || memoryUsage > 85 || requestRate > 1000) {
return 'scale_up';
} else if (cpuUsage < 30 && memoryUsage < 40 && requestRate < 100) {
return 'scale_down';
}
return 'no_change';
}
// CloudWatch Custom Metrics (AWS)
const AWS = require('aws-sdk');
const cloudwatch = new AWS.CloudWatch();
// Send custom metric
const params = {
Namespace: 'Custom/Application',
MetricData: [
{
MetricName: 'RequestRate',
Value: requestRate,
Unit: 'Count/Second',
Timestamp: new Date()
},
{
MetricName: 'ActiveConnections',
Value: activeConnections,
Unit: 'Count',
Timestamp: new Date()
}
]
};
cloudwatch.putMetricData(params, (err, data) => {
if (err) console.log(err, err.stack);
else console.log('Metric sent successfully');
});
# Kubernetes Custom Metrics
apiVersion: v1
kind: Service
metadata:
name: custom-metrics-apiserver
namespace: kube-system
spec:
ports:
- port: 443
targetPort: 6443
selector:
app: custom-metrics-apiserver
# Custom Metrics API
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: custommetrics.metrics.k8s.io
spec:
group: metrics.k8s.io
versions:
- name: v1beta1
served: true
storage: true
scope: Namespaced
names:
plural: custommetrics
singular: custommetric
kind: CustomMetric
Best Practices
Scaling Strategy
Scaling Best Practices
- Monitor multiple metrics
- Set appropriate thresholds
- Use gradual scaling
- Implement health checks
- Plan for capacity limits
- Test scaling policies
- Monitor scaling events
Common Mistakes
- Aggressive scaling policies
- Inadequate monitoring
- No scaling limits
- Poor health checks
- Ignoring costs
- No testing
- Single metric scaling
Summary
Auto-scaling configuration involves several key components:
- Scaling Types: Horizontal, vertical, predictive scaling
- AWS: Auto Scaling Groups, launch templates, scaling policies
- Kubernetes: HPA, VPA, custom metrics
- Docker Swarm: Service scaling, rolling updates
- Cloud Providers: GCP, Azure managed scaling
- Monitoring: Custom metrics, alerting, health checks
- Best Practices: Gradual scaling, multiple metrics, testing
Need More Help?
Struggling with auto-scaling configuration or need help implementing scalable infrastructure? Our cloud experts can help you design robust auto-scaling solutions.
Get Auto-scaling Help