Auto-scaling Configuration - Complete Guide

Published: September 25, 2024 | Reading time: 25 minutes

Auto-scaling Overview

Auto-scaling automatically adjusts resources based on demand:

Scaling Types

# Scaling Types
- Horizontal scaling (scale out/in)
- Vertical scaling (scale up/down)
- Predictive scaling
- Scheduled scaling
- Manual scaling
- Target tracking scaling

AWS Auto Scaling

EC2 Auto Scaling Groups

AWS Auto Scaling Setup

# Create Launch Template
aws ec2 create-launch-template \
  --launch-template-name web-server-template \
  --launch-template-data '{
    "ImageId": "ami-0abcdef1234567890",
    "InstanceType": "t3.micro",
    "KeyName": "my-key",
    "SecurityGroupIds": ["sg-12345678"],
    "UserData": "'$(base64 -w 0 user-data.sh)'",
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "web-server"}]
    }]
  }'

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-asg \
  --launch-template LaunchTemplateName=web-server-template,Version=1 \
  --min-size 1 \
  --max-size 10 \
  --desired-capacity 2 \
  --vpc-zone-identifier "subnet-12345678,subnet-87654321" \
  --health-check-type EC2 \
  --health-check-grace-period 300

# Create Scaling Policy
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-asg \
  --policy-name scale-up-policy \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    }
  }'

# Create CloudWatch Alarms
aws cloudwatch put-metric-alarm \
  --alarm-name "High CPU Utilization" \
  --alarm-description "Alarm when CPU exceeds 80%" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:autoscaling:region:account:scalingPolicy:policy-id:autoScalingGroupName/web-asg:policyName/scale-up-policy

Advanced AWS Scaling

Advanced Scaling Configuration

# Predictive Scaling
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-asg \
  --policy-name predictive-scaling-policy \
  --policy-type PredictiveScaling \
  --predictive-scaling-configuration '{
    "MetricSpecification": {
      "TargetValue": 70.0,
      "PredefinedMetricSpecification": {
        "PredefinedMetricType": "ASGAverageCPUUtilization"
      }
    },
    "Mode": "ForecastAndScale",
    "SchedulingBufferTime": 10
  }'

# Scheduled Scaling
aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name web-asg \
  --scheduled-action-name scale-up-morning \
  --start-time "2024-01-01T08:00:00Z" \
  --recurrence "0 8 * * MON-FRI" \
  --min-size 3 \
  --max-size 15 \
  --desired-capacity 5

# Mixed Instance Types
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-asg-mixed \
  --mixed-instances-policy '{
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "web-server-template",
        "Version": "1"
      },
      "Overrides": [
        {"InstanceType": "t3.micro"},
        {"InstanceType": "t3.small"},
        {"InstanceType": "t3.medium"}
      ]
    },
    "InstancesDistribution": {
      "OnDemandBaseCapacity": 1,
      "OnDemandPercentageAboveBaseCapacity": 50,
      "SpotAllocationStrategy": "diversified"
    }
  }' \
  --min-size 1 \
  --max-size 10 \
  --desired-capacity 2

# Lifecycle Hooks
aws autoscaling put-lifecycle-hook \
  --auto-scaling-group-name web-asg \
  --lifecycle-hook-name scale-up-hook \
  --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
  --default-result CONTINUE \
  --heartbeat-timeout 300

Kubernetes Auto-scaling

Horizontal Pod Autoscaler

Kubernetes HPA Setup

# HPA YAML Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
      - type: Percent
        value: 100
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60

# Apply HPA
kubectl apply -f hpa.yaml

# Check HPA Status
kubectl get hpa
kubectl describe hpa web-hpa

# Custom Metrics HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa-custom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

Vertical Pod Autoscaler

Kubernetes VPA Setup

# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-install.sh

# VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  updatePolicy:
    updateMode: "Auto"  # Auto, Initial, Off
  resourcePolicy:
    containerPolicies:
    - containerName: web
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 1000m
        memory: 1Gi
      controlledResources: ["cpu", "memory"]

# Apply VPA
kubectl apply -f vpa.yaml

# Check VPA Status
kubectl get vpa
kubectl describe vpa web-vpa

# VPA Modes:
# - Off: Only provides recommendations
# - Initial: Sets resources on pod creation
# - Auto: Updates resources on pod restart

Docker Swarm Scaling

Docker Swarm Auto-scaling

Docker Swarm Scaling

# Initialize Docker Swarm
docker swarm init

# Create Service with Scaling
docker service create \
  --name web-service \
  --replicas 3 \
  --publish 80:3000 \
  --constraint 'node.role==worker' \
  --update-parallelism 1 \
  --update-delay 10s \
  --update-failure-action rollback \
  nginx:latest

# Scale Service
docker service scale web-service=5

# Update Service
docker service update \
  --image nginx:1.21 \
  --update-parallelism 2 \
  --update-delay 5s \
  web-service

# Rolling Update
docker service update \
  --image nginx:1.22 \
  --update-parallelism 1 \
  --update-delay 30s \
  --update-failure-action rollback \
  web-service

# Service Constraints
docker service create \
  --name web-service \
  --constraint 'node.labels.zone==us-east-1' \
  --constraint 'node.role==worker' \
  --replicas 3 \
  nginx:latest

# Service Placement
docker service create \
  --name web-service \
  --placement-pref 'spread=node.labels.zone' \
  --replicas 6 \
  nginx:latest

# Health Check
docker service create \
  --name web-service \
  --health-cmd "curl -f http://localhost:3000/health || exit 1" \
  --health-interval 30s \
  --health-timeout 10s \
  --health-retries 3 \
  --replicas 3 \
  nginx:latest

Google Cloud Auto-scaling

GCP Instance Groups

GCP Auto-scaling Setup

# Create Instance Template
gcloud compute instance-templates create web-template \
  --machine-type=e2-micro \
  --image-family=ubuntu-2004-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=10GB \
  --boot-disk-type=pd-standard \
  --tags=web-server

# Create Managed Instance Group
gcloud compute instance-groups managed create web-instance-group \
  --base-instance-name web-instance \
  --size 3 \
  --template web-template \
  --zone us-central1-a

# Configure Auto-scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
  --zone us-central1-a \
  --max-num-replicas 10 \
  --min-num-replicas 1 \
  --target-cpu-utilization 0.7 \
  --cool-down-period 60s

# Create Health Check
gcloud compute health-checks create http web-health-check \
  --port 3000 \
  --request-path /health \
  --check-interval 30s \
  --timeout 5s \
  --healthy-threshold 2 \
  --unhealthy-threshold 3

# Apply Health Check to Instance Group
gcloud compute instance-groups managed set-autoscaling web-instance-group \
  --zone us-central1-a \
  --health-check web-health-check \
  --initial-delay 300s

# Custom Metrics Auto-scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
  --zone us-central1-a \
  --custom-metric-utilization metric-type=custom.googleapis.com/http_requests_per_second,target=100 \
  --max-num-replicas 20 \
  --min-num-replicas 2

# Scheduled Scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
  --zone us-central1-a \
  --schedule "scale-up-morning" \
  --schedule-cron "0 8 * * MON-FRI" \
  --schedule-min-num-replicas 5 \
  --schedule-max-num-replicas 15

Azure Auto-scaling

Azure Virtual Machine Scale Sets

Azure Auto-scaling Configuration

# Create Resource Group
az group create --name myResourceGroup --location eastus

# Create Virtual Machine Scale Set
az vmss create \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --image UbuntuLTS \
  --upgrade-policy-mode automatic \
  --instance-count 3 \
  --admin-username azureuser \
  --generate-ssh-keys

# Configure Auto-scaling Rules
az monitor autoscale create \
  --resource-group myResourceGroup \
  --resource myScaleSet \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name myAutoscaleSetting \
  --min-count 1 \
  --max-count 10 \
  --count 3

# Add Scale-out Rule
az monitor autoscale rule create \
  --resource-group myResourceGroup \
  --autoscale-name myAutoscaleSetting \
  --condition "Percentage CPU > 70 avg 5m" \
  --scale out 1

# Add Scale-in Rule
az monitor autoscale rule create \
  --resource-group myResourceGroup \
  --autoscale-name myAutoscaleSetting \
  --condition "Percentage CPU < 30 avg 5m" \
  --scale in 1

# Create Health Probe
az network lb probe create \
  --resource-group myResourceGroup \
  --lb-name myLoadBalancer \
  --name myHealthProbe \
  --protocol tcp \
  --port 80 \
  --interval 15 \
  --threshold 4

# Scheduled Scaling
az monitor autoscale profile create \
  --resource-group myResourceGroup \
  --autoscale-name myAutoscaleSetting \
  --name "Weekday Profile" \
  --recurrence week \
  --start 2024-01-01T08:00:00 \
  --timezone "Pacific Standard Time" \
  --min-count 5 \
  --max-count 15 \
  --count 8

Monitoring and Alerting

Scaling Metrics

Scaling Monitoring Setup

# Prometheus Auto-scaling Metrics
# Custom metrics for scaling decisions
const promClient = require('prom-client');

// Create custom metrics
const httpRequestsTotal = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

const activeConnections = new promClient.Gauge({
  name: 'active_connections',
  help: 'Number of active connections'
});

const responseTime = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
});

// Scaling decision logic
function shouldScale() {
  const cpuUsage = getCPUUsage();
  const memoryUsage = getMemoryUsage();
  const requestRate = getRequestRate();
  
  if (cpuUsage > 80 || memoryUsage > 85 || requestRate > 1000) {
    return 'scale_up';
  } else if (cpuUsage < 30 && memoryUsage < 40 && requestRate < 100) {
    return 'scale_down';
  }
  
  return 'no_change';
}

// CloudWatch Custom Metrics (AWS)
const AWS = require('aws-sdk');
const cloudwatch = new AWS.CloudWatch();

// Send custom metric
const params = {
  Namespace: 'Custom/Application',
  MetricData: [
    {
      MetricName: 'RequestRate',
      Value: requestRate,
      Unit: 'Count/Second',
      Timestamp: new Date()
    },
    {
      MetricName: 'ActiveConnections',
      Value: activeConnections,
      Unit: 'Count',
      Timestamp: new Date()
    }
  ]
};

cloudwatch.putMetricData(params, (err, data) => {
  if (err) console.log(err, err.stack);
  else console.log('Metric sent successfully');
});

# Kubernetes Custom Metrics
apiVersion: v1
kind: Service
metadata:
  name: custom-metrics-apiserver
  namespace: kube-system
spec:
  ports:
  - port: 443
    targetPort: 6443
  selector:
    app: custom-metrics-apiserver

# Custom Metrics API
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: custommetrics.metrics.k8s.io
spec:
  group: metrics.k8s.io
  versions:
  - name: v1beta1
    served: true
    storage: true
  scope: Namespaced
  names:
    plural: custommetrics
    singular: custommetric
    kind: CustomMetric

Best Practices

Scaling Strategy

Scaling Best Practices

Monitor multiple metrics
Set appropriate thresholds
Use gradual scaling
Implement health checks
Plan for capacity limits
Test scaling policies
Monitor scaling events

Common Mistakes

Aggressive scaling policies
Inadequate monitoring
No scaling limits
Poor health checks
Ignoring costs
No testing
Single metric scaling

Summary

Auto-scaling configuration involves several key components:

Scaling Types: Horizontal, vertical, predictive scaling
AWS: Auto Scaling Groups, launch templates, scaling policies
Kubernetes: HPA, VPA, custom metrics
Docker Swarm: Service scaling, rolling updates
Cloud Providers: GCP, Azure managed scaling
Monitoring: Custom metrics, alerting, health checks
Best Practices: Gradual scaling, multiple metrics, testing

Need More Help?

Struggling with auto-scaling configuration or need help implementing scalable infrastructure? Our cloud experts can help you design robust auto-scaling solutions.

Get Auto-scaling Help