`n

Auto-scaling Configuration - Complete Guide

Published: September 25, 2024 | Reading time: 25 minutes

Auto-scaling Overview

Auto-scaling automatically adjusts resources based on demand:

Scaling Types
# Scaling Types
- Horizontal scaling (scale out/in)
- Vertical scaling (scale up/down)
- Predictive scaling
- Scheduled scaling
- Manual scaling
- Target tracking scaling

AWS Auto Scaling

EC2 Auto Scaling Groups

AWS Auto Scaling Setup
# Create Launch Template
aws ec2 create-launch-template \
  --launch-template-name web-server-template \
  --launch-template-data '{
    "ImageId": "ami-0abcdef1234567890",
    "InstanceType": "t3.micro",
    "KeyName": "my-key",
    "SecurityGroupIds": ["sg-12345678"],
    "UserData": "'$(base64 -w 0 user-data.sh)'",
    "TagSpecifications": [{
      "ResourceType": "instance",
      "Tags": [{"Key": "Name", "Value": "web-server"}]
    }]
  }'

# Create Auto Scaling Group
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-asg \
  --launch-template LaunchTemplateName=web-server-template,Version=1 \
  --min-size 1 \
  --max-size 10 \
  --desired-capacity 2 \
  --vpc-zone-identifier "subnet-12345678,subnet-87654321" \
  --health-check-type EC2 \
  --health-check-grace-period 300

# Create Scaling Policy
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-asg \
  --policy-name scale-up-policy \
  --policy-type TargetTrackingScaling \
  --target-tracking-configuration '{
    "TargetValue": 70.0,
    "PredefinedMetricSpecification": {
      "PredefinedMetricType": "ASGAverageCPUUtilization"
    }
  }'

# Create CloudWatch Alarms
aws cloudwatch put-metric-alarm \
  --alarm-name "High CPU Utilization" \
  --alarm-description "Alarm when CPU exceeds 80%" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:autoscaling:region:account:scalingPolicy:policy-id:autoScalingGroupName/web-asg:policyName/scale-up-policy

Advanced AWS Scaling

Advanced Scaling Configuration
# Predictive Scaling
aws autoscaling put-scaling-policy \
  --auto-scaling-group-name web-asg \
  --policy-name predictive-scaling-policy \
  --policy-type PredictiveScaling \
  --predictive-scaling-configuration '{
    "MetricSpecification": {
      "TargetValue": 70.0,
      "PredefinedMetricSpecification": {
        "PredefinedMetricType": "ASGAverageCPUUtilization"
      }
    },
    "Mode": "ForecastAndScale",
    "SchedulingBufferTime": 10
  }'

# Scheduled Scaling
aws autoscaling put-scheduled-update-group-action \
  --auto-scaling-group-name web-asg \
  --scheduled-action-name scale-up-morning \
  --start-time "2024-01-01T08:00:00Z" \
  --recurrence "0 8 * * MON-FRI" \
  --min-size 3 \
  --max-size 15 \
  --desired-capacity 5

# Mixed Instance Types
aws autoscaling create-auto-scaling-group \
  --auto-scaling-group-name web-asg-mixed \
  --mixed-instances-policy '{
    "LaunchTemplate": {
      "LaunchTemplateSpecification": {
        "LaunchTemplateName": "web-server-template",
        "Version": "1"
      },
      "Overrides": [
        {"InstanceType": "t3.micro"},
        {"InstanceType": "t3.small"},
        {"InstanceType": "t3.medium"}
      ]
    },
    "InstancesDistribution": {
      "OnDemandBaseCapacity": 1,
      "OnDemandPercentageAboveBaseCapacity": 50,
      "SpotAllocationStrategy": "diversified"
    }
  }' \
  --min-size 1 \
  --max-size 10 \
  --desired-capacity 2

# Lifecycle Hooks
aws autoscaling put-lifecycle-hook \
  --auto-scaling-group-name web-asg \
  --lifecycle-hook-name scale-up-hook \
  --lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
  --default-result CONTINUE \
  --heartbeat-timeout 300

Kubernetes Auto-scaling

Horizontal Pod Autoscaler

Kubernetes HPA Setup
# HPA YAML Configuration
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa
  namespace: default
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 2
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Pods
        value: 2
        periodSeconds: 60
      - type: Percent
        value: 100
        periodSeconds: 60
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Pods
        value: 1
        periodSeconds: 60

# Apply HPA
kubectl apply -f hpa.yaml

# Check HPA Status
kubectl get hpa
kubectl describe hpa web-hpa

# Custom Metrics HPA
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-hpa-custom
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  minReplicas: 1
  maxReplicas: 10
  metrics:
  - type: Pods
    pods:
      metric:
        name: http_requests_per_second
      target:
        type: AverageValue
        averageValue: "100"

Vertical Pod Autoscaler

Kubernetes VPA Setup
# Install VPA
git clone https://github.com/kubernetes/autoscaler.git
cd autoscaler/vertical-pod-autoscaler/
./hack/vpa-install.sh

# VPA Configuration
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-vpa
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-deployment
  updatePolicy:
    updateMode: "Auto"  # Auto, Initial, Off
  resourcePolicy:
    containerPolicies:
    - containerName: web
      minAllowed:
        cpu: 100m
        memory: 128Mi
      maxAllowed:
        cpu: 1000m
        memory: 1Gi
      controlledResources: ["cpu", "memory"]

# Apply VPA
kubectl apply -f vpa.yaml

# Check VPA Status
kubectl get vpa
kubectl describe vpa web-vpa

# VPA Modes:
# - Off: Only provides recommendations
# - Initial: Sets resources on pod creation
# - Auto: Updates resources on pod restart

Docker Swarm Scaling

Docker Swarm Auto-scaling

Docker Swarm Scaling
# Initialize Docker Swarm
docker swarm init

# Create Service with Scaling
docker service create \
  --name web-service \
  --replicas 3 \
  --publish 80:3000 \
  --constraint 'node.role==worker' \
  --update-parallelism 1 \
  --update-delay 10s \
  --update-failure-action rollback \
  nginx:latest

# Scale Service
docker service scale web-service=5

# Update Service
docker service update \
  --image nginx:1.21 \
  --update-parallelism 2 \
  --update-delay 5s \
  web-service

# Rolling Update
docker service update \
  --image nginx:1.22 \
  --update-parallelism 1 \
  --update-delay 30s \
  --update-failure-action rollback \
  web-service

# Service Constraints
docker service create \
  --name web-service \
  --constraint 'node.labels.zone==us-east-1' \
  --constraint 'node.role==worker' \
  --replicas 3 \
  nginx:latest

# Service Placement
docker service create \
  --name web-service \
  --placement-pref 'spread=node.labels.zone' \
  --replicas 6 \
  nginx:latest

# Health Check
docker service create \
  --name web-service \
  --health-cmd "curl -f http://localhost:3000/health || exit 1" \
  --health-interval 30s \
  --health-timeout 10s \
  --health-retries 3 \
  --replicas 3 \
  nginx:latest

Google Cloud Auto-scaling

GCP Instance Groups

GCP Auto-scaling Setup
# Create Instance Template
gcloud compute instance-templates create web-template \
  --machine-type=e2-micro \
  --image-family=ubuntu-2004-lts \
  --image-project=ubuntu-os-cloud \
  --boot-disk-size=10GB \
  --boot-disk-type=pd-standard \
  --tags=web-server

# Create Managed Instance Group
gcloud compute instance-groups managed create web-instance-group \
  --base-instance-name web-instance \
  --size 3 \
  --template web-template \
  --zone us-central1-a

# Configure Auto-scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
  --zone us-central1-a \
  --max-num-replicas 10 \
  --min-num-replicas 1 \
  --target-cpu-utilization 0.7 \
  --cool-down-period 60s

# Create Health Check
gcloud compute health-checks create http web-health-check \
  --port 3000 \
  --request-path /health \
  --check-interval 30s \
  --timeout 5s \
  --healthy-threshold 2 \
  --unhealthy-threshold 3

# Apply Health Check to Instance Group
gcloud compute instance-groups managed set-autoscaling web-instance-group \
  --zone us-central1-a \
  --health-check web-health-check \
  --initial-delay 300s

# Custom Metrics Auto-scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
  --zone us-central1-a \
  --custom-metric-utilization metric-type=custom.googleapis.com/http_requests_per_second,target=100 \
  --max-num-replicas 20 \
  --min-num-replicas 2

# Scheduled Scaling
gcloud compute instance-groups managed set-autoscaling web-instance-group \
  --zone us-central1-a \
  --schedule "scale-up-morning" \
  --schedule-cron "0 8 * * MON-FRI" \
  --schedule-min-num-replicas 5 \
  --schedule-max-num-replicas 15

Azure Auto-scaling

Azure Virtual Machine Scale Sets

Azure Auto-scaling Configuration
# Create Resource Group
az group create --name myResourceGroup --location eastus

# Create Virtual Machine Scale Set
az vmss create \
  --resource-group myResourceGroup \
  --name myScaleSet \
  --image UbuntuLTS \
  --upgrade-policy-mode automatic \
  --instance-count 3 \
  --admin-username azureuser \
  --generate-ssh-keys

# Configure Auto-scaling Rules
az monitor autoscale create \
  --resource-group myResourceGroup \
  --resource myScaleSet \
  --resource-type Microsoft.Compute/virtualMachineScaleSets \
  --name myAutoscaleSetting \
  --min-count 1 \
  --max-count 10 \
  --count 3

# Add Scale-out Rule
az monitor autoscale rule create \
  --resource-group myResourceGroup \
  --autoscale-name myAutoscaleSetting \
  --condition "Percentage CPU > 70 avg 5m" \
  --scale out 1

# Add Scale-in Rule
az monitor autoscale rule create \
  --resource-group myResourceGroup \
  --autoscale-name myAutoscaleSetting \
  --condition "Percentage CPU < 30 avg 5m" \
  --scale in 1

# Create Health Probe
az network lb probe create \
  --resource-group myResourceGroup \
  --lb-name myLoadBalancer \
  --name myHealthProbe \
  --protocol tcp \
  --port 80 \
  --interval 15 \
  --threshold 4

# Scheduled Scaling
az monitor autoscale profile create \
  --resource-group myResourceGroup \
  --autoscale-name myAutoscaleSetting \
  --name "Weekday Profile" \
  --recurrence week \
  --start 2024-01-01T08:00:00 \
  --timezone "Pacific Standard Time" \
  --min-count 5 \
  --max-count 15 \
  --count 8

Monitoring and Alerting

Scaling Metrics

Scaling Monitoring Setup
# Prometheus Auto-scaling Metrics
# Custom metrics for scaling decisions
const promClient = require('prom-client');

// Create custom metrics
const httpRequestsTotal = new promClient.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

const activeConnections = new promClient.Gauge({
  name: 'active_connections',
  help: 'Number of active connections'
});

const responseTime = new promClient.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  buckets: [0.1, 0.3, 0.5, 0.7, 1, 3, 5, 7, 10]
});

// Scaling decision logic
function shouldScale() {
  const cpuUsage = getCPUUsage();
  const memoryUsage = getMemoryUsage();
  const requestRate = getRequestRate();
  
  if (cpuUsage > 80 || memoryUsage > 85 || requestRate > 1000) {
    return 'scale_up';
  } else if (cpuUsage < 30 && memoryUsage < 40 && requestRate < 100) {
    return 'scale_down';
  }
  
  return 'no_change';
}

// CloudWatch Custom Metrics (AWS)
const AWS = require('aws-sdk');
const cloudwatch = new AWS.CloudWatch();

// Send custom metric
const params = {
  Namespace: 'Custom/Application',
  MetricData: [
    {
      MetricName: 'RequestRate',
      Value: requestRate,
      Unit: 'Count/Second',
      Timestamp: new Date()
    },
    {
      MetricName: 'ActiveConnections',
      Value: activeConnections,
      Unit: 'Count',
      Timestamp: new Date()
    }
  ]
};

cloudwatch.putMetricData(params, (err, data) => {
  if (err) console.log(err, err.stack);
  else console.log('Metric sent successfully');
});

# Kubernetes Custom Metrics
apiVersion: v1
kind: Service
metadata:
  name: custom-metrics-apiserver
  namespace: kube-system
spec:
  ports:
  - port: 443
    targetPort: 6443
  selector:
    app: custom-metrics-apiserver

# Custom Metrics API
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
  name: custommetrics.metrics.k8s.io
spec:
  group: metrics.k8s.io
  versions:
  - name: v1beta1
    served: true
    storage: true
  scope: Namespaced
  names:
    plural: custommetrics
    singular: custommetric
    kind: CustomMetric

Best Practices

Scaling Strategy

Scaling Best Practices

  • Monitor multiple metrics
  • Set appropriate thresholds
  • Use gradual scaling
  • Implement health checks
  • Plan for capacity limits
  • Test scaling policies
  • Monitor scaling events

Common Mistakes

  • Aggressive scaling policies
  • Inadequate monitoring
  • No scaling limits
  • Poor health checks
  • Ignoring costs
  • No testing
  • Single metric scaling

Summary

Auto-scaling configuration involves several key components:

  • Scaling Types: Horizontal, vertical, predictive scaling
  • AWS: Auto Scaling Groups, launch templates, scaling policies
  • Kubernetes: HPA, VPA, custom metrics
  • Docker Swarm: Service scaling, rolling updates
  • Cloud Providers: GCP, Azure managed scaling
  • Monitoring: Custom metrics, alerting, health checks
  • Best Practices: Gradual scaling, multiple metrics, testing

Need More Help?

Struggling with auto-scaling configuration or need help implementing scalable infrastructure? Our cloud experts can help you design robust auto-scaling solutions.

Get Auto-scaling Help