Lesson 7 60 min Helm KEDA Operators Scheduling

Kubernetes Advanced Platform

Go beyond the basics with Helm chart authoring, KEDA event-driven autoscaling, custom operators, and advanced scheduling — the building blocks of a self-service platform.

This lesson is private to enrolled students. Please keep the link to yourself — thanks.

What You Will Learn

Author production-quality Helm charts with best-practice templating
Configure KEDA for event-driven autoscaling beyond CPU/memory
Understand when and how to write custom operators
Use node affinity, taints, and tolerations for workload placement
Implement Pod Disruption Budgets for high-availability guarantees

1. Helm — The Package Manager for Kubernetes

Helm bundles Kubernetes manifests into reusable, versioned packages called charts.

Chart structure

my-service/
├── Chart.yaml          # Chart metadata (name, version, dependencies)
├── values.yaml         # Default configuration values
├── values-prod.yaml    # Production overrides
├── templates/
│   ├── deployment.yaml
│   ├── service.yaml
│   ├── ingress.yaml
│   ├── hpa.yaml
│   ├── configmap.yaml
│   └── _helpers.tpl    # Reusable template fragments
└── charts/             # Chart dependencies (subcharts)

Templating best practices

# templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: 
  labels:
spec:
  replicas: 
  # HPA manages replicas — don't set a fixed count
  template:
    spec:
      containers:
        - name: 
          image: ":"
          resources:

# values.yaml
replicaCount: 2

image:
  repository: ghcr.io/myorg/my-service
  tag: ""          # defaults to Chart.AppVersion

resources:
  requests:
    cpu: 100m
    memory: 128Mi
  limits:
    cpu: 500m
    memory: 512Mi

autoscaling:
  enabled: true
  minReplicas: 2
  maxReplicas: 20
  targetCPUUtilizationPercentage: 70

2. KEDA — Event-Driven Autoscaling

HPA scales on CPU/memory. KEDA scales on any metric — queue depth, HTTP request rate, database row count, Prometheus queries.

apiVersion: keda.sh/v1alpha1
kind: ScaledObject
metadata:
  name: order-processor-scaler
spec:
  scaleTargetRef:
    name: order-processor
  minReplicaCount: 0           # scale to zero when idle!
  maxReplicaCount: 50
  triggers:
    # Scale based on SQS queue depth
    - type: aws-sqs-queue
      metadata:
        queueURL: https://sqs.eu-west-1.amazonaws.com/123/order-queue
        queueLength: "10"      # 1 replica per 10 messages
        awsRegion: eu-west-1

    # Scale based on Prometheus metric
    - type: prometheus
      metadata:
        serverAddress: http://prometheus:9090
        metricName: http_requests_pending
        query: sum(http_requests_pending{app="order-processor"})
        threshold: "100"

📌

Scale-to-zero saves real money Batch processing workers that are idle 80% of the time can scale to zero with KEDA. On a 10-node cluster, scaling 20 idle workers to zero can free 4+ nodes.

3. Custom Operators

When you find yourself manually doing the same Kubernetes operations repeatedly, write an operator to automate it.

What operators do

Human operator:                  Kubernetes operator:
Check app is healthy    →     Controller watches CRD
Take backup before       →    Reconcile loop runs
   upgrading               →    on every change
Apply schema migration   →
Rolling restart pods     →    Automated, auditable,
Monitor rollout          →    tested, repeatable

When to write an operator

Write an operator when:

You’re managing stateful workloads (databases, message queues)
You need domain-specific rollout logic beyond what Deployments offer
You want to expose a simplified CRD API to application teams

Use tools like Operator SDK or Kubebuilder — don’t write the controller scaffolding by hand.

4. Workload Placement

Node affinity — prefer or require specific nodes

spec:
  affinity:
    nodeAffinity:
      # Hard requirement
      requiredDuringSchedulingIgnoredDuringExecution:
        nodeSelectorTerms:
          - matchExpressions:
              - key: node-type
                operator: In
                values: [gpu]
      # Soft preference
      preferredDuringSchedulingIgnoredDuringExecution:
        - weight: 80
          preference:
            matchExpressions:
              - key: region
                operator: In
                values: [eu-west-1a]

Taints & tolerations — dedicated node pools

# Taint a node — only pods that tolerate it will be scheduled here
kubectl taint nodes node-pool-gpu dedicated=gpu:NoSchedule

# Pod toleration — this pod can run on the tainted node
tolerations:
  - key: dedicated
    operator: Equal
    value: gpu
    effect: NoSchedule

5. Pod Disruption Budgets

Prevent Kubernetes from evicting too many pods at once during node drains or cluster upgrades.

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: api-service-pdb
spec:
  selector:
    matchLabels:
      app: api-service
  minAvailable: 2    # always keep at least 2 pods running
  # OR:
  # maxUnavailable: 1    # allow at most 1 pod to be unavailable

⚠️

PDBs without HPA can block cluster upgrades If minAvailable: 2 but you only have 2 replicas, a node drain will be blocked. Ensure minAvailable < replicas, and keep replicas ≥ 3 for critical services.

6. Hands-on Exercise

Create a Helm chart for a simple web service with configurable replicas, resources, and ingress
Add a _helpers.tpl with standard labels (app.kubernetes.io/name, version, managed-by)
Install KEDA and create a ScaledObject that scales a worker deployment based on a Redis list length
Add a PodDisruptionBudget and test it by draining a node (kubectl drain --ignore-daemonsets)
Use node affinity to ensure your database pod lands on a node with SSD storage (node-type=ssd)

Summary

Concept	Key takeaway
Helm	Charts = versioned, templated K8s manifests — use `_helpers.tpl` for DRY
KEDA	Scale on anything — queue depth, DB rows, custom Prometheus metrics
Operators	Automate complex stateful operations — use Operator SDK, not raw controllers
Affinity	`required` for hard rules, `preferred` for soft preferences
PDB	Always add a PDB for critical services — prevents unsafe eviction

Discussion & Questions

Ask questions, share what you built, or leave feedback about this lesson. GitHub account required.