Infrastructure as Code
Manage cloud infrastructure declaratively with Terraform — modules, remote state, workspaces, drift detection, and testing patterns for production IaC workflows.
This lesson is private to enrolled students. Please keep the link to yourself — thanks.
What You Will Learn
- Understand the Terraform core workflow and state model
- Structure reusable modules for team and organisation-wide use
- Manage remote state with locking in team environments
- Use workspaces and variable files for multi-environment deployments
- Detect and remediate infrastructure drift
- Test Terraform modules before production
1. The Terraform Core Workflow
Write Plan Apply
────── ────── ──────
.tf files → terraform plan → terraform apply
(shows what will change) (makes changes)
Every Terraform operation reads state — the source of truth about what currently exists:
# Initialise — download providers, configure backend
terraform init
# Preview changes without making them
terraform plan -out=tfplan
# Apply the saved plan (no interactive prompt)
terraform apply tfplan
# See what's currently in state
terraform state list
terraform state show aws_s3_bucket.artifacts
2. Module Structure
A well-structured Terraform module is reusable, self-documented, and testable.
modules/
└── kubernetes-cluster/
├── main.tf # Resources
├── variables.tf # Input variables
├── outputs.tf # Output values
├── versions.tf # Required providers + Terraform version
└── README.md # Auto-generated by terraform-docs
Example: EKS cluster module
# modules/kubernetes-cluster/variables.tf
variable "cluster_name" {
type = string
description = "Name of the EKS cluster"
}
variable "node_count" {
type = number
description = "Number of worker nodes"
default = 3
validation {
condition = var.node_count >= 2
error_message = "Production clusters need at least 2 nodes for HA."
}
}
variable "instance_type" {
type = string
default = "t3.medium"
}
# modules/kubernetes-cluster/outputs.tf
output "cluster_endpoint" {
description = "API server endpoint URL"
value = aws_eks_cluster.main.endpoint
}
output "cluster_ca_certificate" {
description = "Base64-encoded cluster CA certificate"
value = aws_eks_cluster.main.certificate_authority[0].data
sensitive = true
}
3. Remote State & Locking
Storing state locally is only safe for solo projects. Teams need remote state with locking.
# versions.tf — configure the S3 backend
terraform {
required_version = ">= 1.6"
backend "s3" {
bucket = "mycompany-terraform-state"
key = "platform/eks/terraform.tfstate"
region = "eu-west-1"
encrypt = true
dynamodb_table = "terraform-state-lock" # prevents concurrent applies
}
}
State locking is critical
Two engineers running
terraform apply simultaneously without a lock will corrupt state. Always use DynamoDB (AWS) or GCS with versioning (GCP) as your lock backend.
Workspaces for environments
# Create and switch between environments
terraform workspace new staging
terraform workspace new production
terraform workspace select staging
# Reference workspace in code
resource "aws_instance" "app" {
instance_type = terraform.workspace == "production" ? "m5.xlarge" : "t3.medium"
}
4. Drift Detection
Drift = reality diverged from your Terraform state (someone applied a hotfix manually, or a cloud event changed a resource).
# Detect drift — shows changes made outside Terraform
terraform plan -detailed-exitcode
# exit code 0 = no changes, 1 = error, 2 = changes detected (drift)
# In CI — alert when drift detected
terraform plan -detailed-exitcode
if [ $? -eq 2 ]; then
echo "DRIFT DETECTED — infrastructure diverged from state"
# Send alert to Slack / PagerDuty
fi
Schedule drift detection to run every 4 hours in CI to catch manual changes early.
5. Testing Terraform
| Tool | What it tests |
|---|---|
terraform validate |
Syntax and type checking |
terraform fmt -check |
Code formatting |
tflint |
Best practice linting (AWS/Azure/GCP rules) |
checkov |
Security policy scanning (CIS benchmarks) |
terratest |
Integration tests — real infra, real assertions |
# .github/workflows/terraform.yml
- name: Validate
run: terraform validate
- name: Lint
run: tflint --recursive
- name: Security scan
run: checkov -d . --framework terraform --soft-fail
- name: Plan
run: terraform plan -out=tfplan
# On merge to main only:
- name: Apply
if: github.ref == 'refs/heads/main'
run: terraform apply tfplan
6. Hands-on Exercise
- Write a Terraform module for an S3 bucket (or Azure Storage Account) with versioning and encryption enabled
- Add
validationblocks on the bucket name and region variables - Configure an S3/GCS remote backend with state locking
- Create
stagingandproductionworkspaces with different instance sizes - Add a GitHub Actions workflow that runs
terraform planon PRs andterraform applyon merge
Summary
| Concept | Key takeaway |
|---|---|
| Core workflow | Init → Plan → Apply — always preview before applying |
| Modules | Input variables + outputs + validation = reusable, safe abstractions |
| Remote state | Always use S3/GCS backend + DynamoDB/GCS lock in team environments |
| Workspaces | One set of configs, multiple environments via workspace interpolation |
| Drift detection | Schedule terraform plan in CI to catch manual changes early |
Discussion & Questions
Ask questions, share what you built, or leave feedback about this lesson. GitHub account required.