Networking Basics Every DevOps Engineer Needs to Know
Networking Basics Every DevOps Engineer Needs to Know
Primary keyword: networking basics for DevOps engineers Secondary keywords: DevOps networking concepts, TCP/IP DevOps, DNS for DevOps, Kubernetes networking explained
Introduction
Most DevOps tutorials jump straight from “install Kubernetes” to “configure ingress” without explaining what’s actually happening in between. That gap is where production incidents live. When a pod can’t reach a database, when an ingress isn’t routing correctly, when a service mesh is dropping connections — the engineers who trace these issues quickly are the ones who understand what’s happening at the networking layer. This guide covers the networking fundamentals that come up most often in real DevOps work.
IP Addressing and Subnets
Every device on a network has an IP address. In cloud and container environments, you’ll work with private IP ranges constantly.
Private IP ranges (RFC 1918):
10.0.0.0/8— large enterprise networks, common in VPCs172.16.0.0/12— Docker’s default bridge network range192.168.0.0/16— home networks, smaller environments
CIDR notation describes a block of IP addresses. 10.0.1.0/24 means the first 24 bits are the network, leaving 8 bits for hosts — that’s 256 addresses (254 usable, minus network and broadcast).
10.0.1.0/24 → 10.0.1.0 to 10.0.1.255 (256 addresses)
10.0.0.0/16 → 10.0.0.0 to 10.0.255.255 (65536 addresses)
10.0.1.0/28 → 10.0.1.0 to 10.0.1.15 (16 addresses)
Why it matters: When designing a VPC in AWS or Azure, you’re choosing CIDR ranges for your subnets. When Kubernetes assigns pod IPs, it carves them from a pod CIDR. When two services can’t communicate, overlapping CIDRs are a common culprit.
DNS — The Part Most Engineers Skip
DNS is the system that translates hostnames to IP addresses. Understanding it prevents a category of production issues that look mysterious until you know what’s happening.
How a DNS lookup works:
- Your process asks: “What’s the IP for
api.example.com?” - The resolver checks its local cache (respecting TTL)
- If not cached, it queries the configured DNS server (usually your VPC’s resolver)
- The resolver queries up the hierarchy until it gets an answer
Key concepts:
TTL (Time to Live) — how long a DNS response is cached before being re-queried. If you change a DNS record and services still see the old IP, they’re serving from cache. TTL expiry is the fix — not restarting the service.
# Check TTL and actual response
dig api.example.com
# Output shows TTL in seconds:
# api.example.com. 300 IN A 10.0.1.42
# ^^^
# TTL: cached for 300 seconds
ndots — a Kubernetes-specific DNS behavior. Kubernetes appends search domains to hostnames with fewer dots than ndots (default: 5). myservice becomes myservice.default.svc.cluster.local before the lookup. Understanding this prevents “DNS resolution failed” debugging sessions.
Common DNS commands:
nslookup myservice.default.svc.cluster.local # basic lookup
dig myservice.default.svc.cluster.local # detailed response
dig @10.96.0.10 myservice.default.svc.cluster.local # query specific server
# 10.96.0.10 is kube-dns in a default Kubernetes cluster
TCP and the Connection Model
TCP is the protocol under most of your application traffic. Understanding how connections work explains a class of production problems.
The TCP handshake:
- Client sends SYN
- Server responds SYN-ACK
- Client sends ACK — connection established
Connection states you’ll see in ss or netstat:
ss -tulnp
# LISTEN — waiting for connections
# ESTABLISHED — active connection
# TIME_WAIT — connection just closed, waiting to ensure last packet received
# CLOSE_WAIT — remote side closed, local side hasn't yet
TIME_WAIT sockets are normal and temporary. CLOSE_WAIT sockets that accumulate indicate your application isn’t closing connections properly — a common bug in services that don’t handle connection lifecycle correctly.
Ports:
< 1024— privileged ports, require root (or CAP_NET_BIND_SERVICE) to bind1024–49151— registered ports (Postgres: 5432, Redis: 6379, HTTP: 8080)49152–65535— ephemeral ports, used by clients for outbound connections
Why it matters: “Connection refused” means nothing is listening on that port. “Connection timed out” means the packet isn’t reaching the destination (firewall, security group, wrong IP). These are different problems with different fixes.
Load Balancers: L4 vs L7
This distinction matters for Kubernetes ingress, service meshes, and cloud load balancer selection.
Layer 4 (TCP/UDP) load balancer:
- Routes based on IP and port
- Doesn’t understand HTTP
- Fast, minimal overhead
- Can’t do path-based routing or header manipulation
- Examples: AWS NLB, kube-proxy (default Kubernetes service routing)
Layer 7 (HTTP/HTTPS) load balancer:
- Routes based on HTTP content: path, headers, host header
- Can terminate TLS
- Can do canary routing, auth injection, rate limiting
- Higher overhead, but much more capable
- Examples: AWS ALB, NGINX, Envoy, Istio, Kubernetes Ingress
L4: client → LB → service-pod (based on IP:port only)
L7: client → LB (read HTTP) → route to /api/* or /web/* (based on path)
Why it matters: When you’re configuring Kubernetes Ingress, you’re using an L7 load balancer. When you set up a Kubernetes Service of type LoadBalancer, you’re typically getting an L4 or L7 cloud load balancer depending on your cloud and annotations. Knowing the difference helps you choose correctly and debug routing issues.
TLS — Certificates and Encryption
Every production service should be encrypted in transit. TLS is the protocol that provides this.
How TLS works (simplified):
- Client connects to server, requests a TLS session
- Server presents its certificate (which includes its public key)
- Client verifies the certificate against trusted CAs
- They negotiate a shared session key using asymmetric crypto
- Subsequent communication uses symmetric encryption with the session key
Key concepts:
Certificate chain — a certificate is signed by a CA (Certificate Authority). Your server cert is signed by an intermediate CA, which is signed by a root CA that browsers and systems trust. The full chain must be presented.
SNI (Server Name Indication) — allows a single IP to serve multiple TLS certificates. The client includes the hostname in the TLS handshake, so the server knows which certificate to present. Essential for multi-tenant and multi-domain setups.
Common TLS issues:
# Inspect a certificate
openssl s_client -connect api.example.com:443 -servername api.example.com
# Check certificate expiry
echo | openssl s_client -connect api.example.com:443 2>/dev/null | \
openssl x509 -noout -dates
Why it matters: Certificate expiry is a recurring production incident. Cert-manager in Kubernetes automates renewal, but you need to understand the underlying mechanism to configure it correctly and debug it when it fails.
Kubernetes Networking Specifically
Kubernetes adds several networking layers on top of standard Linux networking.
Pod networking:
- Every pod gets a unique IP from the pod CIDR
- Pods can communicate directly without NAT (flat network model)
- A CNI plugin (Calico, Flannel, Cilium) implements this
Services:
- A stable virtual IP that load-balances across pod replicas
- Implemented by kube-proxy using iptables or IPVS rules
ClusterIP— reachable only within the clusterNodePort— also exposed on every node’s IP at a high portLoadBalancer— provisions a cloud load balancer
DNS in Kubernetes:
- kube-dns (CoreDNS) provides service discovery
myservice.mynamespace.svc.cluster.localresolves to the service’s ClusterIP- Within the same namespace, just
myserviceworks (due to search domains)
# From inside a pod, debug DNS
kubectl exec -it mypod -- nslookup kubernetes.default
kubectl exec -it mypod -- curl http://myservice.mynamespace/health
NetworkPolicies: By default, all pods can talk to all other pods. NetworkPolicies restrict this:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-only-api
namespace: production
spec:
podSelector:
matchLabels:
app: database
ingress:
- from:
- podSelector:
matchLabels:
app: api
This policy allows only pods with app: api to reach pods with app: database. Everything else is denied.
Conclusion
Networking fluency is the skill that most separates DevOps engineers who can trace production issues from those who can’t. DNS TTL, TCP connection states, L4 vs L7 routing, TLS certificate chains — these aren’t theoretical. They appear in real incidents regularly. Build this knowledge by practicing the diagnostic commands in a real environment, and the next time a service can’t reach another service, you’ll know exactly where to look.
Want hands-on networking labs as part of a structured DevOps curriculum? The full curriculum is at ashoklabs.com.
16-Week Bootcamp
AI-Augmented Platform Engineering Bootcamp
Go deeper — hands-on Terraform, Kubernetes, GitOps, and AI-assisted operations. Build a complete internal developer platform from scratch.
Leave a Comment