Load balancing is the method of distributing network traffic equally across a pool of resources that support an application.
Load balancers improves an application’s availability, scalability, security, and performance.
Increase the fault tolerance of systems by automatically detecting server problems and redirecting client traffic to available servers.
Direct network traffic intelligently among multiple servers
Can contain built-in security features to add another layer of security to Internet applications.
Improve application performance by increasing response time and reducing network latency.
Static load balancing algorithms follow fixed rules and are independent of the current server state.
In the round-robin method, an authoritative name server does the load balancing by returning the IP addresses of different servers in the server farm turn by turn or in a round-robin fashion.
In weighted round-robin load balancing, administrators can assign different weights to each server based on their priority or capacity. Servers with higher weights will receive more incoming application traffic from the name server.
In the IP hash method, the load balancer performs a mathematical computation, called hashing, on the client IP address. It converts the client IP address to a number, which is then mapped to individual servers.
Dynamic load balancing algorithms examine the current state of the servers before distributing traffic.
A connection is an open communication channel between a client and a server. When the client sends the first request to the server, they authenticate and establish an active connection between each other. In the least connection method, the load balancer checks which servers have the fewest active connections and sends traffic to those servers. This method assumes that all connections require equal processing power for all servers.
Weighted least connection algorithms assume that some servers can handle more active connections than others. Therefore, different weights can be assigned to each server, and the load balancer sends the new client requests to the server with the least connections by capacity.
The response time is the total time that the server takes to process the incoming requests and send a response. The least response time method combines the server response time and the active connections to determine the best server. Load balancers use this algorithm to ensure faster service for all users.
In the resource-based method, load balancers distribute traffic by analyzing the current server load. Specialized software called an agent runs on each server and calculates usage of server resources, such as its computing capacity and memory. Then, the load balancer checks the agent for sufficient free resources before distributing traffic to that server.
ipvs.| Cloud / System | Component | Purpose | Kubernetes Equivalent | Where It Runs |
|---|---|---|---|---|
| AWS | Listener | Accepts external connections on port 80/443 and forwards to a target group | Service port definition | Inside cluster via kube-proxy |
| GCP | Forwarding Rule | Maps an external IP and port to a backend service (similar to AWS Listener) | Service port definition | Managed by GCP load-balancer control plane |
| AWS | Target Group | Defines backend EC2 instances or Pods (via EKS integration) | Service endpoints (Pod IPs) | Managed by kube-controller-manager |
| GCP | Backend Service | Defines backends (VMs, MIGs, or GKE Pods) and load balancing behavior | Service endpoints (Pod IPs) | Managed by GKE controller |
| AWS | Health Check | Checks targets’ health via HTTP/TCP pings | Pod readiness/liveness probes | Runs inside Pods |
| GCP | Health Check Probe | Similar to AWS, integrated with backend service | Pod readiness/liveness probes | Runs inside Pods |
| AWS | Elastic Load Balancer (NLB / ALB) | Front-end L4/L7 routing to healthy targets | Service type=LoadBalancer (via AWS Cloud Controller Manager) | In AWS Cloud |
| GCP | Network / HTTP(S) Load Balancer | Front-end L4 (Network) or L7 (HTTP(S)) routing | Service type=LoadBalancer (via GCP Cloud Controller Manager) | In GCP Cloud |
| AWS | Failover / Auto Scaling | Replaces unhealthy nodes using EC2 Auto Scaling groups | K8s control plane (scheduler, replicasets) | Cluster-wide |
| GCP | Managed Instance Group + Autoscaler | Replaces failed nodes / Pods using GCE or GKE autoscaling | K8s control plane (Horizontal Pod Autoscaler, replicasets) | Cluster-wide |
NodePort has load-balancing behavior, but with important caveats. - DIY Load Balancing
kube-proxy evenly distributes incoming connections across all ready Pods, regardless of which node the traffic lands on.IP:NodePort works as a gateway to all Pods.This should be done on a Kubernetes cluster with two worker nodes (3 in total at least for CloudLab)
externalTrafficPolicy to local for NodePort so that a node only serves traffic if it has a ready pod for the Servicenodeport.yaml with the following contents:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
# A1) Namespace
apiVersion: v1
kind: Namespace
metadata:
name: lb-nodeport
---
# A2) NGINX Deployment with 2 replicas and hard anti-affinity to split nodes
apiVersion: apps/v1
kind: Deployment
metadata:
name: web
namespace: lb-nodeport
labels: { app: web } # We can do this on single line with curly bracket
spec:
replicas: 2
selector:
matchLabels: { app: web }
template:
metadata:
labels: { app: web }
spec:
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels: { app: web }
topologyKey: "kubernetes.io/hostname"
containers:
- name: nginx
image: nginx:1.27-alpine
ports: [{ containerPort: 80 }]
# Make the default page show which node/hostname served it
args: ["sh","-c","echo \"Node: $(NODE_NAME) | Pod: $(hostname)\" > /usr/share/nginx/html/index.html && nginx -g 'daemon off;'"]
env:
- name: NODE_NAME
valueFrom:
fieldRef: { fieldPath: spec.nodeName }
readinessProbe:
httpGet: { path: "/", port: 80 }
initialDelaySeconds: 2
periodSeconds: 3
livenessProbe:
httpGet: { path: "/", port: 80 }
initialDelaySeconds: 10
periodSeconds: 10
---
# A3) NodePort Service (L4), externalTrafficPolicy=Local to bind traffic to local pod only
apiVersion: v1
kind: Service
metadata:
name: web-np
namespace: lb-nodeport
spec:
type: NodePort
externalTrafficPolicy: Local
selector: { app: web }
ports:
- name: http
port: 80
targetPort: 80
nodePort: 30080
1
2
kubectl apply -f nodeport.yaml`
kubectl -n lb-nodeport get pods -o wide
NODE column.
1
2
3
# Replace with your node IPs (not the pod IPs)
curl -s http://<NODE1_IP>:30080/
curl -s http://<NODE2_IP>:30080/
1
2
3
# Find the pod that sits on NODE1
kubectl -n lb-nodeport get pods -o wide
kubectl -n lb-nodeport delete pod <pod-on-NODE1>
1
2
curl -s --max-time 2 http://<NODE1_IP>:30080/ # likely times out / connection refused
curl -s --max-time 2 http://<NODE2_IP>:30080/ # still serves traffic
externalTrafficPolicy: Local, NODE1 no longer has a ready endpoint, therefore its NodePort fails. NODE2 still works.Attempt to recreate the above example in a different namespace, this time without externalTrafficPolicy: Local and observe what happens when a pod replica fails.