
- Kubernetes Tutorial
- Kubernetes - Home
- Kubernetes - Overview
- Kubernetes - Architecture
- Kubernetes - Setup
- Kubernetes - Setup on Ubuntu
- Kubernetes - Images
- Kubernetes - Jobs
- Kubernetes - Labels & Selectors
- Kubernetes - Namespace
- Kubernetes - Node
- Kubernetes - Service
- Kubernetes - POD
- Kubernetes - Replication Controller
- Kubernetes - Replica Sets
- Kubernetes - Deployments
- Kubernetes - Volumes
- Kubernetes - Secrets
- Kubernetes - Network Policy
- Advanced Kubernetes
- Kubernetes - API
- Kubernetes - Kubectl
- Kubernetes - Kubectl Commands
- Kubernetes - Creating an App
- Kubernetes - App Deployment
- Kubernetes - Autoscaling
- Kubernetes - Dasard Setup
- Kubernetes - Helm Package Management
- Kubernetes - CI/CD Integration
- Kubernetes - Persistent Storage and PVCs
- Kubernetes - RBAC
- Kubernetes - Logging & Monitoring
- Kubernetes - Service Mesh with Istio
- Kubernetes - Backup and Disaster Recovery
- Managing ConfigMaps and Secrets
- Running Stateful Applications
- Multi-Cluster Management
- Security Best Practices
- Kubernetes CRDs
- Debugging Pods and Nodes
- K9s for Cluster Management
- Managing Taints and Tolerations
- Horizontal and Vertical Pod Autoscaling
- Minikube for Local Development
- Kubernetes in Docker
- Deploying Microservices
- Blue-Green Deployments
- Canary Deployments with Commands
- Troubleshooting Kubernetes with Commands
- Scaling Applications with Kubectl
- Advanced Scheduling Techniques
- Upgrading Kubernetes Clusters
- Kubernetes Useful Resources
- Kubernetes - Quick Guide
- Kubernetes - Useful Resources
- Kubernetes - Discussion
Kubernetes - Advanced Scheduling Techniques
As we build and scale our applications in Kubernetes, we begin to run into more complex scheduling needs. The default scheduler in Kubernetes does a great job out of the box, but there are many scenarios where we need to fine-tune how and where our Pods get placed in the cluster.
In this chapter, we'll explore advanced scheduling techniques in Kubernetes. These tools and strategies help us manage workloads more effectively, optimize resource usage, and ensure that applications meet performance, compliance, and high availability requirements.
What is Pod Scheduling?
Pod scheduling is the process of assigning a Pod to a suitable node. When we create a Pod, the Kubernetes Scheduler looks at the available nodes and decides where to place that Pod, based on various factors like available CPU, memory, node taints, affinity rules, and more.
In simple cases, Kubernetes just needs to find a node that meets the Pod's resource requests. But in production clusters, we often have extra rules or policies to consider â like spreading workloads across zones, running certain workloads only on specific machines, or isolating Pods from others.
Key Concepts in Advanced Scheduling
Let's go over the main tools we can use for advanced scheduling in Kubernetes:
- Node Affinity and Anti-Affinity
- Pod Affinity and Anti-Affinity
- Taints and Tolerations
- Scheduling Constraints and Topology Spread
- Resource Limits and Requests
- Priority Classes and Preemption
- Dynamic Scheduling (with the Descheduler)
We'll go through each of these with explanations and examples.
Node Affinity and Anti-Affinity
Node Affinity
Node Affinity is used when we want a Pod to run only on nodes that meet specific criteria, like having a certain label.
For example, let's say we have some GPU nodes labeled hardware=gpu. We want our AI workloads to run only on these nodes.
First, make sure the node has the right label:
$ kubectl label nodes node01 hardware=gpu
Output
node/node01 labeled
Next, create the following file called gpu-pod.yaml:
apiVersion: v1 kind: Pod metadata: name: gpu-pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: hardware operator: In values: - gpu containers: - name: pause image: k8s.gcr.io/pause
Apply it using:
$ kubectl apply -f gpu-pod.yaml
Output
pod/gpu-pod created
This rule ensures that the Pod will only run on nodes labeled with hardware=gpu.
Node Anti-Affinity
Node Anti-Affinity lets us do the opposite â we tell Kubernetes to avoid certain nodes.
For example, if we don't want our workload running on nodes with env=testing:
apiVersion: v1 kind: Pod metadata: name: no-testing-pod spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: env operator: NotIn values: - testing containers: - name: nginx image: nginx
Apply the file:
$ kubectl apply -f node-anti-affinity-pod.yaml
Output
pod/no-testing-pod created
This helps us avoid unwanted environments or isolate certain workloads.
Pod Affinity and Anti-Affinity
While Node Affinity targets the nodes, Pod Affinity and Anti-Affinity work at the Pod level.
Pod Affinity
With Pod Affinity, we can place Pods close to each other â for example, co-locating a frontend and backend for low-latency communication.
First, deploy a backend Pod with the label app=backend:
apiVersion: v1 kind: Pod metadata: name: backend labels: app: backend spec: containers: - name: nginx image: nginx
Apply it:
$ kubectl apply -f backend-pod.yaml
Output
pod/backend created
Now create the Pod with Pod Affinity. Save it as pod-affinity.yaml:
apiVersion: v1 kind: Pod metadata: name: affinity-pod spec: affinity: podAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - backend topologyKey: "kubernetes.io/hostname" containers: - name: busybox image: busybox command: ["sleep", "3600"]
Here, the Pod will be scheduled on a node where another Pod with app=backend is already running.
Apply it:
$ kubectl apply -f pod-affinity.yaml
Output
pod/affinity-pod created
Pod Anti-Affinity Example
Now let's make sure Pods don't land on the same node â useful for high availability.
First, deploy a frontend Pod with the label app=frontend. Save this as frontend-pod.yaml:
apiVersion: v1 kind: Pod metadata: name: frontend labels: app: frontend spec: containers: - name: nginx image: nginx
Apply it:
$ kubectl apply -f frontend-pod.yaml
Output
pod/frontend created
Create a second Pod that uses Pod Anti-Affinity. Save it as pod-anti-affinity.yaml:
apiVersion: v1 kind: Pod metadata: name: isolated-frontend spec: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchExpressions: - key: app operator: In values: - frontend topologyKey: "kubernetes.io/hostname" containers: - name: busybox image: busybox command: ["sleep", "3600"]
Now this Pod won't be placed on the same node as other frontend Pods.
Apply it:
$ kubectl apply -f pod-anti-affinity.yaml
Output
pod/isolated-frontend created
Taints and Tolerations
Taints let nodes repel Pods. This is useful when we want to dedicate nodes to special workloads.
Let's taint a node and run a Pod that tolerates it.
$ kubectl taint nodes node01 key=value:NoSchedule
This means no Pod will be scheduled on node01 unless it has a matching toleration.
To tolerate this taint, we add:
apiVersion: v1 kind: Pod metadata: name: toleration-pod spec: tolerations: - key: "special" operator: "Equal" value: "workload" effect: "NoSchedule" containers: - name: busybox image: busybox command: ["sleep", "3600"]
This way, we can set up dedicated GPU nodes, batch nodes, or even isolate workloads for security or compliance reasons.
Apply the Pod:
$ kubectl apply -f toleration-pod.yaml
Scheduling Constraints and Topology Spread
To ensure Pods are spread evenly across zones (or other topology domains), we can use Topology Spread Constraints. This is especially useful for high availability, so that a failure in one zone doesn't bring down all replicas.
To get started, create the following file named topology-spread-deployment.yaml:
apiVersion: apps/v1 kind: Deployment metadata: name: myapp spec: replicas: 3 selector: matchLabels: app: myapp template: metadata: labels: app: myapp spec: topologySpreadConstraints: - maxSkew: 1 topologyKey: topology.kubernetes.io/zone whenUnsatisfiable: DoNotSchedule labelSelector: matchLabels: app: myapp containers: - name: nginx image: nginx
This tells Kubernetes:
- Spread the Pods across zones, based on topology.kubernetes.io/zone.
- maxSkew: 1 means one zone can't have more than 1 Pod more than another.
Apply it with:
$ kubectl apply -f topology-spread-deployment.yaml
Output
deployment.apps/myapp created
Resource Limits and Requests
While these aren't scheduling strategies on their own, requests help Kubernetes pick nodes that have enough available resources. Limits help prevent Pods from using too much.
Create the following file named resource-requests-limits.yaml:
apiVersion: v1 kind: Pod metadata: name: resource-demo spec: containers: - name: app image: nginx resources: requests: cpu: "500m" memory: "256Mi" limits: cpu: "1" memory: "512Mi"
Explanation:
- Requests (CPU & memory) are used by the scheduler to place the Pod.
- Limits define how much the Pod can actually consume during runtime.
Apply it with:
$ kubectl apply -f resource-requests-limits.yaml
Output
pod/resource-demo created
Priority Classes and Preemption
If the cluster is full, high-priority workloads can preempt (evict) low-priority ones. This is useful when running mission-critical apps.
First, create the PriorityClass. Save this as high-priority-class.yaml:
apiVersion: scheduling.k8s.io/v1 kind: PriorityClass metadata: name: high-priority value: 100000 globalDefault: false description: "This class is for critical workloads"
Apply it:
$ kubectl apply -f high-priority-class.yaml
Output
priorityclass.scheduling.k8s.io/high-priority created
Now, use it in a Pod. Create this file as high-priority-pod.yaml:
apiVersion: v1 kind: Pod metadata: name: critical-app spec: priorityClassName: high-priority containers: - name: nginx image: nginx
Apply it:
$ kubectl apply -f high-priority-pod.yaml
Output
pod/critical-app created
If the cluster is under pressure, Kubernetes may evict lower-priority Pods to make space for this one.
The Descheduler (for Dynamic Scheduling)
Sometimes, the initial Pod placement isn't ideal. Maybe nodes are underused or overloaded.
Kubernetes doesn't automatically move Pods after they've been scheduled. But the Descheduler can help.
The Descheduler is a separate component that:
- Analyzes the current state of the cluster
- Identifies Pods that should be moved
- Evicts them safely, allowing the Scheduler to reschedule them
It supports strategies like:
- Removing duplicates from the same node
- Balancing resource usage
- Enforcing affinity/anti-affinity
You can install it as a Job or CronJob and run it periodically. It's great for dynamic environments where workloads change over time.
Conclusion
Kubernetes scheduling isn't just about placing Pods on nodes â it's about making smart, context-aware decisions that align with how our applications actually work in the real world.
Whether we're keeping critical services apart for high availability, co-locating tightly coupled workloads for performance, or writing our own scheduler to handle edge cases, the tools are all there. We just need to know how and when to use them.
Advanced techniques like affinity rules, taints and tolerations, topology spread constraints, and custom schedulers give us precision control over how our clusters behave â not just at startup, but as they evolve and scale.
By combining these strategies, we can build clusters that are more resilient, more efficient, and more tailored to our real operational goals.