Skip to main content

CPU Scheduling

CPU scheduling decides which process/thread runs on the CPU at any given time. The scheduler is the OS component responsible for this.

Scheduling Conceptsโ€‹

CPUโ€“I/O Burst Cycleโ€‹

Processes alternate between:

  • CPU burst: Actively executing instructions.
  • I/O burst: Waiting for I/O to complete.

CPU-bound processes: Long CPU bursts, infrequent I/O (e.g., video encoding).
I/O-bound processes: Short CPU bursts, frequent I/O (e.g., web server, database).

Preemptive vs Non-Preemptiveโ€‹

Non-PreemptivePreemptive
DescriptionProcess runs until it voluntarily yields or blocksOS can forcibly remove a running process
LatencyUnpredictableBounded
ComplexitySimpleComplex (need synchronization)
ExamplesFCFS, SJF (non-preemptive)Round Robin, SRTF, Linux CFS

Scheduling Metricsโ€‹

MetricDefinition
CPU Utilization% of time CPU is busy (target: 40โ€“90%)
ThroughputProcesses completed per unit time
Turnaround TimeCompletion time โˆ’ Arrival time
Waiting TimeTime spent in ready queue
Response TimeFirst response โˆ’ Arrival time

Scheduling Algorithmsโ€‹

1. First-Come, First-Served (FCFS)โ€‹

  • Type: Non-preemptive
  • Processes scheduled in arrival order.
  • Convoy Effect: Short processes stuck behind a long one.
Process: P1(24ms) P2(3ms) P3(3ms), all arrive at t=0
Order: P1 โ†’ P2 โ†’ P3
Avg Wait: (0 + 24 + 27) / 3 = 17ms โ† poor

2. Shortest Job First (SJF)โ€‹

  • Type: Non-preemptive (or preemptive โ†’ SRTF)
  • Selects process with shortest next CPU burst.
  • Optimal average waiting time (non-preemptive: among non-preemptive algorithms).
  • Problem: Cannot know burst length in advance โ†’ estimate using exponential averaging:
ฯ„(n+1) = ฮฑ * t(n) + (1 โˆ’ ฮฑ) * ฯ„(n)

Where t(n) = actual nth burst, ฯ„(n) = predicted nth burst, ฮฑ โˆˆ [0, 1].

3. Shortest Remaining Time First (SRTF)โ€‹

  • Type: Preemptive version of SJF.
  • When a new process arrives, if its burst < remaining time of current process โ†’ preempt.
  • Optimal average waiting time overall.
  • Starvation risk for long processes.

4. Round Robin (RR)โ€‹

  • Type: Preemptive
  • Each process gets a fixed time quantum (typically 10โ€“100 ms).
  • After quantum expires, process goes to end of ready queue.
Quantum = 4ms
P1(24ms) P2(3ms) P3(3ms)

P1(4)โ†’P2(3)โ†’P3(3)โ†’P1(4)โ†’...โ†’P1(4)โ†’P1(4)โ†’P1(4)โ†’P1(4)
Avg wait = (6 + 4 + 7) / 3 โ‰ˆ 5.67ms

Quantum size matters:

  • Too large โ†’ degenerates to FCFS.
  • Too small โ†’ too many context switches (overhead).
  • Rule of thumb: 80% of CPU bursts should be shorter than the quantum.

5. Priority Schedulingโ€‹

  • Each process has a priority; highest priority runs first.
  • Preemptive or non-preemptive.
  • Starvation: Low-priority processes may never run.
  • Solution: Aging โ€” gradually increase priority of waiting processes.

6. Multilevel Queue Schedulingโ€‹

Ready queue is divided into separate queues (e.g., foreground/background). Each queue has its own scheduling algorithm.

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ† Interactive (RR, high priority)
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ† System processes
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ† Batch (FCFS, low priority)
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

7. Multilevel Feedback Queue (MLFQ)โ€‹

Processes can move between queues based on behavior:

  • Use a lot of CPU โ†’ demoted to lower queue (longer quantum).
  • Wait too long โ†’ promoted (aging).

This is used in many real OS schedulers.


Linux Completely Fair Scheduler (CFS)โ€‹

Linux uses CFS (since kernel 2.6.23) for normal processes (SCHED_NORMAL).

Core Ideaโ€‹

Each process maintains a virtual runtime (vruntime) โ€” how much CPU time it has used, weighted by priority (nice value). The scheduler always picks the process with the lowest vruntime.

vruntime += actual_runtime * (NICE_0_LOAD / weight)

Higher-priority (lower nice) processes have higher weight โ†’ vruntime grows slower โ†’ they get more CPU.

Data Structureโ€‹

CFS uses a red-black tree (sorted by vruntime) for O(log n) process selection. The leftmost node is always the next process to run.

Nice Valuesโ€‹

  • Range: โˆ’20 (highest priority) to +19 (lowest priority).
  • Each step โ‰ˆ 10% more/less CPU time.
nice -n 10 ./my_program # start with nice=10
renice -n -5 -p 1234 # change running process

Scheduling Classes (Priority Order)โ€‹

  1. SCHED_DEADLINE โ€” EDF (Earliest Deadline First), real-time.
  2. SCHED_FIFO / SCHED_RR โ€” Real-time, fixed priority 1โ€“99.
  3. SCHED_NORMAL / SCHED_BATCH / SCHED_IDLE โ€” CFS.

Real-Time Schedulingโ€‹

Hard vs Soft Real-Timeโ€‹

Hard Real-TimeSoft Real-Time
Deadline missCatastrophic (safety-critical)Degraded quality
ExamplesAirbags, pacemakersVideo streaming, audio

Algorithmsโ€‹

  • Rate Monotonic (RM): Static priority; shorter period โ†’ higher priority. Optimal for static-priority preemptive scheduling.
  • Earliest Deadline First (EDF): Dynamic priority; closest deadline โ†’ highest priority. Optimal for preemptive scheduling (can achieve 100% utilization).

Multiprocessor Schedulingโ€‹

Asymmetric vs Symmetricโ€‹

  • AMP (Asymmetric): One master CPU handles scheduling, others run user code.
  • SMP (Symmetric): Each CPU self-schedules from a shared ready queue.

Processor Affinityโ€‹

The tendency to keep a process on the same CPU to exploit warm cache:

  • Soft affinity: OS tries but does not guarantee same CPU.
  • Hard affinity: Process is pinned to specific CPU(s).
taskset -c 0,1 ./program # Pin to CPU 0 and 1

Load Balancingโ€‹

  • Push migration: Overloaded CPU pushes tasks to idle CPUs.
  • Pull migration: Idle CPU pulls tasks from busy CPUs.

NUMA-Aware Schedulingโ€‹

In Non-Uniform Memory Access systems, accessing local memory is faster. The scheduler tries to keep processes on CPUs near their memory.


Common Interview Questionsโ€‹

Q1: What is the difference between preemptive and non-preemptive scheduling?โ€‹

Preemptive: OS can interrupt a running process (e.g., timer interrupt). Non-preemptive: Process runs until it voluntarily yields, completes, or blocks. Preemptive provides better responsiveness but requires synchronization mechanisms.

Q2: Why is SJF optimal but rarely used in practice?โ€‹

SJF minimizes average waiting time but requires knowing the next CPU burst length in advance, which is impossible. It can be approximated using exponential averaging of past bursts.

Q3: What is the convoy effect?โ€‹

In FCFS, a CPU-bound process with a long burst holds the CPU while many short I/O-bound processes wait. This leads to poor CPU and I/O device utilization.

Q4: How does Linux CFS prevent starvation?โ€‹

CFS tracks vruntime for all processes. A sleeping process's vruntime is set to min_vruntime of the tree when it wakes, so it gets scheduled quickly but doesn't completely skip ahead. No process is ever fully starved.

Q5: What is the difference between SCHED_FIFO and SCHED_RR in Linux?โ€‹

Both are real-time. SCHED_FIFO: process runs until it blocks or yields โ€” no time quantum. SCHED_RR: like FIFO but with a time quantum; when it expires, the process goes to the back of its priority queue.

Q6: What is processor affinity and why does it matter?โ€‹

Affinity keeps a process on the same CPU to benefit from hot cache data. Cache misses on migration are expensive (100s of ns). Linux provides sched_setaffinity() syscall.

Q7: How does the time quantum in Round Robin affect performance?โ€‹

  • Large quantum โ†’ behaves like FCFS, high turnaround for short jobs.
  • Small quantum โ†’ more responsive but high context-switch overhead.
  • Ideal: quantum just larger than a typical interaction burst.

Q8: What is aging in scheduling?โ€‹

Aging incrementally increases the priority of a process that has been waiting a long time, preventing starvation. For example, increase priority by 1 every 15 minutes of waiting.


Advanced Editorial Pass: CPU Scheduling, Fairness, and Tail Latencyโ€‹

Senior Engineering Focusโ€‹

  • Reason about p99 latency through scheduler behavior, not average CPU utilization.
  • Account for cgroup quotas and NUMA effects in containerized deployments.
  • Align workload priority policy with business-critical paths.

Failure Modes to Anticipateโ€‹

  • CPU starvation due to noisy-neighbor effects.
  • Misleading utilization metrics masking run-queue congestion.
  • Latency regressions after container CPU limit changes.

Practical Heuristicsโ€‹

  1. Track run-queue length and throttling metrics with request latency.
  2. Pin high-priority workloads carefully; validate with load tests.
  3. Re-check scheduler assumptions after kernel/runtime upgrades.

Compare Nextโ€‹