Interactive Explainer · For experienced engineers

Operating Systems, VisualizedSchedule. Page. Lock. Crash.

The OS is hiding under every line of code you ship. Twelve hands-on sections to make the kernel obvious — and pass the interview round while you're at it.

user space · ring 3unprivileged

↑ syscall ↓

kernel space · ring 0full hardware access

SchedulerVM / PagingFilesystemNetwork stackDrivers

Basics Kernel Processes Scheduling Paging Replacement Concurrency Deadlock Filesystem IPC Real OS Interview

Scroll to explore ↓

An operating system is the software that owns your hardware. It decides which program gets the CPU next, where each byte of memory lives, when a disk read returns, and which process is allowed to send packets. Every keystroke, every API call, every database query you ship in production travels through the OS — usually thousands of times per second.

Most of that machinery is invisible until something goes wrong. Then suddenly you're reading a stack trace pinned in futex(FUTEX_WAIT), staring at vmstat output where si/so is climbing, or watching p99 latency double after a deploy you can't correlate to anything obvious. The OS is showing through.

This page is built for engineers who've shipped real systems and want their OS knowledge tight enough for a senior interview — and tight enough to debug production. You won't read about schedulers; you'll race them. You won't learn deadlock from a diagram; you'll build one with your hands.

New to all of this? Start with The Basics below — plain prose, no jargon, gets you the vocabulary you need for everything after.

Start Here — The Basics

Everything below assumes you know what a process, a thread, and memory are. If those words feel fuzzy, this section gets you there from scratch — patient, no jargon dumps.

Already comfortable? Skip straight to The Kernel Boundary.

1. What's actually inside a computer?

Forget software for a moment. The physical machine in front of you has three things that matter for our story: a CPU, some RAM, and a disk. Everything an operating system does is ultimately about juggling those three.

CPU

the brain

Does the actual thinking — adds, compares, copies bytes around. Your laptop CPU does this billions of times per second.

RAM

the desk

Where the CPU keeps things it's working on right now. Big, fast, but forgets everything when power is cut.

Disk (SSD/HDD)

the filing cabinet

Where files live permanently. Survives power loss. Much slower than RAM — that's why we copy things into RAM to use them.

A simple way to picture it: the CPU is the brain, RAM is the desk where it spreads out the papers it's currently working on, and the disk is the filing cabinet down the hall. Anything the CPU wants to use, it first pulls onto the desk. That's why opening a big file feels slow — it's being walked over from the cabinet to the desk.

2. Bits, bytes, and how computers count

Every wire inside the CPU is either carrying voltage or it isn't. We call those two states 1 and 0. One such state is a bit — the tiniest possible piece of information.

Bits alone aren't very useful, so we group them. Eight bits make a byte. One byte can hold a number from 0 to 255, or one ASCII character like ‘A’. Almost everything you measure on a computer is a count of bytes:

1 KB ≈ 1,000 bytes — a paragraph of text
1 MB ≈ 1 million bytes — a phone photo
1 GB ≈ 1 billion bytes — a movie
1 TB ≈ 1 trillion bytes — your laptop's disk

When someone says your laptop has “16 gigs of RAM,” they mean the CPU's desk can hold roughly 16 billion bytes at once. That's a lot of mailboxes (more on those in section 10).

3. How a CPU actually works

A CPU only knows how to do very small things: add two numbers, compare two numbers, copy a byte from here to there, jump to a different instruction. Each one of those is called an instruction. Your entire program — Chrome, Python, this web page — is just millions of those little instructions strung together.

The CPU runs them one at a time, very quickly. The clock is what keeps the beat. A 3 GHz CPU ticks 3 billion times per second, and roughly one instruction happens per tick. That's how a chip the size of a stamp ends up running an entire video game.

Most CPUs have multiple cores. Each core is a complete little CPU on its own. A “8-core” chip can be running 8 instructions at the same instant — eight different programs, or eight parts of the same program. This is real hardware parallelism, not a trick.

4. Registers — the CPU's pockets

Inside each core there are about 16–32 tiny storage slots called registers. Each one holds one number. That's it — a few dozen numbers total. But they're the fastest memory that exists in the whole computer, because they're wired directly to the part of the CPU doing the math.

When the CPU adds two numbers, it doesn't reach into RAM. It loads both into registers, adds them in a register, then (eventually) writes the result back to RAM. Think of registers as the CPU's pockets: tiny, but right there, no walking required.

5. Why CPUs have caches (L1, L2, L3)

Here's the catch: RAM is dramatically slower than the CPU. The CPU does an instruction in ~0.3 nanoseconds. Reading from RAM takes ~100 nanoseconds. If the CPU had to wait on RAM for every instruction, it would spend 99% of its time doing nothing.

So chip designers slip in caches — small, fast memories that sit between the CPU and RAM. They're arranged in a hierarchy:

CPU registers~32 slots · 0.3 ns

L1 cache~64 KB · 1 ns

L2 cache~512 KB · 3 ns

L3 cache~8 MB · 10 ns

RAM8–32 GB · 100 ns

SSD256 GB – 2 TB · 16 μs

HDD (spinning disk)1–20 TB · 10 ms

Each level is bigger but slower than the one above. The CPU checks the closest one first; if the data isn't there it tries the next level down. The whole point of writing “cache-friendly” code — the kind that walks an array in order — is to keep the data the CPU needs near the top of this stack.

6. Interrupts — how the CPU notices things

Picture this: the CPU is grinding through a million instructions a millisecond. You press a key. How does the CPU even know?

The keyboard chip raises a signal called an interrupt. That signal yanks the CPU's attention: it stops what it's doing, saves where it was, jumps to a small piece of OS code that handles the keypress, and then resumes. The whole detour takes microseconds.

Almost every external thing you can think of arrives as an interrupt: keyboard, mouse, network packet, disk read finished, hardware timer ticking, USB device plugged in. The OS spends a huge chunk of its life servicing interrupts. Without them, the CPU would have to constantly poll every device — exhausting and wasteful.

7. 32-bit vs 64-bit — what those numbers mean

When someone calls a CPU “64-bit,” they mean two things: its registers are 64 bits wide, and it can address up to 2⁶⁴ bytes of memory.

That second part is why we moved from 32-bit to 64-bit. A 32-bit CPU could only address 2³² bytes — about 4 GB. The moment laptops started shipping with more than 4 GB of RAM (around 2005–2008), 32-bit was done. A 64-bit CPU can theoretically address 16 exabytes — 16 billion gigabytes. We'll be fine for a while.

One side effect: 64-bit pointers are twice as big, so 64-bit programs use slightly more memory. The trade-off is worth it because the alternative is being stuck under 4 GB of RAM per process.

8. CPU architecture families

Not all CPUs speak the same language. The set of instructions a CPU understands is called its Instruction Set Architecture (ISA). Code compiled for one ISA generally won't run on another — this is why an iPhone app and a Windows app aren't interchangeable, even when they do the same thing.

x86 / x86-64

Examples: Intel Core, AMD Ryzen

Where: Most desktops, laptops, servers

Style: CISC — many complex instructions

ARM

Examples: Apple Silicon (M1/M2/M3), Snapdragon, AWS Graviton

Where: Phones, tablets, recent Macs, cloud servers

Style: RISC — fewer simple instructions, lower power

RISC-V

Examples: SiFive, Western Digital chips

Where: Embedded, growing into laptops/servers

Style: RISC, open-source ISA — anyone can build a chip

There are a handful of others (PowerPC, MIPS, SPARC) but they're mostly historical or niche. For day-to-day work you'll see x86-64 and ARM the most.

9. Intel vs AMD vs Apple Silicon

Inside the x86-64 family, two companies make almost every chip: Intel (Core i5/i7/i9, Xeon) and AMD (Ryzen, EPYC). Their chips speak the same instruction set, so a compiled binary runs on either — you don't recompile your Linux server when you swap an Intel box for an AMD box. They compete on speed, power efficiency, and price; AMD has been particularly aggressive on core counts in the last few years.

Apple Silicon (M1, M2, M3, M4) is different. It's ARM, not x86. When Apple switched their Macs from Intel to their own ARM chips in 2020, every macOS app had to be recompiled — or run through a translator called Rosetta 2. The payoff was huge: the M1 was faster than Intel chips at a fraction of the power, partly because of ARM's simpler instruction set.

That “simpler instruction set” is the famous RISC vs CISC argument. CISC (Complex Instruction Set Computing — like x86) has lots of fancy instructions that do a lot per tick. RISC (Reduced Instruction Set — like ARM, RISC-V) has fewer, simpler instructions and trusts the compiler to combine them. Modern chips on both sides have borrowed from each other so heavily that the line is fuzzier than it used to be — but RISC generally wins on power efficiency, which is why ARM rules phones and is now eating data centers.

10. What memory really is

Picture a long street with billions of identical mailboxes, numbered 0, 1, 2, 3, … and so on. Each mailbox holds exactly one byte. That's RAM.

Each mailbox number is called an address. When your code does x = 42, what really happens is: the compiler picks an address for x, and your CPU writes the byte 42 into that mailbox. Reading it back is the same in reverse — give the CPU an address, get the byte at that mailbox.

Numbers bigger than 255 take more than one byte. A 32-bit integer occupies 4 mailboxes in a row; a 64-bit double occupies 8. The address you read or write from is the address of the first mailbox in the run.

11. How a program uses memory

When your program runs, the OS hands it a chunk of those mailboxes and the program organises them into four named regions:

Code (text)

Your program's instructions. Read-only at runtime.

Globals (.data / .bss)

Variables declared outside any function. Live the whole time.

Heap

Memory you ask for at runtime — malloc, new, lists in Python. Grows up.

Stack

Local variables and function call info. Grows down. Auto-cleaned on return.

When you write int x = 10 inside a function, x lives on the stack and disappears the moment the function returns. When you write malloc(1000) in C or new Foo() in Java, the bytes come from the heap and stay until something explicitly frees them (or, in Java/Python, until the garbage collector decides to). Every memory bug you'll ever debug is a confusion between these regions.

12. From program to running code

What happens between “double-click an icon” and “the app is running”? Roughly five steps:

1. The OS finds the executable file on disk.
2. It allocates a chunk of RAM and copies the code section from disk into it.
3. It sets up the four regions from the previous section (code, globals, heap, stack).
4. It records this whole bundle as a new process with a unique ID (the PID).
5. It tells the CPU: “start running instructions at the entry point of this code.”

That's it — that's how every running thing on your computer got there. The terminal command ps on Linux/macOS (or Task Manager on Windows) shows you all the processes currently in step 5.

13. What a process is, really

A process is a running program plus everything that program needs to actually run: its own private memory, its open files, its current state (which instruction it's on), and a unique number called the PID.

Crucially, every process gets its own private memory. Process A and Process B can both have a variable at memory address 0x1000, and they don't interfere — the OS makes sure those two “0x1000”s point to different bytes of physical RAM. (We'll explain how that magic works in section 16.)

If you open Chrome twice, you get two Chrome processes — same program, two independent runtimes, two PIDs. Modern Chrome actually splits itself into many processes (one per tab and extension) for safety: if a malicious page crashes its process, the other tabs keep working.

14. What a thread is, really

Sometimes one process needs to do several things at once. Imagine a video player: it has to decode video, play audio, and respond to your clicks. If those happened one after the other, the UI would freeze every time a frame decoded.

A thread is one stream of execution inside a process. A process can have many threads, all sharing the same memory and files but each running its own code in parallel (or taking turns on the CPU if there are more threads than cores).

Concrete example: when you open this web page, the browser spins up a UI thread that handles your scrolling, a network thread that fetches the page, and a JavaScript thread that runs the page's code. They share the page's data because they're all in the same process. That sharing is what makes threads powerful — and also what makes them dangerous (see the “Race Conditions” section below).

15. Process vs thread — when to use which

The cheat-sheet:

use a PROCESS when…

· you want strong isolation (a crash in one shouldn't kill the others)
· the work is fundamentally a different program
· security boundaries matter (browser tabs, sandboxes)

use a THREAD when…

· you need to do several things inside one program
· they'll share a lot of data and copying is expensive
· you want low startup cost — threads spin up much faster than processes

Real-world examples: Chrome chooses processes-per-tab for safety. nginx and Postgres use one process with many threads (or workers) for speed. There's no universally right answer — it's a trade-off between isolation and performance.

16. Why memory needs to be managed

Now back to the problem we hinted at: your laptop has 16 GB of RAM, and right now there are probably 200+ processes running. They can't all just freely use whatever addresses they want. Two problems:

Conflict: If process A writes to address 0x1000 and process B also writes to address 0x1000, somebody's data gets stomped.
Security: Without isolation, any program could read your password manager's memory.

The OS solves both with virtual memory. Every process gets its own private “map” of addresses. Process A's address 0x1000 and Process B's address 0x1000 secretly point to different bytes of physical RAM — the CPU translates between them on every access. Each program thinks it has the whole memory to itself; the OS makes the illusion work.

That's the foundation of everything in the “Walk a Virtual Address” section below. The boring details (page tables, the TLB, page faults) are all in service of this one trick.

17. How the OS gets running — the boot process

When you press the power button, none of the OS is loaded yet. So how does it get there? Roughly:

1. The CPU starts running instructions at a hard-coded address that points to a small program built into the motherboard, called BIOS (or, on newer machines, UEFI).
2. That firmware does a self-test, finds the disks, and looks for a bootloader — usually GRUB on Linux, the Windows Boot Manager on Windows, or boot.efi on macOS.
3. The bootloader loads the kernel into RAM and hands control to it.
4. The kernel initialises hardware (CPU caches, RAM, disks, network), mounts the root filesystem, and starts the first user-space process — init or systemd on Linux.
5. That first process spawns everything else — login screens, background services, your desktop. Eventually you see a login prompt.

The whole sequence usually takes 2–10 seconds on a modern laptop. On servers it can be longer because firmware does more hardware checks.

18. Types of operating systems

Not all OSes do the same job. The constraints they're built for shape almost every design choice:

Desktop OSWindows, macOS, Ubuntu

Run interactive apps for one user. Make the screen, keyboard, and mouse feel snappy.

Server OSLinux (most), Windows Server

Run thousands of background processes. No display needed. Optimised for throughput, uptime, and remote management.

Mobile OSAndroid (Linux-based), iOS

Battery-aware. Sandboxed apps. Touch-first UI. Aggressive about killing background work to save power.

Embedded OSFreeRTOS, Zephyr, custom firmware

Runs on small chips inside microwaves, routers, smartwatches, car ECUs. Tiny memory, no GUI.

Real-time OS (RTOS)VxWorks, QNX, FreeRTOS

Guarantees a task finishes within a deadline. Used in pacemakers, drones, factory robots, jet engines — anywhere late = disaster.

Distributed OSPlan 9, research systems

Treats many machines as one. Rare in practice — modern clouds use Linux + Kubernetes instead.

Many of these share roots. Android's kernel is Linux. iOS shares a lot of code with macOS. The differences are mostly in what runs on top of the kernel — the UI, the app model, the security policies.

19. What the OS actually does — the layered picture

Now we can put it all together. Every line of code you ship sits on top of a stack like this. The closer to the bottom, the more privileged — and the more dangerous when it goes wrong.

Your appsChrome · Postgres · your code

Run in user mode. Can't touch hardware directly.

Standard librarieslibc · libm · libpthread

Wrap raw kernel features into nicer functions like malloc, printf, pthread_create.

KernelScheduler · VM · Filesystem · Network stack

The privileged part of the OS. Full hardware access.

Device driversNVMe · NIC · GPU · keyboard

Translators between the kernel and the actual hardware chips.

HardwareCPU · RAM · disk · network card

Metal and silicon. Doesn't know or care about your program.

When you call malloc() from C, you're calling libc, which asks the kernel for memory. The kernel updates page tables. The hardware MMU does the translation. All of that happens for one line of code. That's the cost of the illusion — and the value of it.

20. Mini glossary

All the terms you'll need for the rest of this page, defined once.

Bit: A single 0 or 1. The smallest unit of data.
Byte: A group of 8 bits. The standard chunk of memory. One ASCII character is 1 byte.
CPU: Central Processing Unit. The chip that runs your code, one instruction at a time per core.
Core: A complete CPU inside the chip. A 'quad-core' CPU has 4 of them, running in parallel.
Register: A tiny storage slot inside the CPU itself. The fastest memory there is.
Cache: Small fast memory between the CPU and RAM. Holds recently-used data so the CPU doesn't keep waiting on RAM.
RAM: Random-Access Memory. The 'desk' the CPU works on. Loses everything on power-off.
Address: A number that identifies one byte in memory. Like a house number on a street.
Program: A file on disk containing instructions. Doesn't do anything until you run it.
Process: A running instance of a program. Has its own memory and a unique PID.
Thread: One stream of execution inside a process. Threads in the same process share memory.
Kernel: The privileged core of the OS. Has full access to hardware. Other code asks the kernel for help.
OS: Operating System. The software that runs the hardware and shares it among programs.
ISA: Instruction Set Architecture. The 'language' a CPU understands — x86, ARM, RISC-V are all ISAs.
x86: The Intel/AMD CPU family. Runs most desktops and servers.
ARM: A different CPU family. Runs phones, recent Macs (Apple Silicon), and increasingly servers.

That's the foundation. Everything below — schedulers, paging, deadlock, filesystems, IPC — is just zooming in on one part of what you just read. You don't have to remember all of it. As long as you have the rough mental picture, the interactive sections will fill in the rest.

The Kernel Boundary

Every program runs in user mode (ring 3). To do almost anything useful — open a file, allocate memory, send a packet — it has to ask the kernel. Click a syscall and watch the CPU mode-switch.

user space (ring 3)

kernel space (ring 0)

click a syscall to begin

Bare syscall

~100 ns

SYSCALL/SYSRET, no work

+ KPTI mitigation

~200–400 ns

Spectre/Meltdown page-table flip

Why io_uring exists

0 syscalls

Shared rings, SQPOLL kernel thread

Why does the boundary even exist?

Without a kernel/user split, any program could write to disk sectors directly, reprogram the network card, or read another process's memory. Multics in 1965 introduced ring-based protection precisely because researchers had spent the decade watching one buggy job crash the whole machine.

The cost is real but small. A bare syscall on modern x86-64 is around 100–300 cycles since SYSCALL/SYSRET — about 30–100 ns. After Spectre/Meltdown, the kernel page-table isolation (KPTI) mitigation roughly doubles that. The reason io_uring exists, the reason high-frequency traders run kernel bypass with DPDK — every cycle of this boundary-crossing matters when you're doing millions of ops per second.

Processes vs Threads

Both run code. Both can be scheduled. The difference is what they share. Click fork() or pthread_create() and watch the tree grow.

pid 1234process

Processes

Threads

Address spaces

What gets shared?

Hover the segments to see how each is treated by fork vs threads.

CoW gotcha: after fork(), every page is shared and read-only. The first write to any page triggers a page fault, allocates a new physical page, and copies. Memory- heavy parents that write after fork can OOM unexpectedly.

The cheapest fork() in the world

fork() looks expensive in theory — duplicate an entire address space — but Linux (and every Unix since 4.0BSD) cheats. It copies only the page tables and marks every page read-only with a copy-on-write bit. The actual page contents are shared until somebody writes, which triggers a fault, allocates a new physical page, and copies on demand.

That's why Postgres can fork a worker per connection, why Redis BGSAVE forks the entire dataset for snapshotting, why Chrome spawns dozens of renderer processes. Without CoW, none of those architectures would be viable. (And it's also why a memory- heavy parent that writes after fork can suddenly OOM — CoW failure is real and silent.)

Race the Schedulers

Same workload. Four scheduling algorithms. Watch the Gantt chart and metrics shift as you switch. The interview answer is never “just use SJF” — it's knowing which algorithm loses on which workload.

Workload

Algorithm

Workload

P1 · arr 0 · burst 8

P2 · arr 1 · burst 4

P3 · arr 2 · burst 9

P4 · arr 3 · burst 5

FCFS

wait 8.8turn 15.3resp 8.8ctx 3

07132026

First Come, First Served — Run in arrival order. Simple. Suffers from convoy effect.Real-world: Batch print queues. Old IBM mainframes.

What real schedulers actually do

FCFS and SJF are teaching tools, not products. No production OS uses them — FCFS suffers convoy effect catastrophically, and SJF requires knowing burst length, which you can't. Round Robin is closer to reality but its fairness comes at the cost of latency: a 100ms quantum is too coarse for an interactive desktop, and a 1ms quantum spends most of its time on context-switch overhead.

Linux ran CFS (Completely Fair Scheduler) from 2007 to October 2023. Each runnable task got a virtual runtime; the scheduler always picked the task with the lowest vruntime. CFS used a red-black tree, so picking the next task was O(log n).

In Linux 6.6 (October 2023), CFS was replaced by EEVDF (Earliest Eligible Virtual Deadline First). EEVDF still gives each task a fair share, but it also assigns a deadline — interactive tasks with short slice lengths preempt CPU-bound tasks more aggressively. The first major scheduler change in 16 years.

Walk a Virtual Address

On x86-64, every memory access goes through a 4-level page-table walk — unless the TLB caches the translation. Type or pick an address; watch the walk light up.

virtual address:

TLB (4 entries · L1 dTLB on real x86 holds ~64)

empty

PML4

bits 47–39

index 128

PDPT

bits 38–30

index 1

bits 29–21

index 504

bits 20–12

index 4

Offset

bits 11–0

index 64

physical address

0x-5293bfc0

TLB miss — up to 4 memory accesses, ~50–250 ns

Why paging beat segmentation

Early systems used segmentation: each process had a base + limit, and the CPU added the base to every address. Simple. But external fragmentation killed it — once you'd allocated and freed enough segments, you had unusable holes everywhere.

Paging fixed that with one small idea: chop memory into fixed-size pages (4 KB on x86, 16 KB on Apple Silicon). Now every page is the same size, so any free page can hold any virtual page — no fragmentation. The price is per-process page tables and the TLB to cache recent translations. On x86-64 with 4-level paging, a TLB miss can cost up to 4 memory accesses per address. That's why huge pages (2 MB, 1 GB) exist — fewer entries, more hits.

Page Replacement Battle

Pick an access pattern. Set RAM size (frames). All four algorithms run on the same trace simultaneously. The winner is the highest hit ratio. Spoiler: Bélády's Optimal always wins because it cheats by knowing the future.

Access pattern

Frames (RAM size)

Or roll your own (space-separated page numbers)

Reference string (20 accesses)

70120304230321201701

FIFOEvict oldest loaded.

hits 5misses 15

25.0% hit ratio

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	0	0	0	0	0	0	0	7	7	7
f1	·	0	0	0	0	3	3	3	2	2	2	2	2	1	1	1	1	1	0	0
f2	·	·	1	1	1	1	0	0	0	3	3	3	3	3	2	2	2	2	2	1

LRUEvict least-recently used.

hits 8misses 12

40.0% hit ratio

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	0	0	0	1	1	1	1	1	1	1
f1	·	0	0	0	0	0	0	0	0	3	3	3	3	3	3	0	0	0	0	0
f2	·	·	1	1	1	3	3	3	2	2	2	2	2	2	2	2	2	7	7	7

ClockApproximate LRU. Linux uses it.

hits 6misses 14

30.0% hit ratio

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	4	3	3	3	3	0	0	0	0	0
f1	·	0	0	0	0	0	0	0	2	2	2	2	2	1	1	1	1	7	7	7
f2	·	·	1	1	1	3	3	3	3	3	0	0	0	0	2	2	2	2	2	1

OptimalEvict page used farthest in future. Oracle — unimplementable.

winner

hits 11misses 9

55.0% hit ratio

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	2	2	2	2	2	2	2	2	2	2	7	7	7
f1	·	0	0	0	0	0	0	4	4	4	0	0	0	0	0	0	0	0	0	0
f2	·	·	1	1	1	3	3	3	3	3	3	3	3	1	1	1	1	1	1	1

Reading the table: Each column is one access in order. Green numbers in the header are hits, red are misses. Each row is one physical frame; the contents show what page lives there at that step. Watch how Optimal evicts differently from LRU on the “locality” pattern — that gap is why approximation algorithms ship in real kernels.

Why nobody implements LRU exactly

True LRU requires updating a recency timestamp on every memory access. That's billions of writes per second on a busy server — the bookkeeping costs more than the eviction it optimizes. So real systems approximate.

Linux uses a two-list variant of Clock-Pro: an active list and an inactive list, with reference bits set by the MMU on access and cleared by a periodic scan. Pages survive on the active list only if they're touched between scans. macOS does FIFO with reactivation. Windows tracks a per-process working set. Different bets, all approximating Bélády's unreachable optimum.

Race Conditions in 60 Seconds

Spawn N threads. Each increments a shared counter (TARGET / N) times. Without a lock, two threads can read the same value, increment, and write back — one update is silently lost.

Threads

Synchronization

target counter: 1000

counter value

Why CAS wins: A mutex around a single counter increment costs ~25 ns uncontended, ~1–10 μs contended (the syscall path). A lock-free atomic fetch_add stays in CPU cache and costs ~5–15 ns even under contention. That's why high-perf counters (Prometheus internals, Linux per-CPU stats) use atomics, not mutexes.

Build a Deadlock

Each resource is a single-instance lock. A process can hold at most one of each. Click a cell to toggle. Watch the wait-for graph — when a cycle appears, you've built a deadlock.

process	R1 DB Lock	R2 Cache Lock	R3 File Lock	R4 Net Lock
P1
P2
P3
P4

✗ DEADLOCK DETECTED

Cycle: P1 → P1 → R2 → P2 → R1 → P1

All four Coffman conditions met: mutual exclusion, hold-and-wait, no preemption, circular wait. None of these processes will ever make progress.

1. Mutual exclusion — resources can't be shared

2. Hold and wait — hold one, wait for another

3. No preemption — OS won't take a lock back

4. Circular wait — P→R→P→R cycle

Production deadlocks: how the real world handles them

Most production systems don't prevent deadlock — they detect it and break it. PostgreSQL maintains a wait-for graph between transactions; if a cycle appears it picks the youngest transaction and aborts it with deadlock_detected. MySQL/InnoDB does the same. Java's ThreadMXBean exposes findDeadlockedThreads() for the same purpose.

Prevention happens in code review: enforce a global lock ordering. If every codepath always acquires lock A before lock B, the circular-wait condition can't happen. The Linux kernel's lockdep checker enforces this at runtime — it builds a graph of every lock-acquisition order seen and screams when a new path violates it.

Filesystem Internals

Reading /etc/passwd looks atomic but is actually three lookups: directory entry → inode → data blocks. Watch the path light up. Then crash the disk and see why journaling exists.

1. Directory entry

· name → inode #
· lookup in /etc/
· result: inode 1234

2. Inode

· size, perms, owner
· atime / mtime / ctime
· block pointers ↓

3. Data blocks

· 28 KB / 4 KB blocks
· → 7 direct blocks
· DMA fetches into page cache

4. Returned to user

· copy_to_user()
· fd offset advances
· syscall returns

inode 1234 (typical ext4 layout)

mode:        100644  (regular, rw-r--r--)
uid / gid:   0 / 0
size:        28672 bytes
blocks:      7 × 4 KB
atime:       2026-04-25T12:30:00Z
mtime:       2026-04-20T09:14:22Z
direct[0..11]:  → blk 0x4a01 ... 0x4a07
indirect:       (unused — file fits)
double-ind:     (unused)
triple-ind:     (unused)

after crash

Click crash mid-write to see what happens.

Modern filesystems do more: ZFS and btrfs use copy-on-write — never overwrite, always write new and update pointers. APFS adds clones at the file level. XFS uses delayed allocation to batch writes. Each is a different bet on how to balance crash safety, throughput, and metadata overhead.

Pick the Right IPC

Six classic IPC mechanisms, six different jobs. Tell me what you're trying to do; I'll show you what to reach for and what it costs.

Scenarios

recommended primitive

Pipe

why

One-way, byte stream, related processes. Kernel-managed circular buffer. Blocks on full.

in the wild

ls | grep foo — bash creates a pipe, fork()s twice, dup2()s the fds.

cost

~1–5 μs per syscall

Hierarchy of choice: Same machine + same parent → pipe. Same machine + unrelated → UDS. Same machine + huge data → shared memory. Off-host → TCP/gRPC. Async signal → signal. Picking the wrong one is how you end up re-implementing TCP in user space.

How Real Operating Systems Differ

Same problems, different bets. Click an OS to highlight its choices — and notice how often FreeBSD invented something Linux later cloned.

property	Linux	Windows	macOS	FreeBSD
Kernel architecture	Monolithic (modular)	Hybrid (NT kernel)	Hybrid (XNU = Mach + BSD)	Monolithic
Scheduler	EEVDF (since 6.6, Oct 2023). Replaced CFS.	Multi-level priority + boosting (NT scheduler)	Mach thread policies + BSD priority + Grand Central Dispatch	ULE — interactivity + load-balancing aware
Default scheduling unit	Thread (task_struct)	Thread	Mach thread; processes are container	Thread
Page replacement	Active/inactive LRU lists + Clock-Pro inspired	Working-set manager + Modified Page List	FIFO + reactivation (LRU approximation)	Two-handed clock
Default page size	4 KB (huge pages: 2 MB / 1 GB)	4 KB (large pages: 2 MB)	16 KB on Apple Silicon, 4 KB on Intel	4 KB
Filesystem (default)	ext4 / btrfs / xfs	NTFS (ReFS for servers)	APFS (since 10.13, 2017)	UFS2 / ZFS
Async I/O	io_uring (since 5.1, 2019) — kernel rings	IOCP (I/O Completion Ports)	kqueue / GCD / dispatch_io	kqueue (origin of the design)
IPC standout	eBPF, futex, io_uring	ALPC (Asynchronous Local Procedure Call)	Mach ports — everything is a port	Capsicum (capability-mode sandbox)
Container primitive	namespaces + cgroups	Server Silos / Job Objects	Sandbox profiles (macOS Seatbelt)	Jails (the original, 2000)

Linux

EEVDF (Earliest Eligible Virtual Deadline First) replaced CFS in kernel 6.6 (Oct 2023). It's the first major scheduler change in 16 years.

Windows

Windows boosts thread priority when a thread receives keyboard/mouse input — that's why your foreground app feels snappy even under load.

macOS

Apple Silicon Macs use 16 KB pages — 4× the standard size. This reduces TLB pressure but makes per-page memory waste 4× worse.

FreeBSD

FreeBSD invented kqueue (1999) and Jails (2000). Linux's epoll and Docker are both descendants of FreeBSD ideas.

OS Interview Rapid-Fire

15 questions across all topics. No timer — just answer. Get a scorecard at the end with the topics to revise. Share if you survive.

questions across processes, scheduling, virtual memory, page replacement, concurrency, deadlock, syscalls, filesystems, I/O, and Linux internals.

Frequently Asked Questions

What is an operating system, in one sentence?+

An operating system is the software layer that owns the hardware — it shares the CPU, memory, disk, and network among many programs while pretending each program has the machine all to itself.

What's the difference between a process and a thread?+

A process is an isolated address space — its own page tables, memory map, file descriptors. A thread is a unit of execution that shares the address space of its process. Threads are cheap; processes are expensive. fork() takes ~50–200 μs; pthread_create takes ~10–30 μs.

Why does Linux use a scheduler instead of just running one program at a time?+

Because the CPU is hundreds of times faster than disk and network. While one process waits on I/O, the scheduler runs another. Modern Linux's EEVDF scheduler runs ~100,000+ context switches per second on a busy server — the CPU is almost never idle.

What is virtual memory, really?+

Virtual memory is a layer of indirection: every address your program uses is a virtual address that the CPU translates to a physical address via page tables. This gives every process its own isolated address space and enables features like memory-mapped files, copy-on-write, and demand paging.

What is a page fault, and is it always bad?+

A page fault is a trap the CPU raises when it accesses a virtual page with no valid page-table entry. Minor faults (page is in RAM but not mapped — common after fork()) cost ~1–2 μs. Major faults (page must be read from disk) cost ~50 μs+. Minor faults are routine; major faults during latency-critical paths are how databases die.

Why does deadlock happen, and how do real systems prevent it?+

Deadlock requires Coffman's four conditions: mutual exclusion, hold-and-wait, no preemption, circular wait. Real systems break circular wait by enforcing global lock ordering (always acquire lock A before lock B). Databases detect deadlock cycles in a wait-for graph and kill one transaction.

What's the difference between a mutex and a spinlock?+

A mutex puts a blocked thread to sleep — the kernel reschedules another thread. A spinlock burns CPU in a tight loop waiting for the lock. Mutex wins for long critical sections; spinlock wins for very short ones (microseconds) under low contention. The Linux kernel uses spinlocks heavily; userspace usually wants a mutex (or an adaptive one that spins briefly, then sleeps).

What changed in Linux 6.6's scheduler?+

Linux 6.6 (October 2023) replaced CFS (Completely Fair Scheduler, 2007) with EEVDF (Earliest Eligible Virtual Deadline First). EEVDF gives better latency for interactive tasks while keeping CFS's fairness. It's the biggest scheduler change in 16 years.

Why is fork() so fast, even for huge processes?+

Because of copy-on-write. fork() doesn't actually copy memory — it copies page tables and marks every page as read-only. The copy happens only when one of the processes writes to a page, triggering a fault. Postgres relies on this; Redis BGSAVE uses it for snapshots.

What is io_uring and why is everyone excited about it?+

io_uring (Linux 5.1, 2019) replaces the syscall-per-IO model with a pair of shared-memory rings between user and kernel. With SQPOLL, you can submit thousands of IOs per second with zero syscalls. ScyllaDB, Tigerbeetle, and modern databases all use it.

How much should an experienced engineer actually know about OS internals?+

Enough to debug production. You need to know: how processes/threads share memory, what a context switch costs, what causes page faults, how mutexes vs spinlocks differ, what causes deadlock, and where the OS hides — when an `strace` shows you're stuck in `futex(FUTEX_WAIT)`, you should know that means lock contention.

Where should I go to learn more after this page?+

Operating Systems: Three Easy Pieces (free online, ostep.org) is the modern standard. For Linux specifically: Brendan Gregg's blog and book Systems Performance. For real-world depth: read the Linux kernel mailing list when EEVDF, io_uring, or BPF threads come up.

Sources & References

Operating Systems: Three Easy Pieces — Remzi & Andrea Arpaci-Dusseau (ostep.org)

Brendan Gregg — Systems Performance, 2nd ed. + brendangregg.com

Linux Weekly News (lwn.net) — EEVDF scheduler, io_uring, BPF

Linux 6.6 release notes — kernel.org/doc/html/v6.6/admin-guide/

Discord Engineering — How Discord Stores Trillions of Messages (discord.com/blog)

Microsoft Docs — Windows NT scheduler, IOCP, ALPC

Apple — Kernel Programming Guide (XNU + Mach)

FreeBSD Architecture Handbook — kqueue, ULE scheduler, Jails

Jeff Dean / Peter Norvig — Latency Numbers Every Programmer Should Know

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	0	0	0	0	0	0	0	7	7	7
f1	·	0	0	0	0	3	3	3	2	2	2	2	2	1	1	1	1	1	0	0
f2	·	·	1	1	1	1	0	0	0	3	3	3	3	3	2	2	2	2	2	1

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	0	0	0	1	1	1	1	1	1	1
f1	·	0	0	0	0	0	0	0	0	3	3	3	3	3	3	0	0	0	0	0
f2	·	·	1	1	1	3	3	3	2	2	2	2	2	2	2	2	2	7	7	7

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	4	3	3	3	3	0	0	0	0	0
f1	·	0	0	0	0	0	0	0	2	2	2	2	2	1	1	1	1	7	7	7
f2	·	·	1	1	1	3	3	3	3	3	0	0	0	0	2	2	2	2	2	1

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	2	2	2	2	2	2	2	2	2	2	7	7	7
f1	·	0	0	0	0	0	0	4	4	4	0	0	0	0	0	0	0	0	0	0
f2	·	·	1	1	1	3	3	3	3	3	3	3	3	1	1	1	1	1	1	1

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	0	0	0	0	0	0	0	7	7	7
f1	·	0	0	0	0	3	3	3	2	2	2	2	2	1	1	1	1	1	0	0
f2	·	·	1	1	1	1	0	0	0	3	3	3	3	3	2	2	2	2	2	1

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	0	0	0	1	1	1	1	1	1	1
f1	·	0	0	0	0	0	0	0	0	3	3	3	3	3	3	0	0	0	0	0
f2	·	·	1	1	1	3	3	3	2	2	2	2	2	2	2	2	2	7	7	7

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	4	3	3	3	3	0	0	0	0	0
f1	·	0	0	0	0	0	0	0	2	2	2	2	2	1	1	1	1	7	7	7
f2	·	·	1	1	1	3	3	3	3	3	0	0	0	0	2	2	2	2	2	1

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	2	2	2	2	2	2	2	2	2	2	7	7	7
f1	·	0	0	0	0	0	0	4	4	4	0	0	0	0	0	0	0	0	0	0
f2	·	·	1	1	1	3	3	3	3	3	3	3	3	1	1	1	1	1	1	1

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	0	0	0	0	0	0	0	7	7	7
f1	·	0	0	0	0	3	3	3	2	2	2	2	2	1	1	1	1	1	0	0
f2	·	·	1	1	1	1	0	0	0	3	3	3	3	3	2	2	2	2	2	1

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	0	0	0	1	1	1	1	1	1	1
f1	·	0	0	0	0	0	0	0	0	3	3	3	3	3	3	0	0	0	0	0
f2	·	·	1	1	1	3	3	3	2	2	2	2	2	2	2	2	2	7	7	7

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	4	4	4	4	3	3	3	3	0	0	0	0	0
f1	·	0	0	0	0	0	0	0	2	2	2	2	2	1	1	1	1	7	7	7
f2	·	·	1	1	1	3	3	3	3	3	0	0	0	0	2	2	2	2	2	1

ref	7	0	1	2	0	3	0	4	2	3	0	3	2	1	2	0	1	7	0	1
f0	7	7	7	2	2	2	2	2	2	2	2	2	2	2	2	2	2	7	7	7
f1	·	0	0	0	0	0	0	4	4	4	0	0	0	0	0	0	0	0	0	0
f2	·	·	1	1	1	3	3	3	3	3	3	3	3	1	1	1	1	1	1	1