System Design — The Network Layer

How Your Request Travels the Internet

You just typed a URL and pressed Enter. Follow your request through DNS, TCP, HTTP, load balancers, and CDNs — the 8-stop journey that happens 8.5 billion times a day on Google alone.

💻

Your Browser

📖

DNS Lookup

🤝

TCP Handshake

🔒

TLS Encryption

📨

HTTP Request

⚖️

Load Balancer

🖥️

CDN / Server

📬

Response Back

DNS TCP & HTTP Load Balancers CDN Proxies & Gateways Real-time

Every time you click a link, your request doesn't teleport to a server. It takes a specific, repeatable journey through layers of infrastructure that the internet has been refining since 1969. DNS translates the name. TCP opens the connection. HTTP sends the message. Load balancers decide who handles it. CDNs shortcut the distance. And the whole trip completes in under 100 milliseconds.

This page traces that journey step by step. Not in theory — you'll actually resolve DNS lookups, simulate load balancer algorithms, toggle CDNs on and off, and test API gateways. Every number is real: Cloudflare handles 4.3 trillion DNS queries per day, Netflix serves video from 19,000+ servers inside ISPs, and Google saw 15% fewer buffering events after switching to HTTP/3.

If you haven't seen System Design 101 yet, start there for the big picture. This page goes deep on the network layer — the first half of a request's journey.

DNS — Finding the Address

Before anything else, your browser needs to turn a domain name into an IP address. Type a domain and watch the resolution cascade through 6 servers.

https://

Browser Cache~0ms

Chrome, Firefox, Safari all keep a local DNS cache. If you visited this site recently, the answer is already here.

OS Cache~0ms

Your operating system has its own DNS cache, shared across all apps. macOS, Windows, Linux all maintain one.

Recursive Resolver~1-5ms

Your ISP's DNS server (or Cloudflare 1.1.1.1, Google 8.8.8.8). This server does the heavy lifting — it queries other servers on your behalf.

Root Server (.)~5-20ms

13 root server clusters worldwide, operated by Verisign, NASA, US Army, ICANN and others. They don't know specific domains — they know who to ask next.

TLD Server (.com)~5-20ms

Top-Level Domain servers know every domain registered under .com, .org, .net, etc. Verisign operates the .com TLD servers handling 30+ billion queries per day.

Authoritative Server~5-30ms

Google's own DNS server. This is the source of truth — it has the actual IP address mapping for google.com.

DNS Record Types

IPv4 Address

google.com → 142.250.80.46

What it does

Maps a domain name to a 32-bit IPv4 address. This is the most common record — every website you visit resolves through an A record.

Why it matters

Without an A record, browsers have no idea which server to connect to. When you type google.com, the A record is what turns that into an actual machine on the internet. Load balancers return different A records to distribute traffic across servers.

Cloudflare's 1.1.1.1 handles 4.3 trillion DNS queries per day across its network.

The 50-millisecond tax you pay every time

DNS is invisible until it breaks. In October 2021, Facebook engineers accidentally withdrew their BGP routes, making Facebook's DNS servers unreachable. For 6 hours, 3.5 billion people couldn't access Facebook, Instagram, or WhatsApp — not because the servers were down, but because nobody could find their IP addresses.

That's why DNS has extraordinary redundancy: 1,700+ physical root servers, aggressive caching at every level, and Anycast routing that sends you to the nearest server automatically. Cloudflare's 1.1.1.1 resolver alone handles 4.3 trillion queries per day across 330+ cities. The system is designed so that even if entire continents go offline, DNS keeps resolving.

Modern DNS does more than look up addresses. Netflix uses Route 53 for latency-based routing — returning the IP of whichever server is closest and healthiest. GitHub uses Anycast so the same IP routes to different datacenters depending on where you are. DNS isn't just a phone book — it's a traffic director.

TCP & HTTP — Connect and Speak

DNS gave you an IP. Now your browser connects (TCP) and speaks (HTTP). These two protocols power 99% of the web.

TCP — The Three-Way Handshake

Before exchanging data, client and server must agree to talk. This costs one round trip — 0.5ms to a nearby datacenter, 150ms across the ocean.

💻

Client

SYN

SYN+ACK

ACK

🖥️

Server

TCP vs UDP

Feature	TCP	UDP
Connection	3-way handshake required	No handshake — fire and forget
Reliability	Every packet acknowledged and retransmitted if lost	Packets may be lost, no retransmission
Ordering	Packets delivered in order, reassembled	No ordering guarantee
Speed	Slower — overhead from acknowledgments	Faster — no overhead
Use cases	HTTP, email, file transfer, SSH	Video calls (Zoom), gaming, DNS queries, live streaming
Analogy	Registered mail — signature on delivery	Shouting across a room — some words get lost

UDP — Fire and Forget

No handshake. No acknowledgment. No retransmission. UDP blasts packets as fast as possible — if some get lost, that's fine. Speed over reliability.

💻

Client

PKT 1

PKT 2

PKT 3

PKT 4

PKT 5

PKT 6

PKT 7

PKT 8

🖥️

Server

Why Choose UDP?

HTTP — The Language of the Web

TCP opened the pipe. Now your browser and server speak HTTP — a simple text protocol of requests and responses. Every web interaction is one of these methods.

Read a resource

GET /users/42 — fetch user profile

Status Codes — What the Server Said Back

The Protocol Race — HTTP/1.1 vs 2 vs 3

Loading a page needs 70+ requests. How fast they travel depends on which HTTP version you're using.

Google saw a 15% reduction in rebuffering on YouTube after switching to QUIC — millions fewer 'buffering' spinners per day.

Why we're still iterating after 30 years

HTTP/1.1 shipped in 1997 and powered the web for nearly two decades. It was simple, text-based, and worked. But as pages grew from a few kilobytes to 2-3 megabytes with 70+ resources, the one-request-at-a-time model became a bottleneck. Browsers hacked around it by opening 6 parallel TCP connections — a workaround, not a solution.

HTTP/2 (2015) fixed this with multiplexing — many requests over one connection. Google had been using their own version (SPDY) since 2009, proving the concept. But HTTP/2 still ran on TCP, and TCP has a fatal flaw: head-of-line blocking. One lost packet stalls every stream on the connection.

HTTP/3 (2022) solved this by abandoning TCP entirely. It runs on QUIC, a Google-designed protocol built on UDP. Each stream is independent — a lost packet only affects its own stream. Plus, QUIC combines the connection handshake with TLS encryption in a single round trip (or zero for repeat connections). Google reported 15% fewer buffering events on YouTube after deploying QUIC. That's millions of people who stopped staring at a spinner.

API Design — How Clients Talk to Servers

You know how requests travel (DNS → TCP → HTTP). Now here's the language they speak. REST, GraphQL, and gRPC each solve the same problem differently — and the choice shapes everything from mobile performance to team productivity.

REST vs GraphQL vs gRPC — Same Query, Three Approaches

RESTThe universal default

Resource-based. Each URL is a noun (/users/123), HTTP verbs are actions (GET, POST, PUT, DELETE). Stateless — every request carries all context. The lingua franca of public APIs.

HTTP/1.1 or HTTP/2JSON (text)

Strengths

+Universal — every language, every platform, every tool supports it
+Human-readable JSON payloads, easy to debug with curl
+Massive ecosystem: Swagger/OpenAPI, Postman, API gateways
+Cacheable — GET responses cache naturally with HTTP headers

Weaknesses

-Over-fetching: GET /users/123 returns 50 fields when you need 3
-Under-fetching: need user + posts + followers = 3 separate requests
-No built-in schema — clients discover the API by reading docs
-Versioning headaches: /v1/users vs /v2/users drift over time

Request

GET /api/users/123 HTTP/1.1
Host: api.example.com
Authorization: Bearer sk_live_...

Best for: Public APIs, CRUD apps, anything that needs maximum compatibility

Stripe, Twilio, GitHub (v3), Twitter/X. Stripe's REST API is considered the gold standard — clear resource naming, consistent error format, idempotency keys on every POST.

Which one should I use? (Decision Flowchart)

1/4

Is this a public API for external developers?

Pagination — Offset vs Cursor

Skip N rows, take M. Simple math — page 3 = OFFSET 20 LIMIT 10.

SELECT * FROM posts ORDER BY id LIMIT 10 OFFSET 20

Pros

+ Simple to implement — just OFFSET + LIMIT
+ Jump to any page: page 1, page 50, page 999
+ Easy for UI with numbered page buttons

Cons

- Slow at scale: OFFSET 1,000,000 scans and discards 1M rows
- Unstable: if a row is inserted/deleted, items shift between pages
- Gets slower linearly — page 1 is fast, page 10,000 is painful

Page 1 — OFFSET 0

Fine for admin dashboards, internal tools, datasets under 100K rows. Most SQL ORMs default to offset pagination.

Idempotency — Why Stripe Never Charges You Twice

Network fails. User clicks “Pay” again. Without idempotency, they get charged twice. With an idempotency key, the server recognizes the retry and returns the original result.

Client generates UUID

Sends request with key

Server checks key

Response cached

Retry is safe

API Versioning — Three Strategies

GET /v2/users/123

Pros: Obvious, easy to route, easy to cache (different URLs = different cache keys)

Cons: URL represents a resource, not a version — breaks REST purists. Old versions pile up.

Who uses it: Stripe (/v1), GitHub REST (/v3), Twitter/X (/2), Google Maps (/v1)

Why the best teams use all three

Netflix doesn't pick one API style — they use all three. Their consumer-facing edge runs federated GraphQL (announced 2023), letting the TV app, mobile app, and web app each request exactly the fields they need from a single endpoint. Between internal microservices, they use gRPC for its 10x latency advantage — critical when a single “press play” triggers 50+ service calls. And their public partner API? REST, because every integration partner already knows how to call a REST endpoint.

Shopify made an even bolder bet. In February 2025, they mandated that all new public apps must use their GraphQL API — no more REST for new integrations. The result: 30-50% reduction in mobile bandwidth for storefront queries, because apps stop downloading product fields they never display. GitHub took a similar path, launching their GraphQL API (v4) after realizing that a single mobile screen required 11 separate REST calls to assemble.

The pattern is clear: REST for public compatibility, GraphQL for frontend flexibility, gRPC for internal speed. The interview answer isn't “which one is best” — it's “which one for which layer.”

Load Balancer — Who Handles Your Request?

Your request arrived at a wall of servers. Someone decides which one handles it. Switch algorithms and watch the behavior change in real time.

Requests go to each server in sequence: 1, 2, 3, 1, 2, 3... Simple rotation, no state needed.

Pros:

Dead simple. Zero overhead. Works great when all servers are identical and requests are uniform.

Cons:

Blind to server health and load. If Server 3 is handling a heavy query, Round Robin still sends traffic.

DNS round-robin is the simplest form — AWS returns multiple IPs and clients pick randomly.

2 conn

5 conn

8 conn

1 conn

3 conn

L4 vs L7 — How Deep Does It Look?

Google's Maglev load balancer handles 10+ million packets per second per machine using kernel bypass.

No single machine can handle the internet

Google processes 8.5 billion searches per day. Netflix streams to 230 million subscribers simultaneously. Discord maintains millions of concurrent WebSocket connections. No single computer, no matter how powerful, can handle this. Load balancers are the reason these services don't collapse.

The choice of algorithm matters more than most engineers realize. Round Robin works fine for stateless APIs where every request takes roughly the same time. But for database connections that vary wildly in duration, Least Connections prevents hot spots. For caches that benefit from affinity, Consistent Hashing ensures the same key always hits the same server — and when you add or remove a server, only 1/N of traffic gets disrupted instead of all of it.

The NGINX vs HAProxy debate is one of the longest-running in infrastructure. NGINX powers 33% of all websites — 10.9 million deployments. But HAProxy processes 35% more requests per CPU and uses half the memory. Most teams pick NGINX for its versatility (web server + reverse proxy + load balancer) and HAProxy for raw load balancing performance. Increasingly, Envoy (created by Lyft, now CNCF) is replacing both in cloud-native architectures.

CDN — Don't Make Them Cross the Ocean

A user in Mumbai shouldn't wait for a server in Virginia. CDNs cache content at edge servers worldwide. Toggle CDN off to see the difference.

CDN ON

Origin

Push vs Pull CDN

Edges fetch content on first request, then cache it. First user is slow; everyone after is fast.

Best for: Content that changes frequently — news articles, API responses, user-generated content.

News sites use pull CDNs. Articles change every few minutes. Pushing to 300+ edge locations for content with a 5-minute lifespan wastes bandwidth.

TTL — How Long to Cache?

Static images (PNG, JPG)1 week (604,800s)Rarely changes. Cache aggressively.

CSS/JS bundles1 year (31,536,000s)Filename includes hash — new content = new URL.

HTML pages1-5 minutesContent changes frequently. Short TTL = fresh.

API responses0 (no-cache)Personalized and dynamic. Never cache.

Video files1 month+Large, expensive to transfer. Cache forever.

Netflix's Open Connect has 19,000+ servers inside 1,500+ ISP locations across 100+ countries. Netflix provides the hardware free — ISPs provide power and space.

The internet is not in the cloud — it's underwater

There are 550+ submarine cables carrying 99% of intercontinental data. The longest, SEA-ME-WE 3, stretches 39,000 km from Germany to Australia. A packet from Mumbai to Virginia travels through fiber optic cable under the Indian Ocean, around the Arabian Peninsula, through the Mediterranean, across the Atlantic. That physical distance — 14,000 km — means 180ms minimum latency, limited by the speed of light.

CDNs exist because physics is non-negotiable. You can't make light travel faster, but you can put the content closer. Netflix understood this better than anyone — instead of renting CDN capacity from Akamai (which at 15% of internet traffic would cost billions), they built Open Connect. They manufacture custom server boxes and give them to ISPs for free. ISPs agree because it reduces their upstream bandwidth costs by 95%. When you press play on Netflix, the video likely comes from a box sitting inside your ISP's building, one network hop away.

The CDN market is $37 billion in 2026 and growing at 20% per year. Akamai holds 30-40% market share with 365,000+ edge servers. Cloudflare serves 20% of all internet traffic from 330+ cities. And increasingly, the “edge” isn't just caching static files — Cloudflare Workers and Deno Deploy run entire applications at the edge, executing your code in the city closest to the user.

Proxies & API Gateways — The Bouncers

Before hitting your servers, requests pass through gatekeepers that handle security, routing, and rate limiting — so your backend doesn't have to.

Forward vs Reverse Proxy

Internet→

Reverse Proxy (api.company.com)→

Servers 1, 2, 3...

Sits between the internet and the SERVERS. Hides server identity from clients. You hit one domain, unaware there are 100 servers behind it.

Who uses it: Server operators — every major website uses one.

Every large website runs a reverse proxy — NGINX, HAProxy, Envoy. They handle SSL termination, compression, caching, and rate limiting at the edge.

API Gateway — The Swiss Army Knife

Toggle features on/off and send a request. Watch it pass through each layer.

Client

→

Rate Limiting

→

Authentication

→

Routing

→

Backend

Popular API Gateways

KongOpen-source

Plugin ecosystem (100+ plugins). Built on NGINX + Lua.

Used by 60,000+ organizations.

Netflix ZuulOpen-source (Java)

Battle-tested at Netflix scale. 2+ billion requests/day.

Netflix, Amazon, and other Java shops.

AWS API GatewayManaged service

Zero infrastructure. Auto-scales. Pay per request.

Startups and enterprises on AWS.

EnvoyOpen-source (C++)

Cloud-native. Service mesh (Istio). Observability built-in.

Lyft (created it), Uber, Airbnb, Stripe.

One gateway to rule them all (and the risks)

API gateways solve a real problem: without one, every microservice duplicates auth logic, rate limiting, logging, and SSL termination. Netflix has ~1,000 microservices — maintaining auth code in each would be a maintenance nightmare. Zuul centralizes it: change the auth method once, it applies everywhere.

But centralization has a cost. The gateway becomes a single point of failure and a performance bottleneck. Every request pays the gateway tax — parsing, auth checking, routing. At Uber's scale (100K+ requests per second), even 1ms of gateway latency means 100,000 milliseconds of aggregate delay per second.

This is why the industry is moving toward service meshes — tools like Istio and Linkerd that embed gateway logic into a sidecar proxy next to each service, rather than funneling everything through a central gateway. Envoy (created by Lyft) is the most popular sidecar proxy, used by Uber, Airbnb, and Stripe. The tradeoff: more operational complexity, but no single point of failure.

Real-Time Communication — When HTTP Isn't Enough

Chat, live notifications, multiplayer games — these need data pushed to you instantly, not when you ask for it. Four approaches, wildly different tradeoffs.

WebSocketClient ↔ Server (bidirectional)

Single persistent connection. Both sides send data freely, anytime. No HTTP overhead after the initial handshake.

Wasted bandwidth0% wasted

Pros:

True real-time. Lowest latency. Full duplex — both sides talk simultaneously. Binary and text support.

Cons:

More complex infrastructure. Stateful connections harder to load balance. Need fallback for restrictive firewalls.

Slack uses WebSocket for messages + Redis pub-sub for channel fan-out. Discord handles millions of concurrent WebSocket connections with Elixir on the Erlang VM.

Bandwidth Comparison

HTTP Polling

~90%

Long Polling

~5%

Server-Sent Events (SSE)

~0%

WebSocket

How Discord handles millions of simultaneous conversations

Discord maintains millions of concurrent WebSocket connections — every user in every voice channel and text channel is a persistent connection. They chose Elixir (built on the Erlang VM) for their gateway servers because Erlang was literally designed for this: it was built by Ericsson in the 1980s to handle millions of phone calls simultaneously.

But WebSockets alone aren't enough at scale. When you send a message to a Discord channel with 50,000 members, that message needs to reach 50,000 WebSocket connections, potentially across dozens of servers. Discord uses Redis Pub/Sub for this fan-out — the message is published once to a Redis channel, and every gateway server subscribed to that channel pushes it to their connected clients.

ChatGPT, interestingly, doesn't use WebSockets. It uses Server-Sent Events (SSE) — a one-way stream from server to client. This makes sense: you type a prompt (regular HTTP POST), and the server streams tokens back one at a time. There's no need for bidirectional communication. SSE is simpler, works through firewalls, and auto-reconnects. The right tool for the right job.

What happens when I type a URL and press Enter?+

Your browser performs a DNS lookup to find the server's IP address, establishes a TCP connection (3-way handshake), negotiates TLS encryption, sends an HTTP request, receives the response, and renders the HTML. This entire process takes 100-500ms depending on distance and caching.

Why are there only 13 DNS root server addresses?+

The original DNS specification limited UDP packets to 512 bytes, which could only fit 13 server addresses. But each 'server' is actually a cluster — there are over 1,700 physical root servers worldwide using Anycast routing, where the same IP address routes to the nearest physical server.

What's the difference between a load balancer and a reverse proxy?+

A reverse proxy sits in front of servers and forwards requests — it can do SSL termination, caching, and compression. A load balancer is a specific type of reverse proxy that distributes traffic across multiple servers. All load balancers are reverse proxies, but not all reverse proxies are load balancers.

Why did HTTP/3 switch from TCP to UDP?+

TCP suffers from head-of-line blocking — if one packet is lost, ALL streams on that connection are stalled until it's retransmitted. HTTP/3 uses QUIC (built on UDP) which handles streams independently. A lost packet only affects its own stream, and the protocol includes 0-RTT connection resumption.

How does a CDN know what to cache?+

CDNs use Cache-Control headers set by the origin server. These headers specify TTL (how long to cache), whether content is public or private, and whether it can be stored at all. Most CDNs also let you configure caching rules by file type, URL pattern, or custom headers.

Why doesn't everything use WebSockets?+

WebSockets maintain persistent connections, which consume server memory and make load balancing harder. For request-response patterns (90% of web traffic), regular HTTP is simpler, more cacheable, and scales better. WebSockets are only necessary when you need real-time bidirectional communication.

What's the difference between L4 and L7 load balancing?+

L4 (transport layer) load balancers route based on IP address and port — they're fast but can't read HTTP content. L7 (application layer) load balancers can read URLs, headers, cookies, and request bodies — they're smarter but slower. Most production systems use L7 for the routing flexibility.

How does Netflix serve 15% of internet traffic?+

Netflix built Open Connect — their own CDN with 19,000+ servers deployed inside ISPs worldwide. They give ISPs free hardware in exchange for rack space and power. When you press play, the video comes from a box inside your ISP, possibly one network hop away. This saves Netflix billions in bandwidth costs and gives users better quality.

Why do API gateways exist? Can't each service handle auth itself?+

They can, but that means duplicating auth logic, rate limiting, logging, and SSL across every microservice. Netflix has ~1,000 microservices — maintaining auth code in each would be a nightmare. A gateway centralizes cross-cutting concerns in one place. Change the auth method once, and it applies everywhere.

Is DNS a single point of failure?+

By design, no. DNS has massive redundancy: 1,700+ root servers, multiple authoritative servers per domain, recursive resolvers that cache aggressively (TTLs of minutes to hours), and browser/OS caches as first lines of defense. But misconfigured DNS has caused major outages — in 2021, Facebook's BGP misconfiguration made their DNS unreachable, taking down Facebook, Instagram, and WhatsApp for 6 hours.

What's Next?

You've traced a request through DNS, TCP, HTTP, load balancers, CDNs, API gateways, and real-time protocols. Next, dive into the data layer — how databases, caches, and message queues keep your data alive across billions of requests.

← Page 1: System Design 101 Page 3: The Data Layer →

Sources

Cloudflare Radar — DNS query statistics (radar.cloudflare.com)
Netflix Open Connect — CDN architecture (openconnect.netflix.com)
Cloudflare Learning Center — DNS, CDN, TLS explainers
DemandSage — Cloudflare Statistics 2026
Discord Engineering Blog — Real-time architecture
NGINX Documentation — Load balancing algorithms
HAProxy Documentation — Load balancing benchmarks
HTTP/3 RFC 9114 — QUIC specification
ByteByteGo — System Design resources