TCP, UDP & Transport Layer
Transport Layer Roleโ
The transport layer provides process-to-process communication using port numbers. While IP routes packets between hosts, the transport layer routes data between specific applications on those hosts.
Host A (IP: 10.0.0.1) Host B (IP: 10.0.0.2)
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ Chrome โ port 54321โ โ nginx โ port 443 โ
โ Slack โ port 54322โ โโโโ TCP/UDP โโโบโ postgres โ port 5432 โ
โ Zoom โ port 54323โ โ redis โ port 6379 โ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
Port ranges:
0โ1023: Well-known ports (HTTP: 80, HTTPS: 443, SSH: 22, DNS: 53)1024โ49151: Registered ports (PostgreSQL: 5432, MySQL: 3306)49152โ65535: Ephemeral (dynamic) โ assigned by OS for outgoing connections
TCP โ Transmission Control Protocolโ
TCP provides reliable, ordered, connection-oriented delivery.
Key features:
- Connection establishment / teardown (stateful)
- Guaranteed delivery (acknowledgments + retransmission)
- Ordered delivery (sequence numbers)
- Error detection (checksum)
- Flow control (receiver window)
- Congestion control (sender limits)
TCP Three-Way Handshakeโ
Client Server
โ โ
โโโโโ SYN (seq=x) โโโโโโโโโโโโโบโ "I want to connect, my ISN is x"
โ โ
โโโโโ SYN-ACK (seq=y,ack=x+1)โโ "OK, my ISN is y, I got your x"
โ โ
โโโโโ ACK (ack=y+1) โโโโโโโโโโโบโ "Got it, connection established"
โ โ
โโโโโโโโโ DATA TRANSFER โโโโโโโโ
ISN (Initial Sequence Number): random starting point for sequence numbering โ prevents old duplicate packets from being accepted.
States involved: CLOSED โ SYN_SENT โ ESTABLISHED (client); LISTEN โ SYN_RECEIVED โ ESTABLISHED (server)
// Java: TCP connection is implicit in socket creation
Socket socket = new Socket("google.com", 443);
// โ triggers 3-way handshake automatically
// Server side
ServerSocket server = new ServerSocket(8080);
Socket client = server.accept(); // blocks until client connects
TCP Segment Structureโ
0 7 8 15 16 23 24 31
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โ Source Port โ Destination Port โ โ 32 bits
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโค
โ Sequence Number โ โ byte offset of first data byte
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ Acknowledgment Number โ โ next expected byte from sender
โโโโโโโโฌโโโฌโโโฌโโโฌโโโฌโโโฌโโโโโโโโโโโโโโค
โ Data โ โU โA โP โR โS โFโ Window โ
โ Off โ โR โC โS โS โY โIโ Size โ
โ โ โG โK โH โT โN โNโ โ
โโโโโโโโดโโโดโโโดโโโดโโโดโโโดโโโดโโโโโโโโโโโค
โ Checksum โ Urgent Pointer โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Key flags:
SYN: synchronize sequence numbers (connection request)ACK: acknowledgment field is validFIN: sender finished sendingRST: reset / abort connectionPSH: push buffered data to application immediatelyURG: urgent data present
TCP Connection Termination โ 4-Way Handshakeโ
TCP termination is asymmetric โ each side closes independently.
Client Server
โ โ
โโโโโ FIN (seq=u) โโโโโโโโโโโโโบโ "I'm done sending"
โโโโโ ACK (ack=u+1) โโโโโโโโโโโ "Got it"
โ โ โ Server may still send data (half-close)
โโโโโ FIN (seq=v) โโโโโโโโโโโโโ "I'm done sending"
โโโโโ ACK (ack=v+1) โโโโโโโโโโโบโ
โ โ
โ [TIME_WAIT: 2รMSL = ~60s] โ
TIME_WAIT state: the active closer waits 2 ร MSL (Maximum Segment Lifetime, ~30s) before closing the socket. Why?
- Ensures the last ACK reaches the server (if lost, server retransmits FIN)
- Lets duplicate packets from the old connection expire
TIME_WAIT on a server with many short connections causes port exhaustion. Solutions: enable SO_REUSEADDR, use connection pools (HikariCP), use keep-alive, or tune tcp_tw_reuse on Linux.
TCP Sequence Numbers & Acknowledgmentsโ
Sender sends bytes 1โ1000 (MSS=500):
Segment 1: seq=1, data=[1..500]
Segment 2: seq=501, data=[501..1000]
Receiver:
โ ACK 501 (got bytes 1-500, expecting 501 next)
โ ACK 1001 (got bytes 501-1000, expecting 1001 next)
If Segment 1 lost:
โ Receiver gets seq=501, buffers it, but still sends ACK 1
โ Sender sees duplicate ACKs or timeout โ retransmits seq=1
TCP Flow Control โ Sliding Windowโ
Prevents a fast sender from overwhelming a slow receiver.
Receiver advertises: Window Size = 65535 bytes (how much buffer space available)
Sender must not have more than this many unacknowledged bytes in flight
[ Sent & ACKed | Sent, not ACKed | Can send | Cannot send yet ]
โโโ in-flight โโโโบ โโโ window โโบ
If receiver buffer fills up:
โ Receiver sends Window Size = 0 โ "stop sending"
โ Sender pauses, sends zero-window probes periodically
โ Receiver sends Window Update when buffer drains
TCP Congestion Controlโ
Prevents a sender from overwhelming the network (not just the receiver).
Phasesโ
1. Slow Start
cwnd = 1 MSS (congestion window starts small)
cwnd doubles every RTT (exponential growth)
Until: cwnd reaches ssthresh (slow start threshold) OR packet loss
2. Congestion Avoidance
After slow start threshold:
cwnd += 1 MSS per RTT (linear growth)
"Additive Increase"
3. Congestion Detection & Reaction
Packet loss (timeout):
ssthresh = cwnd / 2
cwnd = 1 MSS โ restart Slow Start
3 duplicate ACKs (fast retransmit):
ssthresh = cwnd / 2
cwnd = ssthresh โ skip slow start, enter Congestion Avoidance
"Multiplicative Decrease"
This is AIMD (Additive Increase, Multiplicative Decrease).
Modern Algorithmsโ
| Algorithm | Key Innovation | Use Case |
|---|---|---|
| Reno | Classic AIMD | Legacy |
| CUBIC | Cubic growth function | Linux default (LAN/WAN) |
| BBR | Bandwidth + RTT based (not loss-based) | High-BDP paths, satellite |
| QUIC | UDP-based, built into HTTP/3 | Modern web |
TCP Options & Tuningโ
# Linux TCP tuning
# Increase socket buffer sizes for high-bandwidth links
sysctl -w net.core.rmem_max=134217728
sysctl -w net.core.wmem_max=134217728
sysctl -w net.ipv4.tcp_rmem="4096 87380 134217728"
sysctl -w net.ipv4.tcp_wmem="4096 65536 134217728"
# Enable TCP window scaling (default on modern kernels)
sysctl -w net.ipv4.tcp_window_scaling=1
# Selective Acknowledgment (SACK) โ retransmit only lost segments
sysctl -w net.ipv4.tcp_sack=1
# Enable BBR congestion control
sysctl -w net.ipv4.tcp_congestion_control=bbr
UDP โ User Datagram Protocolโ
UDP provides connectionless, unreliable, fast delivery.
What UDP doesn't have (vs TCP):
- No connection setup/teardown
- No guaranteed delivery
- No ordering
- No congestion control or flow control
What UDP has:
- Very low overhead (8-byte header vs 20+ for TCP)
- No round trips before sending
- No retransmission delays
- Supports broadcast and multicast
UDP Header (8 bytes only):
โโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โ Source Port โ Destination Port โ
โโโโโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโโโโค
โ Length โ Checksum โ
โโโโโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโโโโ
[ Data payload ]
TCP vs UDP Comparisonโ
| Feature | TCP | UDP |
|---|---|---|
| Connection | Stateful (3-way handshake) | Connectionless |
| Reliability | Guaranteed delivery | Best-effort |
| Ordering | Guaranteed | Not guaranteed |
| Speed | Slower (overhead) | Faster |
| Header size | 20โ60 bytes | 8 bytes |
| Flow control | Yes | No |
| Congestion control | Yes | No |
| Broadcast/Multicast | No | Yes |
| Use cases | HTTP, SSH, DB, email | DNS, video, VoIP, games |
When to Use UDPโ
| Use Case | Why UDP |
|---|---|
| DNS queries | Single request/response; retransmit at app layer if needed |
| Video/audio streaming | Slightly stale frame better than delayed one |
| VoIP / Video calls | Real-time; packet loss tolerable, latency is not |
| Online gaming | Low latency critical; game state syncs frequently |
| DHCP | Bootstrapping; no existing connection |
| SNMP | Simple polling; app handles reliability |
| QUIC / HTTP/3 | Reliability implemented in QUIC above UDP |
Java Socket Programmingโ
// TCP Client
try (Socket socket = new Socket("localhost", 8080)) {
PrintWriter out = new PrintWriter(socket.getOutputStream(), true);
BufferedReader in = new BufferedReader(
new InputStreamReader(socket.getInputStream()));
out.println("Hello, server!");
String response = in.readLine();
System.out.println("Server: " + response);
}
// TCP Server
try (ServerSocket server = new ServerSocket(8080)) {
System.out.println("Listening on port 8080");
while (true) {
Socket client = server.accept();
new Thread(() -> handleClient(client)).start();
}
}
// UDP Client
try (DatagramSocket socket = new DatagramSocket()) {
byte[] data = "Hello UDP".getBytes();
InetAddress addr = InetAddress.getByName("localhost");
DatagramPacket packet = new DatagramPacket(data, data.length, addr, 9090);
socket.send(packet);
byte[] buf = new byte[1024];
DatagramPacket response = new DatagramPacket(buf, buf.length);
socket.receive(response);
}
TCP Keep-Aliveโ
Detects broken connections when no data flows.
# Linux kernel keep-alive settings
net.ipv4.tcp_keepalive_time = 7200 # idle before probes (seconds)
net.ipv4.tcp_keepalive_intvl = 75 # probe interval
net.ipv4.tcp_keepalive_probes = 9 # probes before declaring dead
// Java socket keep-alive
socket.setKeepAlive(true);
// Spring WebClient / RestTemplate connection pool keep-alive
// (handled by HttpClient connection pool settings)
๐ฏ Interview Questionsโ
Q1. Describe the TCP three-way handshake.
Client sends SYN with its Initial Sequence Number (ISN). Server responds with SYN-ACK โ acknowledging the client's ISN and sending its own ISN. Client sends ACK โ acknowledging the server's ISN. After this, the connection is established and both sides have synchronized sequence numbers for reliable, ordered data transfer.
Q2. What is the purpose of sequence numbers in TCP?
Sequence numbers serve three purposes: (1) ordering โ the receiver can reorder out-of-order segments; (2) duplicate detection โ old or retransmitted segments with already-ACKed sequence numbers are discarded; (3) reliable delivery โ the sender knows which data has been received via ACK numbers, and retransmits unacknowledged data.
Q3. What is TCP flow control vs congestion control?
Flow control prevents the sender from overwhelming the receiver's buffer โ the receiver advertises its available window size in each ACK, and the sender limits in-flight data accordingly. Congestion control prevents the sender from overwhelming the network โ it uses algorithms (slow start, AIMD) to probe for available bandwidth without causing queue overflow at routers.
Q4. Why does TCP have a TIME_WAIT state and what problems can it cause?
TIME_WAIT ensures: (1) the final ACK reaches the server (if lost, server retransmits FIN within the wait window); (2) duplicate packets from the old connection expire before a new connection on the same port pair is allowed. Problem: high-throughput servers with many short-lived connections can exhaust ephemeral ports and see
address already in useerrors. Solutions:SO_REUSEADDR, connection pooling, ortcp_tw_reuse.
Q5. When would you choose UDP over TCP?
Choose UDP when: low latency is more important than perfect reliability (VoIP, gaming, real-time video); the application handles its own reliability (QUIC, DNS); data is time-sensitive and retransmission would be useless (live streaming โ a retransmitted old frame arrives after newer frames); or multicast/broadcast is needed (DHCP, mDNS).
Q6. What is TCP slow start and why does it exist?
Slow start is TCP's initial congestion probing phase. It starts with a small congestion window (1 MSS) and doubles it each RTT until a threshold or loss is detected. This prevents a new connection from immediately blasting traffic onto a congested network. Despite the name, exponential growth is actually fast โ a 10 Gbps link can be fully utilized within a few RTTs.
Q7. What is a SYN flood attack and how is it mitigated?
A SYN flood sends many SYN packets without completing the handshake, filling the server's SYN queue (half-open connections). The server allocates state for each SYN, exhausting resources. Mitigation: SYN cookies โ the server encodes connection state in the ISN instead of allocating resources; validates the client with the ACK's sequence number. Also: firewall rate limiting on SYNs, shorter SYN timeout.
Q8. What is the difference between a TCP RST and FIN?
FIN is a graceful close โ the sender has finished sending data but the connection stays half-open; the other side can still send. RST is an abrupt abort โ the connection is immediately terminated, all buffered data is discarded. RST occurs when: connecting to a closed port, a firewall drops the connection, or the application calls
socket.close()with pending data (as opposed tosocket.shutdownOutput()for graceful close).