Networking & IPC
OS Network Stack Overviewโ
Application Layer HTTP, gRPC, SMTP, DNS
โ
Transport Layer TCP / UDP (src port, dst port, seq/ack)
โ
Network Layer IP (src IP, dst IP, routing)
โ
Data Link Layer Ethernet (MAC addresses, frames)
โ
Physical Layer Electrical signals / photons
Kernel Network Path (Receive)โ
NIC โ DMA โ Ring Buffer โ NAPI poll โ sk_buff โ
IP routing โ TCP โ Socket receive buffer โ
application read()
Socketsโ
A socket is a bidirectional communication endpoint. Represented as a file descriptor in Unix.
Socket Typesโ
| Type | Protocol | Description |
|---|---|---|
SOCK_STREAM | TCP | Reliable, ordered, connection-oriented |
SOCK_DGRAM | UDP | Unreliable, connectionless, fast |
SOCK_RAW | IP/custom | Direct access to IP layer |
AF_UNIX | Unix Domain | Local IPC (no network) |
TCP Server Lifecycleโ
Server:
socket() โ bind() โ listen() โ loop: accept() โ read/write โ close()
Client:
socket() โ connect() โ read/write โ close()
Backlog Queueโ
listen(fd, backlog): The backlog defines the size of the completed connection queue (TCP 3-way handshake done, waiting for accept()).
# System max:
cat /proc/sys/net/core/somaxconn # Default 128, often tuned to 1024โ65535
sysctl -w net.core.somaxconn=65535
TCP Deep Diveโ
Three-Way Handshakeโ
Client Server
โโโโโ SYN โโโโโโโโโโโ
โโโโโ SYN-ACK โโโโโโโ
โโโโโ ACK โโโโโโโโโโโ
โ Connection Established
TCP Connection Termination (4-Way)โ
Client Server
โโโโโ FIN โโโโโโโโโโโ
โโโโโ ACK โโโโโโโโโโโ
โโโโโ FIN โโโโโโโโโโโ
โโโโโ ACK โโโโโโโโโโโ
Client enters TIME_WAIT (2รMSL, usually 60โ120s)
TIME_WAIT: Prevents old packets from a previous connection being accepted by a new one. Can be a problem if you run out of ephemeral ports.
ss -s # Socket summary
ss -tan state time-wait # Count TIME_WAIT sockets
sysctl net.ipv4.tcp_fin_timeout # Default 60s
TCP Congestion Controlโ
| Algorithm | Description | Use Case |
|---|---|---|
| CUBIC | Default on Linux; slow start + cubic growth | General internet |
| BBR | Bandwidth-Delay Product based; ignores loss | High-BDP networks (GCP) |
| RENO | Classic; additive increase, multiplicative decrease | Legacy |
sysctl net.ipv4.tcp_congestion_control # Current algorithm
sysctl -w net.ipv4.tcp_congestion_control=bbr
Nagle's Algorithm vs TCP_NODELAYโ
Nagle: Buffers small writes until a full segment or ACK received. Reduces packet count, increases latency.
// Disable for low-latency (e.g., game servers, trading systems):
socket.setTcpNoDelay(true);
// Or via ServerSocket:
serverSocket.setOption(StandardSocketOptions.TCP_NODELAY, true);
TCP Keep-Aliveโ
Detects dead connections by sending probes after idle time.
sysctl net.ipv4.tcp_keepalive_time # Idle time before first probe (default 7200s)
sysctl net.ipv4.tcp_keepalive_intvl # Interval between probes (default 75s)
sysctl net.ipv4.tcp_keepalive_probes # Probes before giving up (default 9)
// Java Socket:
socket.setKeepAlive(true);
// Fine-grained (Java 11+):
socket.setOption(ExtendedSocketOptions.TCP_KEEPIDLE, 30);
socket.setOption(ExtendedSocketOptions.TCP_KEEPINTERVAL, 5);
socket.setOption(ExtendedSocketOptions.TCP_KEEPCOUNT, 3);
Socket Optionsโ
ServerSocketChannel ssc = ServerSocketChannel.open();
ssc.setOption(StandardSocketOptions.SO_REUSEADDR, true); // Reuse port in TIME_WAIT
ssc.setOption(StandardSocketOptions.SO_REUSEPORT, true); // Multiple sockets on same port (Linux 3.9+)
ssc.setOption(StandardSocketOptions.SO_RCVBUF, 1024 * 1024); // Receive buffer size
ssc.setOption(StandardSocketOptions.SO_SNDBUF, 1024 * 1024); // Send buffer size
SO_REUSEPORTโ
Allows multiple sockets to bind to the same port โ kernel load-balances incoming connections. Used by Nginx/Redis for multi-process accept.
Zero-Copy Networkingโ
Traditional send(file):
Disk โ page cache โ kernel buffer โ user buffer โ socket buffer โ NIC
(2 copies)
sendfile() system call:
Disk โ page cache โ socket buffer โ NIC
(0 user-space copies)
// Java: FileChannel.transferTo()
try (FileChannel in = FileChannel.open(path);
SocketChannel out = socketChannel) {
in.transferTo(0, in.size(), out); // Uses sendfile() on Linux
}
UDP and Multicastโ
UDP is connectionless, no reliability guarantees. Used for:
- Real-time video/audio streaming
- DNS queries
- DHCP
- Game state updates
DatagramChannel channel = DatagramChannel.open();
channel.bind(new InetSocketAddress(9999));
ByteBuffer buf = ByteBuffer.allocate(1024);
SocketAddress sender = channel.receive(buf);
Multicastโ
Send one packet to a group of subscribers. Used in market data feeds, service discovery.
NetworkInterface ni = NetworkInterface.getByName("eth0");
InetAddress group = InetAddress.getByName("224.0.0.1");
channel.join(group, ni);
IPC Mechanisms Comparisonโ
| Mechanism | Direction | Scope | Performance | Notes |
|---|---|---|---|---|
| Pipe | Unidirectional | Related processes | Fast | pipe() syscall |
| Named Pipe (FIFO) | Unidirectional | Any local process | Fast | Has a filesystem name |
| Unix Domain Socket | Bidirectional | Same host | Fastest | Like TCP but no TCP overhead |
| Message Queue | Bidirectional | Same host | Medium | Kernel-managed; POSIX or SysV |
| Shared Memory | Bidirectional | Same host | Fastest | Needs explicit sync (mutex/semaphore) |
| TCP Socket | Bidirectional | Network | Variable | Portable across hosts |
| Signal | Notification only | Same host | Very fast | No data payload (except sigqueue) |
| Memory-mapped File | Bidirectional | Same host | Fast | Persistent; survives process death |
Shared Memory (POSIX)โ
// Create/open shared memory:
int fd = shm_open("/my_shm", O_CREAT | O_RDWR, 0666);
ftruncate(fd, SIZE);
void *ptr = mmap(0, SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
// Write:
strcpy((char *)ptr, "hello");
// Another process: shm_open with same name, mmap, read
Unix Domain Socket vs TCP Loopbackโ
Unix domain socket:
- Directly copies between kernel socket buffers (no TCP/IP stack).
- ~30โ50% faster than TCP loopback.
- Supports credential passing (
SO_PEERCRED).
# Benchmark comparison:
iperf3 -s -D && iperf3 -c 127.0.0.1 # TCP loopback
socat UNIX-LISTEN:/tmp/s.sock,fork - # Unix socket
High-Performance Server Patternsโ
Reactor Pattern (Java NIO)โ
// Single-threaded Reactor with Selector:
Selector selector = Selector.open();
ServerSocketChannel server = ServerSocketChannel.open();
server.configureBlocking(false);
server.bind(new InetSocketAddress(8080));
server.register(selector, SelectionKey.OP_ACCEPT);
while (true) {
selector.select();
for (SelectionKey key : selector.selectedKeys()) {
if (key.isAcceptable()) {
SocketChannel client = server.accept();
client.configureBlocking(false);
client.register(selector, SelectionKey.OP_READ, ByteBuffer.allocate(4096));
}
if (key.isReadable()) {
SocketChannel client = (SocketChannel) key.channel();
ByteBuffer buf = (ByteBuffer) key.attachment();
client.read(buf);
// process request...
}
}
selector.selectedKeys().clear();
}
Proactor Patternโ
Uses async I/O (AIO) โ kernel notifies on completion, not readiness. Java's AsynchronousSocketChannel:
AsynchronousServerSocketChannel server =
AsynchronousServerSocketChannel.open().bind(new InetSocketAddress(8080));
server.accept(null, new CompletionHandler<AsynchronousSocketChannel, Void>() {
public void completed(AsynchronousSocketChannel client, Void att) {
server.accept(null, this); // accept next
ByteBuffer buf = ByteBuffer.allocate(4096);
client.read(buf, buf, new ReadHandler(client));
}
public void failed(Throwable exc, Void att) { }
});
Thread Per Request vs. Event Loopโ
| Thread Per Request | Event Loop (NIO) | |
|---|---|---|
| Simplicity | High | Lower |
| Memory | High (each thread ~256KBโ1MB stack) | Low |
| Throughput | Limited by thread count | Very high |
| Blocking I/O | Natural | Problematic |
| Examples | Tomcat blocking, Java EE | Netty, Vert.x, Node.js |
| Java 21+ | Virtual threads solve this | Still valid for CPU work |
Common Interview Questionsโ
Q1: What is the difference between TCP and UDP?โ
TCP: Connection-oriented, reliable (retransmits lost packets), ordered, flow/congestion control, higher overhead. UDP: Connectionless, unreliable, unordered, no flow control, low overhead. Use TCP for correctness (HTTP, databases); UDP for latency (DNS, gaming, streaming where a dropped frame is acceptable).
Q2: What is the TIME_WAIT state and why does it exist?โ
After a connection is closed by the initiating side, it enters TIME_WAIT for 2รMSL (Maximum Segment Lifetime, typically 60s). Purpose: ensures the final ACK is received by the peer (re-sends if needed) and prevents old duplicate packets from contaminating a new connection on the same 4-tuple.
Q3: What happens when the receive buffer of a TCP socket is full?โ
TCP advertises a receive window of 0 to the sender (zero-window probe). The sender stops sending until the window opens. This is flow control. If the application doesn't read fast enough, this backs pressure up through the entire connection.
Q4: What is SO_REUSEADDR vs SO_REUSEPORT?โ
SO_REUSEADDR: Allows a new socket to bind to a port that's in TIME_WAIT state. Essential for server restarts. SO_REUSEPORT: Allows multiple sockets to bind the same port โ the kernel distributes connections among them. Used for multi-process accept (Nginx) or per-thread accept.
Q5: How does epoll handle thundering herd?โ
Classic problem: multiple threads/processes block on accept(), a new connection arrives, all are woken up, only one succeeds, others go back to sleep โ wasted context switches. With EPOLLEXCLUSIVE flag (Linux 4.5+) or SO_REUSEPORT, only one thread/process is woken per event.
Q6: What is a Unix domain socket and when would you use it?โ
A Unix domain socket provides socket-like communication between processes on the same machine, but bypasses the network stack โ much faster than TCP loopback. Use it for local inter-process communication (e.g., Nginx โ PHP-FPM, PostgreSQL local connections, Docker daemon socket).
Q7: What is the difference between blocking, non-blocking, and async I/O?โ
- Blocking I/O:
read()blocks until data is available. - Non-blocking I/O:
read()returns immediately withEAGAINif no data; requires polling. - I/O Multiplexing (
select/poll/epoll): Block until any of multiple FDs are ready, then do non-blocking I/O. - Async I/O (AIO): Initiate I/O, continue executing, get notified on completion. Kernel does the I/O in the background.
Q8: How does Netty achieve high performance?โ
Netty uses: NIO with event loops (one thread per selector, no blocking), zero-copy (direct ByteBuffers, sendfile), memory pooling (PooledByteBufAllocator to avoid GC pressure), pipeline architecture (chain of handlers), and efficient encoding/decoding. It avoids context switches by keeping I/O in dedicated event loop threads.
Advanced Editorial Pass: Networking and IPC for High-Concurrency Servicesโ
Senior Engineering Focusโ
- Design connection lifecycle and back-pressure policy explicitly.
- Choose IPC and socket models by ordering, throughput, and failure semantics.
- Understand kernel network queues and zero-copy constraints.
Failure Modes to Anticipateโ
- Connection storms exhausting file descriptors and accept queues.
- Head-of-line blocking in poorly partitioned message channels.
- Packet loss and retransmission patterns misdiagnosed as app bugs.
Practical Heuristicsโ
- Set FD and backlog budgets with monitoring thresholds.
- Instrument queue depths and retransmit rates.
- Apply load shedding before saturation cascades.