I/O Multiplexing

File Descriptor

In fact, all I/O devices are abstracted into the concept of a file. This is similar to the Unix philosophy that “everything is a file.” Disks, network data, terminals, and even pipe, which is the inter-process communication tool, are all treated as files.

Imagine that customers are waiting in line at a store. The employee then gives them a number, which they use to find a specific customer. It is important to note that the employee does not know anything about the customer, but can still find them with just one number. In the Unix/Linux world, to use a file, a number called a file descriptor is required, and this number plays the same role as the waiting number in the example above.

When opening a file, the kernel returns a file descriptor. When executing a file operation, the file descriptor is passed to the kernel. The kernel finds all information related to the file corresponding to this number and completes the file operation. Therefore, the programmer only needs to know one number, the file descriptor.

I/O Multiplexing — The secret to high concurrency

High concurrency here means that the server can handle many user requests at the same time. The way the server processes is usually to first read the user request data and then perform certain processing based on it. It is common for the server to communicate with tens of thousands of users at the same time. This means that it must handle tens of thousands of file descriptors.

If the server processes user requests one by one, the entire thread will be paused if it waits too long to process one request. Multithreading could be a workaround, but it would require too many threads to achieve high concurrency, and would place too much burden on thread scheduling and switching.

The core of the problem is that it is not possible to know in advance whether the I/O device corresponding to one file descriptor is in a readable or writable state. In other words, instead of asking the kernel whether it can read and write the ii-th file descriptor, it is more efficient to inform the kernel of the file descriptor of interest and then ask the kernel to notify it when there is a file descriptor that can read and write. This is I/O multiplexing.

The term multiplexing is actually used a lot in the field of communications. To make full use of a communication line, multiple signals must be able to be transmitted on one channel, so it is necessary to combine multiple signals into one. The device that combines multiple signals in this way is called a multiplexer. Of course, the receiving side of this signal must restore the original multiple signals after receiving the signal, and this device is called a demultiplexer.

epoll

In essence, select, poll, and epoll are all synchronous I/O multiplexing techniques. If an event such as readable or writable does not occur among the file descriptors being monitored, the called thread is blocked and paused, and the function does not return until the event occurs.

select has a limit on the number of file descriptors it can monitor, typically no more than 1024. The problem is that when the process wakes up, the programmer does not know which file descriptor caused the event, so he has to check them all. poll is very similar to select, with the only optimization being that the number of file descriptors it can monitor is greater than 1024.

However, epoll wakes up the process when an event of interest occurs and adds the ready file descriptor to the ready list. Therefore, it can directly obtain the ready file descriptor without having to check all the file descriptors from the beginning to the end, which is very efficient.

Reference

[1] 루 샤오펑. 2024. 컴퓨터 밑바닥의 비밀, 길벗


© 2025. All rights reserved.