概述
epoll
系列描述符用来监控多个文件描述符是否可用。epoll
系列相关函数包括:epoll_create
、epoll_ctl
、epoll_wait
,其中:
epoll_create
创建一个
epoll
实例,并返回一个引用该实例的文件描述符。epoll_ctl
通过
epoll_ctl
将感兴趣的文件描述符注册到epoll
实例。epoll_wait
等待
IO
事件发生,如果当前无事件则阻塞当前线程。
1 | int epoll_create(int size);// 创建一个 epoll 句柄,size 参数已被忽略 |
epoll_create
创建一个 epoll
的句柄,2.6.8 以后版本 size
被忽略。返回的 epoll
示例实际是一个文件描述符,在使用完毕时需要关闭。
epoll_ctl
1 | int epoll_ctl(int epfd, int op, int fd, struct epoll_event *event); |
对 epoll
实例进行操作。op
参数可选值:
-
EPOLL_CTL_ADD
:添加参数fd
的监听事件 -
EPOLL_CTL_DEL
:删除参数fd
的监听事件 -
EPOLL_CTL_MOD
:修改参数fd
的监听事件
struct epoll_event
结构如下:
1 | typedef union epoll_data { |
结构体中 events
成员描述关注的事件类型掩码,取值范围如下:
EPOLLIN
表示对应的文件描述符可以读EPOLLOUT
表示对应的文件描述符可以写
EPOLLPRI
表示对应的文件描述符有紧急的数据可读,一般是带外数据
EPOLLERR
表示对应的文件描述符发生错误
EPOLLET
将文件描述符设置为边沿触发模式
EPOLLEXCLUSIVE
Linux 4.5 内核添加,在操作系统内核级别解决了“惊群”问题
epoll_wait
1 | int epoll_wait(int epfd, struct epoll_event *events, int maxevents, int timeout); |
等待 epfd
关联的 poll
实例上发生事件。参数 events
用来从内核接收发生的事件,maxevents
设置最多返回的事件数目,timeout
为超时时间。与 epoll_wait
相似的函数 epoll_pwait
可以忽略信号产生,仅当事件发生或超时函数才返回。
边沿触发与水平触发
epoll
对于 fd
支持边沿触发(ET,edge trigger)和水平触发(LT,level trigger)两种模式。man
手册(man 7 epoll
)有两者区别的讲解。假设有如下场景:
- 在一个线程 A 将管道描述符写关闭,将管道描述符(rfd)在
epoll
实例上注册,并调用epoll_wait
等待事件发生。 - 另一个线程 B 将管道描述符的读关闭,在管道上写入 2kB 的数据。
- 在线程 A 中,
rfd
作为已准备好事件被epoll_wait
返回。 - 线程 A 读取
rfd
中的 1kB 数据。 - 对
epoll_wait
的调用已完成。
如果在第一步中添加 rfd
到 epoll
实例时设置触发方式为 EPOLLET
模式,在 5 步完成 epoll_wait
的调用后即使 rdf
中有可读数据,rfd
的事件也不会触发,此时请求被 hang
(线程 B 在等待线程 A 的应答数据,线程 A 在继续等待请求数据)。这是因为 EPOLLET
模式仅当变化发生在文件描述符上时才会触发事件。因此在第 5 步中,线程 A 一直在等待已经存在与 rfd
缓冲区中的数据。在上面的示例中,第 2 步因为写数据导致 rfd
的可读事件触发,同时此事件在第 3 步被消费。由于第 4 步读操作并未将所有可读数据读取,因此在第 5 步完成对 epoll_wait
的调用后,有可能导致此缓冲区中的数据一直未被读取。
使用 EPOLLET
模式的应用程序应该使用非阻塞文件描述符,以避免在处理多个文件描述符是阻塞读或写数据。将 epoll
用作 EPOLLET
模式的建议如下:
- 使用非阻塞文件描述符
- 在进行
read
或write
操作时等待EAGAIN
错误发生才认为事件处理结束
使用示例
1 |
|
出于性能原因,使用边缘触发接口时可以通过指定(EPOLLIN | EPOLLOUT)在 epoll
接口 EPOLL_CTL_ADD
中添加文件描述符一次。 这允许您使用 EPOLL_CTL_MOD
调用 epoll_ctl
,避免在 EPOLLIN
和 EPOLLOUT
之间连续切换。
manpage QA
本节摘选自 man 7 epoll
QA 部分:
What happens if you register the same file descriptor on an epoll instance twice?
You will probably get EEXIST. However, it is possible to add a duplicate (dup(2), dup2(2), fcntl(2) F_DUPFD) file descriptor to the same epoll instance. This can be a useful technique for filtering events, if the duplicate file descriptors are registered with different events masks.
Can two epoll instances wait for the same file descriptor? If so, are events reported to both epoll file descriptors?
Yes, and events would be reported to both. However, careful programming may be needed to do this correctly.
Is the epoll file descriptor itself poll/epoll/selectable?
Yes. If an epoll file descriptor has events waiting, then it will indicate as being readable.
What happens if one attempts to put an epoll file descriptor into its own file descriptor set?
The epoll_ctl(2) call will fail (EINVAL). However, you can add an epoll file descriptor inside another epoll file descriptor set.
Can I send an epoll file descriptor over a UNIX domain socket to another process?
Yes, but it does not make sense to do this, since the receiving process would not have copies of the file descriptors in the epoll set.
Will closing a file descriptor cause it to be removed from all epoll sets automatically?
Yes, but be aware of the following point. A file descriptor is a reference to an open file description (see open(2)). Whenever a file descriptor is duplicated via dup(2), dup2(2), fcntl(2) F_DUPFD, or fork(2), a new file descriptor referring to the same open file description is created. An open file description continues to exist until all file descriptors referring to it have been closed. A file descriptor is removed from an epoll set only after all the file descriptors referring to the underlying open file description have been closed (or before if the file descriptor is explicitly removed using epoll_ctl(2) EPOLL_CTL_DEL). This means that even after a file descriptor that is part of an epoll set has been closed, events may be reported for that file descriptor if other file descriptors referring to the same underlying file description remain open.
If more than one event occurs between epoll_wait(2) calls, are they combined or reported separately?
They will be combined.
Does an operation on a file descriptor affect the already collected but not yet reported events?
You can do two operations on an existing file descriptor. Remove would be meaningless for this case. Modify will reread available I/O.
Do I need to continuously read/write a file descriptor until EAGAIN when using the EPOLLET flag (edge-triggered behavior) ?
Receiving an event from epoll_wait(2) should suggest to you that such file descriptor is ready for the requested I/O operation. You must consider it ready until the next (nonblocking) read/write yields EAGAIN. When and how you will use the file descriptor is entirely up to you.
For packet/token-oriented files (e.g., datagram socket, terminal in canonical mode), the only way to detect the end of the read/write I/O space is to continue to read/write until EAGAIN.
For stream-oriented files (e.g., pipe, FIFO, stream socket), the condition that the read/write I/O space is exhausted can also be detected by checking the amount of data read from / written to the target file descriptor. For example, if you call read(2) by asking to read a certain amount of data and read(2) returns a lower number of bytes, you can be sure of having exhausted the read I/O space for the file descriptor. The same is true when writing using write(2). (Avoid this latter technique if you cannot guarantee that the monitored file descriptor always refers to a stream-oriented file.)