opening /proc/self/fd/1 is just not the identical as dup(1) · weblog.gnoack.org

A dive into Linux’ /proc file system
Introduction
Unix processes have file descriptors which level to file descriptions (struct file
in Linux). A number of file descriptors can level to the identical file description, as an illustration by duplicating them with dup(2), or by passing them throughout course of boundaries utilizing fork(2) or UNIX Area Sockets (unix(7)).
For a very long time, I used to be underneath the impression that that was additionally what occurred behind the scenes when opening /dev/fd/${FD}
(a.ok.a. /proc/${PID}/fd/${FD}
) on Linux. I assumed I’d get a brand new file descriptor which can also be pointing to the identical file description, just like when you have been calling dup(fd)
. That is fallacious!
This characteristic is mis-documented
The misunderstanding is even documented in “The Linux Programming Interface” (part 5.11):
Opening one of many information within the /dev/fd listing is equal to duplicating the corresponding file descriptor. Thus, the next statements are equal:
fd = open("/dev/fd/1", O_WRONLY); fd = dup(1);
This can be a fairly easy implementation which is shut sufficient to
actuality for a lot of sensible use circumstances, and which is true on different
Unixes, however it isn’t totally correct on Linux.
This RedHat bug from 2000 discusses how that behaviour was apparently modified in Linux 1.3.34. The aforementioned equivalence between the open(2) and dup(2) calls known as the “Plan9 semantics” there.
/dev/fd/*
behave totally different on different Unixes
On top of that, the conduct is applied in a different way on different Unixes.
From a FreeBSD 14 field:
$ ./dup -dup > out; cat out; echo
1d
$ ./dup -proc > out; cat out; echo
1d
On FreeBSD, the results of open("/dev/fd/1", O_WRONLY);
does share the identical file description with the unique file descriptor, as if we have been calling dup(1)
.
Half 1: An experiment!
It seems, opening /dev/fd/*
, /proc/${PID}/fd/*
or /proc/self/fd/*
leads to a separate file description (struct file
) being allotted for you, nevertheless it refers back to the identical underlying file on disk.
You may attempt it out with the next program:
$ cat dup.c
#embody <err.h>
#embody <fcntl.h>
#embody <stdio.h>
#embody <string.h>
#embody <unistd.h>
int utilization(const char *title) -proc]n", title);
return 0;
int major(int argc, char *argv[]) {
int fd;
if (argc != 2) {
return utilization(argv[0]);
}
if (!strcmp(argv[1], "-dup")) {
fd = dup(1); // stdout
if (fd < 0) {
err(1, "dup");
}
} else if (!strcmp(argv[1], "-proc")) {
fd = open("/dev/fd/1", O_WRONLY);
if (fd < 0) {
err(1, "open /dev/fd/1");
}
} else {
return utilization(argv[0]);
}
write(1, "1", 1);
write(fd, "d", 1);
shut(fd);
}
Once we construct and run this program, we will see that the conduct of dup(2) and open(2) is definitely totally different!
Duplicating the file descriptor utilizing dup(2)
$ make dup
cc -Wall -static dup.c -o dup
$ ./dup -dup > out; cat out; echo
1d
$
Within the dup(2) case, the struct file
is definitely shared – each file descriptors discuss with the very same file description. The primary write(2) updates the file description’s file place (f_pos
). The second write(2) makes use of the very same file description, so it sees the up to date file place, and the byte will get written after the one which was written earlier than.
Duplicating the file descriptor by means of /proc
$ ./dup -proc > out; cat out; echo
d
$
Within the proc(2) case, we see just one byte written to the output file.
So there are two struct file
s created –
they usually use unbiased positions f_pos
within the file, that are each set to 0 initially.
- The primary write(2) by means of stdout (fd 1) updates the file place from 0 to 1.
- The second write(2) makes use of a separate file description
and overwrites the byte that was beforehand written.
That’s why we will solely see “d
” within the output.
Different file varieties
Thus far, this was a bit complicated. It’s undoubtedly inconsistent with
the speculation that opening /dev/fd/*
does the identical as dup(2). However
what occurs for different file varieties than common information?
TCP Sockets: Cannot be reopened by means of /proc
You may do this out by redirecting stdout to a socket, utilizing the
obscure /dev/tcp
extension in bash:
$ nc -l 9999 &
[1] 4166
$ ./dup -proc >/dev/tcp/localhost/9999
dup: open /dev/fd/1: No such machine or tackle
[1]+ Completed nc -l 9999
$
The error right here is ENXIO: No such machine or tackle
.
For sockets, the /proc/self/fd/*
entry is a symlink to a reputation like socket:[16902]
.
lrwx------ 1 gnoack gnoack 64 Feb 17 23:12 1 -> 'socket:[16902]'
Pipes: Can be reopened by means of /proc
Nonetheless, a pipe can be reopened by means of /dev/fd/1
, for instance like this:
$ ./dup -proc | cat ; echo
1d
…and this works regardless that the pipe’s symlink seems to be like this:
l-wx------ 1 gnoack gnoack 64 Feb 17 23:10 1 -> 'pipe:[15895]'
Half 2: What is actually occurring
First, let’s recall the in-kernel VFS construction:
The next issues occur in a sequence:
- A person area course of calls
open("/proc/self/fd/1")
- System name handler:
- parses flags
- does the trail stroll, which ultimately invokes
proc_pid_get_link()
:fs/proc/base.c:proc_pid_get_link()
:- invokes
proc_fd_link()
by means of a callbackfs/proc/fd.c:proc_fd_link()
: seems to be up the uniquestruct file*
from the goal activity and returns the->f_path
that existed on thatstruct file
(by means of an output pointer argument).
- invokes
nd_jump_link()
, which units the results of the trail stroll innameidata
to the beforehand set path!
- invokes
- ultimately calls
path_openat()
.namei.c:path_openat()
: At all times allocates a brand new `strucugh an output pointer argument).- invokes
nd_jump_link()
, which units the results of the trail stroll innameidata
to the beforehand set path!
- invokes
- ultimately calls
path_openat()
.namei.c:path_openat()
: At all times allocates a brand newstr calls the "open" file operation:
f->f_op->open`
The place did the no_open
pointer come from?
For the TCP socket above, f->f_op->open
is ready to the no_open
operate, which unconditionally returns ENXIO
. In order that socket can’t be reopened by means of /proc
.
The choice which f_op->open
is used for every file is completed in inode.c:init_special_inode
, for sockets and pipes.
Abstract
- Each name to open(2) leads to a brand new
struct file*
being allotted. - The ensuing
struct file*
refers to an current inode, even for particular information like pipes. - Not all the particular information help this type of re-opening.