fcntl & cmsg

Discussion:

fcntl & cmsg

James K. Lowden

2014-09-19 19:38:19 UTC

unix(4) contains an "interesting" sentence:

"The received descriptor is a duplicate of the sender's
descriptor, as if it were created with a call to dup(2). Per-process
descriptor flags, set with fcntl(2), are not passed to a receiver."

because fcntl(2) doesn't define "per-process" descriptor flags. Does
the sentence mean "any flag set with fcntl", or are some flags
per-process? If the latter, is the reader supposed to be able to
determine which flags are per-process from the context?

McKusick distinguishes between a "file entry" describing an open file,
and a descriptor, which is an index into an array of references to file
entries. The descriptor array -- and hence each descriptor -- is
unique to each process, whereas many references to the file entry may
be created by fork() and dup(), and via unix domain sockets.

But fcntl(2) says it "provides for control over descriptors" when in
fact sometimes it updates or interrogates the file entry. Examples
include F_SETLK and (afaict) F_SETOWN. fcntl(2) does mention "flags
associated with the file descriptor", which I'm willing to believe are
"per-process". They are:

* the close-on-exec flag via F_SETFD
* the O_NONBLOCK, O_APPEND, and O_ASYNC flags via F_SETFL

(Can anyone explain why close-on-exec isn't just another option for
F_SETFL? I see that dup(2) preserves the F_SETFL flags but not
close-on-exec. Interesting choice....)

Do I have that aright? Are those the flags unix(4) means will not be
passed?

I'm also trying to reason about what happens when we "pass a
descriptor" over a unix domain socket (a misnomer, because we're
creating a new descriptor on an existing file entry). Something a lot
like dup(2) happens, and the new descriptor must be held by the kernel
for the benefit of the receiving process until it is received, at which
point it's installed in the process's descriptor table. Sort of like
fork in slow motion.

It's unclear to me why the sematics of descriptor-passing are different
in any way from dup(2). Are the differences considered a wart, or is
there a good reason for them?

FWIW, Linux doesn't have the same restriction: that page says, "The
passed file descriptors behave as though they have been created with
dup(2)."

Many thanks for your insight and elucidation.

--jkl

Greg Troxel

2014-09-19 22:58:58 UTC

Permalink

Very good questions. I can only suggest that you read the code and
explain to us what actually happens, and then we can argue about that,
and then maybe you can send a patch for the docs. (really, I am not
kidding).

Matt Thomas

2014-09-19 23:01:06 UTC

Permalink

is the descriptor sharing the its reference with another descriptor (which happens with dup(2)/dup(2)) or did get a separate reference via a open or socket call?

if you have a dup'ed file descriptor, setting an attribute on it affects all the other descriptors dup'ed from it.. If you open a file multiple times, each descriptor is a separate reference and setting an attribute on one doesn't affect the others.

There are two levels of reference involved. A descriptor indexes into a table of file structure pointers. dup/dup2 simply duplicates file struct pointer for one descriptor into a new slot increasing its reference count by one.

socket/open/creat allocate a new file structure and insert a pointer to the new file structure into the table. It's this file structure which keep attribute of a file descriptor.

David Holland

2014-09-22 05:26:47 UTC

Permalink

Post by James K. Lowden
"The received descriptor is a duplicate of the sender's
descriptor, as if it were created with a call to dup(2). Per-process
descriptor flags, set with fcntl(2), are not passed to a receiver."
because fcntl(2) doesn't define "per-process" descriptor flags. Does
the sentence mean "any flag set with fcntl", or are some flags
per-process? If the latter, is the reader supposed to be able to
determine which flags are per-process from the context?

I don't know what the documentation is supposed to mean, but the way
it's supposed to behave is:

- fd passing is like calling dup2() except that (once received) the
new fd appears in another process;

- this means that the close-on-exec flag, which is per file handle,
is not shared;

- everything else is an attribute of the open file object (or in
some cases the vnode, e.g. with F_SETLK or flock()) and should be
shared between the two references.

Post by James K. Lowden
McKusick distinguishes between a "file entry" describing an open file,

I think by "file entry" this means "open file object" or in NetBSD,
"struct file". The per-process open file table is an array of
references to these.

They aren't the same as vnodes; vnodes are one layer deeper.

Some people use the words "file descriptor" to refer to the open file
object as well as to the integer that consitutes a user-level
reference to it. This can occasionally be confusing...

Post by James K. Lowden
(Can anyone explain why close-on-exec isn't just another option for
F_SETFL? I see that dup(2) preserves the F_SETFL flags but not
close-on-exec. Interesting choice....)

Because it's per-handle instead of per-object. That in turn is to make
close-on-exec and I/O redirection interact productively.

--
David A. Holland
***@netbsd.org

James K. Lowden

2014-09-23 13:38:47 UTC

Permalink

On Mon, 22 Sep 2014 05:26:47 +0000

Post by David Holland
- fd passing is like calling dup2() except that (once received) the
new fd appears in another process;
- this means that the close-on-exec flag, which is per file handle,
is not shared;
- everything else is an attribute of the open file object (or in
some cases the vnode, e.g. with F_SETLK or flock()) and should be
shared between the two references.

Thank you, David. That conforms with my tests.

The fd read by the receiving process has the very same properties as
the sender's, except that close-on-exec is cleared. Other flags,
notably O_ASYNC and O_DIRECT, remain in force. So do record locks.

My question arose from unix(4) page because it refers to "per-process
descriptor flags". The Linux unix(7) page I have handy seems to hold a
clue:

"The [F_GETFD and F_SETFD] commands manipulate the flags
associated with a file descriptor. Currently, only one such flag is
defined: FD_CLOEXEC, the close-on-exec flag."

It would seem that while the earth was still cooling someone had the
intention of creating other flags associated with the descriptor that
dup(2) would not propagate. Our unix(4) stands athwart the continents
declaring all such flags behave that way, a fact beyond dispute. Never
mind that the set has only one member!

I don't feel qualified to offer a change the documentation yet. At
Greg's suggestion I did start to read the code, but I'm a long way from
understanding it.

--jkl

David Laight

2014-10-15 23:10:43 UTC

Permalink

Post by David Holland

I don't know what the documentation is supposed to mean, but the way
- fd passing is like calling dup2() except that (once received) the
new fd appears in another process;
- this means that the close-on-exec flag, which is per file handle,
is not shared;
- everything else is an attribute of the open file object (or in
some cases the vnode, e.g. with F_SETLK or flock()) and should be
shared between the two references.

IIRC...
Except that (posix) file locks are per process, not per open file.
This is fubar:
1) you can't lock against another thread in the same process.
2) if another thread opens and closes the same file you lose your locks.

David

--
David Laight: ***@l8s.co.uk