How Unix manages processes in userland

Discussion:

Erik Fair

2013-12-05 20:34:46 UTC

So, daemons - watched over or not?

There's been a philosophical split (sort of) between the BSD/v7 school of thought and the System III/V school of thought, on how the Unix system should startup and manage userland processes where (background) daemons are concerned, with the structure and function of process 1 (whether you call that /{etc,sbin}/init or something else) at the center of it.

Process 1 has always been responsible for reaping processes whose parents did not do so (e.g. processes not started by shells), but what it does with those events beyond simple status collection has varied.

In BSD-land, derived from Version 7 Unix (or 7th Edition, if you prefer), daemons are expected to fork into independent processes and their parents exit, leaving them with the default parent of process 1. Monitoring independent daemons has been somewhat ad-hoc and messy; we've left process 1 more or less alone in this regard; it merely collects status (with one exception).

In USG/System III/V-land, they opted to add a more general process monitoring facility into process 1, in what's known as "inittab" (or by other names in other variants). I'd argue: good idea, pretty terrible implementation - when System III shipped, it was clear to me what needed to be done: the "daemonization" routines of all daemons run by inittab needed to be removed (no more fork/exit) so that /etc/init could properly manage those processes, and restart them when they died (and terminate them when the system is being shut down). I did that in the early 1980's at Dual Systems, a small mc68k Unix-box manufacturer that was my employer, and it worked well. Had to redo all the work for System V, alas (it always annoyed me that USG/AT&T added new facilities like inittab and then didn't perform the necessary code rototill for the system to properly use them), and I've never liked inittab's "run levels".

It is important to note right here that there's one area in which both schools of thought agreed: user login sessions initiated by getty & login on ttys needed to be explicitly managed by process 1. One can argue that USG/System III people merely extended that model to daemons, too.

One can also argue that BSD simply didn't change what had been inherited from v7 Unix, and then went and did its own thing when TCP/IP (network) sessions over telnet, rlogin, et alia, showed up. You don't have to hang getty off a pty (unlike a tty) to accept a network user session. A good thing, too, but that's where we get inetd(8) from.

Why isn't that inetd stuff in process 1, too? Reasonable fear of code bloat & bugs, I suppose, and a philosophy that process 1 needs to be as simple as possible so that it can be reasonably expected to work properly (after all, if process 1 dies unexpectedly, all kinds of bad bad things happen).

One more important aside that we should consider: "user sessions" now come in more flavors than person pounding on a tty (pty) and a shell (or three): there's FTP logins, IMAP/POP logins, and so on. There have been some attempts at reflecting those in utmp(5) but I don't think anyone has been consistent about it. I think we ought to tie that stuff into the basic authentication libraries, i.e. when a user authenticates for something, if it's going to last more than a second or three (i.e. a user is asking for a "session"), it ought to get an entry in utmp(5) and wtmp(5) so that you can see with who(1) or w(1) the users of the system and what sort of session they're in.

We're NetBSD - it should be easy to see what the Network users are doing in our systems (never mind http or NFS, for now …).

End of "user session" digression - back to daemon management.

NetBSD's rc.d(8) system is great - proper dependency management, and it's easy to manually start, stop, or restart a given daemon or service, but we totally fall down on daemon monitoring - they're expected to "just work" (perfect code!) and if they're important enough, someone will notice and manually restart when they die. Or not.

I've had some problems with that - named(8) likes to die on some of my systems because it's a big, complicated beast, and the Internet now encompasses enough of the world that the totality of all code paths through named are being relatively regularly exercised and bugs discovered quite rapidly in deployment, but not fixed anywhere near fast enough. So, I wrote a little shell script for cron(8) the other day to keep those daemon processes that are polite enough to leave a PID file in /var/run alive, and after testing in my own environment, I posted it to tech-userlevel for those who might also be having the same problems. It's a simple, somewhat hacky patch to a design deficiency in NetBSD.

The right place to deal with all of this is in process 1. It is deemed responsible for startup & shutdown of system, which mode (single user, multi-user) to run in, the secure levels (ugh) and ultimate reaping of all processes, so it "knows" a priori whether a daemon should be running or not and can know whether it is provided the relationship between a daemon (service) and its PID is known. The trick is in expressing in some kind of configuration system what we want in a simple but hopefully sufficiently rich syntax.

However, I don't like either of the two schemes I've seen to date for dealing with the issue. I've already expressed my distaste for inittab(5) as I've seen it (has Linux done something more sensible with it in the last many years?), and I had a look at Apple's OS X "launchd" and I don't like it either - it really wants to be talked to through a control program interface (launchctl, with yet another control language to learn) rather than allowing one to simply edit configuration files.

Worse, neither system has proper dependency management as we have in rc.d(5), and I really, really don't want to lose that.

So, clear statement of the problem: daemons should be started and managed by process 1 because it is in a position to monitor for their death and restart them as necessary, and log those deaths (kern.logsigexit is OK but not really the right thing, and I was the one who ported it from FreeBSD), but we need a configuration system for process 1 that not merely names all the daemons (services) to be started/stopped, but also expresses dependency for both startup and shutdown.

your comments and thoughts are solicited,

Erik <***@netbsd.org>

David Brownlee

2013-12-05 21:04:31 UTC