crawshaw - 2016-01-07

2016-01-07

Over the holidays I took some real vacation and finally got stuck back into my hobby project. As part of it I implemented a Unix shell with job control. It turns out a shell is a reasonably easy introduction to some of the crustier parts of Unix I've been avoiding for a while now. It lets you get a sense of TTYs without dealing with the really terrible parts.

(If you ever wanted to convince yourself that Unix is due for a replacement, take a look at how much kernel API surface is dedicated to terminals. It is a mess that can only be justified by historical argument.)

Here is a taste of some of the fun of shells, for those of you who find such things fun:

When you start a process in an interactive shell, the process gets a new process group id, and that pgid is brought to the foreground of the current terminal session. (See tcsetpgrp(3) for more details.) From here on out the terminal delivers signals to the new process group.

So far so good.

Now, what happens when you start a pipeline? Executing 'echo hello | rev' starts two processes, where do the signals go? That is why the signals are delivered to a process group: the shell starts echo, creates a new pgid, and then starts rev and gives it the same pgid. Easy enough, and I implemented similarly to what you will find in the GNU libc manual.

Except it sometimes did not work.

Turns out what was happening was that 'echo hello' is so short-lived, and its output tiny enough to fit into the kernel buffer, that it would exit before my shell had a chance to assign its pgid to 'rev'. By the time I got there, the pgid was invalid.

Digging around in the bash sources revealed that I was not the first to deal with this, jobs.c:200:

"Surely I spoke of things I did not
understand, things too wonderful
for me to know."

Oops, wrong Job. Try again:

/* Pipes which each shell uses to communicate
   with the process group leader until all of
   the processes in a pipeline have been
   started.  Then the process leader is
   allowed to continue. */
int pgrp_pipe[2] = { -1, -1 };

Bash plays with this pipe between fork and exec for 'echo'. The process is forked, then blocked on pgrp_pipe until the parent bash has forked all the subsequent processes in the pipeline, then it closes it and lets echo continue.

This was a bit of a novelty for me. I don't spend much time programming between fork and exec. It also turned out to be an unpleasant trick to replicate exactly in Go. (I am using the convenient fork/exec wrapper, syscall.StartProcess, and to modify it I have to copy a whole lot of OS-specific code.) So instead I create a dummy process for the duration of the pipeline initialization to pin the pgid for me.

The stuff inside our computers never ceases to amaze and worry me.