Pipe to Multiple Commands with Bash

Sometimes it’s handy to filter the output of a command (with grep, for example) while still having the column names (the first line) available. How do we go about that?

One example of this would be when filtering the output of ps. What does the following output mean, after all?

$ ps -ef | grep logd
    0   124     1   0  7:45AM ??         0:01.46 /usr/sbin/syslogd
    0   146     1   0  7:45AM ??         0:09.44 /usr/libexec/logd
 1000 17263 14631   0 10:36AM ttys010    0:00.01 grep --color=auto logd

It would be much more readable if the output would be:

 UID   PID  PPID   C STIME   TTY           TIME CMD
   0   124     1   0  7:45AM ??         0:01.46 /usr/sbin/syslogd
   0   146     1   0  7:45AM ??         0:09.44 /usr/libexec/logd
1000 17263 14631   0 10:36AM ttys010    0:00.01 grep --color=auto logd

Now, the correct answer to this is:

$ ps -ef | tee >(sed -n 1p) >(grep logd) >/dev/null

Here, tee takes a variable number of files to write the same output to. We use process substitution (see man bash) to make tee write to the named pipes, which are a kind of file. tee also prints its input to standard out, and we suppress that by redirecting it to /dev/null.

Because the list of commands inside process substitution run asynchronously, to ensure sed -n 1p runs before grep logd, we could insert sleep 0.1s before running grep logd. That should give sed -n 1p enough time to print its line and will make the output more reliable.

Interestingly, >(head -1) instead of >(sed -n 1p) would not work under Linux, because head closes its stdin as soon as it has read enough bytes to print the first line. At that point, tee quits with an exit status of 141 (128 + 13 for SIGPIPE – see man bash and grep for “Simple Commands” for an explanation). Instead, we use a command which is guaranteed to read the entire output tee feeds it.

In the rest of this post I will look at another “solution” to this problem that uses group commands, and show why it is incorrect.

The Wrong Answer

I read the comments below the answer on ServerFault on “How to grep ps output with headers” which suggested we could use ps -ef | { head -1; grep logd; } to solve this problem (incidentally, { cmd; cmd; } is called a “group command”). If we wouldn’t know from our previous run what the correct output would look like, this would seem to work:

$ ps -ef | { head -1; grep logd; }
  UID   PID  PPID   C STIME   TTY           TIME CMD
    0   146     1   0  7:45AM ??         0:09.48 /usr/libexec/logd

But notice two of the three processes are missing. What gives?

The key is understanding that cmd1 | { cmd2; cmd3; } does not provide a copy of the standard output of cmd1 to cmd2 and cmd3 (so that both cmd2 and cmd3 would be working with the same input). Rather, the input is shared: whatever is consumed by cmd2 is no longer available for cmd3.

We can prove this very easily. If we substitute cmd2 with something that consumes all input, such as grep (which has to consume every line in the input looking for a match), the input for the next command should be empty:

$ printf '%s\n' foo bar baz | { grep nomatch; wc -c; }
0

Indeed, we see that wc -c has zero bytes to work with.

OK, you might think, but head -1 only consumes the first line (up until the first newline) of the input, right? So why exactly doesn’t ps -ef | { head -1; grep logd; } work? As it turns out, head reads more than just up to the first newline:

$ printf '%s\n' foo bar baz | { head -1; wc -c; }
0

How much does it read by default (I’ve tested with head from GNU coreutils 8.32)?

$ < /dev/urandom tr -d -c '[[:alnum:]]' | head -c 10000 > urandom.txt
$ cat urandom.txt | { head -1 >/dev/null; wc -c; }
8976
$ echo '10000 - 8976' | bc
1024

It reads the first 1024 bytes that are no longer available to any other commands that are following. This means that you shouldn’t rely on group commands to work with a single input.