Pipe to Multiple Commands with Bash
Sometimes it’s handy to filter the output of a command (with grep
, for
example) while still having the column names (the first line) available. How do
we go about that?
One example of this would be when filtering the output of ps
. What does the
following output mean, after all?
$ ps -ef | grep logd
0 124 1 0 7:45AM ?? 0:01.46 /usr/sbin/syslogd
0 146 1 0 7:45AM ?? 0:09.44 /usr/libexec/logd
1000 17263 14631 0 10:36AM ttys010 0:00.01 grep --color=auto logd
It would be much more readable if the output would be:
UID PID PPID C STIME TTY TIME CMD
0 124 1 0 7:45AM ?? 0:01.46 /usr/sbin/syslogd
0 146 1 0 7:45AM ?? 0:09.44 /usr/libexec/logd
1000 17263 14631 0 10:36AM ttys010 0:00.01 grep --color=auto logd
Now, the correct answer to this is:
$ ps -ef | tee >(sed -n 1p) >(grep logd) >/dev/null
Here, tee
takes a variable number of files to write the same output to. We use
process substitution (see man bash
) to make tee
write to the named pipes,
which are a kind of file. tee
also prints its input to standard out, and we
suppress that by redirecting it to /dev/null
.
Because the list of commands inside process substitution run asynchronously, to
ensure sed -n 1p
runs before grep logd
, we could insert sleep 0.1s
before
running grep logd
. That should give sed -n 1p
enough time to print its line
and will make the output more reliable.
Interestingly, >(head -1)
instead of >(sed -n 1p)
would not work under
Linux, because head
closes its stdin as soon as it has read enough bytes to
print the first line. At that point, tee
quits with an exit status of 141 (128
+ 13 for SIGPIPE
– see man bash
and grep for “Simple Commands” for an
explanation). Instead, we use a command which is guaranteed to read the entire
output tee
feeds it.
In the rest of this post I will look at another “solution” to this problem that uses group commands, and show why it is incorrect.
The Wrong Answer
I read the comments below the answer on ServerFault on “How to grep ps output
with headers” which suggested we
could use ps -ef | { head -1; grep logd; }
to solve this problem
(incidentally, { cmd; cmd; }
is called a “group command”). If we wouldn’t
know from our previous run what the correct output would look like, this would
seem to work:
$ ps -ef | { head -1; grep logd; }
UID PID PPID C STIME TTY TIME CMD
0 146 1 0 7:45AM ?? 0:09.48 /usr/libexec/logd
But notice two of the three processes are missing. What gives?
The key is understanding that cmd1 | { cmd2; cmd3; }
does not provide a copy
of the standard output of cmd1
to cmd2
and cmd3
(so that both cmd2
and
cmd3
would be working with the same input). Rather, the input is shared:
whatever is consumed by cmd2
is no longer available for cmd3
.
We can prove this very easily. If we substitute cmd2
with something that
consumes all input, such as grep
(which has to consume every line in the input
looking for a match), the input for the next command should be empty:
$ printf '%s\n' foo bar baz | { grep nomatch; wc -c; }
0
Indeed, we see that wc -c
has zero bytes to work with.
OK, you might think, but head -1
only consumes the first line (up until the
first newline) of the input, right? So why exactly doesn’t ps -ef | { head -1; grep logd; }
work? As it turns out, head
reads more than just up to the
first newline:
$ printf '%s\n' foo bar baz | { head -1; wc -c; }
0
How much does it read by default (I’ve tested with head
from GNU coreutils 8.32
)?
$ < /dev/urandom tr -d -c '[[:alnum:]]' | head -c 10000 > urandom.txt
$ cat urandom.txt | { head -1 >/dev/null; wc -c; }
8976
$ echo '10000 - 8976' | bc
1024
It reads the first 1024 bytes that are no longer available to any other commands that are following. This means that you shouldn’t rely on group commands to work with a single input.