
Colorizing Stderr: racing pipes, and libc monkey-patching - amasad
https://repl.it/site/blog/stderr?hn=1
======
woodruffw
I came across stderred a few months ago while looking to do the same thing on
my terminals. I ultimately didn't go with it because people have reported
destroying their systems while using it[1]. Wrapping libc (which is itself
wrapping the actual syscall) is a tricky thing to do safely.

I looked into accomplishing the same thing with URxvt's pre- and post-text
rendering hooks, but couldn't get it to work.

[1]:
[https://github.com/sickill/stderred/issues/63](https://github.com/sickill/stderred/issues/63)

~~~
voltagex_
Disappointed in the response over at Ubuntu:
[https://bugs.launchpad.net/ubuntu/+source/initramfs-
tools/+b...](https://bugs.launchpad.net/ubuntu/+source/initramfs-
tools/+bug/1729836)

IMO this should be treated as a bug in initramfs-tools.

~~~
saagarjha
This isn't an initramfs-tools bug–if you're LD_PRELOADing into a binary,
presumably you know what you're doing. The "bug" here is setting LD_PRELOAD as
root (if accidentally through sudo -s) and running a sensitive operation,
knowing full well that stderred is a (admittedly very useful, since I use it!)
hack at best.

~~~
JdeBP
I suggest that it actually _is_ an initramfs-tools bug. As pointed out
elsewhere in this very discussion, the program in question must be using file
descriptor #2 as its output file descriptor for the files that get corrupted,
because stderred only inserts its hardwired control sequences on writes to
that file descriptor. That's a bad thing to be doing _independent of stderred_
, because there is potentially other (library) code in the relevant program
(whatever it is) that assumes that file descriptor #2 is usable for
diagnostics. Even without stderred in the mix, it is a bug waiting to be
triggered via another route.

------
Per_Bothner
This works better if there are distinct escape sequences to mark the start and
end of stderr output, as in DomTerm ([http://domterm.org/Wire-byte-
protocol.html#Special-sequences...](http://domterm.org/Wire-byte-
protocol.html#Special-sequences-sent-by-back-end-and-handled-by-DomTerm)). I
created a fork
([https://github.com/PerBothner/stderred](https://github.com/PerBothner/stderred))
of stderrred: If DOMTERM is set, then it uses the DomTerm-specific escape
sequences. This has two advantages:

\- If the error message has ANSI escape sequences (like gcc does), then you
get an ugly initially-red output line. \- You can do fancier styling with CSS,
since the DomTerm escape sequence creates an element explicitly marked as
error output, which is more semantically meaningful than "red".

~~~
basicer
Neat! We were actually considering doing something like this with custom CSI
sequences. Glad to see someone else has already picked some sensible ones.

------
tyingq
Bash has process substitution. It supports sending stderr to a process. So...

$ somecmd 2> >(someutility)

Should do roughly the same thing. To highlight stderr with red color:

$ somecmd 2> >(perl -ne '$|++;print "\033[31m$_\033[00m"')

I assume you could roll that into a bash function or script easily enough.

Here's another approach too:
[https://github.com/dmoulding/hilite/blob/master/hilite.c](https://github.com/dmoulding/hilite/blob/master/hilite.c)

~~~
amasad
Unfortunately, as explained in the post, both approaches -- which are similar
to red.c in the post -- suffer from the out-of-order issue.

~~~
foxhill
could you prepend all messages with a timestamp (inc stdout), then reorder in
your post-processing?

~~~
amasad
No, because we're running user programs we don't have control over and we're
trying to avoid doing any magic in each language environment (which was the
original approach that we decided to move away from).

~~~
foxhill
i meant in a similar approach to capturing the output of each fd - i.e outside
of the user’s programs

~~~
amasad
We're back to square one. If it's outside the program you've already lost
precision.

~~~
laumars
Depends on the shell. The shell I’ve been writing actually does the red
highlighting of STDERR output natively without needing to monkey patch libc
and still keeps the order of precision.

Ironically it’s one feature I’ve never advertised because I thought it would
irritate other users besides myself. I had no idea there was a demand for that
kind of feature.

~~~
amasad
So what's your secret?

~~~
laumars
Ostensibly it’s the same as the Bash example above but because it’s baked into
the shell itself I’m able to add a very minimal wrapper around the shells PTY
to append some ANSI escape sequences.

As an aside, I also started work on a feature in the shell to strip all SGR
escape sequences (colour and text formatting) for people who prefer their
shell output text only. But that proved a little more problematic because it
proved impossible to reliably differentiate between processes that expect a
TTY (eg top, lynx, etc) and thus would need to bypass the SGR stripping
wrapper, from the processes that don't.

------
jakobegger
In principle, there's no way to fix the interleaving issue. But maybe you can
minimise it by reading both from a single process, without monkey patching
libc.

The first solution (with the JSON adaptor) is the best solution, the only
problem is the implementation (because it doesn't make sure that everything
that goes to stdout/stderr goes through the JSON adaptors)

You could fix this problem by using a little C program to do the JSON
wrapping.

\- The C program would start by setting up two pipes, let's call them
pipe_stdout and pipe_stderr.

\- Then the C program forks

\- The child replaces file descriptors 1 & 2 with the write end of the pipes
and closes the read ends.

\- The parent process closes file descriptor 0, closes the write end of the
pipes, and calls select() on the read end of the pipes

\- The child process calls execve to start the interpreter. All output to
stdout/stderr now goes into the pipes to the parent process.

\- the parent process reads data from the pipes, wraps it in JSON, and sends
it to stdout.

Is there a reason why you didn't go this route? This way you should have
minimal interleaving, and there's no way the interpreter writes data directly
to stdout breaking the JSON stream.

~~~
basicer
We tried something like this. Minimal was still confusing a non-trivial
percentage of the time. Testing locally it was around 0.9%, but in production
under docker where execution is constrained to a single core, it seemed to
happen much more often.

It is really confusing to the user to have the prompt in the middle of their
output.

~~~
jakobegger
Interesting. I guess the specific issue with the prompt could be fixed by
always printing data from the stderr pipe first, but then you probably get
incorrect ordering in other cases...

------
catern
Why don't you just pass different pipes for stdout and stderr? Then you can
treat the two differently, you can do whatever tricks you want to prevent
interleaving, you don't have to inject anything...

~~~
amasad
> whatever tricks you want to prevent interleaving

What tricks do you suggest to prevent interleaving? It's not clear it's
possible at all.

~~~
catern
Just line buffer what you read from stdout/stderr. Or even display them in
separate sections in your UI.

There's no guarantee of ordering between stdout/stderr, so line buffering
them, or displaying them in separate UI sections, or even doing some non-
blocking reading to flush one fully before flushing the other, should be
sufficient.

~~~
amasad
> Or even display them in separate sections in your UI.

That's a non-starter. We think barring a great UX improvement, the environment
should look as predictable and as close to a local setup as possible.

~~~
catern
Then line buffer stdout and stderr to the same UI element, and also provide an
option to toggle showing just one or the other. Sounds pretty useful to be
able to switch between just showing one or the other, and showing them line-
interleaved by timestamp, even if you do want to default to the latter.

If line buffering isn't enough, you can use heuristics around read sizes and
non-blocking reading to guess whether a given write to the pipe was intended
to be done as a block.

------
foxhill
oh please no. please don’t just simply set all stderr to red - it doesn’t help
at all, and makes reading multi-line things near impossible (there is a reason
why most image/video codecs use 1/2 as many bits to encode red colour data).

besides, what happens to colourisation of output from stderr when a process
gives useful color output?

UX is not really my remit (you could maybe guess :) but i’d _greatly_ prefer
bold, or stdout as grey, stderr as white - only if no colour control codes are
present in the written message.

still, kudos for monkey patching libc :)

~~~
basicer
You can actually set an environmental variable with any color code you want:
STDERRED_ESC_CODE. As for color codes in stderr, they work in most cases. If
all the color codes are in a single write(), it will more or less be
unaffected by the extra red and reset color codes that get stuck on either
end.

------
gourlaysama
This is probably a stupid question, but doesn't the PTY itself already have
that out-of-order problem? I mean that it reads both file descriptors and does
interleave them in the end. If it didn't they could still colorize things
after the PTY.

So I guess my question is, why can't whatever it does to handle the ordering
and merge the two be replicated in a small program that reads two names pipes
and does the same but with extra colorizing?

~~~
basicer
Not normally, no. The file descriptors sort of act like pointers. During
normal execution in a pty, stdrr and stdout both point to the same device with
the same buffer. Nothing is handling the order because there is only one
buffer being written to.

~~~
gourlaysama
> Nothing is handling the order because there is only one buffer being written
> to.

Ah, that's what I was missing. Thank you, that makes sense.

------
JdeBP
First: Standard error is not just for formatted error messages. It is ironic
that this is based upon dealing with REPLs when Unix shells, one of which is
even available on the WWW site, _use standard error for their prompts and for
their interactive line editors_. Users do like to put control sequences in
their prompts.

* [https://unix.stackexchange.com/a/434839/5132](https://unix.stackexchange.com/a/434839/5132)

Second: Even as an example that read() implementation is exceedingly bad. It
is stack frame corruption waiting to happen.

Third: This is another example of hardwiring control sequences for a
particular class of terminal and not respecting TERM=dumb. It probably won't
be a problem for the WWW site, but it will for the underlying general-purpose
tool.

------
vbernat
Small nitpick, LD_PRELOAD will override the write() function from the libc,
which is a wrapper around the system call, not the write system call itself.

~~~
bdarnell
This is significant for programs written in Go, which tends to make system
calls directly instead of using the libc wrapper functions. LD_PRELOAD
trickery often doesn't work with Go programs for this reason.

~~~
saagarjha
On macOS, Go just straight up prevents the use of DYLD_INSERT_LIBRARIES (the
equivalent of LD_PRELOAD), because macOS runs dyld and this mucks with some
sort of threading setup.

------
tyingq
Might be nice to splice this kind of functionality into tmux, probably right
about here:
[https://github.com/tmux/tmux/blob/486ce9b09855ae30a2bf5e576c...](https://github.com/tmux/tmux/blob/486ce9b09855ae30a2bf5e576cb6f7ad37792699/client.c#L615)

Wouldn't need any LD_PRELOAD tricks...

------
Rjevski
Great post, but I'm not sure running a monkey-patched libc is a "small"
sacrifice. I would be concerned this leads to very obscure behaviour down the
line, especially when your main purpose is to be a REPL and run the code
_correctly_ without any "surprises" like these.

~~~
amasad
That's a valid concern. However, given our scale, we find those obscure
behaviors, if they exist (like interleaving output), pretty quickly.

We're keeping an eye out in the beta. You can try it by going to your account
and adding the explorer role.

------
dickeytk
stderr is not for errors! (at least, not necessarily) It's for messaging
things to the user. These are often errors, but there are some examples like
curl that use stderr for progress.

Leave it up to the CLI to decide what colors to use.

~~~
lbotos
why does curl used stderr over say stdout? (and happy to read any links you
have, trying to learn)

~~~
black-tea
Because stdout is used for the output of the program, ie. the stuff downloaded
from the URL.

------
unilynx
I don't see out-of-order issues when using interactive SSH, and SSH still
funnels stdout and stderr over a single TCP connection (keeping them separate
at the client's side)

Seems to me that you could just look at how SSH does it ?

~~~
basicer
SSH only separates the streams when you don't request a PTY. For example try:

    
    
      ssh localhost -t 'echo hi 1>&2'  2>test
    

vs

    
    
      ssh localhost 'echo hi 1>&2'  2>test

------
bdavis__
all this complexity for what? seems like a lot for a little.

~~~
amasad
We care about user experience enough to justify the cost. Maybe some
experienced hackers might not mind undifferentiated colors, novices surely do
and since Repl.it is increasingly a place where a lot of people start their
coding journey, this is important.

~~~
AnIdiotOnTheNet
If you care about the user experience that much, then you should do as others
have suggested and just separate the output into two different text areas. If
you're concerned about your UI matching the local behavior then it just
shouldn't be colorized at all. Dirty hack solutions are incompatible with both
goals.

