Hacker News new | past | comments | ask | show | jobs | submit login
How stdbuf works (hmarr.com)
133 points by todsacerdoti on Nov 11, 2022 | hide | past | favorite | 29 comments



This was a fun read! It started out "TIL! Where has this `stdbuf` been all my life?", but then turned to "jfc, if that's how it works, no wonder I've never heard of this".


djb had a similar tool called 'nobuf', released as part of his ptyget package [0] way back on comp.sources.unix during the usenet era

[0] http://jdebp.info/Softwares/djbwares/bernstein-ptyget.html


How I think it should work: associate with each file descriptor (or should I say file handle / struct file) a “buffering” state, initially blank, with IOCTLs to get/set it. In the stdio implementation, call the get IOCTL at startup on stdin/stdout/stderr, and also on any fdopen. If get IOCTL returns non-blank buffering settings, stdio adopts them for the FILE* pointing to that fd.

Advantages: No mucking around with LD_PRELOAD and associated security-related restrictions. Even works for static binaries, so long as they are linked with a stdio which supports this feature (and running under a kernel with that IOCTL).

(Actually, I was thinking you could even have an IOCTL to get/set arbitrary metadata attributes on a file handle - unlike xattrs, this would only be for each opening of the file, not the file on disk. As well as controlling stdio buffering, one can imagine other potential uses - for example, content type negotiation in pipelines.)


I recently integrated stdbuf logic in an embedded product. Basically same logic: set the LD_PRELOAD environment variable, taking care to extend LD_PRELOAD if it already has a value.

This was part of an effort to redirect the output of sub-programs such as external command executions and daemons into the application's logging system (fork and capture stdout). Problem is, some things things redirected to a pipe will fully buffer their output so you don't get the output in a timely way.

I noticed that the target system has a libstdbuf.so (even though its rootfs image lacks the stdbuf program) and the rest was history.


Total tangent: What is the magic on that page that renders:

    if (putenv (var) != 0)
as

    if (putenv (var) ≠ 0)
on my browser (Firefox)?

When I try to copy that text, it copies as '!=' into my copypaste buffer.


They're using a webfont with a ligature for !=.


(Author here) I don't see that in Chrome so had no idea that was happening – thanks for pointing it out! I'm not a fan of coding ligatures so I just pushed a change to turn it off.


Sheeet, I didn't even know that was possible. Why would they use that font for displaying C code? I looked at that ≠ and spent a few minutes trying to figure out what language it was. Maybe I should uncheck the "Allow pages to choose their own fonts, instead of your selections above" option in Firefox, though it woulld probably break other websites.


FYI Fira Code is one example of a very popular programming font that supports ligatures (67k stars on GitHub) [1]. But totally get the frustration of seeing ligatures on the web and not understanding what was going on!

[1] https://github.com/tonsky/FiraCode


>Why would they use that font for displaying C code?

Using ligature fonts for programming became popular a few years ago. As in some people even use them in their editors.


Lately my Firefox has been showing a ligature for "fi" in monospace code blocks, showing those two characters in the width of a single character. On every website with code blocks. It is mildly infuriating and I don't understand it.


Right-click the ligature text -> Inpsect. In the Inspector, on the right there is a tab called "Fonts" (to the right of "Rules", "Layout", etc). That will show you what font(s) Firefox used to render all the text in the selected element.


I used Stylus to force ligatures off at some point, it mostly worked but caused some other issues.


Which is totally fine on their own machine, but using a webfont to force ligatures on anyone viewing their website is just confusing as a reader.


No, I get it. I don't use them either because I find them confusing. I'm just telling you you missed out on what was a quite popular trend from a few years ago.


> though it woulld probably break other websites.

It only breaks sites that are too poorly developed to respect the foundational principles of the web.


It breaks Google Calendar badly, almost unusable due to missing icons and spacing problems.


On Safari the 'notequal' is double-width, and you can select just half of it, which I didn't expect.


It's double wide on Firefox as well, and I could select half of it as well. I had to use a single "Unicode Character 'NOT EQUAL TO' (U+2260)" in my post on HN to describe what it looked like on my browser.


Those kinds of font ligatures are still separate characters.


I wonder if it could add support for pretending to be a tty for the benefit of programs where the current approach doesn't work (ex. static binaries).


That’s what unbuffer does, as mentioned in the article.


I was recently working with a bunch of pseudo-terminal stuff, so I was curious about how unbuffer works. One thing that stuck out to me in the manpages is this:

    Caveats
    
    unbuffer -p may appear to work incorrectly if a process feeding input to unbuffer exits. Consider:
    
    process1 | unbuffer -p process2 | process3
    
    If process1 exits, process2 may not yet have finished. It is impossible for unbuffer to know long 
    to wait for process2 and process2 may not ever finish, for example, if it is a filter. For expediency, 
    unbuffer simply exits when it encounters an EOF from either its input or process2. 
Why does unbuffer care whether or not process2 exits? Can't it receive the EOF from process1, send it to process2, and then wait for process2 to exit? If process2 doesn't exit after receiving an EOF from stdin, then what's the issue about simply letting the pipeline hang? Isn't that what would happen without unbuffer?


from the expect FAQ, of which unbuffer is a part:

  #43. Why doesn't Expect kill telnet (or other programs) sometimes?
  
  From: libes (Don Libes)
  To: Karl.Sierka@Labyrinth.COM
  Subject: Re: need help running telnet Expect script from cron on sunos 4.1.3
  
  karl.sierka@labyrinth.com writes:
  >       The only problem I am still having with the script I wrote is that
  >    the telnet does not seem to die on it's own, unless I turn on debugging.
  
  Actually, Expect doesn't explicitly kill processes at all.  Generally,
  processes kill themselves after reading EOF on input.  So it just seems
  like Expect kills all of its children.
  
  >    I was forced to save the pid of the spawned telnet, and kill it with an
  >    'exec kill $pid' in a proc that is hopefully called before the script
  >    exits. This seems to work fine, but it makes me nervous since omnet
  >    charges for connect time, and leaving a hung telnet lying around could
  >    get expensive. I warned the rest of the staff so that they will also be
  >    on the lookout for any possible hung telnets to omnet.
  
  The problem is that telnet is not recognizing EOF.  (This is quite
  understandable since real users can't actually generate one from the
  telnet user interface.)  The solution is to either 1) explicitly drive
  telnet to kill itself (i.e., a graceful logout) followed by "expect
  eof" or 2) "exec kill" as you are doing.
  
  This is described further in Exploring Expect beginning on page 103.
  
  Don


Thanks, that's is interesting background. From my pty experience, though, the answer is a bit unsatisfying. One reason is that I thought ^D generated an EOF in the terminal. For instance, if I use `wc` in the terminal, it reads from STDIN until I use ^D and then I get the output. Another reason is that it seems like a bug in the program to not respond to EOF. At the very least, unbuffer could offer an option? Anyway, I digress...

The PTY interface in general feels a bit lacking. Normally, if you use a pipe, you can close to write end of the pipe to generate an EOF. With pseudo-terminals, you only have one file descriptor for writing input and reading output so you can't have the system generate an EOF without disabling your ability to read the output.


There should be an IOCTL to trigger an EOF condition on a PTY (even if it is in raw mode). So (once the input buffer is empty), read() calls on the client side will just return 0 instead of blocking.

That requires code changes to the Linux (or whatever other OS you are using) kernel. An approach without any kernel code changes, for Linux, would be to use CUSE to implement a user-space character device. (FreeBSD has a CUSE too, although apparently its implementation is not very compatible with the Linux one). Note you can't actually create a custom tty using CUSE – it doesn't support registering its devices with the tty subsystem – but you can create a device which implements (yourself) enough of the TTY IOCTLs to trick other programs into thinking it is a TTY. (isatty() calls tcgetattr() which calls ioctl(TCGETS), so implementing that IOCTL is enough to convince most programs you are a TTY.)


^D only does that in cooked input mode. In raw mode ^D send ^D, and it's up to the program to interpret that, or not. This is essential in, eg, telnet, since it wants to be able to open a session on a remote host that behaves exactly like a session on a local host, including its behaviour in response to ^D.


Anyone else misread "How Stuff Works"? No? Just me? Alright then...


Great article!

You can also use std::flush to get this behavior by default in the program.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: