
Curl doesn’t spew binary anymore - okket
https://daniel.haxx.se/blog/2017/06/17/curl-doesnt-spew-binary-anymore/
======
baby
I just looked at the commit and I was surprised at how nice the code is! The
relevant line to my question is:

    
    
      if(isatty && (outs->bytes < 2000) && terminal_binary_ok) {
        if(memchr(buffer, 0, bytes)) {
    

memchr() returns the first occurrence of the byte 0 (your second argument), or
NULL.

So a few things:

* what if your output is more than 2000 bytes?

* what if your output is binary but doesn’t contain a byte 0?

* what if your output is a normal UTF-8 string but contains a byte 0? ( see [https://stackoverflow.com/questions/6907297/can-utf-8-contai...](https://stackoverflow.com/questions/6907297/can-utf-8-contain-zero-byte) )

This is interesting to me as I am developing a tool that parses files in
search for bugs and I need to ignore binary files. What I am doing right now:
when checking a line, if it is not a valid UTF-8 string I skip the file. It's
not really nice as I am doing this verification for every file's lines...

~~~
arghwhat
1\. It is not sensible to check more than a small chunk of data, as it would
result in significant performance penalties, both in high resource consumption
(memory or temporary files) and in blocked pipeline (cURL streams data as it
is received). Imagine `curl
[http://somesite/10GBoftext](http://somesite/10GBoftext) | grep "rarely-
occurring-prefix" | do-something-with-found-data`—with a full scan, curl would
use 10GB of memory, checking every byte before it sent things to grep and do-
something. Without a full scan, do-something processes data live, and curl
uses negligible resources.

2\. Then the check yields a false negative, which is not a problem.

3\. Then your UTF-8 string is unprintable, and the check will yield a true
positive. The UTF-8/ASCII NUL character is not printable, despite being valid.

If one only assumes ASCII/UTF-8/Shift-JIS/similar, then a blob containing a
null byte is guaranteed to be unprintable, while a blob not containing a null
byte may be printable. That's good enough for a warning, telling you that
you're doing something that you might not have intended.

Given that UTF-8 has become standard, it means that you will never
realistically get a false warning, but may still get bonkers output. You can
always overrule if you have a fetish for UTF-32.

~~~
baby
> 1\. It is not sensible to check more than a small chunk of data

You could just check the first X bytes. Also, I'm guessing curl doesn't print
out to the terminal if the data is more than 2000 bytes anyway?

> 2\. Then the check yields a false negative, which is not a problem.

the binary will be printed on the screen, that's a problem

> 3\. Then your UTF-8 string is unprintable, and the check will yield a true
> positive.

How is it that the zero byte is part of UTF-8 then?

~~~
dozzie
>> 1\. It is not sensible to check more than a small chunk of data

> You could just check the first X bytes. Also, I'm guessing curl doesn't
> print out to the terminal if the data is more than 2000 bytes anyway?

Why wouldn't it? 2000 bytes is just 25 lines by 80 characters.

~~~
baby
right, it sounded bad for some reason!

------
TekMol
I think this is a bad idea. Trying to make programs intelligent and have them
behave in unexpected ways usually is.

It will make curl fail if a 0 byte is outputted within the text. There are
many reasons why this will happen. Software errors, UTF-Encoding, special use
cases...

In fact, I run into this problem with grep from time to time and it is super
annoying:

    
    
        > grep somefile.txt stuff
        Binary file somefile.txt matches
    

So this change makes the code base more complex and the behavior fuzzy.

I prefer elegant software that behaves predictably.

~~~
Crontab
I am sympathetic to your comment because it feels like we are babying users.
To me, this is no different than when FreeBSD and GNU modified the rm command
to prevent people from shooting themselves in the foot.

~~~
shawnz
I thought they changed it to prevent buggy scripts from executing rm with
empty variables? That's not necessarily the fault of the user.

------
Retr0spectrum
How can a program determine whether its output is being piped to a
file/program, or the terminal?

Edit: It would appear that the answer is `isatty()`

~~~
LogicX
Came here to say this. I can imagine a lot of people instead of doing -o were
just redirecting output... if this changes the behavior of that, then a lot of
people are not going to be happy when a curl upgrade breaks scripts.

~~~
jacquesm
It won't because 'isatty' returns false for a pipe. So that should not break
anything unless you plan on sending binary output to your terminal which is a
bad idea anyway, but they allow you to override the default as per TFA.

~~~
arghwhat
You _can_ trick isatty, but only if you're very, _very_ set on doing so. It
requires allocating a psuedo-TTY, and attaching it.

cURL already use isatty checks elsewhere, such as to determine if progress
info should be printed (will only show if you're _not_ outputting to tty).

~~~
jacquesm
Sure, but that's _exactly_ what a pseudo-TTY is supposed to do, pretend it is
a TTY when actually it is a pipe so isatty will return true.

So that's not so much tricking isatty as much as using the system in the way
it is intended. If you do that without the required background information
then you have only yourself to blame.

~~~
Asooka
A pipe is different from a pseudo-TTY. Pseudo-TTYs are how all graphical
terminals work, but all shells use simple pipes on redirection and pipe-
composition. TTYs and pseudo-TTYs are supposed to be interactive, with a user
sitting on the other end with a keyboard. Plenty of tools make their output
different based on if they're outputting to a tty or not, e.g. ls and others
do not colourise by default if not outputting to a tty, as colour control
characters are usually not what you want dumped to a text file or piped to
grep. I feel this change brings curl more in-line with other terminal programs
that try to be good citizens.

~~~
jacquesm
Psuedo-TTY's come in pairs, one end for the application to connect to, the
other for whatever is consuming the output or producing input to the
application. So, yes, of course they are not pipes but they are similar in
that you have two end-points. So pipes and pseudo-tty's have quite a bit in
common, thinking of a pseudo-tty pair as a pipe that knows how to deal with a
bunch of terminal specific ioctl calls is a pretty good approximation.

~~~
arghwhat
Considering it a pipe is a _very_ rough approximation, though. The only
purpose of PTYs is to provide access to a TTY with all its bells and whistles
(kernel provided line editing, yay!) in a world without hard terminals.

------
rrdharan
Hooray! I'm gonna go ahead and claim credit for the festure request (though I
never attempted to implement it :)

[https://curl.haxx.se/mail/archive-2014-02/0032.html](https://curl.haxx.se/mail/archive-2014-02/0032.html)

------
kwvarga
There are a number of download urls that are "one-time" use. Shouldn't this at
least download the output to a /tmp file?

~~~
pebers
If you're relying on the previous behaviour of curl on your one-time URL,
you're already in trouble, because it's been printed to a terminal and the
real contents are probably irretrievable.

------
agentgt
I have wondered if there are security hazards to piping binary to your
terminal.

I guess what really bad things can happen (and no I'm not talking about
blindly piping to a shell to eval)?

Sure it messes up the terminal of which I usually type clear and/or reset. If
it's really bad I usually just kill the terminal.

~~~
tyingq
There used to be some terminal emulators that supported "screen dump", where
the scrollback buffer would be written to an arbitrary named file.

echo -e "\ec\n\e]55;/tmp/ouch.php\a"

I believe they've all removed that functionality by now, but there might be
some older copies around here and there.

~~~
eigengrau
IIRC, another angle that has been proposed is to switch between the primary
and secondary terminal buffers. This way, one could create a file that looks
harmless when piped to the terminal but contains a hidden, malicious payload.
The victim would, after manual inspection, finally pipe the file to a shell,
where the payload would do something evil.

~~~
joombaga
> The victim would, after manual inspection, finally pipe the file to a shell

Why would anyone do this? If I see some unknown, interesting file, I might run
cat, head, tail, less or vim on it. If it's binary then maybe I'll use xxd.
But it wouldn't even occur for me to pipe it to a shell.

~~~
lathiat
As people tend to re-use shell commands, editing them as they go, I can
totally see someone doing like curl XX | less and then curl XX | sh or
something. If the download is relatively speedy.

You could argue this is one of the reasons less doesn't interpret control
codes by default. As it would let applications hide stuff like that to redraw
the screen.

------
teddyh
Checking for the NUL byte seems like a very arbitrary test. Shouldn’t it be
using isprint_l()? Isn’t that, in fact, what isprint_l() is _for_?

------
jkbr
Great feature. HTTPie, a CLI HTTP client I created, has always had a similar
one: No binary data in terminal output, and on top of that, you get to see the
response headers by default as well.

[https://httpie.org/doc#binary-data](https://httpie.org/doc#binary-data)

------
aaron_m04
Why not check for ESC (0x1b, ^[) bytes instead? Without those, you can't mess
up your terminal that badly.

------
mlgh
Now you can annoy curl users by putting null byte in the beginning of the
document.

------
yaronn01
There are some valid use cases here, I hope this will not affect users of wopr
([https://github.com/yaronn/wopr](https://github.com/yaronn/wopr)).

------
x0
Is there going to be a ~/.curlrc option so I can turn this off? Else I suppose
I can add -o- to my curl alias.

~~~
dullgiulio
You could simply write a wrapper that reads an options file.

In general, it's a bad idea to have a hidden state such as a configuration
file. Imagine, for examples, scripts that expect the standard curl behaviour,
but your configuration file changes that.

(Of course setting shell aliases has the same problem...)

~~~
slrz
Shell aliases do _not_ have that problem as they're generally used with
interactive shells only. Non-interactive shells tend to read a different set
of configuration files for that reason.

------
vacri
I'd prefer they stop curl from detecting a pipe on STDOUT and assuming that
the user must want download progress garbage. Why would I want it in a pipe
any more than I'd want it standalone? Inconsistent behaviour violates POLS.

~~~
teddyh
The GNU Coding Standards adresses this problem:

“ _please don’t make the behavior of a command-line program depend on the type
of output device it gets as standard output or standard input._

[…]

 _There is an exception for programs whose output in certain cases is binary
data. Sending such output to a terminal is useless and can cause trouble. If
such a program normally sends its output to stdout, it should detect, in these
cases, when the output is a terminal and give an error message instead. The -f
option should override this exception, thus permitting the output to go to the
terminal._ ”¹

1\. [https://www.gnu.org/prep/standards/standards.html#User-
Inter...](https://www.gnu.org/prep/standards/standards.html#User-Interfaces)

~~~
vacri
I don't know about the version of curl you use, but mine doesn't give download
progress in binary format, and doesn't contain any content of the target
you're curling.

    
    
        $ curl news.ycombinator.com > /dev/null
          % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                         Dload  Upload   Total   Spent    Left  Speed
        100   178    0   178    0     0    369      0 --:--:-- --:--:-- --:--:--   370
    

You have to add an extra arg to get rid of that progress meter.

~~~
joombaga
The behavior doesn't exactly switch on pipe detection though. The progress bar
consistently shows when outputting to a file.

    
    
      $ curl news.ycombinator.com -o /dev/null
        % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                       Dload  Upload   Total   Spent    Left  Speed
      100   178    0   178    0     0    948      0 --:--:-- --:--:-- --:--:--   951
    

I mean, it's effectively doing tty detection, but I think it's fair to argue a
semantic difference, and it's very similar to how ls strips colors and
disables listing format when piped, which is mentioned by the next paragraph
in the coding standards.

 _Compatibility requires certain programs to depend on the type of output
device. It would be disastrous if ls or sh did not do so in the way all users
expect. In some of these cases, we supplement the program with a preferred
alternate version that does not depend on the output device type. For example,
we provide a dir program much like ls except that its default output format is
always multi-column format._

------
est
should have used `-O - ` so the options is consistent with wget

~~~
heinrich5991
`-O` is already used. `curl -O <URL>` is roughly equivalent to `wget <URL>`.

------
dstroot
the fact that cURL is hosted on haxx.se always throws me. It makes me
immediately associate haxx === hacking and I feel uncomfortable. This is
honestly why I use other tools instead even though it's more an unconscious
decision than a conscious one.

~~~
kam
Yet you post on a site called Hacker News.

~~~
dstroot
Ummm. You nailed me there.

~~~
lathiat
And people wonder why integrity of the "news" media is important xD

