
Unexpected places you can and can’t use null bytes - ammmir
https://eklitzke.org/unexpected-places-you-can-and-cant-use-null-bytes
======
kstenerud
It seems a bit odd to call it "unexpected" when any C API that accepts a char*
and doesn't include a length parameter is commonly understood to expect a null
terminated string, and any API that has a length parameter likely won't have
this restriction.

~~~
ferzul
you missed the point. it's not that this specific api call can't take embedded
\0; it's that there is no alternative api that allows embedded null. there's
no way to open a file by using a string containing \0, no matter what api you
pick, but you can write data to a file which contains null if you pick the
right function. there's no apriori way to know which apis have this
duplication and which don't.

~~~
kstenerud
I don't follow what you're getting at. If there's no length field, it's going
to stop scanning on the first null. If there is a length field, it will keep
reading until the length is reached. That's the general contract with C APIs.

If you actually NEED strings with nulls in them (although I couldn't think of
a reason why), you'll need to use/find/create APIs with length fields.

~~~
taneq
> If there is a length field, it will keep reading until the length is
> reached.

Well, it _might_. If it ever uses that string internally with some function
that expects a null termination then it'll probably still get truncated.

~~~
kstenerud
Possibly, depending on the API. But then again, strings should not have
embedded NUL characters. There's no good reason for it. You have 32 other
control characters to choose from.

------
cpeterso
Pedantic nit: the ASCII '\0' character is called NUL, not null or NULL. C
strings are NUL-terminated or zero-terminated, not null-terminated.

[https://en.wikipedia.org/wiki/ASCII](https://en.wikipedia.org/wiki/ASCII)

~~~
jfk13
IMO, you're being a bit _too_ pedantic here; it's perfectly reasonable to
refer to a "null byte" or "null character".
[https://en.wikipedia.org/wiki/Null_character](https://en.wikipedia.org/wiki/Null_character)

It's true that "NUL" is the usual abbreviation for this value in character
code charts/standards.

~~~
DonHopkins
I am disappointed this URL does not redirect to that page:

[https://en.wikipedia.org/wiki/%00](https://en.wikipedia.org/wiki/%00)

But what the hell does it do??? In Safari and Firefox, I get an nginx 400 Bad
Request page from en.wikipedia.org. But in Chrome, it seems to be redirecting
to a google search for the same url, when I type it into the address bar.
Well, that's meta. Chrome won't even let me drag-and-drop that icky %00
terminated url from one page into another page to navigate there -- it angrily
rejects it and sadly animates the evil url back to where it came from (though
dragging it into an existing or new tab mysteriously works). But actually
clicking on that link immediately goes to a blank purgatory page with the url
"about:blank#blocked". Those are Chrome's stories, and it's sticking with
them.

At least this works:

[https://en.wikipedia.org/wiki/%01](https://en.wikipedia.org/wiki/%01)

>Special Page

>Bad title

>The requested page title contains an invalid UTF-8 sequence.

>Return to Main Page.

Oh, yeah -- PHP:

[https://webmasters.stackexchange.com/questions/84008/url-
enc...](https://webmasters.stackexchange.com/questions/84008/url-encoded-
query-string-with-embedded-null-00-breaks-on-some-servers)

If not the actual NUL character, then at least the Unicode Symbol for NUL
redirects to the page on the Null character.

[https://en.wikipedia.org/wiki/␀](https://en.wikipedia.org/wiki/␀)

>Null character

>From Wikipedia, the free encyclopedia

>(Redirected from ␀)

>For other uses, see Null symbol.

...But then again, shouldn't the Null symbol ␀ redirect to the page on the
Null symbol, which it actually is, not the page on the Null character, which
it only symbolizes?

[https://en.wikipedia.org/wiki/Null_symbol](https://en.wikipedia.org/wiki/Null_symbol)

~~~
missblit
> But what the hell does it do??? In Safari and Firefox, I get an nginx 400
> Bad Request page from en.wikipedia.org. But in Chrome, it seems to be
> redirecting to a google search for the same url, when I type it into the
> address bar.

Firefox makes a GET request to
"[https://en.wikipedia.org/wiki/%00"](https://en.wikipedia.org/wiki/%00"), the
server returns an HTTP/2 400 Bad Request. Presumably because the web-server
considered the URL invalid.

Chrome decides the string isn't a valid URL up front. So it does what it
normally does when you enter random junk in the address bar; it searches for
it.

The dirty secret of URLs is that no one can quite agree on which ones are
valid or how they should be canonicalized.

We can take WHATWG's spec as a modern way to handle URLs [1]. If I'm reading
it right (50/50 chance!) the URL would be considered valid by that spec.

See also this article from the developer of curl:
[https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-
url/](https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/)

[1] [https://url.spec.whatwg.org/](https://url.spec.whatwg.org/)

~~~
kccqzy
> The dirty secret of URLs is that no one can quite agree on which ones are
> valid or how they should be canonicalized.

Fun story. An engineer was working to migrate an old system from Python 2 to 3
before the Python 2 EOL deadline. The engineer decided to use the str type to
represent URLs. Chaos ensued when suddenly non-UTF-8 URLs don't work any more.
Turns out back when that system was designed, people were directly URL-
encoding binary data into URLs.

------
nneonneo
Unexpected places where you _can_ use null bytes: gets, fgets and scanf("%s").
All three will read and store null bytes into your string from the input, and
keep going: gets and fgets only terminate at a newline character and scanf
only terminates at whitespace (which doesn't include the null byte).

gets and scanf("%s") are also horrifically unsafe. gets is well-known to be
unsafe (to the point where you'll almost certainly get a compiler warning for
using it). However, scanf("%s") is unsafe for exactly the same reason (no
bound on the buffer length) yet will not produce a compiler warning. Add to
the fact that these functions will accept null bytes, and you have a very
dangerous buffer overflow waiting to happen.

~~~
fao_
This is why you _always_ write:

    
    
        if (*s && *s != '\n' ...)
    

and never:

    
    
        if (*s != '\n' ...)

------
msarnoff
One unexpected place where null bytes are acceptable: Wi-Fi SSIDs. That’s one
way to keep people off your network, I suppose.

------
ChrisSD
> While we’re on the topic, it’s worth noting that the only other restriction
> on filenames is that that they cannot contain a /, which is the character
> used to denote directories. Filenames can contain arbitrary other binary
> data, including spaces and newlines, and there’s no defined character
> encoding.

I've seen this _byte_ people when junk gets written to a filename (either
accidentally or maliciously). Especially in shells but also in other
programming languages. Issues that aren't always handled well include file
names that:

* include a newline or some other control characters

* start with a `-`

* aren't valid UTF-8

~~~
heavenlyblue
I have recently accidentally created a folder named “~”. Then I tried deleting
it through shell.

~~~
smichel17
I did similar, once. I had a script where I misquoted and ended up with a
directory starting with ~

It was the only directory starting that way, so I typed "rm -rf ~<TAB><ENTER>"

Hit control-C a half second later, but the damage was done. Fortunately most
of my important files are backed up.

Lesson: when deleting files with tricky names, write the command without flags
first, then add "-rf" after the path is confirmed correct.

~~~
heavenlyblue
I’ve got lucky with some directory that is called .asound or similar. It was
the first one in the home directory and it didn’t manage to go beyond that.

------
cdcarter
Neat article, but the author doesn't provide an actual reason you'd _need_ to
pass a NUL byte into something like a socket address, or command arg.

Is there an (perhaps obvious, or not) common usage of NUL byte literals being
passed around, not for the purpose of terminating strings? Just terrible ye-
olde file formats?

~~~
singron
All kinds of binary data might have a NUL byte. E.g. if you want to write a
NUL byte to a file in the shell, you might try something like

    
    
        echo -n $'\0' > nul
    

This doesn't work for the reason stated in the article. The argument is
instead interpreted as a string ended at the NUL byte and the file will be
empty. BTW you can get around this with printf since it processes escape
sequences internally.

    
    
        printf '\0' > nul

~~~
nneonneo
There's also `echo -ne '\0'`, which works similarly (it tells echo to
interpret the escape sequences).

------
fwsgonzo
You can printf a null byte just fine, you just need to provide the length,
just like with fwrite:

printf("%.*s", (int) nbytes, str);

------
skrebbel
PostgreSQL TEXT fields

