Hacker News new | past | comments | ask | show | jobs | submit login

Pedantic nit: the ASCII '\0' character is called NUL, not null or NULL. C strings are NUL-terminated or zero-terminated, not null-terminated.

https://en.wikipedia.org/wiki/ASCII






IMO, you're being a bit too pedantic here; it's perfectly reasonable to refer to a "null byte" or "null character". https://en.wikipedia.org/wiki/Null_character

It's true that "NUL" is the usual abbreviation for this value in character code charts/standards.


I am disappointed this URL does not redirect to that page:

https://en.wikipedia.org/wiki/%00

But what the hell does it do??? In Safari and Firefox, I get an nginx 400 Bad Request page from en.wikipedia.org. But in Chrome, it seems to be redirecting to a google search for the same url, when I type it into the address bar. Well, that's meta. Chrome won't even let me drag-and-drop that icky %00 terminated url from one page into another page to navigate there -- it angrily rejects it and sadly animates the evil url back to where it came from (though dragging it into an existing or new tab mysteriously works). But actually clicking on that link immediately goes to a blank purgatory page with the url "about:blank#blocked". Those are Chrome's stories, and it's sticking with them.

At least this works:

https://en.wikipedia.org/wiki/%01

>Special Page

>Bad title

>The requested page title contains an invalid UTF-8 sequence.

>Return to Main Page.

Oh, yeah -- PHP:

https://webmasters.stackexchange.com/questions/84008/url-enc...

If not the actual NUL character, then at least the Unicode Symbol for NUL redirects to the page on the Null character.

https://en.wikipedia.org/wiki/␀

>Null character

>From Wikipedia, the free encyclopedia

>(Redirected from ␀)

>For other uses, see Null symbol.

...But then again, shouldn't the Null symbol ␀ redirect to the page on the Null symbol, which it actually is, not the page on the Null character, which it only symbolizes?

https://en.wikipedia.org/wiki/Null_symbol


> But what the hell does it do??? In Safari and Firefox, I get an nginx 400 Bad Request page from en.wikipedia.org. But in Chrome, it seems to be redirecting to a google search for the same url, when I type it into the address bar.

Firefox makes a GET request to "https://en.wikipedia.org/wiki/%00", the server returns an HTTP/2 400 Bad Request. Presumably because the web-server considered the URL invalid.

Chrome decides the string isn't a valid URL up front. So it does what it normally does when you enter random junk in the address bar; it searches for it.

The dirty secret of URLs is that no one can quite agree on which ones are valid or how they should be canonicalized.

We can take WHATWG's spec as a modern way to handle URLs [1]. If I'm reading it right (50/50 chance!) the URL would be considered valid by that spec.

See also this article from the developer of curl: https://daniel.haxx.se/blog/2016/05/11/my-url-isnt-your-url/

[1] https://url.spec.whatwg.org/


> The dirty secret of URLs is that no one can quite agree on which ones are valid or how they should be canonicalized.

Fun story. An engineer was working to migrate an old system from Python 2 to 3 before the Python 2 EOL deadline. The engineer decided to use the str type to represent URLs. Chaos ensued when suddenly non-UTF-8 URLs don't work any more. Turns out back when that system was designed, people were directly URL-encoding binary data into URLs.


> ...But then again, shouldn't the Null symbol ␀ redirect to the page on the Null symbol, which it actually is, not the page on the Null character, which it only symbolizes?

Perhaps it should. That page actually did not exist when the redirect was created:

https://en.wikipedia.org/w/index.php?title=␀&action=history

https://en.wikipedia.org/w/index.php?title=Null_symbol&actio...


NUL is explicitly 7 bits. Any other null could be larger than a machine byte.

Even your link tells specifically the name of '\0' is Null, NUL is merely an abbreviation

See the table here https://en.wikipedia.org/wiki/ASCII#Control_characters


Honest question (and not being rude). Is that strictly just being pedantic about it? Is the concept of NUL vs "NULL" semantically the same, and so you are just pointing out the ASCII abbreviation being used (three characters being consistent in the spec)? Genuinely curious if the concept of ASCII NUL '\0' can be interpreted differently in other contexts (other languages maybe)?

[edit, to clarify the question]

Like for example, the ASCII code '\0' is still a valid "thing" (it's a byte sequence, kind of). But in other contexts (other languages maybe), NULL is not a "thing" per se; it's more of a non-thing. How does a C programmer see the difference between NUL and NULL?


The two things referred to by the terms ”NUL byte” and ”null pointer” (aka ”NULL”) are quite distinct and mixing them up is not advisable, but calling the NUL byte a NULL byte, where it’s clear from the context what is meant, is not that confusing.

In the context of C, where NUL characters are most likely to be discussed, "NUL byte" and "null pointer" refer to exactly the same thing, that thing being the integer value 0.

Only if pointers are 8 bits.

More specifically, "NUL byte" is the character that has value 0, whereas "null pointer" is the pointer that has value 0.

OK, I'll try to stop being the pedant. I'll let the real pedants poke holes in what I said...


> More specifically, "NUL byte" is the character that has value 0, whereas "null pointer" is the pointer that has value 0.

Where is this enforced? You can assign between notional types without a problem. Whatever the context, providing 0 will have the same effect no matter how you labeled it.


Enforced? Not, as I think you're pointing out, by prohibiting conversion. That is, if

  char c = '\0';
  void* p = (void*)0;
  int a = (int)c;
  int b = (int)p;
then

  a == b
resolves to true. In that sense, they are the same - which I believe is your point. In that point, you are completely correct.

The difference is that I can't do

  *c
In that sense, they are not the same - not in the sense of numberic value, but in the sense of type. Also, c is 8 bits, and p (at least these days) is 32 or 64. (I pity anyone who ever had to work in an environment where p was 8 bits!) So they are the same numerically, but they are different both in type and in memory footprint.

a == p will also resolve to true. So will c == p. a and b don't add anything to your example; they're just in there to make c and p look like they're more different than they are.

    $ cat zero.c
    #include "stdio.h"

    int main(int argc, char* argv[]) {
     char nul = 0;
     void* null = 0;

     if( nul == null ) {
      printf("compared char to pointer; they are the same\n");
     } else {
      printf("found a difference between char and pointer\n");
     }
     return 0;
    }

    $ gcc -o zero zero.c
    zero.c: In function ‘main’:
    zero.c:7:10: warning: comparison between pointer and integer
      if( nul == null ) {
              ^~
    $ ./zero
    compared char to pointer; they are the same
    $
You get a warning, but not an error, for making the comparison. By contrast, assigning the integer zero to a void* isn't even a warning -- it's just a natural thing to do. There isn't another way to set a pointer to NULL. There is another way to set a character to 0, the '\0' syntax, but that's not a warning either.

C will think nothing of adding '!' to 'P' and getting 'q'. That's not strange because addition is a pretty normal thing to do with integers. You're right that a char variable should only occupy 8 bits of memory, but that's an implementation artifact, not a theory of what the value '\0' means. That value is unambiguously the integer zero with infinite precision. The reason it only occupies 8 bits is that you can't let it have infinite bits.


Hmm. Probably true. What about c == p? I don't recall whether both will promote to numeric, or whether it's a type error.

For that matter, what does (uncasted) p = c do? How about c = p?

[Edit: You updated while I was writing; my first question you already answered. Warning, not error.]


    $ cat pointers.c 
    #include "stdio.h"

    int main(int argc, char* argv[]) {
     char Z = 'Z';
     char q = 'q';
     void* null = 0;

     printf("Z is \\x%02x\n", Z);
     printf("But if it were a pointer, it would be %08x\n", Z);
     printf("Watch this:\n\n");

     null = Z;

     /* %p to print a pointer value */
     printf("Our void* is now: %p\n", null);

     q = null;

     printf("And q is: %c\n", q);

     return 0;
    }
    $ gcc -o pointers pointers.c
    pointers.c: In function ‘main’:
    pointers.c:12:7: warning: assignment makes pointer from integer without a cast [-Wint-conversion]
      null = Z;
           ^
    pointers.c:17:4: warning: assignment makes integer from pointer without a cast [-Wint-conversion]
      q = null;
        ^
    $ ./pointers
    Z is \x5a
    But if it were a pointer, it would be 0000005a
    Watch this:

    Our void* is now:     0x5a
    And q is: Z
    $
The assignments are warnings. They work just like you'd expect them to work.

Notice all the different printf flags? This is why you need them.


The literal 0 in a pointer context will be converted to a NULL pointer, which can be a non-zero bit pattern (there are some systems where the actual NULL pointer isn't all zeros). Going through a variable might not do what you think. So this:

    char *p = 0;
is fine, but

    intptr_t a = 0;
    char *p = (char *)a;
might not do what you expect (set p to the NULL pointer).

That’s not accurate. NUL byte is a byte with an all-zeros bit pattern, where a null pointer is a special pointer value (and thus pointer-sized) that can be coerced from the integer literal 0, but not in general from an arbitrary integer with value zero, and what’s more, a null pointer value is not guaranteed to have an all-zero bit pattern!

Fine as long as you call the backspace character BS and the tab character HT.

But then you have to call vertical tab VT, and then you have no name for a virtual teletypewriter.

Better let Wikipedia know that then! https://en.wikipedia.org/wiki/Null-terminated_string

I used to be pedantic about this one, now I just use 'null' because that's what everyone else uses and so they'll know what I'm talking about. If I really want to be picky I'll use '\0'.

I agree, and I think it's an important distinction as sizeof('\0') is not the same as sizeof((void*)0).

However, the standard does talk about null characters throughout as being a character code of value zero, as opposed to NULL the macro.


So the BEL character rings a bel on the user's terminal?

ACK!



Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: