Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> The best example is an app that dereferences null. Good developers see it right away and answer “segfault”.

And more pedantic developers state that it is "undefined behavior" and anything can happen.



Interesting go to response!

"So what does this code do?"

"Undefined behavior"

"Well, actually, it just prints out 'Hello World'"

"How can you be sure? Will the universe exist at that point? Will your computer not spontaneously combust? Will not a cosmic ray strike and flip the bits just-so, to make it print out 'Hello Girls'?"


You can define this behavior if you want to. I did that on an embedded system where location zero was mapped and it was a complete pain in the fundament to arrange things so it would generate a fault. So I put on my wizard hat, intoned "Indirecting through NULL yields the value <some-reset-vector-or-something>, so mote it be!" and sure enough, a few weeks later one of the Q/A folks wrote a test that indirected through zero and filed a bug when it didn't explode.

"But it's supposed to generate an exception!"

"Says who?"

"Uh... it just does, right? It's in the CPU or something."

"Really? How do you think that works?" I took the opportunity to explain the mechanisms that were in play on various platforms, including ours, because the engineer in question had always treated indirection through zero as something just universally and magically fatal somehow, and there is no magic, just details.

It's still a darned good idea to keep the first 64K or 1MB or whatever of your address space unmapped (as well as similar guards at the top of your address space) because it catches interesting mistakes, but it's not like this stuff was handed down to us on stone tablets.


The issue is that it is a C program, and according to the C standard, dereferencing (void * )0 is undefined behaviour. If you actually want to get at memory address 0 in a C program, you need to use some other mechanism or use a compiler that explicitly defines dereferencing NULL.

I believe the other mechanism could be as simple as pointer arithmetic, and not involve any compiler specific construct, but I would want to check the spec carefully before assuming that. Also, I would yell at whoever thought it was a good idea to store data at 0.


This doesn't remove UB though. Your compiler still can assume that null-dereference never happens and generates code with this assumption.


This sort of standards-purism goes too far, IMHO.

The map is not the territory: C compilers are real, concrete things that have real behavior regardless of whether that behavior is defined by an international treaty.

"What does the C standard require of an implementation when given this program" and "based on your experience, how might you expect a typical implementation to react to this?" are different questions, but both are interesting.

None of this implies, of course, that I think it's a good idea to write UB! But I think an ideal candidate would both know what UB is and also some of the ways it manifests in practice. If you don't know that code often segfaults on null dereference, you are going to have a very hard time debugging segfaults that you see in the real world.


The only correct answer would be "you dereference a null pointer". That is the only part of this equation the c language interacts with. That's the entire reason SIGSEGV is a signal to your process. Page faults or access violations are manifested as interrupts in most systems. In some systems there are no such restrictions. Even in C & x86 within RING 0 you should be able to access ((char )0). I've even made an example branch on an old kernel I was making when I was in highschool to test this and it does work [0].

What is unfortunate is that everyone is looking for a different kind of person. If I'm applying to a job, saying "segfault" might be the right answer OR it will be the wrong answer and the interviewer will leave thinking I don't know how this fundamental functionality (in some industries) works. I could also give the more correct answer and look like a know-it-all which, when I'm trying to sell myself based on what I know, is somehow a bad thing.

It's a loss no matter which way I play it.

I distinctly remember one interview where someone asked me to define a RESTful API for a chat service. So I naturally defined the different objects (ChatRoom, Message, User) and defined what CREATE, LIST, DELETE on them all did. The interviewer was confused because CREATE&LIST are not HTTP methods and I explained that RESTful design isn't really coupled to HTTP and these objects and the operations we can perform with them could be implemented over an RPC or HTTP/json and I listed the steps for both of these approaches. He cut the interview short and I never heard from that company again.

[0] - https://github.com/gravypod/Simple-OS/commit/b7a608500b2e70d...


You make it sound like there is no right answer, but why couldn't you just explain to the interviewer everything you wrote in this post?

"Well, according to the C standard it is undefined behavior. In context X it performs like Y" and so on.


An ideal candidate would have been bitten by bugs where the segfualt never happens because the offending code got optimized away. Once you've been bitten by that once, it become hard to answer with just "segfault".


I'm not sure it even has to get optimized away. Stuff like,

  Foo *ptr = NULL;
  ptr->member...
isn't going to access address 0, even w/o optimizations, despite the only pointer here being null. (It depends on the offset of "member", and if that offset is large enough, it might not be in the first page anymore. What's mapped at 0x1000 and later?)


I am not suggesting someone should answer with "segfault" and in fact I think that might be a worse answer than just "UB".

A good answer would be something like "UB according to the standard, but often segfault in real-world code on x86"...


How about “are you aware that this ‘segfault’ may be turned into RCE by a clever compiler plus a clever attacker”?

This may not be such a good interview question but, in most industries, C programmers should internalize this.


Far as I've seen on systems with MMU's generally you get page fault. And then the OS handles that and issues a seqfault. Which is what happens usually when you start trying to poke at random addresses.

On an ARM Cortex, if you try and read a null pointer you get the top of stack address. Least on my machine/compiler. If you try and write, then you get a bus fault. Beacuse flash memory is mapped to that address.

On an AVR, I think reading gives you the reset vector. And write to address 0 is useally a nop. I think with some magic though you can write to that page of flash.

So yeah. Kinda depends.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: