Hacker News new | past | comments | ask | show | jobs | submit login
C Pointers Explained, Really (2012) (karwin.blogspot.com)
60 points by dizzystar on Nov 25, 2014 | hide | past | web | favorite | 45 comments



Note that "%x" is not the correct way of displaying the value of a pointer with printf(), since "%x" assumes that it's printing an unsigned int and a pointer may be longer than an int (e.g., on a 64-bit architecture a pointer would be 8 bytes and an int would typically be 4 bytes). The correct format to use to print a pointer in hex is "%p", which knows how long a pointer is on the current system.

For more details on printf() formats, see:

https://en.wikipedia.org/wiki/Printf_format_string


Actually %p can only be used to print values of type void * . Since there is no requirement that other pointer types be converted to void * when calling a varargs function, you need to cast:

    int x = 4711;
    printf("x is at %p\n", (void *) &x);
Of course the conversion will be a null operation on many modern systems, where all data pointer types are implemented identically, but this is what the language standard requires.


Interesting, I thought pointer representation could only be different for function pointers. Has there ever been an architecture where data pointers could have a different representations?


> Interesting, I thought pointer representation could only be different for function pointers.

Yes, this is the case, e.g. in micro controllers, where the program is stored in EEPROM. You can store data in program memory (e.g. constant strings) but you need to access it in a special manner.

> Has there ever been an architecture where data pointers could have a different representations?

At least conceptually, this was the case in early GPUs, where there were different memories for different access patterns. Memory area for constants, another one for global read only memory, and a third area for thread-specific read-write -memory and a fourth that was shared between groups of threads.

Whether they were actually different physical memories and whether they had different address spaces is another issue (they at least partially did, some stuff was on chip, some was external DRAM), but this was how it was conceptually seen by the programmer and the way early GPGPU APIs worked (early versions of CUDA and OpenCL).


Yes, different representations as well as non-zero null pointers. More from comp.lang.c FAQ at http://c-faq.com/null/machexamp.html.


Ah, very interesting, thanks!


When I was learning C I had heard about how terrifying pointers were and so I expected them to be super difficult to learn. Maybe, my experience is atypical, but I found them pretty intuitive.


I'm not sure why pointers have such bad rep, they're a pretty simple concept of storing and using memory addresses.

References in dynamically typed languages however often make me nervous, as I'm never entirely sure when different types in different languages get copied or not. If I the function argument p gets modified here, did the original variable in the caller get modified? Do I have to write an elobarate check for its mutability?


I think the difficulty is not so much in pointers themselves but rather the many undefined behaviour landmines that surround them.

Stuff like pointer aliasing, dangling pointers, pointer arithmetics (it's UB to take a pointer after the 1st byte after the end of the object for instance), NULL pointer dereference and things like that. Debugging those kinds of bugs can be super tricky and time consuming.

Not that pointers are the only source of UB in C, but they're probably some of the most easily encountered.

There's also the whole array->pointer decay thing that can be a bit tricky to handle in certain cases (I think the sizeof() a parameter declared as an array is probably a surprise to all C coders when they encounter it for the first time).


I have a similar experience but I have to admit that pointers became reasonably trivial once our teacher told us to use pen and paper (or a digital equivalent) to schematically draw how these pointers work, much like this article does. The difficulty with pointers is, for me, keeping track of them in your head, especially when you work with arrays. Otherwise it's actually pretty straightforward!


> The difficulty with pointers is, for me, keeping track of them in your head, especially when you work with arrays.

I think this is mostly an artifact of C's type system. I wish pointers were taught in a language with similar runtime semantics as C with a more expressive type system. Drawing it out helps a ton, but the way C forces you to embed that structure into code doesn't help.


agree and in the article when he gives advise to his friend, he mentions he has a leg up due to his ASM background. I first learned ASM on a Commodore64 and so I had a very close relationship with the hardware such as indexed addressing and indirect index addressing, etc... By the time I learned C some 10 years later, I already new what authors were talking about on the subject of pointers. Over the years, I wondered how I would have learned this stuff if I didn't have the ASM (assembly language background) and a knack for electronics.

Now a days, things are more complicated with MMUs and virtual memory concepts but I think the examples in the article are a good basis for anyone wanting to know what happens under the hood.


Yep. Once you understand how memory works, it's very straightforward how pointers work (the value of a pointer is actually the address of another "unit" of memory).


True. Never had much problem with pointers in my (limited I admit) C experience.

But I did have a problem with remembering to manage memory clear.


When used within the context of pointers, it helps to think of '*' as the 'at address' operator and '&' as the 'address of' operator. Especially when reading code.


A shame they didn't use "@" for the dereference operator...


Wouldn't that also have made parsing simpler?


Used to think of '*' as 'value of' or 'value at' using the same logic :)


Just came to say I think it is impressive this guy has an email from 1990 backed up.


Is this really from 2012? Examples are in 1978 "K&R1" C:

  func(p)
  int *p;
  {
    body ...
  }
No, memory is not one big array, sorry. Objects have boundaries even though there are valid addresses below their base and above their last byte. Not everything that looks sensible from a machine language point of view is well-defined.

Comment below article:

> Marin Todinov said...

> Dear Programming Guru

> You are an absolute legend, ive been programming for 4 years and i have a masters in computer science, your explanation of pointers has helped me increase my efficiency in recursive functions and made a map in my breain of how these basic fundamental structers.

Lol, what the heck? Troll or astro-turfed? Where can you get a CS Master's Degree and on only four years of programming?


> No, memory is not one big array, sorry.

To the extent the operations are well-defined, both the compiler and the OS conspire to make this look true; if there's no OS, the compiler typically works harder to make it work, because the alternative would make programs too difficult to port to or away from the system in question.

The "big array of bytes, each with its own unique address" mental model is a useful lie which most programmers who know better don't pry into most of the time. Going beyond that would involve knowing about system-specific things that C is explicitly designed to abstract away, to make programs more portable.

So, no, the struct hack isn't valid C, but you can make huge arrays and, within those arrays, simple increments and decrements do work reliably, because the standard says so and the OS and the compiler will together contain enough code to make it work if they're any good at all.


As a CS undergrad having only really learned the "everything is a big array" style, can you provide more materials on the reality of memory in a system? I'm very curious how that works. What would you recommend I search for to learn more about this topic?


There are a few things you should now: first, the actual physical memory of your computer is pretty much like a big array: if you were writing code in assembly and were to run it with no operating system, that's exactly what you'd get.

However, with operating systems and multiple programs running at the same time, memory is no longer contiguous: instead, programs can request "pages" (blocks of memory). This is (more or less) what `malloc` does, if you've come across it. That's the key difference: in a modern operating system, you can't expect memory to be one big array, since your program might have requested more than one page of memory. In that sense, it's more like a collection of smaller arrays.

We have to do it this way so we can have memory protection (similar to file permissions - a program can decide if other programs can read one of their pages, write to it, etc) and swapping (i.e writing unused pages to non-volatile storage , like a hard drive, to free memory).


Not only that, but the linearity of physical ram is a fiction as well: in nearly all systems these days ram is made up of multiple memory modules (the MM in SIMM/DIMM), and to my knowledge, the OS is free to stitch them together in any way it sees fit.

(All of this is to say nothing of NUMA.)

However, one of the responsibilities of the OS is to hide all that messy detail from the bare-metal programmer or compiler writer and provide a simple(r) abstraction over the hardware. Thus, "(physical) memory is a big array".


On PC, BIOS configures memory controller in such way as to hide boundaries of memory modules from OS. Resulting address space still contains some holes and stuff that is not physical memory, but OS gets map of this from BIOS. Originally (on systems with 36pin SIMMs) this wasn't the case and you had to match memory modules such that they produce continuous block of addresses.

In essence the situation is pretty much same as for user space program: you get big address space and list of memory regions that are mapped and usable.




Sounds like a job for CS:APP http://csapp.cs.cmu.edu/ . (I haven't got through it myself yet.)


> What would you recommend I search for to learn more about this topic?

Very simple:

Registers.

L1 cache.

L2 cache.

NUMA

Somewhat more complex:

Virtual memory.

Memory pages.

Other posts have more information, but that should get you going.


> Lol, what the heck? Troll or astro-turfed? Where can you get a CS Master's Degree and on only four years of programming?

My experience (from interviewing people with various educational backgrounds) is that a lot of people who have a Masters in CS have very very little practical experience actually programming. People who go into the field to ascend the ivory tower often just don't do a lot of it, really.

Which is, I think, not really very different from a lot of fields. There's a distinct academic track to a lot of fields.


No, memory is not one big array, sorry.

No, no, memory is in fact one big array of bytes. Everything else is just really nice syntactic sugar over that fact.

Now, it may well be that attempts to access that memory result in page faults or weird interrupts or IO behavior or what have you, but the computer really does only see a big array.


> but the computer really does only see a big array.

Define "the computer" in this context.

Certainly not the x86 chip itself - it sees memory as a series of caches (L1, L2, L3) and eventually the memory bus, which it manages through various lookup tables (TLB etc.) more closely resembling a series of hash tables on steroids than an array - and that's ignoring per-processor caches on multiproc systems and all the invalidation logic that needs to occur as a result.

What about processes? One flat memory space! ...except when you communicate with another process, say by sharing memory. Then you realize you can't share your 'indicies' without associating them to other indicies, because even if the physical memory is the same, each process has their own 'array' for indexing into that memory (and yours doesn't even contain everything in theirs.) That's at least 68 arrays on my computer at the time of writing this, not one.

The kernel's the one managing this mess of arrays, pinning pages needed for interrupt handlers and software TLB support (for not even it is addressing pure physical memory most of the time?)

I guess you could argue that because your chip supports DMA, you can do all your array indexing through that to get to your 'one true' physical memory addressing scheme, label that as what your computer 'really sees', and ignore the 99.99% of instructions executing and making up the bulk of your computation, which have nothing to do with that addressing scheme, but that seems a bit disingenuous.


From C and assembly, that's very much how memory is access--compared with the objects notion mention in the GP.

The fact that certain accesses may cause memory layout to change or other strange things is something better left to the computer engineers. :)


No, that's untrue. I find that the thinking that it is so is generally derived from C attempting to impose it as part of its spec. Any architecture that uses bank switching, for example, is very much not a "big array of bytes". Or go try to write to byte 0x382 of your modern graphics card's VRAM, will you?


Message 1956 (8 left): Thu Jan 25 1990 2:44am

(that's the first line of the 3rd paragraph, maybe a dozen whole lines before the part you quoted...)


>No, memory is not one big array, sorry.

Yet that fact is not very relevant for his explanation, sorry.

>Lol, what the heck? Troll or astro-turfed? Where can you get a CS Master's Degree and on only four years of programming?

In lots of places.

In some cases a "CS Master degree" can mean that you took 4 years of Economics or Physics and took a CS postgraduate course afterwards -- not continuous BSc and CS education.

Other countries require 3 years for the BSc and 1 year for a master's degree.


It says 1990 in the original message.


> Lol, what the heck? Troll or astro-turfed? Where can you get a CS Master's Degree and on only four years of programming?

... and still don't totally grasp pointers apparently.

But seriously, what's so confusing about pointers?


> what's so confusing about pointers?

as with any constructor, you can make it confusing. just do pointer arithmetic with different types and you get yourself a confusing mess.


> Lol, what the heck? Troll or astro-turfed? Where can you get a CS Master's Degree and on only four years of programming?

Introduced to computer science / programming halfway through undergrad 2 years ago. Going to finish up a masters next fall. (3 and some change years after I started)


Just for interest: I got my masters in CS in four years, and this is pretty typical in the UK.


Bachelors + masters in 4 years? Wow, that's pretty fast.

(I think that's the parent comment's question, if you have a bachelors + masters in CS, you'd have had to code for 6 years or so).


I don't have a bachelors - I left the equivalent of US high school and started on a four year masters degree and now I'm doing a PhD. As I say it's not uncommon in the UK - it's not a special accelerated or course or anything.


Cool, that's new information!


At least in Japan, it's possible. I have a couple friends who are doing a master's in CS (taking 2 years) and did their bachelor's in something completely unrelated, so their programming experience will only be 2-3 years on graduation.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: