Hacker News new | past | comments | ask | show | jobs | submit login
Beej’s Guide to C Programming [pdf] (beej.us)
507 points by tumblewit on April 23, 2021 | hide | past | favorite | 171 comments



Jens Gustedt's "Modern C" (https://modernc.gforge.inria.fr) is an excellent resource as well.


Jens Gustedt is a co-editor of the ISO C standard, so he knows his shit. Modern C is probably the best book about modern C, short of reading the standard.


It's far easier to read than the standard, too, since it's explanatory rather than declaratory. The standard tells you what C can do, Modern C teaches you how.


What does "modern" C imply? AFAIK, there are not very many new language features; is it about organizing code differently than what one would learn from K&R?


I'm chuckling to myself at "there are not many new language features" because that's what I thought.

There's fair amount added to the core since C89, actually... And if you include the library changes, game over! I had no idea how much had been added.

More challenging is to find what's been subtracted.

This no longer defaults to int:

static i;

And gets() is toast.

Anyone know any others?


Array initializers! Lots of changes there. Oh, things subtracted? Wasn't there a change in typedef vs type tags?


VLAs, made optional in C11, so compilers that jumped from C89 to C11, like on embedded space, or MSVC, never bothered with supporting them.


Apart from the new features added in C11 or C17 that others have mentioned, both Beej's Guide and Modern C also cover threads and string encoding (neither of these topics are covered in K&R 2nd edition, as far as I remember). Beej's Guide also has a section on internationalization and one about date/time handling.

But yes, the code is also quite different than in K&R which is relatively terse. See for example this comment in K&R about strcpy on print page 105 (page 119 of the PDF)[1], where after showing two versions of strcpy that are pretty readable and easy to follow, the book says:

> In practice, strcpy would not be written as we showed it above. Experienced C programmers would prefer

    void strcpy(char *s, char *t)
    {
        while ((*s++ = *t++) != '\0')
            ;
    }

Followed by a paragraph saying the condition could be simplified, and that

> the function would likely be written as

    void strcpy(char *s, char *t)
    {
        while (*s++ = *t++)
            ;
    }

Make of that what you will :-)

[1] https://kremlin.cc/k&r.pdf#page=119


While experienced C programmers that care about security wouldn't even touch strcpy().

All those code samples will crash and burn if s and t point to the wrong locations, t happens to point to a string without a null terminator, or s points to a buffer not big enough to handle the string pointed by t.


> All those code samples will crash and burn if s and t point to the wrong locations

This is true for all uses of pointers in C. The only validation you can do is whether a pointer is null. Apart from that you can never know that a pointer points to a non-wrong location. And still experienced C programmers seem to be OK with using pointers.


Actually no, many OSes do have APIs to validate pointer integrity.

But even then, most experienced C programmers don't care about using them anyway, that is why Apple, Oracle, Microsoft, ARM, Google, Cambridge university are all leading efforts for hardware memory tagging.

So it won't matter how much they care, as the OS will kill their beloved application when pointers get misused.

A scenario already made reality in platforms like Solaris SPARC.


> many OSes

Are outside the scope of the C language. But if you allow these special APIs, you can use them to implement a safe strcpy. Either way your point from above is still invalid.


I know that

    while (*s++ = *t++);
is the type of thing seemingly designed to reach through history to make a modern programmer spit out their tea — but I also can’t help but feel that it’s a very direct “C machine” representation of the semantics of a processor string instruction, ala x86’s REPNZ MOVSB.


According to the introduction of the book, C17. Which is mostly just bugfixes on C11. But, compared to c99, you get nice things like generic expressions, better unicode, and (standard) multithreading.


> But, compared to c99, you get nice things like generic expressions

I'd have a hard time calling the way they designed _Generic "a nice thing" :-)



The current standard is C17, of course: https://en.m.wikipedia.org/wiki/C17_(C_standard_revision).


Nice book. As a C++ programmer how much of its discussion related to C's storage/memory model would carry over to C++?


All of it. C++ is a superset of C.



You don't really have to ask others. The C and C++ standard disagree, you can easily find instances of this disagreement if you just peruse them. The easiest example to find is the difference in behavior of the auto keyword or in how conversions from void * to something_else * are implicit in C not in C++. There's now also an extreme number of subtleties in terms of undefined behavior which differ between the languages.


And even if it was, the idioms would still be different. Even if yaml is a superset of JSON, you wouldn't apply JSON pattern (eg. Using a "_comment" key for comments) to yaml. And similarly, many of the JavaScript design pattern won't apply to typescript because it has language features instead of conventions.


Bjarne Stroustrup, C++ creator, disagrees, IIRC. Think I read it on his site.

Edit:

https://www.stroustrup.com/bs_faq.html#C-is-subset

>Please note that "C" in the paragraphs above refers to Classic C and C89. C++ is not a descendant of C99; C++ and C99 are siblings.


foo_t foo = {.bar=1} ; doesn't work in C++.


I stumbled upon this gem a while ago [0] while looking for a decent tutorial and reference to C:

Stuff that should be avoided: [...]

Beej's Guide to C: http://beej.us/guide/bgc/output/html/singlepage/bgc.html

Full of mistakes.

[...]

Could someone confirm this? I've seen a lot of threads here on HN praising beej's guides so I am somewhat confused.

[0] http://www.iso-9899.info/wiki/Main_Page

edit: Formatting


Beej himself lists this as an 'alpha-quality document' on the download page [0] and if I remember correctly, it has been so for years. Wonder why this is posted here on HN.

[0]: http://www.beej.us/guide/bgc/


I recently overhauled it in a big way.. er, "am overhauling it".

And I'm sure it's full of mistakes. It's over 500 pages, most of which has yet to be edited, so if there are fewer than 1000 defects, I'd be shocked.

But I fix them all as I find them, or as they're pointed out. And after an eventual editing pass, things will be better.

And if it's not useful to someone, I take no offense I'd they don't like it. :-)


I think posts like this are handy just for the extra exposure an alpha document might need; there's a lot of good feedback and discussion here that hopefully beej comes across someday and can finish the book


On further observation the git repository of the book [0] seems to be quite active. Maybe this book might be finished after all.

[0]: https://github.com/beejjorgensen/bgc


Backstory: I started writing this book for novice programmers about 15 years ago. It was going to be a lot shorter.

But I became disinterested because:

1. Most beginning programmers don't start with C

2. I wouldn't get a chance to go deep and explore the language.

So I shelved it, unfinished.

Flash forward to about a year ago... I had flash of inspiration: change the audience to intermediate programmers.

Now I could skim the general conceptual stuff and get into more details.

One thing, though. C actually added a lot of stuff in the intervening years. I didn't realize the magnitude of the project.

Oh well! Too late to turn back now!


Just wanted to say THANK YOU as I read your guide to network programming way back in 2000, over 20 years ago! I was just starting out in C network programming on VxWorks :) Glad to see you're still updating your guides.


Thanks! Makes my day to hear people find the work useful. :)


+1000 - I used your network programming guide extensively for my Computer Networking final project in 96, invaluable, Thankyou!


I read the beginning of this C guide and also found it very helpful and made things very easy to understand (coming from C++). Thanks!


> Most beginning programmers don't start with C

Except literally every indian engineer. They teach C to mechanical and chemical engineers for some reason. (It's not an elective)


Glad to hear this, thank you!


On a quick skim of some introductory parts I found:

> When you have a variable in C, the value of that variable is in memory somewhere, at some address. Of course. After all, where else would it be?

It would be in a register. Of course. Or it would be eliminated by a compiler optimization. Of course.

Same error later on:

> When you pass a value to a function,a copy of that value gets made in this magical mystery world known as the stack

No. In most common cases, arguments will not be passed via the stack. This goes on to clarify in a footnote that the implementation might not actually use a stack, and that the stack has something to do with recursion. That part is true, but the values saved on the stack for recursing are not the same as function arguments.

Neither in the Variadic Functions chapter nor anywhere else are the default argument promotions mentioned -- this will bite someone who tries to write a variadic function that gets floats out of the variadic argument list, which you cannot do, since passing a float to a variadic function promotes it to double.

Speaking of floats... This is one of those tutorials that are very confused regarding their target audience. For example, in the "Variables" section it goes out of its way to define: "A “byte” is an 8-bit binary number. Think of it as an integer that can only hold the values from 0 to 255, inclusive." (which isn't the C definition, but this really is nit-picking) but then happily goes on to talk about Booleans and floats without explaining what those are. What reader has a background that would make this useful?

Overall, from the little I've seen, I'd give this an initial rating of "broadly correct but with definite mistakes".

Even if it were fully correct, I dislike the verbose style, and I wouldn't recommend this tutorial. For example, in the Hello World chapter, we have the following "explanation" of the line "#include <stdio.h>":

> Now, what is this#include? GROSS! Well, it tells the C Preprocessor to pull the contents of another fileand insert it into the code rightthere.Wait—what’s a C Preprocessor? Good question. There are two stages (well, technically there are more thantwo, but hey, let’s pretend there are two and have a good laugh) to compilation: the preprocessor and thecompiler. Anything that starts with pound sign, or “octothorpe”, (#) is something the preprocessor operateson before the compiler even gets started. Commonpreprocessor directives, as they’re called, are#includeand#define. More on that later.Before we go on, why would I even begin to bother pointing out that a pound sign is called an octothorpe?The answer is simple: I think the word octothorpe is so excellently funny, I have to gratuitously spread itsname around whenever I get the opportunity. Octothorpe. Octothorpe, octothorpe, octothorpe.Soanyway. After the C preprocessor has finished preprocessing everything, the results are ready for thecompiler to take them and produceassembly code8,machine code9, or whatever it’s about to do. Don’t worryabout the technical details of compilation for now; just know that your source runs through the preprocessor,then the output of that runs through the compiler, then that produces an executable for you to run. Octothorpe.What about the rest of the line? What’s<stdio.h>? That is what is known as aheader file. It’s the dot-hat the end that gives it away. In fact it’s the “Standard I/O” (stdio) header file that you will grow to knowand love. It contains preprocessor directives and function prototypes (more on that later) for common inputand output needs. For our demo program, we’re outputting the string “Hello, World!”, so we in particularneed the function prototype for theprintf()function from this header file. Basically, if we tried to useprintf()without#include <stdio.h>, the compiler would have complained to us about it.How did I know I needed to#include <stdio.h>forprintf()? Answer: it’s in the documentation. Ifyou’re on a Unix system,man printfand it’ll tell you right at the top of the man page what header files are required. Or see the reference section in this book.:-)Holy moly. That was all to cover the first line! But, let’s face it, it has been completely dissected. No mysteryshall remain!

Only one sentence of this is relevant for an introductory Hello World chapter: "Basically, if we tried to use printf() without #include <stdio.h>, the compiler would have complained to us about it." None of the rest is relevant or helpful to a beginner who is just seeing their first ever C program. Also "completely dissected" isn't true either; there is a lot more to be said about headers.


> It would be in a register. Of course. Or it would be eliminated by a compiler optimization. Of course.

Since you mention relevance for beginners later on in your post I'd argue this isn't relevant either. This concept holds true for simple code that doesn't do advanced stuff like working with hardware. As soon as you do &variable, you get an address and can work with it. If the compiler optimized something away you never use you might as well just pretend it's in memory somewhere for the sake of a mental model that's easy to grasp. Same with passing variables via stack. A simple compiler could do it just like that.

That isn't to say the tutorial is good/not good, but these points in particular seem rather sane to me. Far from "Mastering C Pointers" at least :)


> Since you mention relevance for beginners later on in your post I'd argue this isn't relevant either.

Agreed. At the time that variables are introduced, it should just say "a variable is a name for a location where a value is stored". I didn't mean to suggest that the tutorial should go into needless detail at that point. Just that the needless detail that it currently goes into is wrong.

> Same with passing variables via stack.

Again, my problem is just with the needless detail. When you pass a value it is copied (that is the relevant part) somewhere where the callee can find it (that somewhere is the irrelevant part).


> It would be in a register. Of course. Or it would be eliminated by a compiler optimization

As long as you’re taking, and using, the address of that variable, it’s almost guaranteed to be in memory. Even if it won’t, the compiler guarantees the output of the program will be equivalent to unoptimized code.

> arguments will not be passed via the stack

I’m not sure explaining nuances of various calling conventions, and how they differ across processors and OSes, is useful information in a document about C and targeted towards beginners.

You’re talking about things which are underneath C in the abstraction layer hierarchy. The abstraction has many layers, the lowest one being quantum physics. One has to stop somewhere, and this article decided to stop at C, as opposed to assembly.


> One has to stop somewhere, and this article decided to stop at C, as opposed to assembly.

As soon as you mention the stack, you've gone beyond C and started talking about something that is not C.


See my reply to your sibling comment. I wasn't proposing adding more irrelevant detail. If anything, I was proposing removing the existing irrelevant and incorrect/very incomplete detail.


It's a tough line between intro and The Rabbit Hole. But I'll see what I can do there.

Appreciate the feedback. Some good suggestions here that I'll add.


> Only one sentence of this is relevant for an introductory Hello World chapter: "Basically, if we tried to use printf() without #include <stdio.h>, the compiler would have complained to us about it."

Quite the contrary in my opinion. As a beginner I was very frustrated with most approach that say "Just put that thing that is needed and will be explained latter. And it work good job attaboy!"

And maybe 250 pages latter if the author didn't forgot in the meantime you get a one liner mention that link back to the first introduction of the syntax.

At least this guide don't let the reader in the fog wondering.


IMO, pointers are less difficult to comprehend than other abstractions, like lambdas are.

If you know how to walk down a street and stop at the right street number, then you have used pointers. And if you've ever observed that one tall building may "cover" a range of street numbers, such as 200-220, then you should understand how to move from one 4-byte "value" to the next in an array in memory.

Anyway, many more analogies... probably better than this one.

Maybe unions could make using pointers a bit more challenging, but again, tall buildings next to short buildings and so on. We do this kind of pointer calculation in real life.


What pointers basically are is not particularly hard to grasp. What is harder to grasp it what can be done with them and how you can shoot yourself in the foot with them in non-obvious ways.

I think I only understood much of it once I learned Rust, because you realize: Ah, that thing I once did in C is something that maybe ahouldn't be possible at all without extra steps. Even if I were to nwver use Rust again, this definitly helped to understand how to use pointers more safely.


I think that pointers are a tool to control a CPU's indirect addressing modes from a higher-level language.

Exposure to an assembler makes pointers easy to understand.


What's difficult to understand about pointers isn't the concept of a pointer itself, or even * and &, it's the fact that working with pointers requires you to simultaneously understand different abstraction levels. While it's not unique to pointers, and it's in fact the case for most nontrivial programming tasks, what's unique about C is that pointers are so pervasive you can't really do anything if you don't understand how to work with them.

IME languages like Python aren't any easier than C to work with (ignoring UB issues of course), but it's certainly the case that you can probably kinda sorta get your job done with Python even without understanding the first thing of what you're doing, and that's not happening if you write in C.


In C, pointers require you to think deeply about the ownership and lifetime of any "allocated object" at runtime. How long does it live, who is responsible for the deallocation, how many pointers does your program hold to that object (dangling issues). Ultimately, it can lead to a cleaner design if these issues are taken seriously up-front.


I don't disagree with that, but most cases fall within a pretty clear pattern:

- typedef struct { ... } foo

- foo *foo_create()

- void foo_destroy(foo *)

- a bunch of functions that take foo* as their first arg

which is kind of the same as a class and only more error-prone.

I say this as someone who actually _likes_ C, but the manual memory management model is very often unnecessary, confusing, repetitive. There was an idea some time ago of a language extension that would extend the concept of automatic storage duration to allow an explicit destructor to be called when the variable goes out of scope, like <close> variables in some languages. I genuinely think things like that would make the language a bit more ergonomic without fundamentally changing its nature.


That's why I tend to always prefer automatic and static storage to dynamic allocation wherever possible, especially in cases where you don't have "N" items or "N" cannot possibly exceed a certain small value. Also, allocation/deallocation of the certain object need not be defined within its module. It should be up to the caller to decide whether to allocate the object on the stack, statically or dynamically depending on the caller's situation:

    foo f;
    foo_init(&f);
    foo_destroy(&f);
    ...
    foo *g = malloc(sizeof(foo));
    foo_init(g);
    foo_destroy(g);
    free(g);


That is of course fine, with the drawback that it requires a public definition of the foo type.

I would write the allocation as

    foo * const g = malloc(sizeof *g);
to avoid repeating the type name and "lock" the allocation to the variable. If the type on the left hand side ever would change, this still does the right thing.


Interesting, I tend to prefer a create and destroy function that allocates and frees the structure. That way you can have foo without it being initialized, and you cant have foo been freed without being de-initialized. Where do see the value in being able to move it between different memory types?


In games and embedded systems, a very common pattern is batched heap allocation: a single allocation of an array of foo, and/or an array of foo mixed with other types, or sometimes a memory managed data structure like a memory pool. This is one big reason that the C++ Standard Template Library was shunned in game studios for a long time; it automatically did it’s own heap allocations, sometimes at inopportune times IIRC like in constructors, and didn’t let the caller choose. EA wrote their own version (EASTL) that was game friendly because it allowed control of the allocator.

In GPU programming, there are actual different memory sub-systems, with different sizes and different performance implications, so it’s critical that the caller is able to do the allocation & deallocation any way they want. This is why most well designed GPU APIs rarely allocate GPU memory inside the API, but instead are designed to work with caller-provided pointers to buffers.


Coming from the embedded world, I am far more used to the style bluetomcat showed where the caller handles allocation. It takes more work to use, and you have to make sure to not use it before initializing or after destroying, as you said. The advantage is that the caller has full control of memory allocation. The object can be a local variable, a static, malloc:ed, come from a custom allocator (say a slab allocator), or be a part of a larger struct.


That makes a lot of sense. I do some embedded development, but when I dont, I tend to prefer heap just because my memory debuggers are much more capable working with heap memory.

In the case you show, foo would have to be a struct that doesn't contain pointers to additional allocated memory, but its a entirely valid use case and pattern.

Calling malloc and free, has a cost associated with it, that the stack doesn't. But stack can in some cases, like with recursion be scary to use, because you dont know where it ends. If malloc returns NULL you know you have found the end and can do something reasonable.

Thanks for you insight!


I always lacked a good discussion on the subject of allocation/deallocation strategies. Is there some recommended good read?


It’s not standardized, but if you’re using gcc, you can use the cleanup attribute for this: https://gcc.gnu.org/onlinedocs/gcc/Common-Variable-Attribute...

IIRC clang implements it as well, but I wasn’t able to find a reference to it in their docs.


I’m actually implementing that right now for my own use, as a pre-processor.

There is a much more advanced design and implementation at “A defer mechanism for C” (December 2020): https://gustedt.wordpress.com/2020/12/14/a-defer-mechanism-f...

For my own purposes, I think I can live without handling stack unwinding so I continue working on my pre-processor.

Since the pre-processor is not yet finished, there I use a vector¹ of {.pointer, .destructor} where I put objects after initialization, with one macro at the end of each managed scope that calls the destructors for that scope in reverse order, then another macro meant for the function exit points that calls all the destructors in that vector. This has been built many times before by other people of course, it’s just an exercise to see which difficulties arise.

¹ Vector, growable array: I did my own trivial type-generic growable array, with the classic {.capacity, .len, .items}, but again there are many previous implementations. The two I’ve found more interesting are:

- “C Template Library (CTL)”: https://github.com/glouw/ctl

- “Klib: a Generic Library in C”: https://github.com/attractivechaos/klib/


For a vector alternative, I find this one easy to read and use :

https://github.com/tezc/sc/tree/master/array

It is just an array of your type, e.g int *numbers, so you have type info in debugger as well.


No type safety and broken alignment. No thanks.


Isn’t it type safe? e.g you can’t add char* to a double array.

What do you mean by broken alignment?


It lacks type safety because it's not possible to distinguish between an sc_array and a pointer so there's no way of detecting that someone passed a char * that nobody had ever called sc_array_create on to sc_array_add for example.

The alignment is broken because nothing in the C standard guarantees that the elems member of sc_array will be aligned correctly for any possible element type.

I also spotted another problem, in sc_array_init the code `void *p = a` is also not guaranteed to work. In an example snippet such as `int iv; sc_array_create(iv, 0);` expands to `sc_array_init(&iv, sizeof iv, 0)` so the type of the expression `&iv` is `int *` which is then being converted to `void *` in the function which is actually not allowed by the standard. This is also the reason why if you were writing a wrapper around realloc which exited if the allocation failed you would still have to pass in the current pointer with void * and return the new pointer with void *. This could be applied here actually as an easy fix but it indicates even further to me that the author of the library is taking a very leisurely approach to writing conforming C. This pattern also appears in the other two functions though and I'm not sure if in those cases it's something which can be easily fixed.


With a bit of trickery CTL works much like the STL, in that containers can be built of containers:

    #define P
    #define T int
    #include <vec.h>

    #define T vec_int
    #include <deq.h>
A deq_vec_int - analogous to std::deque<std::vector<int>> - is a neat example.


There's also a CTL fork that might be of interest: https://github.com/rurban/ctl


I also like the Ken Thompson extensions from which part of Go was inspired by

    struct X { int a; }
    struct Y { *X; int b; int c;}

    void add(Y* self, int number) {
      self->a += number;
    }

    Y y;
    y.a = 10; // composition
    y.add(1) // y.a = 11 now
This alone would simplify C coding so much without taking any power out of it.

The other extension i would add is some sort of interface or protocol.

As soon as you can do something like "y.add(1)", having a generic contract to refer to things without having to know its concrete type is some of the good things from the OOP world.

With this you would also be able to call some cleanup code and even a initializer.

This is still C and its still much simpler than C++, and yet almost as powerful.

C should propose these kind of things even if it was not that conservative and it would retain a lot of coders that migrate instead giving C barely evolved from its 70's roots.


Yeah, except when it doesn't follow the pattern: for example, there is no foo_destroy(), you're supposed to "just" call free() when you're done with it. Used to be very common (not sure how it is now), and very frustrating when you link against non-standard allocators.


> In C, pointers require you to think deeply about the ownership and lifetime of any "allocated object" at runtime.

Unless you decide you use libgc, presumably?


These are good points. I can sometimes feel that Python is more pointer-y (?) than people expect, with stuff like:

    a = {"one": 1, "two": 2 }
    b = a
    b["two"] = 99
    print(a["two"])
The above prints 99, since "b = a" does not copy the value (the dictionary) but just the reference to the value ("the pointer", kind of). This is surprising to some people.


I agree, which is why I'm glad the first language I learned was C. I don't get to write a lot of C at work, but the concepts the language teaches you are the very fundamentals of programming.

I know it's probably baseless, but I can't shake the feeling that people who learn modern languages before learning C are just making their own lives harder.


I think the same goes for learning assembly before learning C. A simple assembly-based computer [1] should be the first programming target in every "introduction to programming" class. After that, C becomes obvious and those who struggle with the fundamntals can be guided to more appropriate careers instead of dumbing down the tools everyone uses.

[0] https://www.instructables.com/CARDIAC-CARDboard-Illustrative...


This is one part (of many) of perl that I like, that it exposes the pointer-y parts to the programmer (to abuse :))

    # hash
    my %a = ( 'one' => 1, 'two' => 2 );
    my %b = %a;

    $b{ 'two' } = 99;

    # prints 2
    print $a{ 'two' }, "\n";

    # reference to hash
    my $a = { 'one' => 1, 'two' => 2 };
    my $b = $a;

    $b->{ 'two' } = 99;

    # prints 99
    print $a->{ 'two' }, "\n";


The real difference is Perl has typed variables, such as hashes or arrays, which other languages typically do not. Which does make it convenient when you need to clone the object, such as "%b = %a" example. Perl borrows the sigil concept from Bash and other shells (which makes sense, given the historical context of Perl). The only other language I know of that uses sigils in such manner is BASIC. Other languages, such as Ruby or Common Lisp, use them as syntactic convention rather than as a feature that the compiler/interpreter understands.[1]

Most dynamic languages expose the data as references. In fact, the one thing that trips up JavaScript developers (especially in React) is that they do not understand how references work. I see senior and lead developers inadvertently doing mutation all the time. Or getting incredibly paranoid that two identical strings, for example, do not equal each other in the strictest sense in JS. They also throw in memoization everywhere due to their fundamental lack of understanding.

You can always tell the developers that do not have C/C++/Pascal experience.

[1] https://en.wikipedia.org/wiki/Sigil_(computer_programming)


Reminds me of the Python trap that I fall into every few months because I forget about it:

def foo(mylist=[]): mylist.append("a") return mylist

mylist is only initialized once, so the function will actually return one more "a" with each function execution


It's a feature! Use this memoization pattern and you'll never forget it

  def foo(x, cache={}):
    if x in cache:
      return cache[x]
    val = cache[x] = x*x+1
    return val
But, be warned, there's no mechanism for cache eviction; use @lru_cache if you don't know ahead of time that x will take a reasonably small number of values


It's also similar in Java's pass-by-value vs pass-by-reference (pass as well as copy), if I recall correctly.


Actually, most languages only have pass-by-value, being “pedantic” (but it is actually a useful distinction). I’m sure there are other’s but out of the majorly used languages, only C++ has actual references. Pointers are passed by value, that is the pointer value gets copied. References are a language level construct.

Here is a great “rant” at it in case of Java:

http://www.javadude.com/articles/passbyvalue.htm


People that can’t comprehend arrays/lists in Python, especially multi-dimensional ones, would have had no chance in C


Pointers seem easy until you understand the pointers you're working with do not correspond with the basic hardware model you have in mind:

https://www.ralfj.de/blog/2018/07/24/pointers-and-bytes.html

https://www.ralfj.de/blog/2020/12/14/provenance.html

https://www.ralfj.de/blog/2019/07/14/uninit.html


Pointers aren't that complicated but C syntax is misleading imho until someone comes along and tells you that the declaration syntax "follows use", which does not seem like the greatest idea. Then you get used to it and forget about it but when you don't know the principle behind declaration syntax, it does not help you reason about the language.


Yup. I spent a ton of time trying to make sense of pointer syntax (* operator/modifier), until I realized that it didn't really make sense (except for the simplest cases) and then I could get on my life.


There's actually a logic to it, although not an intuitive one at all, and it's that declaration should follow use. That is `int *ptr;` is a pointer to an int and to get the int you have to use * on it.


The logic is that by declaring `int *ptr;` you effectively say that `*ptr` is an `int`. Only the first token (which must be a type) is syntactically special, the rest is normal expression syntax.

Same goes for arrays, `int *x[30]` just says that `*x[30]` is an `int` (well, technically it's out of bounds by 1), thereby we're declaring an array of 30 int-pointers.

Of course, that hasn't been entirely true since function declaration changed with ANSI C in the late 80s or so.


But int-pointer is the type of it. It makes no sense that it “sticks” to the variable name.


There's a type and an identifier. In `int *pa[10]`, the identifier is `pa` and the type is `int *[10]`. A declaration simply specifies a name (the identifier), some operations around it (pointer dereferencing, array subscription, function calling, grouping parentheses) and the terminal type that you get after applying all those operators in the precise order as specified by operator precedence rules.

When using that variable with the allowed operators, the resultant type of the expression is determined by the declaration:

    pa;     // int *[10]
    pa[x];  // int *
    *pa[x]; // int


Thank you for the concise write up, but I think lowly of the [] syntax as well.

And unary operators on an identifier are a different thing, though I similarly don’t think adding one at the left, another at the right is a good thing. But I have trouble with the declaration site as well.

It’s getting out of reach of my knowledge, but I believe C’s grammar being hacky is in part due to that as well.

I didn’t find exactly what I was looking for but here is some more on it: https://pdos.csail.mit.edu/archive/l/c/roskind.html


The reason the asterisk naturally "sticks to" the name is simply the possibility to write a series of declarations like this:

    int a, *b, (*c)(int);


Which is a questionable saving of 2 lines and a few characters in a language where huge repetitions are a given, due to lack of templates/generics whatever.


You missed the whole point. Yes, it makes no sense until you grok the logic of it and then it starts making sense (while still being unintuitive). Re-read the comments you are responding to.


Yeah I get that it has logic (though by being pedantic, anything we can write a parser for has logic, but I’m sure we could create some stupid languages), but as you yourself note, not very intuitive. And as far as I know, it doesn’t ease the work of parser-writers.


  int * a, b;
Here it sticks to a.

Then typedef comes along and ruins everything:

  typedef int *  pint;
  pint c, d;


Pointers are like swazzles[1]. The construction is very simple. The principle behind how they work is very simple. Learning how to use one well enough that you can (a) consistently make it do what you want, and (b) not injure yourself in the process, though, is no mean feat.

[1] https://www.atlasobscura.com/articles/swazzle-punch-and-judy


Well, you've demonstrated that memory addresses aren't that hard. But you've also demonstrated how easy it is to get undefined behavior in C programs.

C's pointers aren't memory addresses. Ok, they tend to be represented as such at run time, but that's not what the spec actually says the are. And as far as compiler authors are concerned, they can do anything they want as long as it's within spec. Further, the spec even requires some additional behaviors pure memory addresses aren't capable of. See https://www.ralfj.de/blog/2020/12/14/provenance.html for examples of the extra requirements.

Compared to that mess, lambdas are trivial. They're just functions.


> If you know how to walk down a street and stop at the right street number, then you have used pointers. And if you've ever observed that one tall building may "cover" a range of street numbers, such as 200-220

I see that as a European, I have virtually no chance to understand pointers using street numbers. :)

(Fortunately I've never had problems either with lambdas or with pointers.)


C is an abstraction over assembly, really (benefits being that it is simpler by being more abstract and portable across CPU types).

I've always thought that an introduction to CPUs (can take a simpler one as example) and how they work, how memory is (usually) organised, and to assembly would go a long way in helping understand many programming issues and C.

My experience is that C or programming concepts are often taught in a very abstract/mathematical way, which can be hard to grasp compared to a more practical approach.

If you take a concrete example where memory is effectively an array and indices are addresses (which holds true for most cases and, in any case is a good example) then understanding pointers becomes basically common sense and notations are simply conventions of the language you're using.


Treating C as an abstraction over assembly is a surefire way to step into all the thousands of sharp edges C has. In fact I would hazard a guess that the majority of bugs found in software written in C are a result of programmers treating it as a portable assembler instead of a language for programming an abstract machine. So many incorrect assumptions arise as a result of telling people to treat C as a portable assembler that I think it's safe to call it an extremely bad bit of advice.


I was discussing pointers. I only commented in passing about C being portable over assembler. You could rephrase this as it being a language for programming an abstract machine and it would not change anything about my comment (nor would it change the fact that C is an abstraction over assembly)

Thank you for your reply...


The fact that you didn't use the term "portable assembler" when saying: "If you take a concrete example where memory is effectively an array and indices are addresses (which holds true for most cases and, in any case is a good example)" doesn't change the fact that the statement makes links to how actual machine memory operates on what you personally think is a usual machine. This is really not a good idea because it encourages people to think of pointers as all existing in the same place and encourages erroneous thinking such as subtracting two pointers to different objects or confusion surrounding why the numbers you get when you `int a; char b; printf("%p, %p\n%p, %p\n", &a, &a+1, &b, &b+1);` differ in separation. Also subtler errors arise such as people assuming that all pointer types are effectively equal as long as you convert back to the correct pointer type when it comes to using the pointer (really subtle issues like people assuming that you can convert `int *` to `void *` safely).

The "very abstract" way C is taught actually prevents people from making such assumptions by not priming them to make them. The fact that people get complacent and start to lean on their understanding of (what they think are) real machines to write C is the result of the bugs I mentioned in the previous response.


Like my professor used to say, if you ever saw a pidgeon, you know what vectors are.


For anyone who wants to learn about pointers I can recommend studying a language simpler than C like for instance Oberon where pointers are more restricted. Having a look at Oberon can also broaden your view even if you know pointers in C.

  https://www.miasap.se/obnc/oberon-report.html
  http://people.inf.ethz.ch/wirth/Oberon/PIO.pdf


Actually, this could confuse things even more, because the C pointer and Oberon's POINTER TO have very little in common.


When pointers are used to build dynamic data structures the differences are merely syntactical.


There was a "C Unleashed" book, a massive tome of 1000+ pages written by many famous programmers, many of them who where quite active in comp.lang.c, like Richard Heathfield and CB Falconer, had quite insightful material in it.

Any one remember the heyday of comp.lang.c? I wonder what goes on in there now.


I wasn't sure anyone but the authors remembered C Unleashed! I wrote the chapter on binary search trees and balanced trees.

Comp.lang.c was important to me for many years. I've met 5 or so of the regulars at least once. The most famous comp.lang.c regular is probably Tim Hockin of the Kubernetes project.


Ben Pfaff... There's a name I recognize from back then. :-)


Is it beej or bj?


One syllable, typically. But either works.


Thanks.


Thanks very much! The code from the book proved very useful since C does not have a standard library of rich data structures.


I bought that book ages ago. Good stuff. comp.lang.c still has a small group of knowledgeable regulars, but a lot of the "old guard" seems to have stepped away. And Usenet is a shadow of its former self, obviously.

Reddit has more traffic nowadays.


There was a thread about that group a few months ago and this post in particular really stuck in my mind: https://news.ycombinator.com/item?id=26127632


That's what I remember from comp.lang.c -- lawyer-ly obsession with ANSI C, rather than C as it's being used. That is an important distinction which I respect, but they should have had some kind of sister forum that's more practical. They denied that hardware exists, etc.

Also, obsession with ANSI C, analogous to obsession with POSIX shell, is sort of "middlebrow" in the sense that the people who WRITE the spec need to go outside of it to create a new version of it. Good specs are derived from real usage.


Yes, usenet was huge for me back in the early 90s when I had questions on C programming. I would ask them on comp.programming.c and had some of the best programmers providing guidance. Of course they were strict with questions/discussions being specific to ANSI C.


#c on EFnet!


It doesn't do the brisk business of the networking one, but there have been at least a couple past threads:

Beej's Guide to C Programming - https://news.ycombinator.com/item?id=26100391 - Feb 2021 (1 comment)

Beej's Guide to C Programming (2007) - https://news.ycombinator.com/item?id=15198093 - Sept 2017 (79 comments)

As long as we're talking C programming, I'd single out this large thread with C Standards committee members from last year:

Tell HN: C Experts Panel – Ask us anything about C - https://news.ycombinator.com/item?id=22865357 - April 2020 (962 comments)


I commented this before on a previous post on one of beej’s guides: https://news.ycombinator.com/item?id=26100075

These tutorials are the gold standard of tutorials. I wish more content would be as straight to the point and easy to follow.


> It’s especially insidious because once you grok pointers, they’re suddenly easy. But up until that moment, they’re slippery eels.

I'm sort of a C beginner myself. I understand pointers, and I do remember they clicked in my mind suddenly. The moment before, I didn't understand at all. I also love the quirkiness of this guide. Definitely going to give this a read.


Do you mean the general concept or like: a is a pointer to an array of functions which return pointers to functions which return ints and take double arrays as parameters.

This somehow never really clicked (or actually it clicked and declicked somehow)


For me it was the practical understanding. I understood the concept of a pointer, but I wasn't confident in writing code that used that used them or (more importantly) reading code. I would see an asterisk, multiple asterisks, or ampersands, and would get confused with the code. I do think some of the issues I encountered had to do with the notation of pointers. The asterisk serving as a symbol to both declare a pointer variable and dereference one.


If you always read from right to left (and put occasional braces for readability) you can get through anything. Most tricky ones are in job interviews, while ones found in the wild will often make use of typedefs to break down the complexity.


I love Beej’s guide to network programming. Back when I was just starting to learn to code, I wanted to get right down to the low level C stuff and his guide was what I used. It was simple and approachable and I had a running TCP client by the end of the day. It was a thrilling experience for a young novice.


I see a lot of people making opinions that clearly shows that they have not audited the entire guide. For the intended audience that the author wanted to reach. I will say that he accomplished it.

For anything that one finds as mistakes, the author went out of his way (via references) for the reader to dig further.


From the creator of the famous Network programming in C guide.


I've enjoyed this guide a number of times but each time I hit a brick wall trying to understand pointers. I'm still keen to learn but it just doesnt 'click' for me...

Edit: poor grammar


around 25 years ago I got so frustrated because I hit the same brick walls over and over especially with advanced pointer stuff. It sounds silly today but what helped me are some really basic books about C. iirc the "for dummies" series and others.

It took another couple of years until I understood what I lacked wasn't time spent reading another section on pointers but additional tooling. Using a debugger and stepping through programs was the next breakthrough.

Looking back today understanding my C (on UNIX) isn't just the language it's a whole ecosystem of tools to measure what is going on and manipulating state so that I can troubleshoot. After gdb came valgrind, strace, lsof, signal handling (kill), process control, gcov, the appropriate type of CFLAGS to use (e.g. the compiler itself) and how to stay sane using Make.

None of them have to do with pointers but they make life a lot easier. To become productive at this takes years but becoming good took me decades. C (imho) isn't just another language but a complete career path with dozens of branches into other areas.

If you stay patient with yourself and treat it as a journey instead of a milestone it can deepen your understanding of systems (nod to eBPF) in situations many others will bail out long before.

Don't give up and then not much will look scary any more.


Thanks so much for this reply, really quite inspiring. Much appreciated.


Ability to inspect is vital indeed


1) A pointer is a memory address of an object.

2) Pointer arithmetic takes into account the size of the type pointed to.

3) The asterisk operator gives you access to the object.

4) The ampersand operator gives you the pointer to the object.

If you understand this, you understand pointers. (The arrow operator is syntactic sugar.)


Pointers are easy and fun, just keep experimenting with them. What is essential is understanding that they're nothing more than like a piece of paper containing where something is instead of being part of that something, and they're the same size of the address registers of the underlying hardware as they must be able to point everywhere in the addressable memory. Therefore, on a 32 bit system pointers will be 32 bits long, on a 64 bit system they will be 64 bits long, etc. no matter what they point to.

That is, both a pointer to a single byte and a pointer to a 10GB memory chunk will be of the same size because they just hold a memory location whose value represents where that data starts. Therefore, declaring pointers to a certain type doesn't change their size at all, it just becomes handy when one needs to go back and forth in a memory area in which objects of that type are stored one after another, so that incrementing or decrementing the pointer by a number actually means that number times the size of the objects. Imagine asking for directions to someone and he replies "3rd door" or "3rd building" or "3rd block"; he gave you a pointer that is always the same size, but how much you have to walk will depend on the destination (size).

Once grasped the above, it should become a lot more easy; I had the same problems, then one day had a flash and they became totally clear (and fun).

It may be of help experimenting with a debugger, or simply printf-ing all values a pointer assumes when declared, assigned, incremented/decremented etc. Keep also track of the pointed data values, changing it instead of the pointer, or the other way around, are common mistakes.

An old small command line program like "cdecl" can help a lot to understand complex declarations, and someone has even made an interactive webpage around it (cdecl.org).

example:

cdecl> explain char ((x())[]) () declare x as function returning pointer to array of pointer to function returning char


That is fascinating, thanks for sharing. What part of pointers is it that makes it so hard for you to grasp, if you have any thoughts?

Do you have any experience in other programming languages? How familiar are you with low-level computer architecture, at the CPU/memory level?

I guess one important part is to realize that in C, variables are basically names for memory locations, that in turn hold values. In other languages variables can be more abstract, and you have no idea how the value(s) are being associated with the variable name.

I started writing an example here, but I ran out of time and it wasn't good enough. :) Sorry.

EDIT: Now I've taken a loo at the relevant pages in the guide itself, and it seemed to explain the concepts very clearly and easily, so ... I'm not sure how to help. :)


Thanks for this - I think you are on to something when you mention the low level architecture, on reflection this is where I need to fully understand the concepts - I'm messing around in this area with the usual (Nand to Tetris) type info/tutorials. I'll then come back to this manual I'm sure. Like I say - I've enjoyed it quite a bit. Its certainly me lacking - not the book!


Something that might help is understanding a very fundamental thing about computers: they are very, very stupid.

Essentially, they don't/can't know anything about anything, at the most fundamental levels of their construction, and have to be hand-held every tiny step of the way.

A computer is essentially a highly complex arrangement of on/off switches - there's little else fundamentally in there doing anything at all other than something causing the first switch to cycle between on and off states and cascade to all the rest (this isn't entirely accurate but it's close enough to make te point).

This gives rise to situations where in order to create greater levels of complexity, lots of unintuitive, and seemingly even pointless things need to be done. For example: assigning letterbox addresses to every discrete portion of memory. Then things like "I want to read the values from this part of memory up to this part" require laboriously adding 1 to a value (a pointer) that tracks which address the computer is currently "thinking" about. It's so stupid it needs to remember where it is all the time like this, or it can't do anything.

Because C is very close to this mundane and laborious fundamental architecture, it (usefully in that case) deals with concepts like "pointers".

In languages at just a bit higher level, the language internals deal with pointers so that we as programmers don't have to.


One tool I use with my own students to visually demonstrate C pointer code is CTutor from PythonTutor.

http://pythontutor.com/c.html#mode=edit

If you click down to the bottom and choose from the examples "Pointer Levels" you'll get a good idea of the tools powers.


One analogy I’ve used to teach pointers is to think of them as phone numbers. They’re an identifier used to get ahold of something that you might not otherwise know exactly where it is. If you call my number, I’ll pick up. But if I transfer my number to someone else, that doesn’t mean anything happened to me personally. That number is just not pointing to new anymore.


A pointer is a level of indirection.

A regular variable is a place in RAM that contains a certain value.

A pointer is a variable whose value is the address of another variable.

X = 5

Y = location in memory of X

Now you can use Y to manipulate X


Try watching this: https://youtu.be/443UNeGrFoM

Its long and is filled with other stuff about C programming but it goes in to dept about how to think about pointers. Good Luck!


Try to explain where you're stuck.


Just another integral data type which supports not only arithmetic (adding or removing an integral value), but also "dereferencing" which means getting a value of the pointed-to type at the pointed-to address.

The quantities added or removed in arithmetic operations are the same as the size of the type that is being pointed to.


It's incorrect to call a pointer an integral type. It doesn't behave like an integer (you can't add two pointers together). You can't even subtract two pointers from each other unless they point into the same array (or one element past) without causing undefined behaviour. You can't even reliably convert a pointer generally to an integer and back again unless it's a void *.

Edit: just remembered I wrote a rambling on this in 2014: https://ramblings.implicit.net/c/2014/04/21/pointers-are-not...


Strictly it's not an integral type as you point out, but for the purpose of introducing a beginner who might be afraid that there is "deep magic" involved, it can be a helpful comparison.

Subtracting two pointers yields ptrdiff_t which is not a pointer type in itself.


Pretending it's an integral type leads to all sorts of conceptual mistakes, imho. It has almost no comparable semantics to integers.


Just learn C by reading and working through the K&R book. (It's one of the best CS books ever written.)

https://www.amazon.com/Programming-Language-2nd-Brian-Kernig...

Also, there's nothing magical about pointers - scripting languages use "handles", which is the same thing except they're read-only to the end-user programmer.

The real challenge with C is multi-threaded programming, so don't do that if you don't need it.


Personnally I have learned C following the CS50 course on Harvard's youtube channel and pointer came easily with http://alumni.cs.ucr.edu/~pdiloren/C++_Pointers/.

It's a great (free as in beer) start.


I agree. If you're only going to read one book, that's the book. :-)


> Just learn C by reading and working through the K&R book. (It's one of the best CS books ever written.)

This is really an amazing feat: books on programming languages age really fast. This one - not so much. And you can really appreciate the careful thought that was put in each sentence. Peerless.

> The real challenge with C is multi-threaded programming, so don't do that if you don't need it.

Well, these days it's harder and harder to avoid it if you want to write efficient apps - we got more cores and the clocks remain more or less the same.


> Well, these days it's harder and harder to avoid it if you want to write efficient apps - we got more cores and the clocks remain more or less the same.

There was some article a few years ago that said the apps they looked at didn't actually run faster after making some routines parallel, so it depends, and you will definitely have more debugging to do.


I haven't read these 2 books, but would like the opinion of someone who did.

[1] C Interfaces and Implementations: Techniques for Creating Reusable Software by David Hanson - HN's tptacek seemed to rave about this book, that's how I heard of it. Wonder what he thinks of it in 2021.

[2] C Programming: A Modern Approach by K. N. King - this one seems to be loved by many. Seems to be more 'beginner-friendly' than the 1st one I guess.


You can always check reviews published by ACCU.org:

K. N. King "C Programming":

https://accu.org/bookreviews/1999/graham_1260/

Ben Klemens "21st Century C":

https://accu.org/bookreviews/2016/demin_1882/

Robert C. Seacord "Effective C":

https://accu.org/bookreviews/2020/glassborow_1952/

https://accu.org/bookreviews/2021/bruntlett_1959/


Wow I never heard of Robert Seacord's 'effective C'. Thank you.

Edit: Here's a short rationale for the book by the author.

https://ieeexplore.ieee.org/document/9237323


It remains my favorite C book, though: I don't think you should write much C anymore.


My favorite is Expert C Programming: Deep C Secrets by Peter van der Linden. I started it as soon as I encountered C at university and could only get through about a chapter at a time. By third year when the assignments in C got hard I was in a great position, partly for this book. Even crazier is I enjoyed it.


My only complaint about that book is the title - by the time I'd got to it I was nodding along to things I'd figured out long before or nitpicking things that were "merely" right enough to be useful. Maybe I was already solidly expert, but I feel like for most things programming the knowledge available is so broad that there will be some significant gaps between experts; in any case, I wish I had picked it up ten years prior.


Is there any guide that operates at the level of the C abstract machine and a real platform because it's always really jarring to have lots of hand-wavy statements like

> When compiling C,machine codeis generated. This is the 1s and 0s that can be executed directly by the CPU.

No! Tell me about how the code is translated into an ELF executable, linked, has its memory laid out by the OS and then executed.

> I’m seriously oversimplifying how modern memory works, here. But the mental model works, so please forgive me

No! Tell me about how memory works in the C abstract machine which is what you can actually program against and guaranteed by the compiler.

> Nothing of yours gets called before main(). In the case of our example, this works fine since all we want to do is print a line and exit

No! tell that main is special because it's mapped to the _start symbol or at least eventually jumped into by code at that symbol which has an address that's stored by the linker in e_entry.

Like I might be the weird one but this kind of writing (which is common to seemingly all C texts) confuses me more than if it had just been explained.


it actually has working examples for all of the C library calls even math routines. Not sure if it's complete but it seems so. This seriously helps bridging the gap between man (2) pages and putting things into working code and only beef I have is that I didn't have it 25 years ago. Very cool.


It's not complete... There are tons of functions in C11+. I'm whittling it away.

Putting in the examples for all the calls--I stole that idea from The Turbo C Bible, a book I really loved back in the day... because of the examples.


Nit: most of the C library calls are in section 3; section 2 is system calls.


yepp (me facepalm)


Beej's Guide to Network Programming helped me survive a grad-level computer networking class as a naive undergrad in college.

If his guide to C is anywhere near as good it should be an awesome resource.


The C Guide is still rough around the edges (and even some in the center). But I'm working on it! :-)


Again, the string chapter is telling us about zero-terminated char arrays/pointers and doesn't even mention Unicode/UTF-8 or safety :-(


I do have a chapter on Unicode and wide characters, but it's separate from the "classic" strings chapter. That said, I could certainly refer forward to it.

And C11 only has minimal portable UTF-8 support, but I do talk about it. I think C21 will improve on that a bit.

A note on safety would be well worth it. I'll do that. Good suggestion.


Cool. Thank you. Lack of coverage of this subject in educational materials has been pain ever since I was studied C at school. All the books said that: strings are zero-terminated char arrays. But this kind of strings is almost useless in today globalized world. Sorry I didn't notice you cover this topic in a separate chapter. This makes your guide really great. Let me suggest you mention that chapter in the strings chapter unless you have a reason to omit such a reference. Thank you for a great job!


A thing I really enjoy about this guide is that it’s close to common C paradigms and practices. Many guides lack this and only show outdated ones.


I love C programming. Even though people say it's dangerous and easy to shoot yourself in the foot, It really is the simplest and most elegant way forward. That said, having to invent the wheel yourself so much, it is not as time efficient as some more modern languages.

I liken it to an artisanal craftsman's tool versus a modern multi-tool like a dremel which would be something like python.


Totally agree - sometimes I solve small problems in C rather than a scripting language. It feels like freedom and the result is so damn fast.


Just found out that it has an actively maintained GitHub repository, with 35 or more commits this April alone. https://github.com/beejjorgensen/bgc/commits/main


This is a wonderful book if you want to understand how the OSI model of internet works, clearly illustrated network apis. Man this book was amazing.


I think you’re confusing it with another book. This is a guide to C programming. You’re probably thinking of Beej's Guide to Network Programming :)

https://beej.us/guide/bgnet/


Thank you for posting!


Thanks for sharing this


[flagged]


The Rust Programming Language[0] is the officially endorsed guide. I thought it was a good starting point.

[0] https://doc.rust-lang.org/book/




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: