Hacker News new | past | comments | ask | show | jobs | submit login
Essential C (2003) [pdf] (stanford.edu)
183 points by th33ngineer on Sept 2, 2020 | hide | past | favorite | 82 comments



This, paired with their Pointers and Memory [1] guide are how I learned C in college. They're both pretty short and to the point, I would highly recommend.

[1] http://cslibrary.stanford.edu/102/PointersAndMemory.pdf


That and other useful links are listed in the parent page of this pdf.

http://cslibrary.stanford.edu/101/



> the greatest pointer/recursion problem ever (advanceed)

Does 'advanceed' mean 'more advanced than advanced?' [1]

[1] https://www.youtube.com/watch?v=YAYKnnWCzto


Now then, isn't that nicer? (editorial) Build programs that do something cool rather than programs which flex the language's syntax. Syntax -- who cares?

Although some feel like it’s too opinionated to belong in this article I really appreciated the above. Edit To be clear the author is advocating for simpler syntax here to increase program readability. This could be taken other ways.


> Relying on the difference between the pre and post variations of these operators is a classic area of C programmer ego showmanship.

Ugh. He's talking about inline use of post-increment and pre-increment (i.e. x++ and ++x) here. This is perfectly readable to a C programmer, and sidestepping them actually makes the code harder to understand.


Can you give an example where not using them 'inline' makes code harder to understand?


Consider the idiomatic way of interating backward through a array:

  for(i=n; i-- > 0 ;)
    { /* operate on a[i] */ }
converting i--; to a statement at the start of block makes it less clear that it's part of the iteration idiom rather than a ad hoc adjustment that's specific to this particular logic. There are other examples, but they're either more involved or statementification is less obviously wrong.


Hmm, I think `for (i = n - 1; i >= 0; --i)` is way clearer and maybe more common?

edit: Ah unsigned underflow. :O


Yeah, so then you write

    for (size_t i = n-1; i < n; --i) { /* operate on a[i] */ }
It works fine (unsigned overflow is well defined) but it's even less clear.


It seems sensible to always just use signed values for indices. Indices are difference types, which should include negative values so that you can subtract two indices and get a sane delta. The range of signed values seems 'big enough.'


> Indices are difference types

Umm, no? Indices are ordinals[0], forming the canonical/nominal well-ordering of a collection such as a array.

> an ordinal number, or ordinal, is one generalization of the concept of a natural number that is used to describe a way to arrange a (possibly infinite) collection of objects in order, one after another. [...] Ordinal numbers are thus the "labels" needed to arrange collections of objects in order.

0: https://en.wikipedia.org/wiki/Ordinal_number


In C an index is a difference that you add to a pointer to get a pointer. `a[i]` is `*(a + i)`. Given two indices `i` and `j`, you want `i - j` to be such that `a[j + (i - j)]` is `a[i]`, and it then makes sense to me that `i - j` is signed. The expression works out whether they are signed or unsigned, but just in terms of their interpretation on the part of a user (eg. "oh this is 2 elements before bc. it says -2") or so that comparisons like `i < j` are equivalent to `i - j < 0` and so on. That's why it's always made sense to me to use `ptrdiff_t` (or just `int`) for an index, vs. using `size_t`.


ptrdiff_t exists for subtraction between pointers that produce negative values. But how many times have you ever needed to subtract p and q where p represents an array element at a higher index than q? For that matter, how many times have you ever needed to add a negative integer to a pointer?

In C an object can be larger than PTRDIFF_MAX, a real possibility in modern 32-bit environments. (Some libc's have been modified to fail malloc invocations that large, but mmap can suffice.) Because pointer subtraction is represented as ptrdiff_t, the expression &a[n] - a could produce undefined behavior where n is > PTRDIFF_MAX. But a + n is well defined behavior for all positive n (signed or unsigned) as long as the size of a is >= n.

There's an asymmetry between pointer-pointer arithmetic and pointer-integer arithmetic; they behave differently and have different semantics. Pointers are a powerful concept, but like most powerful concepts the abstraction can leak and produce aberrations. I realize opinions vary on whether to prefer signed vs unsigned indices and object sizes (IME, the camps tend to split into C vs C++ programers), but the choice shouldn't be predicated on the semantics of C pointers because those semantics alone don't favor one over the other.


Negative offset is used often to access fields in parent struct having pointer to a field only. For example, to implement garbage collection or string type.


That should be:

  Parent* p = container_of(field,Parent,pa_somefield);
  access(p->pa_otherfield);
You'd usually define container_of using subtraction (not negative offset per se):

  #define container_of(FIELD,TYPE,MEMB) ({ \
   const typeof( ((TYPE*)0)->MEMB )* _mptr = (FIELD); \
   (TYPE*)( (char*)_mptr - __builtin_offsetof(TYPE,MEMB) ); \
   })
but you shouldn't actually be using that directly, because thats what the macro is for.


But p - 2 is not the same as p + -2, and it's not clear in your example whether the former suffices or the latter is required. I can definitely imagine examples where the latter is required--certainly C clearly accommodates this usage--but IME it's nonetheless a rare scenario and not something that could, alone, justify always using signed offsets. Pointers are intrinsically neither signed nor unsigned; it's how you use them that matters.


The reason we are in this subthread is my message about being bitten by unsigned underflow when iterating backwards using an unsigned `i`.


The C language "de facto" uses size_t for indexing and ptrdiff_t for differences, or the rare case where you have a negative index.


size_t is unsigned? Since when?


It always has been. C89, 4.1.5[1]:

> The type are [...] size_t which is the unsigned integral type of the result of the sizeof operator

(Emphasis mine.)

1. https://port70.net/~nsz/c/c89/c89-draft.html#4.1.5


Couldn't find an online version of the C standard with links to parts of it, but here's one for C++: http://eel.is/c++draft/support.types#layout-3

> The type size_­t is an implementation-defined unsigned integer type that is large enough to contain the size in bytes of any object ([expr.sizeof]).


Yes. ssize_t is signed.


ssize_t should never be used.

It's not guaranteed to have a full negative range, only to be able to represent -1.

Use ptrdiff_t as a signed size type.


Out of curiosity, do you know of any implementations where ssize_t has that kind of range limitation?


Nope. But I do know of at least one implementation where it's not present at all—msvcrt. ssize_t isn't specified in the c standard, it's part of posix. ptrdiff_t is standard.


That's the idiomatic way? Cool. The more straightforward-looking way,

    for(i = n-1; i >= 0; i--)
        { /* operate on a[i] */ }
breaks if i is unsigned, like a size_t.


Yep. That why it's a idiom, rather than a obvious-way-of-doing-it-that-anyone-competent-would-use.


You can't beat

    return x++;
:)


Why? This should be straightforward.


Imagine operating on something like a stack.

  x = *stack--; // pop 'x' off of the stack
  *++stack = y; // push 'y' onto the stack
This way is simple, direct, and it avoids inconsistent state.


I don't see how this proves the point.

For someone who doesn't have the operator precedence rules memorized, it isn't clear whether the above code means this:

    x = *stack;
    stack--;
or this:

    stack--;
    x = *stack;
Combining those two operations into one line is a trade-off I will never agree with. And I'm a fan of C myself: https://gist.github.com/cellularmitosis/3327379b151445c602ad... https://gist.github.com/cellularmitosis/d8d4034c82b0ef817913...

The two-liner is actually the one which is simpler and more direct, as it requires less knowledge of operator precedence rules. The one-liner and two-liner compile to the same number of instructions, so I don't see how either "avoids inconsistent state".

Many expert-level C programmers tend towards one-liners. Here's an example from the original "Red book":

    c = ((((i&0x8)==0)^((j&0x8))==))*255;
nooooo don't do it sadpanda.jpg


> The one-liner and two-liner compile to the same number of instructions, so I don't see how either "avoids inconsistent state".

It's about performance, or thread safety, or anything like that; it's about having a coherent mental model of the code. A statement should, if possible, represent a single, complete operation. Invariants should not be violated by a statement, with respect to its environment. (This more true for 'push' than 'pop'.) One way of solving that is to bundle the 'push' and 'pop' operations up into functions; someone else in this thread did that. But why bother with the mental overhead of a function call when you could just represent the operation directly? To be sure, there are cases where the abstraction is warranted, but a two~three-line stack operation isn't abstraction, it's just indirection.

> For someone who doesn't have the operator precedence rules memorized, it isn't clear whether the above code means [snipped] or [snipped]

> The two-liner [...] requires less knowledge of operator precedence rules

It's not operator precedence—that's a separate issue; despite having implemented c operator precedence, I don't know all of them by heart—but simply behaviour of pre- and post-increment/decrement operations. It's even mnemonic—when the increment symbol goes before the thing being incremented, the increment happens first; else after—but even if not, it's a fairly basic language feature.

Even beyond that, though, it's an idiom. Code is not written in a vacuum. Patterns of pre- and post-increment fall into common use over time and become part of an established lexicon which is not specified anywhere. Natural language works the same way. Nothing wrong with that.


> It's not operator precedence—that's a separate issue

> It's even mnemonic—when the increment symbol goes before the thing being incremented, the increment happens first; else after—but even if not, it's a fairly basic language feature.

I think you missed the issue.

This is 100% about operator precedence, and has nothing to do with the decrement operator being in front of or behind the variable.

This expression:

    *stack--
means either this:

    (*stack)--
or this:

    *(stack--)
depending on the operator precedence rules.

If this is the layout of memory:

             ~~~~~~
    stack-1: | 52 |
    stack:   | 23 |
    stack+1: | 19 |
             ~~~~~~
(* stack)-- evaluates to 22, while *(stack--) evaluates to 52.

https://godbolt.org/z/P7Ghfc


> operator precedence

Right, yes. I got confused by your example, because the example is definitely about pre- vs post-increment. My point about idioms still stands, though.

> (* stack)-- evaluates to 22, while * (stack--) evaluates to 52.

Actually, (* stack)-- evaluates to 23, but changes *stack to 22 :)


> Actually, (* stack)-- evaluates to 23, but changes *stack to 22 :)

Thanks, I just realized I had that wrong :) https://pastebin.com/n7sHzW3p


Saving characters on spacing is a terrible thing to do. In fact that jumble is missing a zero on the equality, which is made less evident because all the the characters are not spaced in a way that makes this mistake obvious.


    int pop_int () 
    {
      int x = *stack; 
      --stack;
      return x;
    }

    void push_int(int x)
    {
      ++stack;
      *stack = x;
    }

Genunine questions:

- Is this worse? - How does the state get inconsistent?


For one it’s three and two lines for what is two logical operations. I assume the “inconsistent state” is the time between the lines where the stack is not truly in the right state-many people prefer to preserve their invariants as much as possible.


it will produce indistinguishable assembly language, no?


The use of that construct is mainly a stylistic choice. On any compiler from this millennium there should be no difference in the code that it produces.


Yep so if we're going with style I'm very happy with the functions dashed off there. Nobody will confuse those even when very, very tired (similar effect on the brain to being drunk). There is zero difference in the generated output. Calling those functions tells you exactly what they are and what they do. Vertical space is not an issue at all with 3 line functions.

Relying on post-increment? Make sure it's a one line block that is totally unbraced with only single letter variable names if you do it because otherwise it's just faux-macho C and that's /weak/.


> Make sure it's a one line block that is totally unbraced with only single letter variable names if you do it because otherwise it's just faux-macho C and that's /weak/.

I think you're projecting. The point being made was that when you're writing a simple stack (as you often might do in C, since the standard library and the language itself conspire against providing you one) and you don't have the overhead to write multiple functions to wrap it up (vertical space is an issue when you make more than one of these–trust me, I used to write Java and every thing about it was just a papercut in verbosity), the post- and pre-increment versions are concise, idiomatic, and–to be honest–more clear simply because they use the operators in the way that they are meant to be used. I can glance at them and see, OK, this one gives me whatever the stack is pointing to and then makes it point to the next element; this one first moves the pointer to the next element (which is free) and sets it. All in one line. There's nothing to show off here, this is just how you write C; those operators exist for exactly this purpose (and IMO single letter variable names are generally only a good idea in the smallest of scopes, and I personally use braces even when optional).


Sorry no. That's not for you in particular that's just a general comment on macho C, which I think we've all seen.

    int abc(int a, int b, int c)
    {

    }
I can do postincrement. I learned C the macho way. We all still have to read that crap. Now I know better when I'm writing it. I strongly disagree that

   a = *stack--;
   *++stack = b
is better in any way beyond "I'm a macho C guy" than

    a = pop_int();
    push_int(b);
https://en.wikipedia.org/wiki/Duff%27s_device

It's fun when you first see it. Sure.


Now the stack pointer is hidden. Is there only a single global stack?

I agree with saagarjha, there is nothing unclear or "crap" about using basic operators in an idiomatic way.


If we're being serious about a stack you really /need/ to access through functions so you can switch on and off instrumentation, eg bounds check & backtrace on failure, poison etc.

But this is as much beside the point under discussion as global pointers you raise.

Post-increment is an artefact from PDP-11 assembler and maps to a single instuction there. That's where it came from quite directly. It's completely unnecessary. Most modern languages find it useless enough they remove it. Python goes fine without it relying on +=, for example. (Although some do repeat C mistakes when basing their syntax on C, eg the unbraced, single line block that serves only to add non-zero probability of introducing a future bug but with the benefit of precisely nothing.. Hi Walter! Larry Wall cops flack for Perl syntax but he did not copy that.)

Post increment is hardly the end of the world it just isn't useful. It doesn't help readability. It can harm it. As a question of taste I find it lacking.

But hey, everyone else uses it, and duff's device is fun to read so go with them, knock yourself out.


I highly recommend the CS50 course to get familiar with C:

https://www.youtube.com/playlist?list=PLhQjrBD2T381L3iZyDTxR...

Sure it doesn't get in details about the language but you get the essential and the videos are great.


I'm doing CS50x at the moment and I can definitely recommend it. It got me interested in C despite trying to avoid it my entire life. David Malan is one of the best lecturers I've seen.


It's how I started my career 7 years ago. Amazing course and lecturer


There is also Jens Gustedt's Modern C:

https://modernc.gforge.inria.fr/#org81433c2


I prefer something that is more up to date https://nostarch.com/Effective_C


While we are here, does anyone have a good resource about memory management strategies in C?

Topics:

* Best practices for C function signatures (caller allocates (which size?), callee allocates (where? which allocator?))

* Memory Ownership Models

* Borrowing

* Reference Counting

* Garbage Collectors and C-Libraries providing this functionality

* Interning Objects (Strings)

* RAII [1]? And it's benefits/flaws

[1] https://en.wikipedia.org/wiki/Resource_acquisition_is_initia...

Context: I feel that understanding the C memory primitives in not that hard (stack variables, malloc/free, C++'s new). But how to use them is devilishly tricky. I have seen little information about this.


Shameless plug:

This generally discusses the lack of RAII in C (towards the end), and what to do about it:

https://floooh.github.io/2019/09/27/modern-c-for-cpp-peeps.h...

...and this presents a (reasonably runtime-safe) general memory management strategy using tagged-index-handles instead of pointers:

https://floooh.github.io/2018/06/17/handles-vs-pointers.html

The gist is basically:

- don't allocate small chunks of memory all over the code base, instead move memory management into few centralized systems, and let those systems own all memory they allocate

- don't use pointers as public "object references", instead use "tagged index handles"

- don't use "owning pointers" at all, use pointers only as short-lived "immutable borrow references"


Another global (imperative style) vs object (code and data by object) vs functional ...


Thanks for this!


"Teach Yourself C in 24 Hours" by Tony Zhang (1997) seems interesting as well:

http://aelinik.free.fr/c/

A previous HN discussion on that book:

https://news.ycombinator.com/item?id=15624521

EDIT: That earlier discussion has an excellent first post. Quoting:

"It bothers me so much that very few books (Kernighan) talk about WHY. WHY. WHY is a variable needed? WHY is a function needed? WHY do we use OOP? Every single book out there jumps straight into explaining objects, how to create them, constructors, blah blah blah. No one fricking talks about what's the point of all this?

Teaching syntax is like muscle memory for learning Guitar. It is trivial and simply takes time. Syntax - everyone can learn and it is only one part of learning how to code. Concepts are explained on their own without building upon it.

[... A list with learning resources the poster finds great ...]

This is learning how to produce music. Not learning the F chord. Teaching how to code is fundamentally broken and very few books/courses do it well."


> Now then, isn't that nicer? (editorial) Build programs that do something cool rather than programs which flex the language's syntax. Syntax -- who cares?

I never really got to a point of learning Haskell or Lisp up until recently, it was always this --- I can do everything with C/C++/Java/Python and I could. But the thing is it is only after learning lisp that I really got the hang of thinking in top down manner(recursively), or for that matter it took Haskell to teach me composition intuitively, which then could be extended to my main language(C++). I understand that syntax doesn't matter much, but fwiw I still think in terms of lisp syntax when writing recursive code in C++/C. So yeah, take that for you will.


This guide is short (which is always nice) but not has a couple of flaws in places in the brief skim I gave it. For example:

> In particular, if you are designing a function that will be implemented on several different machines, it is a good idea to use typedefs to set up types like Int32 for 32 bit int and Int16 for 16 bit int.

Use <stdint.h> please

> The char constant 'A' is really just a synonym for the ordinary integer value 65 which is the ASCII value

Not always, especially right after you came off a paragraph explaining how different machines have implementation-specific behaviors

> The compiler can do whatever it wants in overflow situations -- typically the high order bits just vanish.

This is a good time to explain what undefined behavior actually means

> The // comment form is so handy that many C compilers now also support it, although it is not technically part of the C language.

Part of the language since C99

> C does not have a distinct boolean type

_Bool since C99

> Relying on the difference between the pre and post variations of these operators is a classic area of C programmer ego showmanship.

I'm fine with you mentioning that this can be tricky, but this is more opinion than I am comfortable with in an introductory text

> The value 0 is false, anything else is true. The operators evaluate left to right and stop as soon as the truth or falsity of the expression can be deduced. (Such operators are called "short circuiting") In ANSI C, these are furthermore guaranteed to use 1 to represent true, and not just some random non-zero bit pattern.

Under the assumption that there are no boolean types from earlier, this is not true

> The do-while is an unpopular area of the language, most everyone tries to use the straight while if at all possible.

I would argue that people use do-while more than they need to

> I generally stick the * on the left with the type.

Not a problem, but :(

> The & operator is one of the ways that pointers are set to point to things. The & operator computes a pointer to the argument to its right. The argument can be any variable which takes up space in the stack or heap

And constants/globals

> To avoid buffer overflow attacks, production code should check the size of the data first, to make sure it fits in the destination string. See the strlcpy() function in Appendix A.

strlcpy is non-standard and probably not what you want

> The programmer is allowed to cast any pointer type to any other pointer type like this to change the code the compiler generates.

> p = (int * ) ( ((char * )p) + 12); // [Some spaces added by me to prevent Hacker News from eating the formatting]

Only in some very specific cases…

> Because the block pointer returned by malloc() is a void* (i.e. it makes no claim about the type of its pointee), a cast will probably be required when storing the void* pointer into a regular typed pointer.

Casting malloc is never required (and I would say usually not a good thing to do)


>> The value 0 is false, anything else is true. The operators evaluate left to right and stop as soon as the truth or falsity of the expression can be deduced. (Such operators are called "short circuiting") In ANSI C, these are furthermore guaranteed to use 1 to represent true, and not just some random non-zero bit pattern.

> Under the assumption that there are no boolean types from earlier, this is not true

Actually, I believe _Bool is guaranteed to use 0 for false, and any non-0 value is stored as 1 for true. Arithmetic on _Bool is also guaranteed, based on those values.

For example, I believe the standard guarantees:

    _Bool x = 255;
    assert(x == 1);

    size_t y = 10 + x;
    assert(y == 11);


This is exactly why I included earlier bit where they claimed there was no boolean type, to show that their conclusion is logically inconsistent rather than just incomplete ;)


>> I generally stick the * on the left with the type.

> Not a problem, but :(

In a declaration, * is a type modifier. E.g. `int a;` declares a variable of type "int" named "a". `int* a;` declares a variable of type "pointer to int" named "a".

The only time this doesn't work is if you stick multiple declarations on the same line. That's annoying to me, because it means you're breaking the "declare variables as close to their first use as possible" practice. It's not K&R C any more, you don't need to declare everything all at once at the top of the scope.

Also, to prove that `` in a declaration is part of the type, note that K&R style function declarations (no argument names, just types) are still valid C (though I'd strongly discourage their use). So `void func(int a, int b);` is identical to `void func(int, int);`. It's very different from `void func(int, int);` that you'd get if you assume the `` goes with the `a`.


It took Microsoft 16 years to support most of C99 in MSVC, and they are still not completely done after 21 years. I think for a document last updated in 2003 it's ok not being based on C99 ;)


> Under the assumption that there are no boolean types from earlier, this is not true

Can you elaborate on this one? I thought && and || expressions always evaluated to 0 or 1.


C99 added type _Bool (also called bool if you #include <stdbool.h>) -- but it doesn't actually use it much. Operators that yield logically "boolean" values (<, <=, >, >=, ==, !=, &&, ||, !) yield values of type int, not _Bool. The value is always 0 or 1 -- in contrast with isdigit(), for example, which is specified to return 0 for false and some unspecified non-zero value for true.

Converting any scalar value (that includes pointers) to _Bool yields 0 or 1 (false or true).


Now that I think about it, I think that depends on what you mean by "use". I was commenting from a perspective that you can pass in something that is not 0 or 1, and in general it is not advised to assume that a "boolean" is 0 or 1 especially given that this document doesn't mention the boolean type (which is guaranteed to have those values). This is true even under ANSI C, because as it mentions later, programmers depend on any nonzero value being "truthy".



Normal c is not difficult. But once get yourselves into using it in eg the hacking of game boy (still remember the confusion of the data, tile etc) and basic bare metal small machine. Just hard.


I don't want to learn C but certain types of applications there's really no practical alternative.


I typically see Rust marketed as a better replacement for C. In what cases isn't it a practical replacement?


Rust is a fine replacement for C++, but not really for C, and the reasons why Rust can't be a replacement for C are very similar to why C++ can't be a replacement for C. Everybody who chooses C today has slightly different reasons to do so, but one important reason is that C is a very small language with a very small standard library, and most parts of the standard library can be ignored without losing any of the "qualities" of the C language. IME a small language feature set makes it easier to evaluate and integrate third party code, it's hard to say exactly why that is, but that's my experience anyway. Of course one could say "nobody forces you to use the whole feature set of Rust", but the same has been said in the C++ world for decades. The problem is that everybody selects their own subset of the language when having the choice.


Core Rust is very small language. It doesn't have memory allocator, so it can be used in embedded, where heap is not available. C++ cannot do that.

https://doc.rust-lang.org/core/index.html


Its a replacement for c++ not c


When you’re targeting platforms that Rust doesn’t, have a lot of legacy code that can’t do for a new toolchain, or cannot handle dependencies.


I learned from Sam's C Primer Plus.

https://www.amazon.com/Primer-Plus-5th-Stephen-Prata/dp/0672...

My version was older than the Amazon version as I learned in 1987.


This was the PDF used in my UNIX class! That brings back some memories...


A very great resource!


I also liked his "Essential Perl" back in the day: http://cslibrary.stanford.edu/108/EssentialPerl.pdf


This is a pretty neat guide if you’re cheap and have moral qualms about pirating K&R. Still, I think the best introduction to C remains K&R.


K&R is woefully out of date. Gives you no info on how to do things safely and sanely. And encourages a leet style of programming that results in catastrophic edge case bugs. As you can see in comments above where naive code that iterates backwards through an array fails when the array size is 0. Worse K&R leet style buys you absolutely nothing with a optimizing compiler written in the last 30 years.


K&R is actually fairly reasonable about performance, and much less l33t than code I have seen in the wild. It is a very good introduction about the language, and although I would not ever call it "woefully out of date" I would say that it is a good idea to read more about the current state of bugs and tooling, which is not discussed because the book is a general overview of the language.


Eh, K&R Second Edition is a very useful book also today. The only downside is that stops at C89 and hasn't been updated for C99.


I hate guides showing me how to do things safely and sanely from the get-go. Show me how to do it. If safety and sanity are priorities for me, I’ll seek those out on my own.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: