This, paired with their Pointers and Memory [1] guide are how I learned C in college. They're both pretty short and to the point, I would highly recommend.
Now then, isn't that nicer? (editorial) Build programs that do something cool rather than programs which flex the language's syntax. Syntax -- who cares?
Although some feel like it’s too opinionated to belong in this article I really appreciated the above.
Edit To be clear the author is advocating for simpler syntax here to increase program readability. This could be taken other ways.
> Relying on the difference between the pre and post variations of these operators is a classic area of C programmer ego showmanship.
Ugh. He's talking about inline use of post-increment and pre-increment (i.e. x++ and ++x) here. This is perfectly readable to a C programmer, and sidestepping them actually makes the code harder to understand.
Consider the idiomatic way of interating backward through a array:
for(i=n; i-- > 0 ;)
{ /* operate on a[i] */ }
converting i--; to a statement at the start of block makes it less clear that it's part of the iteration idiom rather than a ad hoc adjustment that's specific to this particular logic. There are other examples, but they're either more involved or statementification is less obviously wrong.
It seems sensible to always just use signed values for indices. Indices are difference types, which should include negative values so that you can subtract two indices and get a sane delta. The range of signed values seems 'big enough.'
Umm, no? Indices are ordinals[0], forming the canonical/nominal well-ordering of a collection such as a array.
> an ordinal number, or ordinal, is one generalization of the concept of a natural number that is used to describe a way to arrange a (possibly infinite) collection of objects in order, one after another. [...] Ordinal numbers are thus the "labels" needed to arrange collections of objects in order.
In C an index is a difference that you add to a pointer to get a pointer. `a[i]` is `*(a + i)`. Given two indices `i` and `j`, you want `i - j` to be such that `a[j + (i - j)]` is `a[i]`, and it then makes sense to me that `i - j` is signed. The expression works out whether they are signed or unsigned, but just in terms of their interpretation on the part of a user (eg. "oh this is 2 elements before bc. it says -2") or so that comparisons like `i < j` are equivalent to `i - j < 0` and so on. That's why it's always made sense to me to use `ptrdiff_t` (or just `int`) for an index, vs. using `size_t`.
ptrdiff_t exists for subtraction between pointers that produce negative values. But how many times have you ever needed to subtract p and q where p represents an array element at a higher index than q? For that matter, how many times have you ever needed to add a negative integer to a pointer?
In C an object can be larger than PTRDIFF_MAX, a real possibility in modern 32-bit environments. (Some libc's have been modified to fail malloc invocations that large, but mmap can suffice.) Because pointer subtraction is represented as ptrdiff_t, the expression &a[n] - a could produce undefined behavior where n is > PTRDIFF_MAX. But a + n is well defined behavior for all positive n (signed or unsigned) as long as the size of a is >= n.
There's an asymmetry between pointer-pointer arithmetic and pointer-integer arithmetic; they behave differently and have different semantics. Pointers are a powerful concept, but like most powerful concepts the abstraction can leak and produce aberrations. I realize opinions vary on whether to prefer signed vs unsigned indices and object sizes (IME, the camps tend to split into C vs C++ programers), but the choice shouldn't be predicated on the semantics of C pointers because those semantics alone don't favor one over the other.
Negative offset is used often to access fields in parent struct having pointer to a field only. For example, to implement garbage collection or string type.
But p - 2 is not the same as p + -2, and it's not clear in your example whether the former suffices or the latter is required. I can definitely imagine examples where the latter is required--certainly C clearly accommodates this usage--but IME it's nonetheless a rare scenario and not something that could, alone, justify always using signed offsets. Pointers are intrinsically neither signed nor unsigned; it's how you use them that matters.
Nope. But I do know of at least one implementation where it's not present at all—msvcrt. ssize_t isn't specified in the c standard, it's part of posix. ptrdiff_t is standard.
The two-liner is actually the one which is simpler and more direct, as it requires less knowledge of operator precedence rules. The one-liner and two-liner compile to the same number of instructions, so I don't see how either "avoids inconsistent state".
Many expert-level C programmers tend towards one-liners. Here's an example from the original "Red book":
> The one-liner and two-liner compile to the same number of instructions, so I don't see how either "avoids inconsistent state".
It's about performance, or thread safety, or anything like that; it's about having a coherent mental model of the code. A statement should, if possible, represent a single, complete operation. Invariants should not be violated by a statement, with respect to its environment. (This more true for 'push' than 'pop'.) One way of solving that is to bundle the 'push' and 'pop' operations up into functions; someone else in this thread did that. But why bother with the mental overhead of a function call when you could just represent the operation directly? To be sure, there are cases where the abstraction is warranted, but a two~three-line stack operation isn't abstraction, it's just indirection.
> For someone who doesn't have the operator precedence rules memorized, it isn't clear whether the above code means [snipped] or [snipped]
> The two-liner [...] requires less knowledge of operator precedence rules
It's not operator precedence—that's a separate issue; despite having implemented c operator precedence, I don't know all of them by heart—but simply behaviour of pre- and post-increment/decrement operations. It's even mnemonic—when the increment symbol goes before the thing being incremented, the increment happens first; else after—but even if not, it's a fairly basic language feature.
Even beyond that, though, it's an idiom. Code is not written in a vacuum. Patterns of pre- and post-increment fall into common use over time and become part of an established lexicon which is not specified anywhere. Natural language works the same way. Nothing wrong with that.
> It's not operator precedence—that's a separate issue
> It's even mnemonic—when the increment symbol goes before the thing being incremented, the increment happens first; else after—but even if not, it's a fairly basic language feature.
I think you missed the issue.
This is 100% about operator precedence, and has nothing to do with the decrement operator being in front of or behind the variable.
Right, yes. I got confused by your example, because the example is definitely about pre- vs post-increment. My point about idioms still stands, though.
> (* stack)-- evaluates to 22, while * (stack--) evaluates to 52.
Actually, (* stack)-- evaluates to 23, but changes *stack to 22 :)
Saving characters on spacing is a terrible thing to do. In fact that jumble is missing a zero on the equality, which is made less evident because all the the characters are not spaced in a way that makes this mistake obvious.
For one it’s three and two lines for what is two logical operations. I assume the “inconsistent state” is the time between the lines where the stack is not truly in the right state-many people prefer to preserve their invariants as much as possible.
The use of that construct is mainly a stylistic choice. On any compiler from this millennium there should be no difference in the code that it produces.
Yep so if we're going with style I'm very happy with the functions dashed off there. Nobody will confuse those even when very, very tired (similar effect on the brain to being drunk). There is zero difference in the generated output.
Calling those functions tells you exactly what they are and what they do. Vertical space is not an issue at all with 3 line functions.
Relying on post-increment? Make sure it's a one line block that is totally unbraced with only single letter variable names if you do it because otherwise it's just faux-macho C and that's /weak/.
> Make sure it's a one line block that is totally unbraced with only single letter variable names if you do it because otherwise it's just faux-macho C and that's /weak/.
I think you're projecting. The point being made was that when you're writing a simple stack (as you often might do in C, since the standard library and the language itself conspire against providing you one) and you don't have the overhead to write multiple functions to wrap it up (vertical space is an issue when you make more than one of these–trust me, I used to write Java and every thing about it was just a papercut in verbosity), the post- and pre-increment versions are concise, idiomatic, and–to be honest–more clear simply because they use the operators in the way that they are meant to be used. I can glance at them and see, OK, this one gives me whatever the stack is pointing to and then makes it point to the next element; this one first moves the pointer to the next element (which is free) and sets it. All in one line. There's nothing to show off here, this is just how you write C; those operators exist for exactly this purpose (and IMO single letter variable names are generally only a good idea in the smallest of scopes, and I personally use braces even when optional).
Sorry no. That's not for you in particular that's just a general comment on macho C, which I think we've all seen.
int abc(int a, int b, int c)
{
}
I can do postincrement. I learned C the macho way. We all still have to read that crap. Now I know better when I'm writing it. I strongly disagree that
a = *stack--;
*++stack = b
is better in any way beyond "I'm a macho C guy" than
If we're being serious about a stack you really /need/ to access through functions so you can switch on and off instrumentation, eg bounds check & backtrace on failure, poison etc.
But this is as much beside the point under discussion as global pointers you raise.
Post-increment is an artefact from PDP-11 assembler and maps to a single instuction there. That's where it came from quite directly. It's completely unnecessary. Most modern languages find it useless enough they remove it. Python goes fine without it relying on +=, for example. (Although some do repeat C mistakes when basing their syntax on C, eg the unbraced, single line block that serves only to add non-zero probability of introducing a future bug but with the benefit of precisely nothing.. Hi Walter! Larry Wall cops flack for Perl syntax but he did not copy that.)
Post increment is hardly the end of the world it just isn't useful. It doesn't help readability. It can harm it. As a question of taste I find it lacking.
But hey, everyone else uses it, and duff's device is fun to read so go with them, knock yourself out.
I'm doing CS50x at the moment and I can definitely recommend it. It got me interested in C despite trying to avoid it my entire life. David Malan is one of the best lecturers I've seen.
Context: I feel that understanding the C memory primitives in not that hard (stack variables, malloc/free, C++'s new). But how to use them is devilishly tricky. I have seen little information about this.
- don't allocate small chunks of memory all over the code base, instead move memory management into few centralized systems, and let those systems own all memory they allocate
- don't use pointers as public "object references", instead use "tagged index handles"
- don't use "owning pointers" at all, use pointers only as short-lived "immutable borrow references"
EDIT: That earlier discussion has an excellent first post. Quoting:
"It bothers me so much that very few books (Kernighan) talk about WHY. WHY. WHY is a variable needed? WHY is a function needed? WHY do we use OOP? Every single book out there jumps straight into explaining objects, how to create them, constructors, blah blah blah. No one fricking talks about what's the point of all this?
Teaching syntax is like muscle memory for learning Guitar. It is trivial and simply takes time. Syntax - everyone can learn and it is only one part of learning how to code. Concepts are explained on their own without building upon it.
[... A list with learning resources the poster finds great ...]
This is learning how to produce music. Not learning the F chord. Teaching how to code is fundamentally broken and very few books/courses do it well."
> Now then, isn't that nicer? (editorial) Build programs that do something cool rather than programs which flex the language's syntax. Syntax -- who cares?
I never really got to a point of learning Haskell or Lisp up until recently, it was always this --- I can do everything with C/C++/Java/Python and I could. But the thing is it is only after learning lisp that I really got the hang of thinking in top down manner(recursively), or for that matter it took Haskell to teach me composition intuitively, which then could be extended to my main language(C++). I understand that syntax doesn't matter much, but fwiw I still think in terms of lisp syntax when writing recursive code in C++/C. So yeah, take that for you will.
This guide is short (which is always nice) but not has a couple of flaws in places in the brief skim I gave it. For example:
> In particular, if you are designing a function that will be implemented on several different machines, it is a good idea to use typedefs to set up types like Int32 for 32 bit int and Int16 for 16 bit int.
Use <stdint.h> please
> The char constant 'A' is really just a synonym for the ordinary integer value 65 which is the ASCII value
Not always, especially right after you came off a paragraph explaining how different machines have implementation-specific behaviors
> The compiler can do whatever it wants in overflow situations -- typically the high order bits just vanish.
This is a good time to explain what undefined behavior actually means
> The // comment form is so handy that many C compilers now also support it, although it is not technically part of the C language.
Part of the language since C99
> C does not have a distinct boolean type
_Bool since C99
> Relying on the difference between the pre and post variations of these operators is a classic area of C programmer ego showmanship.
I'm fine with you mentioning that this can be tricky, but this is more opinion than I am comfortable with in an introductory text
> The value 0 is false, anything else is true. The operators evaluate left to right and stop as soon as the truth or falsity of the expression can be deduced. (Such operators are called "short circuiting") In ANSI C, these are furthermore guaranteed to use 1 to represent true, and not just some random non-zero bit pattern.
Under the assumption that there are no boolean types from earlier, this is not true
> The do-while is an unpopular area of the language, most everyone tries to use the straight while if at all possible.
I would argue that people use do-while more than they need to
> I generally stick the * on the left with the type.
Not a problem, but :(
> The & operator is one of the ways that pointers are set to point to things. The & operator computes a pointer to the argument to its right. The argument can be any variable which takes up space in the stack or heap
And constants/globals
> To avoid buffer overflow attacks, production code should check the size of the data first, to make sure it fits in the destination string. See the strlcpy() function in Appendix A.
strlcpy is non-standard and probably not what you want
> The programmer is allowed to cast any pointer type to any other pointer type like this to change the code the compiler generates.
> p = (int * ) ( ((char * )p) + 12); // [Some spaces added by me to prevent Hacker News from eating the formatting]
Only in some very specific cases…
> Because the block pointer returned by malloc() is a void* (i.e. it makes no claim about the type of its pointee), a cast will probably be required when storing the void* pointer into a regular typed pointer.
Casting malloc is never required (and I would say usually not a good thing to do)
>> The value 0 is false, anything else is true. The operators evaluate left to right and stop as soon as the truth or falsity of the expression can be deduced. (Such operators are called "short circuiting") In ANSI C, these are furthermore guaranteed to use 1 to represent true, and not just some random non-zero bit pattern.
> Under the assumption that there are no boolean types from earlier, this is not true
Actually, I believe _Bool is guaranteed to use 0 for false, and any non-0 value is stored as 1 for true. Arithmetic on _Bool is also guaranteed, based on those values.
For example, I believe the standard guarantees:
_Bool x = 255;
assert(x == 1);
size_t y = 10 + x;
assert(y == 11);
This is exactly why I included earlier bit where they claimed there was no boolean type, to show that their conclusion is logically inconsistent rather than just incomplete ;)
>> I generally stick the * on the left with the type.
> Not a problem, but :(
In a declaration, * is a type modifier. E.g. `int a;` declares a variable of type "int" named "a". `int* a;` declares a variable of type "pointer to int" named "a".
The only time this doesn't work is if you stick multiple declarations on the same line. That's annoying to me, because it means you're breaking the "declare variables as close to their first use as possible" practice. It's not K&R C any more, you don't need to declare everything all at once at the top of the scope.
Also, to prove that `` in a declaration is part of the type, note that K&R style function declarations (no argument names, just types) are still valid C (though I'd strongly discourage their use). So `void func(int a, int b);` is identical to `void func(int, int);`. It's very different from `void func(int, int);` that you'd get if you assume the `` goes with the `a`.
It took Microsoft 16 years to support most of C99 in MSVC, and they are still not completely done after 21 years. I think for a document last updated in 2003 it's ok not being based on C99 ;)
C99 added type _Bool (also called bool if you #include <stdbool.h>) -- but it doesn't actually use it much. Operators that yield logically "boolean" values (<, <=, >, >=, ==, !=, &&, ||, !) yield values of type int, not _Bool. The value is always 0 or 1 -- in contrast with isdigit(), for example, which is specified to return 0 for false and some unspecified non-zero value for true.
Converting any scalar value (that includes pointers) to _Bool yields 0 or 1 (false or true).
Now that I think about it, I think that depends on what you mean by "use". I was commenting from a perspective that you can pass in something that is not 0 or 1, and in general it is not advised to assume that a "boolean" is 0 or 1 especially given that this document doesn't mention the boolean type (which is guaranteed to have those values). This is true even under ANSI C, because as it mentions later, programmers depend on any nonzero value being "truthy".
Normal c is not difficult. But once get yourselves into using it in eg the hacking of game boy (still remember the confusion of the data, tile etc) and basic bare metal small machine. Just hard.
Rust is a fine replacement for C++, but not really for C, and the reasons why Rust can't be a replacement for C are very similar to why C++ can't be a replacement for C. Everybody who chooses C today has slightly different reasons to do so, but one important reason is that C is a very small language with a very small standard library, and most parts of the standard library can be ignored without losing any of the "qualities" of the C language. IME a small language feature set makes it easier to evaluate and integrate third party code, it's hard to say exactly why that is, but that's my experience anyway. Of course one could say "nobody forces you to use the whole feature set of Rust", but the same has been said in the C++ world for decades. The problem is that everybody selects their own subset of the language when having the choice.
K&R is woefully out of date. Gives you no info on how to do things safely and sanely. And encourages a leet style of programming that results in catastrophic edge case bugs. As you can see in comments above where naive code that iterates backwards through an array fails when the array size is 0. Worse K&R leet style buys you absolutely nothing with a optimizing compiler written in the last 30 years.
K&R is actually fairly reasonable about performance, and much less l33t than code I have seen in the wild. It is a very good introduction about the language, and although I would not ever call it "woefully out of date" I would say that it is a good idea to read more about the current state of bugs and tooling, which is not discussed because the book is a general overview of the language.
I hate guides showing me how to do things safely and sanely from the get-go. Show me how to do it. If safety and sanity are priorities for me, I’ll seek those out on my own.
[1] http://cslibrary.stanford.edu/102/PointersAndMemory.pdf