Hacker News new | past | comments | ask | show | jobs | submit login
To become a good C programmer (2011) (fabiensanglard.net)
272 points by __john on May 1, 2016 | hide | past | web | favorite | 103 comments

I found that, after learning the basics of the language via K&R or a similar book, the best way to get a good understanding of C and its weird corner cases and eccentricities is just to go through the comp.lang.c FAQ page. http://c-faq.com/index.html

It's pretty comprehensive, and I found the level was pretty good for a "not total newbie, but still not familiar with the subtilties" level that can be kind of hard to find resources for.

This, the standard document itself (it's not very long, and drafts are freely available), and the C tag on stackoverflow (often contains comprehensive answers by well informed people) are my recommended resources.




A good way to casually browse SO for interesting information is filtering by questions in the last <time period> and ordering by votes.


Making a blogging engine in C would be very informative (though difficult).

Not terribly difficult. I wrote my own blogging engine in C [1] which I still use [2] (it also uses this library I wrote: [3]). Last modification I made: support for gopher [4] for that "retro feel."

[1] https://github.com/spc476/mod_blog

[2] http://boston.conman.org/

[3] https://github.com/spc476/CGILib

[4] gopher://gopher.conman.org/

i think fefes cms is written in C... [1]

[1] http://www.fefe.de/poweredby.html

Eeyup. A 2009 version is archived at https://erdgeist.org/cvsweb/Fefe/blog/blog.c

The best part is that where stores his posts. SQL? Nope. NoSQL? Nope. Plain text files? Nope. Fucking LDAP. [1]

[1] Source: http://blog.fefe.de/?ts=a8d61c27

it would be pretty easy to make one using c pre processor

If you build it on ribs2 (with garbage collection) it wouldn't be too hard.

I'm not sure if I think writing C with a garbage collector is a good exercise for learning the subtleties of the language.

If you figure out how the garbage collector works (it's a C library) then you'll learn :)

When I first started out with C (and back at that time it was the first programmimg language I started learning) I read a majority of the "standard" books and felt rather confident in my understanding of the basics, but when it came to actually writing C, I never felt like I was writing good, pragmatic code. So I started combing through the sources of popular FOSS software like gtk, musl libc, Linux, OpenRC, gcc, etc, submitting bugfixes for a few of them, and asking for code reviews on mailing lists until I finally thought that I was doing okay. I still felt like my knowledge was lacking though, so I decided to learn the internals of the modern x86 machine by writing a hobby OS resembling MINIX (I think I even got X working at one point). Despite still not being the best programmer, it really helped with my understanding of why things are done in C like they are.

This sounds like a good way to learn, but I would worry about picking up bad habits. Do you have any recommendations of (C) projects etc. that are particularly well written?

The more I learn C the more I hate it. At first it seems simple and easy but reading "Expert C Programming" is reading a laundry list of what's really messed up with the language. 80% of the problems would be solved by some sane syntactic sugar that compiles down to C

I would say generally I feel that way about any language I've learned. Initially they are pretty easy and the examples given include slick solutions to contrived problems. Then you get into wanting to do real work and you learn about all the corner cases and ambiguities and landmines that are hidden farther afield.

Can anyone name a language that they've grown to like more the more they learned about it? I would guess maybe only those in the LISP family would make the cut.

My favorite language remains 'C'. You don't have to even look at the corner cases, landmines and ambiguities - use the subset of the language that works.

Other than things like signed integer weirdness, most complaints about 'C' revolve around the library. Well, don't use those parts of the library.

Except that generally the point of using C is to squeeze performance out of your alogorithms - so you can't limit yourself. But even the simple straightforward stuff has been done just sloppily. I think we just take for granted that it's part of the learning curve and that programming just isn't easy.

And eventually your fingers memorize all the nuanced and sure if you don't screw up it all works. But it's designed in a way where you end up shooting yourself in the foot. It's hard to reason about pointer and reference notations, everyone eventually miss a break in a switch - const is weird, arrays vs. pointers is weird. Symbols are needlessly overloaded to do different hings. Things that seems like they'd work for strange reasons don't EXAMPLE:

foo(const char p) { }

main(int argc, char argv) { foo(argv); }

I really really recommend opening up "Expert C Programming". I promise by page 50 you will be angry - not because there are so many gotchas, but because most of them are completely fixable. Just no one has bothered to do it

Haskell can be annoying to work with in the beginning, since the slick quicksort examples don't really explain why you'd want to use it to write something real. But then your project grows bigger, and the type system maintains sanity in a way you don't otherwise see. Other languages usually use tons of unit tests everywhere to enable refactoring, but then they bog you down if you really need to change something.

I guess it has problems with records, space leaks, and deployment to old machines. Maybe ML or F# are just as good for this purpose; I haven't really used them.

Objective-C, actually. I had really only used Java and a little C++ and PHP before I learned it to start doing iOS development (still iPhone OS at the time), and while I initially hated the syntax and found it confusing, I really ended up appreciating a lot about how things were done in the language. Although that was mostly things which originated in Smalltalk.

> Can anyone name a language that they've grown to like more the more they learned about it? I would guess maybe only those in the LISP family would make the cut.

You're correct that a few of the Lisps have had this effect for me (Scheme, Common Lisp, EuLisp, Le-Lisp, elisp), but there have been a few dialects that I came to like less and less as I learned them (note that I do not necessarily dislike them, rather I'm disappointed by them): Clojure, Newlisp, and Racket, for example.

There are a few non-Lisps that I find myself liking more and more as I use them: Kitten and Mantra. They're both concatenative languages that take somewhat different approaches from the usual Forth-likes. I'm still not super proficient with them, though, so it's possible that the derivative of my fondness for them will invert yet. I've also had that experience with ksh (believe it or not) and Vim (a DSL for editing, if you will).

There have also been a few languages that have had a sort of "roller coaster" effect: at first they excited me greatly, then as I learned them better and better, I liked them more, then less, then moreā€¦. Some examples that come to mind are C#, Datalog, Haskell, Modula-3, OCaml, Prolog, Rust, and Standard ML.

> Can anyone name a language that they've grown to like more the more they learned about it? I would guess maybe only those in the LISP family would make the cut.

Well, personally I've fallen in love with using FORTH for when I'm doing my hardware hacking (mostly on Arduino). I don't think that anybody would call it 'in the LISP family', but the language has similar grammar-extension capabilities, and I adore it for that.

I'm not a super genius programmer by any means, but I've had a lot of pleasant surprises learning about the depths of Python.

I disagree: no static typing --> annoying runtime errors, no 'use strict'..

While I like static typing, there's a lot more that goes into making a language good or bad than that. On the whole, python seems pretty good to me. I'd take it over Java 90% of the time.

I learned to appreciate static typing after "diving into" a large Python codebase. Even with a good IDE (PyCharm), it was freaking hard to understand what the heck was going on. With all the fun in quick prototyping, there's no way I'm going to use a dynamic language for anything serious.

If a Python code base that's large carries no tests it can be a nightmare to walk through (I've walked through my own ones that I wrote before I had learnt testing too). Statically types languages are great but if they are also oop I find that unless someone really knows how to hold back on the "make things an interface" mentality you can end up with a large codebase that looks just as crazy as a dynamically typed one.

Point here being that everyone's mileage will vary with dynamic and statically typed languages. But over time as the code base grows, the helpfulness of the language type becomes less (to the point of irrelevance). Instead the value of how cleanly the code has been written and maintained matters much more. Code with good tests, classes that are divided logically, well named methods and variables, etc, should be able to tell you what you are looking for regardless of language type.

I have really enjoyed Scala. The more depth I gain in the language, the more I appreciate it. It has the ability to be compact without being overly terse, and is very readable if you don't get too crazy with using symbol overloading and overly-complicated types.

> Can anyone name a language that they've grown to like more the more they learned about it?


My guess is that liking/not liking 'C' depends on how interested you are in its representation model. If you're actively disinterested in how things work as bits in memory, then I can't blame you a ... bit.

No the representation model is fine. It's where C tries to be something other than portable assembly where it goes awry. E.g. C has more syntax for arrays than actually exists at runtime, which leads to surprising behavior. The problem with array arguments discussed in the LKML thread posted above is a great example.

I am having trouble identifying "the LKML thread."

I think he meant this:


(It's Torvalds on a bug made harder to spot by using array arguments in C, which you shouldn't do.)

sorry for OT, but:

g+ share: 0, reddit share: 1k.

.. so I guess it becomes clear why Google is deemphasising this.

Thanks so much! Agreed.

I was actively interested how bits and hardware work, using Z80, 80x86, 68000, GW Basic, Turbo Basic, Turbo Pascal, Modula-2 and Oberon.

No need for C and its flaws.

You might like Go....I find it's 90% of the good-ness of C with only like 10% of that badness

Try learning Rust :)

Why are you learning it?

I cannot recommend Understanding and Using C Pointers by Richard Reese highly enough (http://www.amazon.com/Understanding-Using-Pointers-Richard-R...)

Learning C syntax is pretty easy. Learning to use the standard library is mostly a matter of reading man pages and other people's code. But I found understanding pointers and memory management completely opaque until I read that book. It definitely brought me from "beginning C hacker flailing about" to "intermediate C hacker flailing about in a more dangerous way".

You can be an excellent C programmer but still create horrible abstractions. Programmers underestimate the art that goes into that.

K&R is often recommended, and it's certainly fun to read and accessible. But I've also heard it's outdated, and doesn't rally focus much on modern C software design, mostly because the world knew little about it when K&R was written.


Some of the examples in K&R are not up to date with respect to modern security practices. They are almost case studies in how to implement code vulnerable to buffer overflows, etc.

Or maybe it's better to say -- sanity checking on inputs is an (unmentioned) exercise left to the reader.

K&R Second Ed. was published going on two decades after C first emerged. Already many versions of Unix and the applications to run on them were written at the time of K&R Second. You can hardly say that little was known about software design with C at the time it came out.

It won't cover C99, C11, or modern practices to mitigate buffer overflows. But it is still an excellent introduction to the fundamentals of C. It's also aimed at people who already know how to program in, say, Pascal, so is probably best not approached as a rank beginner.

I first learned C from K&R, but I'm not really that big of a fan. I find the style of inter-mixing new information in the middle of examples kind of hard to follow. I prefer to have the basics set out in front, and than get a bunch of examples demonstrating them. I'm sure people with different learning styles have an easier time with it, though.

It is still relevant due to the fact that K&R C is still extremely (but not completely) forward compatible with even the most modern c++ standards. Many software design best practices have changed since then, but if you read a more modern book some of the first principles of the C language will have to be either repeated or omitted by a modern book.

> thoughts ?

well, there is a 'ModernC' book by Jens Gustedt of INRIA available here: http://icube-icps.unistra.fr/img_auth.php/d/db/ModernC.pdf

which might suit your fancy ?

K&R is still a great book but there are also other great books. Programming in C by Kochan and A Modern Approach by King are two fantastic books and much more suited to a true beginner than K&R is IMHO. Also Head First C and 21st Century C (second edition) are great books to read.

The exercises in K&R are superb though and I highly recommend taking the time to do them all while you read K&R which I still feel you should, it is a small book so shouldn't take long to read.

Since we're throwing out books, "The Absolute Beginner's Guide to C" (by Perry) has helped a number of people who felt completely overwhelmed by programming in general...

I think the style should be avoided; code shouldn't be compact and variable names should be descriptive.

Artificial example:

    for( int i = 0 ; i < c+j ; i += b )
should be:

    for( int count = 0 ; count < sum+extra ; count += skip )
It may be allowed to use i in place of count here, but this is the only place where single name variables should be permitted and only if i is really just a simple array index iterator.

There should always be a balance between brevity and being too verbose.

all single letters = bad, java style thisIsALoopCounter = bad.

and in the c world i is almost universally understood to be used as a loop counter and index. If I came across 'count' in a c code I (briefly) might think it was related to a count of items in an array or similar.

I agree. The problem is that those single character names get abused and you end up with code that consists only of them, including the function arguments, which is the worst.

I hope you don't have a stroke:


...just one of the many source files that make the interpreter for the J language.

The 'style' even leaked out to file naming: a.c, ab.c, af.c, am.c, c.c, ca.c, cc.c, cd.c, cf.c, ...


I doubt that this is either serious (unobfuscated) or handwritten code though.

No, it really is the hand-written source code for the J language (http://jsoftware.com/) interpreter. You may also enjoy:



Even "i" can be changed to "index". My rule of thumb is this: If you wouldn't use the abbreviation when speaking, don't use it when coding.

I don't understand why 'i' has been given a pass. The argument boils down to saved keystrokes. Typing is not the bottleneck.

> I don't understand why 'i' has been given a pass.

Because it is well enough established (in both programming and mathematics) that it communicates literally no less information to the reader than "iterator" or "index" would in the same context.

> The argument boils down to saved keystrokes.

Not at all. I, for one, find it easier to see the shape of the whole expression when the variable names are shorter.

Yes; prolixity yields clarity in programming about as much as in prose. The old quip needs an update: a modern programmer can write COBOL in any language.

My guess would be that it derives from the mathematical notation for sums.

But apart from that, I think using i,j,k as index variables in for-loop has become such a de-facto standard that I would only prefer long names such as "count" or "index" in situations where multiple indices may cause confusion. But then I'd probably use even more specific names than "count" or "index".

> I don't understand why 'i' has been given a pass.

The reason is that in FORTRAN variables starting with i, j, k..n were defined as integers.

Back in the day I learned fortran first, then basic. People that learned basic first often would use the letter 'a' as a loop index instead of 'i'

Now days I still use 'i' but often use indx instead because my editor doesn't highlight single letter variables easily.

I'm okay with "i" (like others said, it's a well-known shorthand for "iterator" or "index") but I often use "idx" because it can easily be combined with other words in nested loops (e.g. "rowIdx" and "columnIdx").

I agree, and I have calculated this a while ago this based on average number of lines a programmer writes in 8 hours. I don't have the actual numbers but unless you type slower than ~30 words per minute, speed it isn't a bottleneck.

I have been using index instead of i in while loops lately. It's surprisingly concise.

When I started to program in C I used "n" for the longest time, simple because it was quicker on to type on my first computer; "Next N" was a quick two taps on the N key. Guess the platform :-)

A ZX80, ZX81 or ZX Spectrum.

This is the first time I have seen sizeof used like this:

  sizeof( &array[0] )
This looks equal to:

  sizeof( array )
at first glance, which would give the size of the entire array in bytes, but of course the &array[0] expression is really:

  &*( array + 0 )
which simplifies to:

  array + 0 
which is a pointer. And using sizeof on it gives the size of a pointer to int.

Edit: (&* array) will also give a pointer.


This is just a really convoluted way to write 2:

   &array[2] - &array[0]

   &*(array+2) - &*(array+0)

   (array+2) - (array+0)

   2 - 0
Again I have never seen this written in such fashion.

Here are the relevant parts of the C standard:


"The sizeof operator... When applied to an operand that has array type, the result is the total number of bytes in the array."


"Except when it is the operand of the sizeof operator or the unary & operator, or is a character string literal used to initialize an array of character type, or is a wide string literal used to initialize an array with element type compatible with wchar_t, an lvalue that has type ``array of type '' is converted to an expression that has type ``pointer to type '' that points to the initial member of the array object and is not an lvalue."

Additionally, here is a thread of Linus Torvalds pointing out even more of the confusing nature of arrays and sizeof in C:


Additionally, here is a thread of Linus Torvalds pointing out even more of the confusing nature of arrays and sizeof in C:

I really like the idiom for passing sized arrays suggested at the end of that LKML thread[1]: pass them by reference!

  void func(int (*arr)[256])
    printf("arr size: %ld\n", sizeof(*arr));

  int main(void)
    int array[256];
[1] https://lkml.org/lkml/2015/9/7/147

I doubt Torvalds telling everyone that sizeof was a function helped with this confusion.

The array->pointer "decay" (as the standard calls it) has 3 exceptions, of which two are when an array is the operand of "sizeof", and when it is the operand of "&". (The third involves a string initialiser, which is not relevant here.)

So your reasoning is not quite correct, it should really be that you think of

    sizeof( &array[0] )
as being

    sizeof( &(something) )
where "something" could be of any type T, and so the '&' operator yields "pointer-to-T", to which sizeof will yield the size of a pointer.

> and when it is the operand of "&"

Another slightly confusing aspect is that `&array` gives you a "pointer to array" (which has the same value as `&array[0]`, but is of a different type). Most importantly, it behaves differently in pointer arithmetic (the implied offset is the size of the array, rather than the size of its elements).

Fabien is hinting at the confusion: sizeof(&array[0]) != sizeof(array)

&array[0] and array decay to the same thing, the pointer to the first element if they are used in an expression. But sizeof gives a different result, because array+0 'decays' to a pointer, and array doesn't.

Lots of good book suggestions in the Ask HN thread I had posted earlier: https://news.ycombinator.com/item?id=11560509

"And no good book is as good as disassembly output."

[x] Strongly agree [ ] Agree [ ] Neutral [ ] Disagree [ ] Strongly disagree

For me, C, i.e., GCC, is most useful as a faster way to generate assembly for a particular CPU than typing it out from scratch. I use GCC as a code generator. I'd like to see more free asm code generators, but I am not holding my breath.

I do appreciate C as a medium for distributing reasonably efficient software.

I don't really understand what this means. How is that different from how anyone else uses a compiler?

I think he means that he works like this:

1. Write some C

2. Dump the corresponding assembly code

3. Modify the code by hand

4. Assemble

5. Load into assembly debugger

6. Learn

There is "C", the language, which can be relatively simple. Or hopelessly opaque depending on the author.

I think of C the language as just a shorthand for assembly. Only because that's how I use it.


But then there is "C" in practice: the specifications, the "standard" libraries, preprocessors, Makefiles, autoconf, etc.

But what would qualify as an "asm code generator" if not a compiler? There are other free compilers. Clang comes to mind.

Let us see... in my day, we called an "asm code generator" an assembler... anyone remember MasterSEKA, Devpac, TRASH'M-One and ASM-one?

  1. Program that reads some input (e.g., C language) and generates asm and/or opcodes.
  2. Program that reads asm and/or opcodes and generates binary numbers ("object files").
No. 1 is what I need. No. 2 is what I call an "assembler". Although terminology means less to me than what a program actually does.

2. is also what I learned as "assembler": a program which takes ASCII source code (written in a language we call assembler), and assembles that source code directly into a binary executable (no linking).

So, back in the day (and still for me, always):

assembler: ASCII source code in assembler language -- "which language did you code that in? - Assembler."

assembler: the integrated development environment (such as ASM-One) which produces executable machine code straight from source, with no linking step in-between; also known as a two-pass optimizing assembler.

No. 1 is literally the exact definition of a "compiler". I still don't understand the distinction you're making.

That's because it is a compiler.

No. 1 is a program that accepts input (e.g., RTL, MINIMAL, etc.) and generates asm.

I like to call this an asm code generator. Because today when people say "compiler" they are often referring to a collection of programs, some of which do not generate asm.

I see. So you are excluding the preprocessor, linker, type-checker (for some languages this is a separate program), assembler, etc.?

Yeah. I'm sorry about the terminology. I guess I just like the term "code generator".

When I use that term I envision simple filters that take ASCII input, maybe even some sort of "template", and transform it into some other format that's useful. Ideally, asm. But not always.

For example, in GCC, for x86, there's a couple of programs that operate on i386-opc.tbl and i386-reg.tbl. I would not call them "code generators" but I suspect they are needed in order for "gcc -s" to work.

Spitbol has one.

As in thinking in terms of what assembly you want the compiler to produce, and not trusting it to do so. Whereas people that program in higher level languages tend to think mostly in terms of the language semantics.

the biggest virtue of a c-programmer is temporary forgetfulness.

forget for the moment, that all of the old-guard-tech foundations is basically a castle made of glued together jello filled rubber ducky's. forget all the tricks needed to jump through that final hoop in assembly. forget even those hopeful endeavors of the languagewiser that stood up, and then came back because performance is a bitch and there use cases to edgy. forget all those library's that overpRomised, undereallocated and disspointered. forget all the futile attempts to steer this boat, carried on the hands of the likes of you, towards some sail-able waters. blissful unawareness settles in, while every "good c-programmer" near you starts to spit fire as soon as management declares a new megalomaniac project in C worthy the effort and thus starting. forget that strange feeling of elated Shame of being the best to repair the most broken car in town.

Then, and only then, you will be a "good" C-Programmer, one that knows all the tricks of trade, while not getting wiser.

Very cool. I would like to see a similar recommendation for TCP/IP, DNS, and perhaps HTTP 1/2, including SSL. (Although I suppose you could start with C and then just get involved with the linux kernel, linux net utils, and nginx. But that's like, hardcore.)

i think 'TCP/IP Illustrated' is good if you want to get into networking.

I'm interested in learning about reverse engineering and malware analysis. Is learning C the proper first step in getting my hands dirty? I used C++ in a few college courses, but I've been primarily a Java developer for the past two years.

Definitely C, and you will also need to know a fair bit of assembly. Also familiar yourself with the PE (portable executable) format and learn how to use IDA disassembler and possibly SoftICE (Not sure if it still works, but it was/is a very powerful kernel debugger)

What's a good practical but small enough project you can do with C? Typically if you are learning Ruby or Node, people recommend creating a blog. What's something like that for C?

Go and grab yourself the latest edition of the excellent Advanced Programming in Unix by Stevens [1]. C in practical action.

(Yes, I'm in the dive in the deep end learning camp.)

[1]: http://kohala.com/start/

A recursive descent parser.

A virtual machine and assembler. This is actually pretty straightforward and lots of fun.

Along the same lines: an emulator for an old computer or games console.

You could make a small game with SDL, perhaps?

One can make a commercial game title with C and SDL.

Case in point: Star Control 2, a commercial title with a cult following, which had subsequently been open sourced, then modified to use SDL. Still works all the way from Windows through mobile phones to Solaris, thanks to portable C and the SDL library, and is an excellent game to boot. Original ran on DOS and the 3DO gaming console.

Grab an AVR microcontroller. Make a clock.

Microcontroller-C is extremely different from application-C but there are many less complicated concepts. Like never having to touch memory allocation or string munging.

why not a blog? A pseudo device driver is a fun/easy project (at least for netbsd), writing a simple lib and interfacing it with ruby or node is also a good weekend project.

I made a small database software engine. Learned a lot about pointers and C syntax, and I still use the software sometimes. SQLite is written in C, if you didn't know.

That's a really cool idea. Thanks.

What other experiences, non C based, will make you a better C programmer too ? ADA, ML, Forth ? some other academic domain ?

Learn assembly then c will be easy

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact