Amazingly the list is still relevant. I will try to check out the new suggestions on this thread.
There are a few mentioned on this thread I have not read yet so maybe these will change my mind.
I disagree with this comment: "No website is as good as a good book. And no good book is as good as a disassembly output."
There are many websites today that are excellent. And there are many websites that cover obscure topics not found in books.
Finally, I think it's important to move past C89 to at least C99. I realize in 2011 (the date of this post), that was less feasible. But today, there is little reason for most projects not use the advances found in C99.
printf("here's your int64_t: %ld\n", x);
printf("here's your int64_t: %lld\n", x);
printf("portably int64_t: %" PRId64 "\n", x);
The default integer types have not aged well...
This bothers me less because I can always cast it up:
fprintf(logfile, "pid: %lld", (long long)pid);
> The default integer types have not aged well...
To me, the real pisser is that every compiler copped out and decided to leave int at 32 bits when the architecture advanced to 64 bits. Part of the reason for compilers exploiting "signed can't overflow" is so they can work around for loops with 32 bit int variables, and this has ugly consequences because most of the conversion rules in C are defined with respect to int.
And honestly, Microsoft should burn for leaving long at 32 bits.
printf2("int64: %d64", x);
printf("int64: %" PRId64, x);
Does visual studio support c99 yet? Last I heard MS wasn’t interested in supporting it.
Why was it added to C99?
VLAs were made optional in C11, so it's better not to use them at all for portable code.
For example, I taught myself Pascal, C, C++, assembly and Java (badly) before attending undergrad. The good thing about the ugrad program I had was that there were SGI, HP-UX, Solaris and Linux boxes such that writing portable C was essential.. you couldn't turn in a project if make didn't work on every platform because they wouldn't say which one they would use ahead-of-time. Then, I had an embedded internship at GPS manufacturer to work on 900 Mhz and 2.4 GHz radio firmware where a C++ subset was used on a codebase that spanned about 100 products. Also, lots of refactoring test/flashing tools was required because of commented-out code remnants were checked-in and 2000-line megafunctions because interns just banged-out code without any enforced coding standards.
Yup. Are there any books covering the same content that ctyme does? (http://www.ctyme.com/intr/int.htm)
You can search it using duckduckgo with !posix, too.
Something else I've found infinitely useful when digging into musl libc or doing assembly programming, is the SYSV x64 ABI: https://refspecs.linuxfoundation.org/elf/x86_64-SysV-psABI.p...
Late into a career investing in other areas, what's the advantage of becoming a good C programmer? Especially in a time where Rust and Go are viable options?
Go is probably never going to run on microcontrollers due to its very big overhead.
C perfectly matches on top of the hardware of a processor. Every single design decision about C was made with the computer in mind. Memory, pointers, stack, call stack, returns, arguments, just everything is so excellently designed.
Even Linus Torvalds says so! https://www.youtube.com/watch?v=CYvJPra7Ebk
If I were to make one change to C, it would be to completely rip out the #include system and bring a proper modules system. Apart from that, it's pretty much perfect.
The only problem is that computers have changed a bit in the last 50 years, and C largely hasn't. There are a couple issues:
First, C was designed for single-pass compilers, because the PDP-7 it was designed for was too small to actually run much fancier of a compiler. So C is seriously sub-optimal for optimization in a lot of ways (because the assumption was you weren't going to do compiler optimizations anyway), and there are some user-visible warts like forward declarations that are completely unnecessary today.
Second, the relevant questions with regard to CPU performance have changed a lot. Most notably:
- CPU performance has completely outstripped memory perf, so memory hierarchies and locality are everything
- Parallelism everywhere. Multiple cores, but also deeper instruction pipelines and other such things.
The way those things map to C is completely implicit; they don't show up in the language at all, and getting the machine to do what you want requires knowing things that the code wouldn't suggest at all.
I think if the same people had designed a language for a similar niche today's hardware, a lot of things would be different.
Being tedious and almost as unexpressive as asssembly does not make a low level language.
C doesn't map to the machine and never did. Compilers and chip vendors map to C.
Further reading: https://queue.acm.org/detail.cfm?id=3212479
The problem with this line of argument: assembly has also not changed. If the things you talk about mattered, we would be using a different assembly.
We are. It's called microcode.
I'm utterly confused at this.
It's trivial to layout memory as you please, where you please, very directly, in C. Set a pointer to and address and write to it. Better yet, I can define a packed struct that maps to a peripheral, point it to its memory address from a data sheet, and have a nice human readable way of controlling it: MyPIECDevice.sample_rate = 2000.
Keeping things physically close in memory has always been a strong requirement, as long as cache, pages, and larger than one-byte-memory buses have existed.
Just make sure you don't forget `volatile` in the right places. A lot of codebases end up just using their own wrappers written in asm for this kind of thing, because the developers have learned (rightly or wrongly) not to trust the compiler.
To be clear, it's not that hard to get the memory layout semantics you want in C. But issues around concurrent access, when it is acceptable for the compiler to omit loads & stores, whether an assignment is guaranteed to be a single load/store or possibly be split up (affects both semantics in the case of mmio and also atomicity), are all subtle questions, the answers to which are not at all suggested by the form of the code; The language is very much designed with the assumptions that (1) memory is just storage, so it's not important to be super precise on how reads and writes actually get done (in fairness, the lack of optimization in the original compilers probably made this more straightforward), and (2) concurrent access isn't really that important (the standard was completely silent on the issue of concurrency until C11). If you care about these issues there's a lot of rules lawyering you have to do to be sure your code isn't going to break if the compiler is cleverer than you are. A modern take on C should be much more explicit about semantically meaningful memory access.
I think you can make a sensible argument that wrt hierarchies C is at least not a heck of a lot worse than the instruction set, so maybe I'm conceding that point -- though the instruction set hides a lot that's going on implicitly too. Some of this though I think is the ISA "coddling" C and C programs; in a legacy-free world it might make more sense to have an ISA let the programmer deal with issues around cache coherence. I could imagine some smartly designed system software using the cache in ways that can't be done right now (example: a copying garbage collector with thread-local nurseries that are (1) small enough to fit in cache (2) never evicted and (3) never synced to main memory, because they're thread-local anyway). Experimental ISA design is well outside my area of competency though, so it's possible I'm talking out of my ass. But the general sentiment that modern ISAs hide a lot from the systems programmer and that other directions might make sense is something that I've heard more knowledgeable people suggest as well.
>A modern take on C should be much more explicit about semantically meaningful memory access.
If you are working on concurrent code close to the hardware you’re going to either have to accept a less efficient language or engage in rule lawyering. Unfortunately, granting the compiler license to perform the most mundane optimizations interferes with concurrent structures. Fortunately, with C there are rules to lawyer with, and they actually are simple. No matter what, rules will always need learned.
It wasn't trivial before fixed width integral types, which is fairly recent in C terms (C99), and it's still far more complicated than it needs to be.
Furthermore, the fact that C is the defacto language of performance means that our hardware has been constrained by needing to run C programs well in order to compete.
Think of all the interesting innovation we could have had without such constraints. For instance, see how powerful and versatile GPUs have become because they didn't carry that legacy.
GPUs are the best example for why C is a good lower-level high-level language, seeing how CUDA is programmed in C/C++.
Do you have any examples of architectures that could exist if only they weren't constrained by legacy C?
CUDA is not C or C++. That you can program GPUs in a C/C++-like language does not entail that C/C++ is a natural form of expression for that architecture.
> Do you have any examples of architectures that could exist if only they weren't constrained by legacy C?
Turing tarpit means that every architecture could be realized, but that doesn't make it a particularly efficient or a natural fit for the hardware.
For instance, consider that every garbage collected language must distinguish pointers from integer types, but no such distinction exists in current hardware, and the bookkeeping required can incur significant performance and memory constraints (edit: C also makes this distinction but it doesn't enforce it).
Lisp machines and tagged hardware architectures do make such a distinction though, and so more naturally fit. With such distinctions, you could even have a hardware GC.
It's not a matter of what is/isn't a "natural form of expression." The point of C/C++ is to be high-level enough for humans to build their own abstractions over hardware. (sounds like an OS, right?) The success of the design of C/C++ is in that the creators had no knowledge of modern GPUs, yet GPUs can efficiently execute them with a little care from developers. We use other abstractions (e.g. SciPy on Tensorflow) because they are more appropriate to solve our problems, but they are built on C.
>Lisp machines and tagged hardware architectures do make such a distinction though, and so more naturally fit. With such distinctions, you could even have a hardware GC.
And why would that not be backwards-compatible with legacy C?
Particularly, I am rejecting the idea that C is somehow stunting hardware development - I see no evidence of this fact. I am also skeptical about the claim (although I will not reject it outright) that there is a language substantially better fit compared to C for low-level programming (e.g. embedded, kernel).
Sure it matters. If primitives don't map naturally to the hardware, then you have to build a runtime to emulate those primitives, just like GC'd languages do.
> The success of the design of C/C++ is in that the creators had no knowledge of modern GPUs, yet GPUs can efficiently execute them with a little care from developers
You cannot run any arbitrary C program on a GPU. This fact is exactly why GPUs were able to innovate without legacy compatibility holding them back.
Only later were GPUs generalised to support more sophisticated programs, which then permitted a subset of C to execute efficiently.
The progress of GPUs proves exactly the opposite point that you are claiming. If C were so perfectly suited to any sort of hardware, then GPUs would have been able to run C programs right from the beginning, which is not true.
> And why would that not be backwards-compatible with legacy C?
That's not the point I'm making. Turing equivalence ensures that compatibility can be assured no matter what.
The actual point is that CPU innovations were tested against C benchmark suites to check whether innovations effectively improved performance, and some or many of those that failed to show meaningful improvements were discarded, despite the fact that they would have had other benefits (obviously not all of them, but enough). It's simply natural selection for CPU innovation.
It's incredibly naive to think that only hardware influences software and not vice versa. For instance, who would create a hardware architecture that didn't have pointers? It would simply never happen, because efficient C compatibility is too important.
The problem is that C was given a disproportionately heavy weighting in these decisions. For instance, a tagged memory architecture would show zero improvement on C benchmarks, but it would have been huge for the languages that now dominate the software industry.
> that there is a language substantially better fit compared to C for low-level programming (e.g. embedded, kernel).
The limitations of C are well known (poor bit fields and bit manipulation, poor support for alignment and padding, no modules, poor standard library, etc, etc.).
Zig addresses some of those issues. Ada has been better than C for a long time. A better language than all of these could definitely be designed given enough resources, eg. see the research effort "House" .
That's only half the equation. Hardware cannot save you from semantics that are less efficient. To use your example: every GC'd language must have a runtime system track objects, whether that is implemented with or without hardware support. That system constitutes additional overhead -- either precious silicon is used delivering hardware support for GC or clock cycles are used emulating that support. Either way, you're losing performance. C/C++ have semantics that are easy to support, in contrast.
>You cannot run any arbitrary C program on a GPU.
Nor can you run any arbitrary C/C++ program written for Posix on Windows, or a program written for the x86 on a STM32, etc. You have always had to know your platform with C/C++. The point is that they are flexible enough to work very well on many platforms.
>This fact is exactly why GPUs were able to innovate without legacy compatibility holding them back.
GPUs have become a lucrative business precisely because they have begun exposing a C++ interface. Look at how the usage of graphics cards have changed in recenter years.
> If C were so perfectly suited to any sort of hardware, then GPUs would have been able to run C programs right from the beginning, which is not true.
No. GPUs _were not_ general purpose compute devices from the beginning, as you pointed out. You had GLSL, etc. but the interface exposed to programmers was not Turing-complete. From what I gather, GPUs have only had a Turing-complete interface since shader model 3.0, which first appeared in 2004. By 2007, you had nvcc. Today, C++ is very well supported by CUDA. You may as well be saying "You can't run C on a cardboard box, so it's obviously not well-suited to all hardware." Obviously, your hardware needs to expose a Turing-complete interface for a Turing-complete language to be able to run on it.
>The problem is that C was given a disproportionately heavy weighting in these decisions. For instance, a tagged memory architecture would show zero improvement on C benchmarks, but it would have been huge for the languages that now dominate the software industry.
>For instance, who would create a hardware architecture that didn't have pointers? It would simply never happen, because efficient C compatibility is too important.
No. It would never happen because the machine you just described would make a very bad general purpose computer.
>The limitations of C are well known
Yes, they are. But everything you listed isn't substantial. It's C, with a better standard library, standard support for controlling alignment/padding, and modules. That's not significantly different.
Cuda is a C++ API. On modern hardware, it's programmed in purely standard C++.
A computer. The PDP-11. C was a terrible fit for a lot of the popular contemporary architectures when it was designed (PDP-10, Burroughs large systems, UNIVAC 1100, CDC mainframes, HP 3000), and continued to be a very poor fit for many computers in the 1980s (segmented 8086 and 286, 6502, AS/400, Lisp Machines except for Xerox, Connection Machine, Tandem, Novix/RTX, Transputer, etc.).
Another thing is that large C code bases tend to become ensconced in layers of preprocessor macros, which I think is a hack-y way of doing things.
My employer runs go on micro controllers. Definitely suboptimal, but as long as the cost of increasing hardware capacity to accommodate Go is feasible then it's a viable option.
no_std allows the important hooks of panicking, output and allocation to be implemented by the user. It's also very easy to put in hard-coded pointers that represent memory-mapped hardware. And there's no GC. Furthermore, it's entirely possible to convert only a portion of a project to using Rust while working to gradually to replace/reimplement.
> If I were to make one change to C, it would be to completely rip out the #include system [preprocessor] and bring a proper modules system.
Congratulations, you've just reinvented Java, D, Rust and Go.
> Apart from that, it's pretty much perfect.
This seems like a religious opinion rather than having understanding of different paradigms. Have you been paying attention to why Java, Erlang, Go, Rust and exist?
Rust has numerous advantages over C that eliminate entire categories of problems without sacrificing speed. If you can't see that, then maybe you don't want to see it.
But anyways, programming skills is not so much about the language. A good programmer in one language will be a good programmer in any other language very quickly. But still, if I have to hire a programmer for a project in a language he doesn't know, I will tend to prefer C programmers over those who only use higher level languages. The reason is that C programmers may do things that are not pretty, but they usually understand what they are doing, people who are only accustomed to some higher level language may come up with better designs, but write code that make no sense.
This is not true. It hasn't been true for decades. C is an abstract high level language on modern processors.
As a C++ person, basically all micro-optimizations start at "look at the damn assembly" since people really suck at predicting what code will actually be generated.
Even with inlining the point is mostly the same - there aren't small program changes that lead to big changes in what is generated.
My point is that learning C is definitely worth it. Not only it has real life applications but it will help you become a better programmer in general. But it doesn't mean you should limit yourself to C. Go and Rust are good too, and even the most hated languages (ex: PHP) can teach valuable lessons.
There are a lot of libraries written in C, like, thousands in Debian alone. You can use them from any language, but sometimes you need to write a little bit of glue C to get that to work. Sometimes it's less painful to just write your program in C++ or Objective-C so you don't have to debug your glue C.
If you want to write a library that can be used from any language, basically your options are C, C++ (but realistically its C subset), and Rust. Getting things to work all the time is easier in Rust than in C or C++. Getting things to work some of the time is easier in C.
But for most of these things it's probably adequate to be a mediocre C programmer. Unless your mediocrity is manifested in spending a week on tracking down a bug that should have taken an hour, maybe.
I think this is a good generalization of the learning curves of the languages, but not necessarily productivity for an experienced developer. Rust has modern amenities like pattern matching, generics and a package manager.
The biggest problem would likely bad code quality standards of earlier periods (lots of budding programmers in the dot com boom wrote a lot of poorly done code that survives today, which is responsible at least in part for Perls reputation as write only), but if you're just deploying your own code, that's less of an issue. Perl can be written to be readable and obvious, it just takes control. In that respect, I imagine it's a lot like C.
C is necessary in some paradigms/domains. But that list has been shrinking as other languages are born and mature.
Learning C, like learning most things, will still certainly lend itself to lots of things you do, even if you're not using C.
Why learn C? Because you want to do stuff on that list or because you feel like it. Why not learn something else instead? It may very well be that there's more worthwhile things to learn depending on your goals.
It is the most popular programming language. There is a lot of code written in it. Most jobs involve maintaining extant code. So it's good for that. It's safe to say there will be a need for C programmers for the next 100 years.
Why does the Rust community have to make every discussion about Rust?
Because it's the best no bullshit book to learn C, period. Especially coming from garbage collected languages. It's not going to teach you valgrind or GDB, or even compile and link C programs but it will certainly teach you how to write C programs.
> only trivial toy examples with no real world application
Re-implementing POSIX commands is writing real world applications.
> and touches almost nothing on actual project architecture or best practices.
On what platform? Linux? Windows? Mac? tool chains are so different on all these platforms that it wouldn't make sense to spend time on that. That books is about C, not learning autotools and other horrors.
It seems the author was aware of this. Isn't syntax a good place to start for a new language?
It teaches you how to write malloc. Plenty of the examples have real world applications. If you want a book on architecture then get a book on architecture. K&R is about C.
>To read great C code will help tremendously.
But then he just gives examples of games I don't know and am not really interested in. Anyone know some non-game C that is, I suppose, well-known and wonderfully-written?
Also, am learning about writing APIs, any good books or talks about that people could recommend, or examples of particularly good ones in C? (I'm particularly interested in maths/graphics-type stuff.) Thanks!
Including the sub projects, for instance:
It feels immediately obvious and understandable, which is quite impressive for a browser. Everything is well documented, well separated, the code is clean and seems simple, functions are small, etc. I am really impressed. I'd want to work on such code. I have never worked on it though, and I am not connected in any way to this project.
The tests in particular are very impressive.
Some other notable C codebases: Redis, LuaJIT, FreeBSD, Memcached -->
Varnish --> https://github.com/varnishcache/varnish-cache
Linux Kernel --> https://github.com/torvalds/linux
also plan9, its nice to read some kernel code that hasn't been tortured by practical requirements for decades.
And one of the few examples of literate programming one might come across!
The Rust Programming Language  has since equalled and surpassed it in my mind. Hats off for Steve Klabnik & Carol Nichols :)
(note, I'm not saying Rust is a better answer than Fabien's recommendation, just that its book is as high quality as the excellent K&R)
1 - Recreate some of the Linux command line tools, including a command line interface similar to BASH;
2 - Implement a fully functional compiler or byte-code Virtual Machine for a stripped down scripting language;
3 - Write a key-value data store
Sadly I really can't name any project that is not system programming. I saw some projects that use C for backend (CGI style?) programming or game programming but I believe there are more suitable tools for either of them.
BTW I also found it very clear to use C or a stripped down version of C++ to teach myself Data structure.
Good (and often forgotten) advice for more than just C...
Learning through trial and error can be okay for custom problems that don't have well defined solutions. For simple problems, it's highly inefficient. Extreme example: if I want to print something to the console in C, I can use trial and error for a half-hour to try to understand the nuances of how to use printf(), or I can read for a few minutes a few pages in a book or an (good) online tutorial. Not only will I learn faster, I will also learn best practices.
My experience has been that people who are really against learning through books/tutorials tend to have short attention spans. Which is okay—books don't work for you. But for people who do have the ability to sit and read, it can be much more efficient than struggling through trial-and-error.
Somebody has already solved your problems, and they can tell you in a few pages how you can solve them too.
> Or how errno came to existence?
This is interesting. According to this book, errno was created because they wanted system calls to work like ordinary C functions:
> Each implementation of UNIX adopts a simple method for indicating erroneous system calls.
> Writing in assembly language, you typically test the carry indicator in the condition code.
> That scheme is great for assembly language. It is less great for programs you write in C.
> You can write a library of C-callable functions, one for each distinct system call.
> You'd like each function return value to be the answer you request when making that particular system call.
This turned out to be unnecessary. For example, many Linux system calls return a signed size_t type with either the result or the negated errno constant. When an error occurs, the C library stub function simply negates that value and assigns it to errno which is likely a thread-local variable.
The C standard library provides many examples of bad design. All the hidden global data, the broken locale handling, the duplication found in so many str* and mem* functions... Freestanding C lacks most of the standard headers and is a better language for it. Understanding the historical context that contributed to these designs is very useful since it allows new languages to avoid repeating these mistakes.
Mel loved the RPC-4000
because he could optimize his code:
that is, locate instructions on the drum
so that just as one finished its job,
the next would be just arriving at the “read head”
and available for immediate execution.
There was a program to do that job,
an “optimizing assembler”,
but Mel refused to use it.
You mean, adopting a memory model that properly supports multi-threaded code?
> memcopy is for instance impossible to implement in C17.
What? I assume you mean memcpy? How is it impossible to implement given that it's implemented in the standard library?
> Due to these issues the Linux kernel is no longer using standrard C.
As others have mentioned, the Linux kernel never used standard C. Just search for, say, the substring "__builtin" where it uses compiler builtins directly. It's also quite recent that Linux can be compiled by clang; if it were using standard C, such compatibility issue would not arise in the first place.
The Linux kernel never used standard C, since the very first release (0.01) it already depended on GCC extensions.
void set_value_to_zero(int * p)
if(p == NULL)
*p = 0;
In c89 writing to NULL is gives you undefined behavior. In c99 the compiler can assume that no code path will produce code that writes to NULL. That's a huge difference! It means that looking at the above code the compiler can reason: since there is a path to writing to the pointer p, the compiler can assume that p will never be NULL. Therfore p == NULL is known to be FALSE at compile time and the null test can be optimized out. I (and Linus) think this is insane.
Stopped reading right there.