Can anyone recommend an intermediate to advanced C book? I feel like most C material I've been exposed to in the past lacks what a modern C project should have when it comes to best practices, code organization, build tools (should I use make for this, cmake, where does autotools fit? what about clang vs gcc?), linters (valgrind?), testing, etc. I recognize that much of this falls outside of C, but all of these tools seem like things C projects routinely grapple with.
What I liked about this book is that it's very upfront about how messed up C is. I read about half of it and it made me never want to write any C ever again.
An especially memorable part for me was how it took 3 pages of language lawyering to explain why this doesn't compile:
I don't see this as a good example of "how messed up C is". It doesn't compile because you're violating const correctness, and any other language with similarly sound correctness requirements would flag it in a similar way. If anything, this is one of those rare cases where C chooses correctness over convenience.
It takes quite a bit to explain because the "common sense" is that if one level of pointer indirection allows you to pass non-const where const is expected, then two levels shouldn't be any different. But common sense is wrong, and the compiler is right. And it doesn't require any language lawyering, either - all you need to do is slightly tweak the example to show why exactly it is unsafe.
A better example of messed up in that example is how it's not a compile-time error (but is very nasty undefined behavior) for the function without an explicit return type (so defaulting to int) to end without returning anything.
Compilers can - and, indeed, do - diagnose UB as compile time errors, or at least warnings (which you can then turn into errors if you want) all the time.
Now, it is not undefined behavior for the function to not return anything despite having a return type. It is UB for the caller to try to use the returned value, but in this case it's not actually used.
The implicit int feature is really very much deprecated (in fact, it was already removed in C99, almost 20 years ago!). If, for some mysterious reason, you're trying to compile code like that, it's probably very old code dating to before C was an ANSI standard, and void return type was a thing. In such code, it would be pretty common for functions to not return anything, because semantically they don't - it was just a quirk of the language that there was no notion to express a non-value-returning function back then, and so returning an (undefined) int became idiomatic. In C89, this entire behavior was retained largely because backwards compatibility was necessary. C99 finally fixed it.
* Clang vs gcc: doesn't really matter. Clang has slightly better compile times. GCC has really advanced but in some cases still has inferior warning/error messages. Best practice is not to really use GNU c features unless you want to. If you do want to, decide if you want to just use the subset supported by clang or the full shabang from gcc. Beyond that: if you want to support both, test on both before release and do release builds using whichever produces faster or smaller (choose whichever metric you prefer) binaries.
* Build systems: plain old make is sufficient in most cases. I would recommend autoconf over cmake just because it's significantly simpler, although cmake has better cross-platform support (it can generate visual studio build files, for instance). I wouldn't use something outside of make, cmake, or autoconf, even if it seems like it's better in every way than those, because it probably isn't, and even if it is, you lose out on the battle-tested, available-everywhere, and widely-used nature of the above build systems. I shouldn't have to install a build system to build your program, and if I do, I should be able to use that build system to build a significant of other programs too.
* Linting: not really necessary IME. Aim for no compiler warnings, though.
* Yes, use valgrind. Anything it complains about, fix. Uninitialized values and out-of-bounds memory accesses are significantly more important and worrisome than memory leaks (because they represent potential attack vectors), but still, it won't complain about something unless it's actually something that should be complained about.
* Testing: like the other commenter said, just a little bit of macro magic and you're golden.
* Code organization: headers in include/, source files in src/. You can separate src/ into subdirectories if your project grows to sufficient complexity that the source files become difficult to wrangle. Separating the headers into separate directories is probably not necessary, however.
* Not directly c-related, but pick a SANE code style and stick to it.
* Debugging: make a separate target that compiles in debug symbols and disables optimizations, and use gdb (or lldb) on it. You don't need much to get started: 99% of the time, all I do is "break main", "run" ("r" for short), "backtrace" ("bt" for short), "frame <number>" (to switch between stack frames), and "print <variable>" ("p <variable>" for short); it's taken me quite far.
> Uninitialized values and out-of-bounds memory accesses are significantly more important and worrisome than memory leaks (because they represent potential attack vectors)
I agree with your conclusion but not your reasoning. Attack vectors are way down the list of reasons why uninitialised values and out of bounds memory accesses are higher priority. They're higher priority because they will likely (maybe definitely) crash your program or (even worse) cause it to fail in a frustratingly indeterminate manner. You should be so lucky that your program becomes significant enough to be targetted by "attackers".
I like your thinking regarding using simpler build tools. I took this to the extreme and now my build tool for personal C/C++ projects is just a file called build.sh/build.bat. That does little more than:
gcc main.c
main.c #includes any other .c files that are needed (the term for this appears to be a 'unity build'). Compiling this way is /really/ fast, which is why it's okay to use a dumb build script that always recompiles everything.
I would tbh recommend make (bsd make is a wider-supported subset of gnu make) over a shell script, because it can automatically detect which source files need to be rebuilt and which can be kept from previous builds, greatly reducing compile times. Also, it supports parallel builds, although this one is relatively easy with a shellscript.
If the sole purpose is to track which files changed, then redo is arguably a simpler solution to this problem - you don't need to learn a new language either (and its funky rules like the tab/space difference), just shell scripts with a couple new commands.
I do exactly the same thing whenever I write C/C++. Genuinely large projects aside, I see no compelling reason to waste your time with build systems. This approach is simple, easy to maintain and lets you get on with writing actual code.
Because I don't see any value from using it. It's not easier to read or write, it doesn't force you to keep it sane and simple like a bash/cmd script generally would, and it's just another dependency I don't really need. If I take a minimalistic approach to build systems, I prefer to go all the way.
Also, although this is merely a personal quirk that shouldn't persuade anyone else, I've seen enough horrific, unreadable make files to instinctively dislike them by now.
In my mind, shell scripts are simpler. I have no reason to need the extra complexity that make brings.
Also, I do a lot of programming on Windows, where GNU make would be another dependency to install. (Also, in my experience, make is slow on Windows, since they have to emulate fork()). I guess I could use Microsoft nmake, since I assume it's still installed along with Visual Studio, but again, batch files are simpler.
And for GNU tools on Windows, I would heavily recommend MSYS2 these days - having Pacman as the package manager is very nice, and there are already a lot of packages there.
I second this. Seriously the best textbook on systems programming I've worked through especially when accompanied with the famous CMU labs[0]. Anyone who works thoroughly through this book can become a master systems programmer.
Sure! I recommend sitting with the book, a pen, and a notebook at a cafe or wherever you like and write solutions to the practice problems you see sprinkled in each chapter as you read every single word. Then choose a few of the homework problems and do those, some will require a computer. Most of all, work through the labs and don't cheat yourself by looking at other (probably not very good) solutions posted online! Solving the labs with the textbook and TLPI[0] as a reference is how I got the most out of the course. A list of the assignments, as they're done at CMU, is posted below[1]. Good luck!
One of the most challenging exercises in any language is to learn how to organize your code and modularize it at a large scale. I recommend reading Postgres's source code. But don't wander aimlessly, set a goal. For example, figure out how an sql select query is parsed and processed, how types are deduced, etc and learn the programming patterns applied as you go through.
I often recommend postgres because it's highly commented, and easy to understand.
I would recommend C: A Reference Manual (by Samuel P. Harbison and Guy L. Steele Jr.) along with K&R The C programming language.
C has lots of edges where it's easy for a beginner to make mistakes. Say for example the minimal range of a char is -127 to 127 (yes, C cares 1's complement machines) or 0 to 255. Not knowing this may result in a code working in one machine while broken on others. So learning the standard is a must, especially, the undefined behaviors.
I'm not claiming any C ability myself, but in addition to things which have already been mentioned: the old Steve Summit comp.lang.c C FAQ http://c-faq.com/ was only updated up to 2005 but is still hard to ignore.
As far as books go; K&R contains a lot of wisdom despite it's apparent simplicity; and the language hasn't really changed that much since last edition. But to answer some of your questions:
CMake is by far the most convenient build tool I've come across for C so far, the link below will give you an idea of how it looks. I find linters mostly a waste of time , but I run all my tests through valgrind before committing; I don't miss programming in C without valgrind at all; it's also an excellent profiler in combination with KCacheGrind. For testing, simple functions and bit of macro magic on top goes pretty far; C programmers tend to value simplicity, you won't find that many epic unit testing adventure frameworks out there. clang or gcc doesn't really matter, I prefer clang's error messages; and both support GNU extensions, which solve most of the boring problems with coding in C.
Thirdly: grab a copy of FreeBSD (or OpenBSD) and (a) set it up in VirtualBox and SSH it into locally (b) use an old ThinkPad. Then grab the source code of the base system. Build and install it. And start reading code of things like usr.bin/grep/grep.c
I have't done the code walkthrough course, but I bought and watched Kirk McKusick's Kernel Internals course and it is excellent (https://www.mckusick.com/courses/introdescrip.html). It is based around FreeBSD, but is a generic enough Unix internals course that it is good for Linux.
I'm thankful to have the opportunity to learn from someone with such deep knowledge of Unix, who was involved with BSD from the early days in the 80s to modern FreeBSD.
Not to pick on you specifically, but I've really grown to hate this type of response to this very common question. Most people new to a topic want an instructional manual or guide, not a technical reference. Man pages and tables of syscalls are decidedly the latter, and therefore primarily intended for people who are already familiar with the topics they cover.
> Most people new to a topic want an instructional manual or guide, not a technical reference. Man pages and tables of syscalls are decidedly the latter, and therefore primarily intended for people who are already familiar with the topics they cover
But this thread is about brushing up on OS and C programming. So not novices, but people who are already familiar with the topic.
Even then... Where are you going to start in reading references? A random syscall or function a day? I think it is far more useful to e.g. read the late W. Richard Stevens' Advanced Programming in the Unix environment. It puts everything in context, provides historical background where necessary, and gives examples.
Reference pages are not really for brushing up, but more for the 'what was the address family field of sockaddr called again'-type of questions? Or put differently: they are external memory.
> Even then... Where are you going to start in reading references? A random syscall or function a day?
Sure. Or browse through a bunch of them?
> think it is far more useful to e.g. read the late W. Richard Stevens' Advanced Programming in the Unix environment.
That is closer to a reference than a novice tutorial.
> It puts everything in context, provides historical background where necessary, and gives examples.
Sure. So do good references. Even man pages do.
> Reference pages are not really for brushing up, but more for the 'what was the address family field of sockaddr called again'-type of questions? Or put differently: they are external memory.
It depends on your level, experience and your competence in the material I guess. I'm not saying it's the only thing you need, but in many situations, it's the only thing you need to brush up.
The Linux Programming Interface is my favorite software book of all time. I had a problem that I think a lot of other people had, where I learned C the language pretty in depth, but didn't really know what to do with it until I read this.
I've gone through the first few chapters of this book, it's fine. The exercises are somewhat lacking though, in my opinion (the last course I did was nand2tetris, which was project-based, so perhaps I have unrealistic expectations). I started learning C concurrently, and didn't find it a problem.
To learn which traps to avoid in C, have a look at SEI CERT C. It's written very well with examples and explanations, and is structured as a coding guide.
I recently decided to review operating systems, but my old Tanenbaum textbook from college would take forever to read through especially now that I'm working full time. I've been watching the Berkeley lecture series on Youtube (CS 162) and its been pretty good at covering the main topics concisely.
I graduated from college more than 2 years ago so I forgot almost all of them CS topics that I've learned and wanted to review the fundamentals (OS, databases, computer architecture, etc). And no, I don't use of these concepts very much at work, only surface level knowledge. I've also been trying to learn more about malware analysis and reverse engineering, and it seems like having a solid foundation in CS concepts is key to being proficient at it.
Would you mind sharing what kind of software you write?
This is interesting to me, since I'm self-taught and seem to have done well in my career so far (4 years in), but now I'm going through core CS books and courses, worried that I'll hit a ceiling and wanting to fill the gaps in my knowledge. I work in devops and infrastructure.
Knowing and fully understanding the concept of undefined behaviour (UB) and the difference to unspecified and implementation-defined behaviour is a must for safe and secure C programming. Because arguably, C and C++ are broken at the specification level. Chris Lattner puts this in kind words:
Somewhat unrelated but I've always wanted ask: What types of small projects can a beginner start to develop a better understanding of C ? (Beyond basic syntax, something that would actually make use of memory allocation, pointers etc)
The best guided project/intro I've seen for complete beginners is Daniel Holden's "Build Your Own Lisp". It's a free online book that walks you through writing a Lisp interpreter (something worthwhile even for experienced C programmers!):
I think it’s always good to start with implementing data structure and algorithms...
If you want C++, try help out Firefox’s codebase. I did and it was a very rewarding experience. (Linux kernel project is hmm... just harder to get involved IMO).
Indeed! It's my favorite technical book, several of the chapters have information I've yet to find laid out in comparable detail anywhere online.
The book was written for a CS course (15-213) at Carnegie Mellon University, course materials here (supplement, but no replacement for the book): https://www.cs.cmu.edu/~213/
I like this little book, in very small amount of space it teaches you how to build a bare bones OS, from there you can start doing your own experiments which I think is the best way to really learn something:
It's good to not only know operating systems in general, but also to understand some of the internals of the operating system you are actually using. For Linux, this is good:
CS50 courses (This is CS50!), cs50.io is currently one of the easiest and the best introduction to C language. It's very modern approach to teaching C programming.
Will recommending my own project get me hell-voted? Cixl is a straight forward 3-stage (parse/compile/eval) interpreter that is designed to be as fast as possible without compromising on simplicity, transparency and flexibility. The codebase has no external dependencies and is currently hovering around 7 kloc including tests and standard library. There are some articles explaining design decisions, but I've been mostly busy with code. The code should be perfectly readable though, and contains plenty of good ideas distilled from 32 years of daily practice.