Hacker News new | past | comments | ask | show | jobs | submit login
To become a good C programmer (2011) (fabiensanglard.net)
364 points by 6581 11 days ago | hide | past | web | favorite | 132 comments





I feel like I wrote that a decade ago. Wait. Damn. It WAS a decade ago.

Amazingly the list is still relevant. I will try to check out the new suggestions on this thread.


Have you read "C Programming: A modern approach" (2nd ed)? And if so, what do you think about it?

I've found "Understanding and Using C Pointers" by Richard Reese, a really wonderful book.

So glad to see this little gem mentioned! I have many books on C, but this is the only one I've ever read cover-to-cover. Clear and to the point, and no fluff/silliness. One of my favorite programming books ever!

Has your book recommendations changed in this decade? Would you recommend different books today?

No, I think these books are still a decent recipe for a good start.

There are a few mentioned on this thread I have not read yet so maybe these will change my mind.


In the 90s, most seriouc C programmers knew about these books. They were the core books most devs relied on and referred to. They're all good.

I disagree with this comment: "No website is as good as a good book. And no good book is as good as a disassembly output."

There are many websites today that are excellent. And there are many websites that cover obscure topics not found in books.

Finally, I think it's important to move past C89 to at least C99. I realize in 2011 (the date of this post), that was less feasible. But today, there is little reason for most projects not use the advances found in C99.


Personally (3 decades+ journeyman C), I find lot of the advances are often about trying to get people out of bad practices (e.g. stdint.h in C99 to fix portability issues between 32 and 64 bit).

The names in stdint.h are ok, but the way they play with library functions like printf() and scanf() is not so great. Given an int64_t value x, each of these works on some platforms and is wrong on another:

    printf("here's your int64_t:  %ld\n", x);
    printf("here's your int64_t:  %lld\n", x);
And the portable way using inttypes.h is really ugly:

    printf("portably int64_t:  %" PRId64 "\n", x);
There's another religious argument to be had over size_t vs ssize_t, but that can wait :-)

Then there’s the perennial problem of missing format specifiers for POSIX types like pid_t. Although you could argue that POSIX should mandate macros to appropriately handle them, the root problem is the same.

The default integer types have not aged well...


> missing format specifiers for POSIX types like pid_t

This bothers me less because I can always cast it up:

    fprintf(logfile, "pid: %lld", (long long)pid);
Technically not portable, but find a platform with PIDs where this breaks. You can count on long long being at least 64 bits, and I think it'll be a while before we need 128 bit process IDs. :-)

> The default integer types have not aged well...

To me, the real pisser is that every compiler copped out and decided to leave int at 32 bits when the architecture advanced to 64 bits. Part of the reason for compilers exploiting "signed can't overflow" is so they can work around for loops with 32 bit int variables, and this has ugly consequences because most of the conversion rules in C are defined with respect to int.

And honestly, Microsoft should burn for leaving long at 32 bits.


On Amiga, long was 32 bit and int 16 bit and that was a long time ago, when Microsofts int was also 16 bit. So they already shifted it up once. (Going from DOS to Win32 I guess.)

The concept of int hasn't aged very well. People think it's fine because it's always 32 bits on their machine. Go recompile on a machine where it's 16 bits and see how that works out.

What I wonder is, why didn't they add new *printf functions? So you could say 'printf("int64: %d64", x)'

Backward compatibility. Your example would impose additional constraints on the %d specifier that would likely break existing code. What if you want an int immediately followed by literal ”64”?

Oops, I typoed. I said 'new *printf functions'; the example was supposed to be 'printf2("int64: %d64", x)'

I see! That would make more sense, but I still think the inttypes.h macros are a sufficient solution. Implementers and standards influencing users probably aren't too keen on another formatting language when you can implement it as macros that only take another few characters to use.

    printf2("int64: %d64", x);
for example isn't significantly shorter than

    printf("int64: %" PRId64, x);
Perhaps it's easier to read, but when you have to support both, and you're more likely to run into the latter than the former anyway, I think it just adds cognitive overhead to have to consider which formatting language is being used.

I think it's important to push forward with getting people out of bad habits. In a lot of cases, the only way to do that is to force it. I don't often agree with things Apple does, but their forcing people to use x64 is similar to this I think.

> But today, there is little reason for most projects not use the advances found in C99.

Does visual studio support c99 yet? Last I heard MS wasn’t interested in supporting it.


It supports most of it. Useless stuff like VLA is still uninmplemented though.

Why is VLA useless?

Why was it added to C99?


Because even C isn't free from the occasional misfeature slipping into the standard. VLAs come with a lot of caveats (e.g. available stack size being limited and opaque, no sizeof() support, etc).

VLAs were made optional in C11, so it's better not to use them at all for portable code.


You can use clang on VS

Really? No longer an MS user but occasionally my code gets ported to Windows so would like to know more.

MSVC in C mode is essentially C89 plus extensions needed to compile C++/C99-isms present in Windows SDK and DDK headers. And Microsoft seems to be mostly interested in whether the compiler is able to build NT kernel and drivers and nothing much else.

That's not true for quite some time. Visual Studio 2013 added many C99 features that aren't in C++ like compound literals or designated initializers.

The only notable C99 features not supported on the MSVC C compiler are: (a) VLAs, and (b) type-generic macros. The distinction between the MSVC C and C++ compiler is important though. The MSVC C++ compiler doesn't support any C99 features.

Nuance is needed. Multiple sources form the overall picture. I found that doing hard things in an embedded or conventional shipped product team is the best way to learn, and that working with effective people who used a particular language reasonably defensively and obviously for maintainability contributed the most code value.

For example, I taught myself Pascal, C, C++, assembly and Java (badly) before attending undergrad. The good thing about the ugrad program I had was that there were SGI, HP-UX, Solaris and Linux boxes such that writing portable C was essential.. you couldn't turn in a project if make didn't work on every platform because they wouldn't say which one they would use ahead-of-time. Then, I had an embedded internship at GPS manufacturer to work on 900 Mhz and 2.4 GHz radio firmware where a C++ subset was used on a codebase that spanned about 100 products. Also, lots of refactoring test/flashing tools was required because of commented-out code remnants were checked-in and 2000-line megafunctions because interns just banged-out code without any enforced coding standards.


> And there are many websites that cover obscure topics not found in books.

Yup. Are there any books covering the same content that ctyme does? (http://www.ctyme.com/intr/int.htm)


Ah, the famous “Ralph Brown’s Interrupt List”. It was actually published as a book: https://www.amazon.com/PC-Interrupts-Programmers-Reference-T...

just curious. this is not relevant nowadays right?

It's relevant for any computer with a BIOS, it's relevant for the coreboot project (although these days in diminishing value).

Agreed. I need to get some time-off/tech resort to study the fundamentals.

It's unfortunate he doesn't list the POSIX system interface, which has personally become an absolute boon:

https://pubs.opengroup.org/onlinepubs/9699919799/

You can search it using duckduckgo with !posix, too.

Something else I've found infinitely useful when digging into musl libc or doing assembly programming, is the SYSV x64 ABI: https://refspecs.linuxfoundation.org/elf/x86_64-SysV-psABI.p...


Thanks for the link to the POSIX docs, looks very useful. More recent specs for the x64 ABI can be found at https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI

One of the things I tend to think about these days is return on investment. I spent several years at the beginning of my career being a bad and then mediocre C programmer, and once I found a few other languages, I got the sense that being a mediocre-to-good programmer in these languages would be much easier, and that seems to have been borne out.

Late into a career investing in other areas, what's the advantage of becoming a good C programmer? Especially in a time where Rust and Go are viable options?


Rust isn't viable for embedded platforms, at least not yet. It's not as easy to compile it to the most obscure ISAs as C, and the little support it has for stuff like STM32 is restricted to just that group of microcontrollers and doesn't support the entire ARM range. Maybe in the future? I'm looking forward to that day!

Go is probably never going to run on microcontrollers due to its very big overhead.

C perfectly matches on top of the hardware of a processor. Every single design decision about C was made with the computer in mind. Memory, pointers, stack, call stack, returns, arguments, just everything is so excellently designed.

Even Linus Torvalds says so! https://www.youtube.com/watch?v=CYvJPra7Ebk

If I were to make one change to C, it would be to completely rip out the #include system and bring a proper modules system. Apart from that, it's pretty much perfect.


> Every single design decision about C was made with the computer in mind.

The only problem is that computers have changed a bit in the last 50 years, and C largely hasn't. There are a couple issues:

First, C was designed for single-pass compilers, because the PDP-7 it was designed for was too small to actually run much fancier of a compiler. So C is seriously sub-optimal for optimization in a lot of ways (because the assumption was you weren't going to do compiler optimizations anyway), and there are some user-visible warts like forward declarations that are completely unnecessary today.

Second, the relevant questions with regard to CPU performance have changed a lot. Most notably:

- CPU performance has completely outstripped memory perf, so memory hierarchies and locality are everything

- Parallelism everywhere. Multiple cores, but also deeper instruction pipelines and other such things.

The way those things map to C is completely implicit; they don't show up in the language at all, and getting the machine to do what you want requires knowing things that the code wouldn't suggest at all.

I think if the same people had designed a language for a similar niche today's hardware, a lot of things would be different.


Yes, the "high performance" argument for C is a joke at this point. Modern programs are not bound by ALU heavy cycles. It's all about cache locality. C does not help besides forcing you, out of lack of expressiveness, to stick to simplistic data structures without too much indirection. Where it fails is at being unable to inline well (sans spooky LTO magic) because it effectively has no type system, bottlenecking your instruction cache where C++/Rust/Java would create optimized code. If C really were made to map to hardware, it would have a better story (at the language level) for heterogenous computing, SIMD, vectorization, etc. But instead vendors had to create DSLs for these things, because of course the base language doesn't support it, because it's not low level.

Being tedious and almost as unexpressive as asssembly does not make a low level language.

C doesn't map to the machine and never did. Compilers and chip vendors map to C.

Further reading: https://queue.acm.org/detail.cfm?id=3212479


> The only problem is that computers have changed a bit in the last 50 years, and C largely hasn't. There are a couple issues:

The problem with this line of argument: assembly has also not changed. If the things you talk about mattered, we would be using a different assembly.


> we would be using a different assembly.

We are. It's called microcode.


And, you write microcode?

> CPU performance has completely outstripped memory perf, so memory hierarchies and locality are everything > they don't show up in the language at all... requires knowing things that the code wouldn't suggest at all

I'm utterly confused at this.

It's trivial to layout memory as you please, where you please, very directly, in C. Set a pointer to and address and write to it. Better yet, I can define a packed struct that maps to a peripheral, point it to its memory address from a data sheet, and have a nice human readable way of controlling it: MyPIECDevice.sample_rate = 2000.

Keeping things physically close in memory has always been a strong requirement, as long as cache, pages, and larger than one-byte-memory buses have existed.


> Set a pointer to and address and write to it. Better yet, I can define a packed struct that maps to a peripheral, point it to its memory address from a data sheet, and have a nice human readable way of controlling it: MyPIECDevice.sample_rate = 2000.

Just make sure you don't forget `volatile` in the right places. A lot of codebases end up just using their own wrappers written in asm for this kind of thing, because the developers have learned (rightly or wrongly) not to trust the compiler.

To be clear, it's not that hard to get the memory layout semantics you want in C. But issues around concurrent access, when it is acceptable for the compiler to omit loads & stores, whether an assignment is guaranteed to be a single load/store or possibly be split up (affects both semantics in the case of mmio and also atomicity), are all subtle questions, the answers to which are not at all suggested by the form of the code; The language is very much designed with the assumptions that (1) memory is just storage, so it's not important to be super precise on how reads and writes actually get done (in fairness, the lack of optimization in the original compilers probably made this more straightforward), and (2) concurrent access isn't really that important (the standard was completely silent on the issue of concurrency until C11). If you care about these issues there's a lot of rules lawyering you have to do to be sure your code isn't going to break if the compiler is cleverer than you are. A modern take on C should be much more explicit about semantically meaningful memory access.

I think you can make a sensible argument that wrt hierarchies C is at least not a heck of a lot worse than the instruction set, so maybe I'm conceding that point -- though the instruction set hides a lot that's going on implicitly too. Some of this though I think is the ISA "coddling" C and C programs; in a legacy-free world it might make more sense to have an ISA let the programmer deal with issues around cache coherence. I could imagine some smartly designed system software using the cache in ways that can't be done right now (example: a copying garbage collector with thread-local nurseries that are (1) small enough to fit in cache (2) never evicted and (3) never synced to main memory, because they're thread-local anyway). Experimental ISA design is well outside my area of competency though, so it's possible I'm talking out of my ass. But the general sentiment that modern ISAs hide a lot from the systems programmer and that other directions might make sense is something that I've heard more knowledgeable people suggest as well.


>If you care about these issues there's a lot of rules lawyering you have to do to be sure your code isn't going to break if the compiler is cleverer than you are.

>A modern take on C should be much more explicit about semantically meaningful memory access.

If you are working on concurrent code close to the hardware you’re going to either have to accept a less efficient language or engage in rule lawyering. Unfortunately, granting the compiler license to perform the most mundane optimizations interferes with concurrent structures. Fortunately, with C there are rules to lawyer with, and they actually are simple. No matter what, rules will always need learned.


I definitely agree with all your criticisms of the memory semantics in C, and I would love a language that fixed these flaws, but the "ideal" low-level language is still a lot closer to C than it is to anything else. I also think that C, being low-level, is much better poised to deal with experimental ISA designs than higher-level languages. For instance, one mechanism of manual cache control could be that you set bit 63 in a pointer to indicate that loads/stores from should place the corresponding cacheline in a high-priority. That's pretty trivial with a pointer in C, but a lot harder with say a C++ reference.

> It's trivial to layout memory as you please, where you please, very directly, in C

It wasn't trivial before fixed width integral types, which is fairly recent in C terms (C99), and it's still far more complicated than it needs to be.

Furthermore, the fact that C is the defacto language of performance means that our hardware has been constrained by needing to run C programs well in order to compete.

Think of all the interesting innovation we could have had without such constraints. For instance, see how powerful and versatile GPUs have become because they didn't carry that legacy.


>see how powerful and versatile GPUs have become because they didn't carry that legacy.

GPUs are the best example for why C is a good lower-level high-level language, seeing how CUDA is programmed in C/C++.

Do you have any examples of architectures that could exist if only they weren't constrained by legacy C?


> GPUs are the best example for why C is a good lower-level high-level language, seeing how CUDA is programmed in C/C++.

CUDA is not C or C++. That you can program GPUs in a C/C++-like language does not entail that C/C++ is a natural form of expression for that architecture.

> Do you have any examples of architectures that could exist if only they weren't constrained by legacy C?

Turing tarpit means that every architecture could be realized, but that doesn't make it a particularly efficient or a natural fit for the hardware.

For instance, consider that every garbage collected language must distinguish pointers from integer types, but no such distinction exists in current hardware, and the bookkeeping required can incur significant performance and memory constraints (edit: C also makes this distinction but it doesn't enforce it).

Lisp machines and tagged hardware architectures do make such a distinction though, and so more naturally fit. With such distinctions, you could even have a hardware GC.


>That you can program GPUs in a C/C++-like language does not entail that C/C++ is a natural form of expression for that architecture.

It's not a matter of what is/isn't a "natural form of expression." The point of C/C++ is to be high-level enough for humans to build their own abstractions over hardware. (sounds like an OS, right?) The success of the design of C/C++ is in that the creators had no knowledge of modern GPUs, yet GPUs can efficiently execute them with a little care from developers. We use other abstractions (e.g. SciPy on Tensorflow) because they are more appropriate to solve our problems, but they are built on C.

>Lisp machines and tagged hardware architectures do make such a distinction though, and so more naturally fit. With such distinctions, you could even have a hardware GC.

And why would that not be backwards-compatible with legacy C?

Particularly, I am rejecting the idea that C is somehow stunting hardware development - I see no evidence of this fact. I am also skeptical about the claim (although I will not reject it outright) that there is a language substantially better fit compared to C for low-level programming (e.g. embedded, kernel).


> It's not a matter of what is/isn't a "natural form of expression." The point of C/C++ is to be high-level enough for humans to build their own abstractions over hardware.

Sure it matters. If primitives don't map naturally to the hardware, then you have to build a runtime to emulate those primitives, just like GC'd languages do.

> The success of the design of C/C++ is in that the creators had no knowledge of modern GPUs, yet GPUs can efficiently execute them with a little care from developers

You cannot run any arbitrary C program on a GPU. This fact is exactly why GPUs were able to innovate without legacy compatibility holding them back.

Only later were GPUs generalised to support more sophisticated programs, which then permitted a subset of C to execute efficiently.

The progress of GPUs proves exactly the opposite point that you are claiming. If C were so perfectly suited to any sort of hardware, then GPUs would have been able to run C programs right from the beginning, which is not true.

> And why would that not be backwards-compatible with legacy C?

That's not the point I'm making. Turing equivalence ensures that compatibility can be assured no matter what.

The actual point is that CPU innovations were tested against C benchmark suites to check whether innovations effectively improved performance, and some or many of those that failed to show meaningful improvements were discarded, despite the fact that they would have had other benefits (obviously not all of them, but enough). It's simply natural selection for CPU innovation.

It's incredibly naive to think that only hardware influences software and not vice versa. For instance, who would create a hardware architecture that didn't have pointers? It would simply never happen, because efficient C compatibility is too important.

The problem is that C was given a disproportionately heavy weighting in these decisions. For instance, a tagged memory architecture would show zero improvement on C benchmarks, but it would have been huge for the languages that now dominate the software industry.

> that there is a language substantially better fit compared to C for low-level programming (e.g. embedded, kernel).

The limitations of C are well known (poor bit fields and bit manipulation, poor support for alignment and padding, no modules, poor standard library, etc, etc.).

Zig addresses some of those issues. Ada has been better than C for a long time. A better language than all of these could definitely be designed given enough resources, eg. see the research effort "House" [1].

[1] http://programatica.cs.pdx.edu/House/


>If primitives don't map naturally to the hardware, then you have to build a runtime to emulate those primitives, just like GC'd languages do.

That's only half the equation. Hardware cannot save you from semantics that are less efficient. To use your example: every GC'd language must have a runtime system track objects, whether that is implemented with or without hardware support. That system constitutes additional overhead -- either precious silicon is used delivering hardware support for GC or clock cycles are used emulating that support. Either way, you're losing performance. C/C++ have semantics that are easy to support, in contrast.

>You cannot run any arbitrary C program on a GPU.

Nor can you run any arbitrary C/C++ program written for Posix on Windows, or a program written for the x86 on a STM32, etc. You have always had to know your platform with C/C++. The point is that they are flexible enough to work very well on many platforms.

>This fact is exactly why GPUs were able to innovate without legacy compatibility holding them back.

GPUs have become a lucrative business precisely because they have begun exposing a C++ interface. Look at how the usage of graphics cards have changed in recenter years.

> If C were so perfectly suited to any sort of hardware, then GPUs would have been able to run C programs right from the beginning, which is not true.

No. GPUs _were not_ general purpose compute devices from the beginning, as you pointed out. You had GLSL, etc. but the interface exposed to programmers was not Turing-complete. From what I gather, GPUs have only had a Turing-complete interface since shader model 3.0, which first appeared in 2004. By 2007, you had nvcc. Today, C++ is very well supported by CUDA. You may as well be saying "You can't run C on a cardboard box, so it's obviously not well-suited to all hardware." Obviously, your hardware needs to expose a Turing-complete interface for a Turing-complete language to be able to run on it.

>The problem is that C was given a disproportionately heavy weighting in these decisions. For instance, a tagged memory architecture would show zero improvement on C benchmarks, but it would have been huge for the languages that now dominate the software industry.

At what cost? As I already pointed out, adding support for VHLLs at the hardware level means you are spending silicon space on that task => languages like C will be slower. Yes, a lot of software is written in JavaScript, Java, and Python, and these languages would benefit from that hardware support. But people using JavaScript, Java, and Python generally are relying on C services (memcached, redis, postgre, etc) to do their heavy-lifting anyway, which you just made slower.

>For instance, who would create a hardware architecture that didn't have pointers? It would simply never happen, because efficient C compatibility is too important.

No. It would never happen because the machine you just described would make a very bad general purpose computer.

>The limitations of C are well known

Yes, they are. But everything you listed isn't substantial. It's C, with a better standard library, standard support for controlling alignment/padding, and modules. That's not significantly different.


> CUDA is not C or C++.

Cuda is a C++ API. On modern hardware, it's programmed in purely standard C++.


Give me an example of where C is allowed to optimize your data layout and/or locality. Afaik it is incredibly restrictive in this sense, because of how well defined it is. The less things it gives as guarantees with regards to layout the more wiggle room it would have, and languages like C cannot do some things that languages with a gc can do that can improve cache locality.

It's not, that's the point, the language/compiler cannot interfere with the programmer fine tuning data structures to suit the underlying architecture.

I thought the discussion was around compiler optimizations? That's the point

I didn't read it as such. The point behind what I and the parent are saying is, the programmer is going to do much better at optimal memory layout than an optimizer can, and C allows manual control while languages which can mess with memory layout necessarily cannot.

> Every single design decision about C was made with the computer in mind.

A computer. The PDP-11. C was a terrible fit for a lot of the popular contemporary architectures when it was designed (PDP-10, Burroughs large systems, UNIVAC 1100, CDC mainframes, HP 3000), and continued to be a very poor fit for many computers in the 1980s (segmented 8086 and 286, 6502, AS/400, Lisp Machines except for Xerox, Connection Machine, Tandem, Novix/RTX, Transputer, etc.).


While it is true that C perfectly matches the hardware, it imperfectly matches the rich software abstractions which are needed for moderate and large software. The lack of namespaces, sane object creation, destruction features etc, makes programming tedious. A large proportion of code in many large C programs go into recreating imperfectly the features that are by default provided by richer programming systems, and that repeats for every large program you do. It quickly becomes boring and unenlightening, to recreate an exception handling mechanism or data structure implementation for the nth time. Why would anyone not prefer not having to focus on the mere infrastructure, and rather direct directions on the interesting problems to be solved.

Another thing is that large C code bases tend to become ensconced in layers of preprocessor macros, which I think is a hack-y way of doing things.


TinyGo recently became an official project and seems extremely successful.

> Go is probably never going to run on microcontrollers due to its very big overhead.

My employer runs go on micro controllers. Definitely suboptimal, but as long as the cost of increasing hardware capacity to accommodate Go is feasible then it's a viable option.


Isn't it counter productive to go out of your way to find better hardware to fit your language choice? Why not choose a lighter language?

Libraries might be easier to use, and you can use the same language on the server and hardware.

> Rust isn't viable for embedded platforms, at least not yet.

no_std allows the important hooks of panicking, output and allocation to be implemented by the user. It's also very easy to put in hard-coded pointers that represent memory-mapped hardware. And there's no GC. Furthermore, it's entirely possible to convert only a portion of a project to using Rust while working to gradually to replace/reimplement.

> If I were to make one change to C, it would be to completely rip out the #include system [preprocessor] and bring a proper modules system.

Congratulations, you've just reinvented Java, D, Rust and Go.

> Apart from that, it's pretty much perfect.

This seems like a religious opinion rather than having understanding of different paradigms. Have you been paying attention to why Java, Erlang, Go, Rust and exist? Rust has numerous advantages over C that eliminate entire categories of problems without sacrificing speed. If you can't see that, then maybe you don't want to see it.


C is still king in embedded systems. It is also great at making you aware of the machine. In C, very little is happening behind the scene. If you want something to happen, you need write some code. Objects will not initialize themselves, allocated memory will not free itself, no smart reference mechanisms, just explicit pointers. This understanding of the machine will help understanding why higher level langages behave the way they do.

But anyways, programming skills is not so much about the language. A good programmer in one language will be a good programmer in any other language very quickly. But still, if I have to hire a programmer for a project in a language he doesn't know, I will tend to prefer C programmers over those who only use higher level languages. The reason is that C programmers may do things that are not pretty, but they usually understand what they are doing, people who are only accustomed to some higher level language may come up with better designs, but write code that make no sense.


> It is also great at making you aware of the machine. In C, very little is happening behind the scene.

This is not true. It hasn't been true for decades. C is an abstract high level language on modern processors.

https://queue.acm.org/detail.cfm?id=3212479


True, but if it is still possible to be a C programmer who can more or less imagine what assembly will be generated (at lower levels of optimization at least) by their C code, then isn't the real issue that x86 assembly abstracts away a lot of hardware info about caches, pipelines, microcodes etc? In that case how can a low level exist on x86?

I'm not sure that's true. There was a highly upvoted post recently on both HN and /r/programming that claimed to have a better sorting algo than libc's sort. After hundreds of comments across both forums, looking into compiler explorer showed that the key difference was inlining.

As a C++ person, basically all micro-optimizations start at "look at the damn assembly" since people really suck at predicting what code will actually be generated.


How does inlining happening in C++ mean that the asm in C isn't predictable?

Even with inlining the point is mostly the same - there aren't small program changes that lead to big changes in what is generated.


Language fluency is very important but real programming is program architecture. Learning C++ or spending time with object oriented paradigms in general will boost your C abilities.

I agree with you 100%. Learning an object oriented language will make you a better C programmer just as learning C will make you a better, say, Java programmer.

My point is that learning C is definitely worth it. Not only it has real life applications but it will help you become a better programmer in general. But it doesn't mean you should limit yourself to C. Go and Rust are good too, and even the most hated languages (ex: PHP) can teach valuable lessons.


If you are troubleshooting software problems in any language (except, usually, Java), sooner or later you dig down and hit C or C++. In those cases knowing C means you can solve the problem.

There are a lot of libraries written in C, like, thousands in Debian alone. You can use them from any language, but sometimes you need to write a little bit of glue C to get that to work. Sometimes it's less painful to just write your program in C++ or Objective-C so you don't have to debug your glue C.

If you want to write a library that can be used from any language, basically your options are C, C++ (but realistically its C subset), and Rust. Getting things to work all the time is easier in Rust than in C or C++. Getting things to work some of the time is easier in C.

But for most of these things it's probably adequate to be a mediocre C programmer. Unless your mediocrity is manifested in spending a week on tracking down a bug that should have taken an hour, maybe.


> Getting things to work all the time is easier in Rust than in C or C++. Getting things to work some of the time is easier in C.

I think this is a good generalization of the learning curves of the languages, but not necessarily productivity for an experienced developer. Rust has modern amenities like pattern matching, generics and a package manager.


You may be right; I'm still a novice at Rust. I've heard people with substantially more experience in Rust telling me they still find it slow going compared to C, but they may not be experts, and they may not be representative. And surely in domains like compilers, which benefit more from pattern matching and automated storage reclamation, Rust would have a significant edge.

For me it is a smaller dependancy stack. For example I needed some tooling that worked on RHEL/CentOS, back to version 4.x and 3.x (yes, I know). Also, someone at work wanted me to get into Swift... They had Ubuntu packages but nothing for RHEL or Fedora (that may have changed now though).

If you can spare the overhead, Perl is a good option for some of these. Much of the CPAN ecosystem will still work, and anything written that doesn't require much/any external modules it will almost definitely work with no changes. There are plenty of Perl scripts kicking around from that time, and earlier.

The biggest problem would likely bad code quality standards of earlier periods (lots of budding programmers in the dot com boom wrote a lot of poorly done code that survives today, which is responsible at least in part for Perls reputation as write only), but if you're just deploying your own code, that's less of an issue. Perl can be written to be readable and obvious, it just takes control. In that respect, I imagine it's a lot like C.


No idea when they were added, but there are Fedora packages for Swift now under the name "swift-lang". I don't use Fedora personally, but from what I can tell, they should be available in the default package repositories with dnf.

My take is this:

C is necessary in some paradigms/domains. But that list has been shrinking as other languages are born and mature.

Learning C, like learning most things, will still certainly lend itself to lots of things you do, even if you're not using C.

Why learn C? Because you want to do stuff on that list or because you feel like it. Why not learn something else instead? It may very well be that there's more worthwhile things to learn depending on your goals.


> Late into a career investing in other areas, what's the advantage of becoming a good C programmer? Especially in a time where Rust and Go are viable options?

It is the most popular programming language. There is a lot of code written in it. Most jobs involve maintaining extant code. So it's good for that. It's safe to say there will be a need for C programmers for the next 100 years.


Oh come on now. I will develop in "safe" languages where possible (my favorites are F# and Erlang), but when I need to do something on the hardware, I still use C (and C++). Rust and Go are not viable options for everything, especially when you need complete control.

Why does the Rust community have to make every discussion about Rust?


What can c do that rust can't?

The C ABI is still king.

Why does everyone always suggest K&R? I really don't see the big deal. It's short, contains only trivial toy examples with no real world application, and touches almost nothing on actual project architecture or best practices. I bought it expecting to learn to write programs in C but it's really just a reference manual on syntax.

> Why does everyone always suggest K&R?

Because it's the best no bullshit book to learn C, period. Especially coming from garbage collected languages. It's not going to teach you valgrind or GDB, or even compile and link C programs but it will certainly teach you how to write C programs.

> only trivial toy examples with no real world application

Re-implementing POSIX commands is writing real world applications.

> and touches almost nothing on actual project architecture or best practices.

On what platform? Linux? Windows? Mac? tool chains are so different on all these platforms that it wouldn't make sense to spend time on that. That books is about C, not learning autotools and other horrors.


In short: It is a great starter. The book is so tiny it won't discourage people and get them going in no time.

> It is all you need to know about C…for the first few weeks.

It seems the author was aware of this. Isn't syntax a good place to start for a new language?


> trivial toy examples with no real world application

It teaches you how to write malloc. Plenty of the examples have real world applications. If you want a book on architecture then get a book on architecture. K&R is about C.


This is a great list of books - at least, I found the same ones were the most excellent. Also I really learnt a lot from 21st Century C (The author here said they wanted to stick to C89, which is fair enough.)

>To read great C code will help tremendously.

But then he just gives examples of games I don't know and am not really interested in. Anyone know some non-game C that is, I suppose, well-known and wonderfully-written?

Also, am learning about writing APIs, any good books or talks about that people could recommend, or examples of particularly good ones in C? (I'm particularly interested in maths/graphics-type stuff.) Thanks!


I find the source code of the Netsurf browser really beautiful:

- https://netsurf-browser.org/

- https://source.netsurf-browser.org/netsurf.git/tree/

Including the sub projects, for instance:

- https://source.netsurf-browser.org/libcss.git/tree/

- https://source.netsurf-browser.org/libdom.git/tree/

It feels immediately obvious and understandable, which is quite impressive for a browser. Everything is well documented, well separated, the code is clean and seems simple, functions are small, etc. I am really impressed. I'd want to work on such code. I have never worked on it though, and I am not connected in any way to this project.


Very well written code indeed. Thanks for the links. They could make error management less bug prone by using goto, but it's definitely high quality C code nonetheless.

The sqlite codebase is well-known for being a large, well-written C codebase:

https://github.com/sqlite/sqlite

The tests in particular are very impressive.

Some other notable C codebases: Redis, LuaJIT, FreeBSD, Memcached -->

https://github.com/antirez/redis

https://github.com/LuaDist/luajit

https://github.com/freebsd/freebsd

https://github.com/memcached/memcached


PostgreSQL’s source code is also really good. https://git.postgresql.org/gitweb/?p=postgresql.git;a=tree

Varnish and the Linux kernel should be in that list as well I think.

Nice! Here are the links:

Varnish --> https://github.com/varnishcache/varnish-cache

Linux Kernel --> https://github.com/torvalds/linux


I recommend git source code which was pretty neat last time i checked.

recently I started working on a module for nginx and have developed a full-blown crush on the module loader and its relationship with core functionality. haven't seen anything else quite like it, in C, that is. major newb talkin'

also plan9, its nice to read some kernel code that hasn't been tortured by practical requirements for decades.


The core Linux kernel is well-known and mostly well-written C.

I'd say "advanced" C, not necessarily "well-written" C. There's certainly a lot of C in it, and I'd bet it's the largest open source C codebase.

They go beyond C, it's actually C + GCC extensions. The only reason clang can compile it (in some configurations) is that they painstakingly implemented each of the GCC extensions the kernel uses (and when they were almost done, the kernel started requiring a few more).

"C Interfaces and Implementations" is another one that comes highly recommended.

https://sites.google.com/site/cinterfacesimplementations/


This book is fantastic to tackle for anyone comfortable with C at the level of K&R. A great second book on C.

And one of the few examples of literate programming one might come across!


The K&R book was for decades my gold standard of a programming book. Perfectly clear and concise, its one notable failing being that it's obsolete.

The Rust Programming Language [1] has since equalled and surpassed it in my mind. Hats off for Steve Klabnik & Carol Nichols :)

(note, I'm not saying Rust is a better answer than Fabien's recommendation, just that its book is as high quality as the excellent K&R)

[1] https://doc.rust-lang.org/book/


I think for us laymen, the most difficult thing is to find a good reason to use C frequently. I have a few "legitimate" (in the sense that C is actually suitable for those) non-embedded projects that I'd like to take on in the future when I have time to pick up C:

1 - Recreate some of the Linux command line tools, including a command line interface similar to BASH;

2 - Implement a fully functional compiler or byte-code Virtual Machine for a stripped down scripting language;

3 - Write a key-value data store

Sadly I really can't name any project that is not system programming. I saw some projects that use C for backend (CGI style?) programming or game programming but I believe there are more suitable tools for either of them.

BTW I also found it very clear to use C or a stripped down version of C++ to teach myself Data structure.


C + knowing the Linux API and what is done in kernel vs user space can lead to very efficient Linux software. Since much consumer software these days is a webapp with a Linux backend, this sort of embedded optimization can make good consumer sense.

It's a pity that it's almost impossible for a non programmer to find a C/C++ job nowadays. Much of the deeper knowledge takes too much time and effort to master from outside of the field so I guess most people don't go this way.

A lot of widely used recent CLIs have been implemented in Go or Rust, e.g. I don't think I would choose C for such a task these days.

https://github.com/BurntSushi/ripgrep

https://github.com/sharkdp/bat

https://github.com/jesseduffield/lazydocker

https://github.com/ogham/exa

https://github.com/docker/cli

https://github.com/cli/cli


> No website is as good as a good book

Good (and often forgotten) advice for more than just C...


I don't agree with this sentiment. People learn different ways. I personally learn through trial and error, and explanations of things that I haven't been able to "touch" yet don't help me at all.

> I don't agree with this sentiment. People learn different ways. I personally learn through trial and error, and explanations of things that I haven't been able to "touch" yet don't help me at all.

Learning through trial and error can be okay for custom problems that don't have well defined solutions. For simple problems, it's highly inefficient. Extreme example: if I want to print something to the console in C, I can use trial and error for a half-hour to try to understand the nuances of how to use printf(), or I can read for a few minutes a few pages in a book or an (good) online tutorial. Not only will I learn faster, I will also learn best practices.

My experience has been that people who are really against learning through books/tutorials tend to have short attention spans. Which is okay—books don't work for you. But for people who do have the ability to sit and read, it can be much more efficient than struggling through trial-and-error.

Somebody has already solved your problems, and they can tell you in a few pages how you can solve them too.


I like the K&R book, but I think what really taught me the most about C was actually writing programs in C. And playing around with particularly interesting features like function pointers and OS library interaction such as dynamic memory allocation.

> The Standard C Library

> Or how errno came to existence?

This is interesting. According to this book, errno was created because they wanted system calls to work like ordinary C functions:

> Each implementation of UNIX adopts a simple method for indicating erroneous system calls.

> Writing in assembly language, you typically test the carry indicator in the condition code.

> That scheme is great for assembly language. It is less great for programs you write in C.

> You can write a library of C-callable functions, one for each distinct system call.

> You'd like each function return value to be the answer you request when making that particular system call.

This turned out to be unnecessary. For example, many Linux system calls return a signed size_t type with either the result or the negated errno constant. When an error occurs, the C library stub function simply negates that value and assigns it to errno which is likely a thread-local variable.

The C standard library provides many examples of bad design. All the hidden global data, the broken locale handling, the duplication found in so many str* and mem* functions... Freestanding C lacks most of the standard headers and is a better language for it. Understanding the historical context that contributed to these designs is very useful since it allows new languages to avoid repeating these mistakes.


>> Expert C Programming . This book is fantastic because it will bring your attention to what happens under the hood in a very entertaining way. I am not a C programmer nor was trying to become one. Just trying to be familiar with low level programming and computer internals in general. But I enjoyed reading the book a lot. One of the the stories I remember is about the programming contest at Carnegie-Mellon University. The rules were simple that the program had to run as fast as possible and the program had to be written in Pascal or C. Here are some paragraphs extracted from the book about the result. “The actual results were very surprising. The fastest program, as reported by the operating system, took minus three seconds. That's right—the winner ran in a negative amount of time! The next fastest apparently took just a few milliseconds, while the entry in third place came in just under the expected 10 seconds. Obviously, some devious programming had taken place, but what and how? A detailed scrutiny of the winning programs eventually supplied the answer. The program that apparently ran backwards in time had taken advantage of the operating system. The programmer knew where the process control block was stored relative to the base of the stack. He crafted a pointer to access the process control block and overwrote the "CPU-time-used" field with a very high value. The operating system wasn't expecting CPU times that large, and it misinterpreted the very large positive number as a negative number under the two's complement scheme. “

This reminds me of Mel the real programmer. I'll quote:

Mel loved the RPC-4000 because he could optimize his code: that is, locate instructions on the drum so that just as one finished its job, the next would be just arriving at the “read head” and available for immediate execution. There was a program to do that job, an “optimizing assembler”, but Mel refused to use it.


Using C as a lowest-common-denominator for game engines makes sense but I do love having Objective-C (despite the platform limitations). It just makes expressing game logic so easy, with protocols, etc. and I would not want to do it all with just C.

Interesting comment they made about using C89 for portability. What's the state of C99 today? I know gcc doesn't support all C99 features, but what about other compilers? Is it worthwhile to use C99 in a new project?

You can safely use C11 or even C18 if you’re not too worried about support beyond Linux, macOS, and Windows with Clang. Game consoles and embedded development might require crappy old compilers that don’t support anything but C89, but otherwise use the latest version you can.


I believe GCC does support all of C99 and has for a while. https://gcc.gnu.org/c99status.html

GCC supports c17 already. Using c99 is fine, and plenty portable.

C99 is well supported now, but Its has become increasingly clear that some of the changes to the definition of undefined behavior in the spec has made som very troubling compiler optimizations possible that can make C a lot less dependable. Later versions of C also adopt a new memory model that is very broken. memcopy is for instance impossible to implement in C17. Due to these issues the Linux kernel is no longer using standrard C. So as it stands C89 remains a good choice.

> Later versions of C also adopt a new memory model that is very broken.

You mean, adopting a memory model that properly supports multi-threaded code?

> memcopy is for instance impossible to implement in C17.

What? I assume you mean memcpy? How is it impossible to implement given that it's implemented in the standard library?

> Due to these issues the Linux kernel is no longer using standrard C.

As others have mentioned, the Linux kernel never used standard C. Just search for, say, the substring "__builtin" where it uses compiler builtins directly. It's also quite recent that Linux can be compiled by clang; if it were using standard C, such compatibility issue would not arise in the first place.


The new memory model tried to make it possible to implement fat pointers with reference counters. That means that the compiler must be able to tell when a pointer is being copied. After the spec was ratified several people pointed out that its not possible to implement memcpy(typo sorry) since memcpy doesn't know what it is copying and might therefore copy a pointer without increasing the ref counter.

> Due to these issues the Linux kernel is no longer using standrard C.

The Linux kernel never used standard C, since the very first release (0.01) it already depended on GCC extensions.


It has always used extensions to do things like lin line assembler, something you need when writing an operating system. What has changed is that compiler writers now interpret the new spec language in some ways that Linus thinks are broken, due to the new definition of UB. Like this:

void set_value_to_zero(int * p) { if(p == NULL) throw_an_error_and_exit(); *p = 0; }

In c89 writing to NULL is gives you undefined behavior. In c99 the compiler can assume that no code path will produce code that writes to NULL. That's a huge difference! It means that looking at the above code the compiler can reason: since there is a path to writing to the pointer p, the compiler can assume that p will never be NULL. Therfore p == NULL is known to be FALSE at compile time and the null test can be optimized out. I (and Linus) think this is insane.


Where can I read more about this?

I consider [0] to be well written and fairly low fat.

0 https://github.com/Genymobile/scrcpy


I think you pasted the wrong URL.


I would add "C Interfaces and Implementations" to his (excellent) list.

This is a good list because it isn't a gigantic compendium about C.

C, the low-level language that cannot even get integer types right and believes that -1 is greater than 1... I wish the Zig language people get some industrial backers so C can finally start fading away to obscurity.

> I picked C89 instead of C99

Stopped reading right there.




Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: