Hacker News new | comments | ask | show | jobs | submit login
Essential C (stanford.edu)
222 points by krat0sprakhar on Aug 12, 2012 | hide | past | web | favorite | 53 comments

C is an excellent first language when teaching in a CompSci type environment.

If the students are of a different discipline (not aiming to be CompSci grads) or much younger then i think there are better choices e.g. Python.

In a CompSci academic setting i think learning assembly first (that is first programming language but in parallel with the other classes such as computer architecture etc.) is also a valid approach.

Programming from the Ground Up [1] is a very accessible introduction to x86 ASM and even supposing you don't go any further than the examples in this book, you'll still feel the benefits.

[1] http://ftp.igh.cnrs.fr/pub/nongnu/pgubook/ProgrammingGroundU...

I am split on if I think C/assembler is the best introduction material for computer science, or if scheme is.

What I am rather certain of though is that just about everything in-between is not. You don't really see the best of both worlds, but rather the worse.

(I should probably clarify that I think there is a large disconnect between what makes a good language for teaching and what makes a good language for software development in a corporate environment.)

Those are two of the "pure" approaches, and recently there's a third one, something in the ML/Haskell orbit. C for the low-level/machine-oriented approach, Scheme for the lambda-calculus-esque approach, and ML/Haskell for the type-theory-oriented approach.

Whether the Scheme approach is the best theoretical introduction depends on part on how important you think type systems are to modern CS. Much of PLs thinks the answer is "very important", since they view types as the basis of rigorously specifying program behavior. Rob Harper has taken that view at CMU, for example, using Standard ML in his revamped intro course. (I'm not strongly opinionated on that subject myself.)

I think the best introduction would be Scheme (to learn algorithms), assembly language (to learn machines), and then C (to unite the two). After that, I would recommend some strongly-typed language like Haskell or OCaml. And finally Python to actually get work done. :)

My friend said that UC Berkeley's undergraduate CS program followed this introduction. (In the 1990s, at least; I don't know about now.)

I think the case could be made for either C or Scheme because they are both quite small languages compared to most of the pl that are in common use. I've seen people start with something like C++ and they spend more time playing language lawyers than learn about algorithms and data structure.

The benefit of Scheme over C is that people are less likely to have experienced it before university/college. We were taught Java and C/C++ in CS, but the problem was that many people already knew them, so you ended up with a situation where half the students were bored and the other half had no idea what the lecturer was talking about.

I think Scheme is the way to go because it's very easy to build a real compiler for it and introduce assembly that way (see http://scheme2006.cs.uchicago.edu/11-ghuloum.pdf if you don't believe me).

Thanks for the link, found a choice quote from there on page 234:

"If you are constantly looking for new and better ways of doing and thinking, you will make a successful programmer. If you do not seek to enhance yourself, 'A little sleep, a little slumber, a little folding of the hands to rest - and poverty will come on you like a bandit and scarcity like an armed man.' Proverbs 24:33-34"

"And thou shalt make loops" Exodus 26:4

I still prefer Python as the first language someone learns. Especially in a school environment where someone is trying to decide majors, starting off in C might be too much to grasp. It might be a turn off for the person where as in Python can get students up and running quickly and able to produce some really cool things (games (pygame), a twitter like app (using google app engine), etc. I'd rather get someone excited about CS first then go back and teach them the important inner workings such as memory management, pointers, heap v. stack, etc.

I'm in for assembly as a first language, provided it's for a clean architecture. A PC has all kinds of weird legacies (that enable it to boot MS-DOS and behave like an IBM 5150) that can make life miserable for an apprentice.

People say x86_64 is much cleaner than x86, but I'd prefer ARM, MIPS or 68K. I learned assembly on an Apple II and the 6502, while anachronistic by today's standards, is a nice place to start.

Maybe one day I'll find the time to learn enough of VHDL to implement a 6502-inspired 64-bit processor and build a reasonable computer around it.

I don't (technically) agree with assembly as a first language because of the daunting amount of system knowledge a beginning student needs.

However, once a student starts groking C (or C++ for that matter), especially pointers, bit manipulation, and stacks, I think assembly is invaluable for really understanding what the computer and your code is doing. This is especially true when paired with the aforementioned systems course.

I guess I am agreeing with you in the end...

Trying to write JavaScript that takes advantage of a single engine such as V8, hidden classes, type immutability, performance of 31 vs 32 bits, lack of 64 bit integer operations, cost of closures, etc. in the end, you spend so much time trying to work within this indirection that C begins to look far less wasteful and more direct, in terms of both human and machine time, especially if you have the general design of what you're trying to do figured out.

This gets at one of the reasons I like Common Lisp: Common Lisp allows you to program at a high level and then stepwise refine your program to make it more efficient, such as by adding type declarations, adding other declarations, changing your data types, and, at an extreme, changing your algorithms to run without CONSing (that is, eliminating dynamic allocation and garbage collection), not to mention whatever your specific implementation provides as an extension to the standard.

Most code can stay fairly high-level, but the tools are there and a standard part of the language if you need them.

I've seen some lispers going 'full stack' programming. Clojurians (like this guy http://www.learningclojure.com/2010/09/clojure-is-fast.html) also aim to cpu cycles when they program. It's an interesting workflow to prototype high and fast and the project it down baremetal.

The fact that Lisp systems have traditionally included an easy-to-use asm inspector right in the environment is an interesting aspect imo. It's actually more common in my experience for Lispers to look at their generated asm than it is for (non-embedded) C coders to do so, because it's so easy to just (disassemble #'foo).

It's indeed nice to have direct connection between high and low. It's also in lisp culture to unfold layers of code transformations rather than stay in your own isolated layer and relying blindly on 'external tools' that will never be as integrated syntactically and systemically as ... well lisp.

Lisp aside, having this point of view seems to be a good goal to reach.

ps: There's also one very important lower layer, the memory/cache subsystem.

The document talks about C89, even though the last Copyright year is 2003. Isn't it about time that we start movin on? Where are the restricted pointers, the flexible array member and so on?

And, by the way, I think the comment at the bottom of page 27 is not correct:

The qualifier const can be added to the left of a variable or parameter type to declare that the code using the variable will not change the variable. As a practical matter, use of const is very sporadic in the C programming community. It does have one very handy use, which is to clarify the role of a parameter in a function prototype...

Actually, the use of const is encouraged as it helps the compiler to catch more errors as well as to enable some optimizations.

There's another error here:

The qualifier const can be added to the left of a variable or parameter type

If you have the declaration

  char* str;
Adding const to the left of the type doesn't make the variable immutable:

  const char* str;
This indicates that str will point to a character or string which must not be mutated. str itself can be reassigned. To make str immutable, you need

  char* const str; // immutable pointer to mutable string

  const char* const str; // immutable pointer to immutable string
Const does tend to be used little in pure C, though you pretty much can't escape it in C++ code. (I also use it a lot in C, and have type warnings/errors cranked up to max)

I just spent the past 4 months programming a library in C89. I would have MUCH preferred C99 if only for inline variable declaration.

The key issue for my project was being cross platform. Since Microsoft's compiler does not implement C99 and never will, my hands were tied.

Now that's just my library. Other programers might be willing to decide case by case which features of C99 to use. Still if the document dates back to 2003 then C99 support everywhere would have been even more primitive. C89 would thus be the right decision for a introduction to C to avoid confusion.

I'm curious: could you write your program in the subset of C99 that'll compile under C++, and then use C++ mode to compile it under Visual Studio?

I suppose I could but that'd be too complicated for my taste. In fact I wouldn't even need to go that far. Many of the most useful things are implemented as extensions to C89 in the popular compilers. Instead I stuck to pure C89 which I knew would be universally implemented. I like knowing that I can set GCC to ansi + pedantic and have a reasonable expectation of no drastic surprises.

C89 is primitive but charming in the simplicity. For instance while I disliked being forced to declare a function's variables before any code I now see that it encouraged me to keep functions very small. I would find ways to use fewer variables. Instead of 'for(int i = 0; i < max; i++)' I would use 'while(--max >= 0)', anything to avoid moving my cursor to the top of the function.

On the other hand I miss C++'s exceptions. Checking for errors all over the place gets ugly and bloats the codebase. What would otherwise be a two line copy operation between two objects gets two if blocks added which handle the miniscule chance of memory allocation failure in some other deeper frame.

This; a few other especially inexcusable oversights that result from ignoring C99:

- telling students to make up types like Int32 or Int16 instead of using the fixed-width types defined in <stdint.h>.

- Claiming that // comments are "not technically part of the language".

- "C does not have a distinct boolean type."

The comment about const is correct in C. C++ created this dichotomy between const and non-const objects that makes life harder for everyone, but is needed because of how the language and libraries are organized. In C, const is used only with const strings and other data that is truly immutable, as well as a hint about the use of parameters.

I also thought C should be the first language anyone learns. Cause you learn the most (about the machine) and other languages are easy peasy compared (well except for assembler, of course).

I find assembler is _much_ easier than C. At least the x86/amd64 and arm assemblers that I know. And assembler is much closer to the hardware, so would recommend learning x86 assembler as the first language, then C, then some scripting language like Python. It is a hard start but totally worth it.

I don't think that the first language you learn matters (except vocationally, of course!). What matters more is the understanding that every language has it's own 'take' on approaching the problem of instructing computers.

This 'take' (more formally called the language semantics), determines the kind of runtimes, libraries and tools that can be implemented for the language.

With this mindset, it doesn't matter which language you start with. When comparing languages, you go beyond simple syntax differences and compare what are the implications of the semantics. This will allow you to make more informed language selections for your project.

tl;dr: learn multiple languages as well as compare them effectively to profit !

I consider myself a fairly decent programmer. Better than a lot but by no means near the best. Through necessity I've had to learn a LOT of languages over my career and I have to agree with you that learning different types of languages will always be more beneficial to learning any one specific "special" language. That said, I HATE HATE HATE learning new languages. Most of the new languages I have to pick up these days are much like the old ones I've used in the past. The stumbling point for me is finding interesting starting projects to work on to learn them(I can't and I assume a lot of other developers can't) just read a book on a new language and get comfortable with it. Even the best language books have pretty painful starter projects for experienced developers. What we need(or what I need to find if anyone knows one) is a site that lists a bunch of interesting projects that target the specifics of different languages with a repository system for people to upload their results to for any potential feedback.

Hello World or my CD collection manager is no fun for anyone that regularly just glances at the obvious first chapter of any language book(variables) for basic syntax.

Two comments:

I don't hate new languages. I hate that fact that I need 27 different programming and scripting languages on a weekly basis to accomplish my tasks. And for contracts in the Microsoft realm, it's a moving target that changes every two years.

As for project ideas, I usually port smaller projects that I've done for home, work or charity into the new language, then try to refine them using the appropriate patterns/style of the language. I've rewritten some semi-complex utilities in C#, JScript, PowerShell, Python and Ruby.

Since I deal mainly with databases, one idea that I've been playing with recently is grabbing public domain data from the FCC or the Census Bureau and building desktop or mobile apps with the results. Example: take the Amateur Radio License information from FCC.gov and build a application.

"What we need(or what I need to find if anyone knows one) is a site that lists a bunch of interesting projects that target the specifics of different languages with a repository system for people to upload their results to for any potential feedback."

I would love this. The biggest hurdle for me learning any new language is what to write. I mean, I can only rearrange my iTunes library so many times. :)

I've long thought that people should start at the edges and move towards the middle. Start with an assembly language for exposure to the guts of the machine, and with something like Scheme for the theoretical stuff. Then work your way toward the middle with more common languages like C or Java or Ruby or whatever.

I think something higher level is a better first language, but C should be pretty soon after.

> Cause you learn the most (about the machine)

Except when it comes to cache, pipelining, registers, non-uniform memory access in general (not just cache), out-of-order execution and opcode pairing rules, SIMD architectures, and the existence of the processor status word. Other than that, yeah, C exposes you to a lot of worrying about memory allocation when algorithm design would be a better use of your time.

> other languages are easy peasy compared

The only way you could possibly say this is if the only languages you know are imperative Algol-derived ones with minimal type systems and no support for logic or declarative programming. Learning C doesn't make Prolog meaningfully easier. Learning C doesn't even make learning a mainstream language like SQL easier.

Understanding of the cache (specifically) and the memory hierarchy (in general), and of register allocation, are vital to writing efficient C code. Obviously, you don't have to understand that stuff to write marginal C code, but the idea that C hides it from programmers is a bit of a stretch.

I'm not sure what you mean by "the existence of a status word", since much of the expression syntax of C is a mapping of the status bits.

But C doesn't expose any of that in a meaningful, direct way.

You have to have a mental model of the cache and memory hierarchy in your head to write really efficient code in any language.

> the existence of a status word

When I'm writing a bigint package in assembly, I can see whether the previous addition set the carry flag. In x86, I even have an adc opcode. There's nothing like that in C.

This still doesn't successfully counter the statement that C is the best language to start with.

I'm biased since C is my first language. Do you recommend another better suited for the purpose?

I would recommend almost anything with an interactive shell over a traditional C environment.

Python and irb are nice starting points because you can start out by claiming they are just a calculator, where you have to press return instead of =.

From there, you can go to variables (prevent you from having to type, e.g. the gravitational constant or a VAT percentage over and over), then to looping (print multiplication tables), to arrays (store them for later use, or as input for a loop to print year lengths for the planets, computed from their distance to the sun), and then to functions.

And all of that without having to teach people the difference between source, object code, and executable.

I agree that this is the case for casual programming. If you are teaching someone who has hardware experience (electrical engineer, etc), it is sometimes easier to build from the bottom up and C is as low as it goes without being assembly (a good and a bad thing).

For the low level stuff an assembly language is way better than C. C is still a LOT of magic that you won't be able to fully understand without understanding how the machine and the compiler works. In assembly a line corresponds to one instruction. What the assembler is doing is translating each line to an instruction number that the machine can interpret. That is a whole lot less magic than what happens in C. In addition, pointers are much easier to understand from a machine perspective than from a C perspective: they are just numbers indicating a location in memory. C makes that far too complicated with different data types, pointers to local variables (and higher up the call stack), confusing pointer declaration syntax, etc.

You can learn C syntax for assembly language idioms later. Starting out by learning the concepts plus the syntax at the same time leads to inefficient learning because when you're learning concepts like pointers it doesn't help to have to learn confusing stuff and syntax at the same time.

This does not apply to languages that cleanly abstract the machine like Scheme/Python/Haskell/what have you, but C lets the low level stuff shine through so much that you end up having to learn that anyway; you can't really learn it as an abstraction.

My first language was Javascript/Actionscript and then later Java. It wasn't until after that that I learned C and C++. But I agree that C should be one of the first languages learned.

> This still doesn't successfully counter the statement that C is the best language to start with.

I wasn't actually trying to refute that statement above; I was responding to the specific arguments used to support it.

Anyway, I agree with the other poster who replied to you: Pick a language with a REPL, as instant reinforcement of concepts is essential to ingraining them into the mind. Having a longer turnaround time means the lesson gets diluted by being interleaved with too much process (save the file, build the program, run it, look at the output, consider it, etc.).

I don't understand how your comment is a refutation of the assertion that C will teach you more about the machine than other languages.

Which language would you recommend using if you want to learn more about the machine?

> I don't understand how your comment is a refutation of the assertion that C will teach you more about the machine than other languages.

It completely hides some of the most important aspects of any modern hardware from you. The only thing it really exposes you to is manual memory management, and even then the view of memory C gives you is grossly simplified compared to how memory actually works on any modern hardware.

> Which language would you recommend using if you want to learn more about the machine?

Pick a machine and learn that machine's assembly language.

He specifically called out assembly as obviously better in this respect than C...

I highly recommend this and a bunch of other C-related stuff in the Stanford CS library. Their pointer explanations helped me many, many years ago. http://cslibrary.stanford.edu/

Along with Kernighan and Ritchie, I would also highly recommend getting a copy of 'C: A reference manual' by Harbison and Steele. That book explains a lot of stuff by topic and with descriptive examples. Especially beneficial for a quick reference.

Highly recommended.

I've been trying to get through K&R for the past week or so. I think this will make things a lot easier. Thanks a lot!

Well, that kind of invalidates part of the comment I was going to write: "How is this any better than K&R's 'the C programming language'?". I guess it all depends on background and personal preference.

I've never become comfortable with C's pointer/reference/dereference syntax -- I much prefer Pascal in this sense. I recently had a look at Ada and was pleasantly surprised -- it has a very straightforward syntax.

As other's mention here, I think C should be learnt with assembler (and computer architecture - especially the bit about cache hits/misses). I find that:

is a great introduction to assembler and C -- and how the two (can) interact.

Another thing I find frustrating with C is that it's still a bit of a pain to work with unicode/wide strings -- on my todo list is writing a short post on "Hellø wørld (with unicode)" -- with some examples of wide strings in C, (possibly Pascal) and ADA -- along with assembler output and a "pure" assembler version.

There is surprisingly little good material on the web for this (that I managed to find, anyway).

This looks like the perfect material to learn C if you already know how to program (or at least in C-like languages like Java and C#).

Nick Parlante is an exceptional instructor. I highly recommend Google's Python Class he instructed.

I really enjoy the short / succinct format of this, does anyone have links to similar documents for java / javascript / C++?

Oh, this is fantastic. Thank you!

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact