Hacker News new | comments | show | ask | jobs | submit login
A Boggling Return to C (thraxil.org)
162 points by ColinWright on June 2, 2012 | hide | past | web | favorite | 85 comments

I started with assembly language and I thought C was heaven. Of course, there are easier languages, there are more powerful languages, there are fancier languages and whatnot. I like Python and I adore Lisp. But I still love C most.

C is the sweet spot where I can extend my programs to do high-level stuff while still keep my hands down on the actual hardware I'm programming. I like that a lot, probably because I grew up banging hardware. Or maybe it is because I like to keep in touch with the actual device that I program: bending some definitions, C is what my machine does. Even if I embed Lua and script parts of my program, I'm still conceptually working on a C runtime, complete with addresses, pointers, integers and registers. That's why I also like LLVM as it abstracts away different instruction sets into a generic high-level instruction set, much like C abstracts different assembly languages into a generic high-level assembly language.

C also has the property of being most enjoyable code to read. I've spent a lot of time just reading C out of sheer enjoyment. C is tricky: it can be a total mess or it can be terse and beautiful and clear. No matter what, it describes exactly what it does to my machine. Read some of D.J. Bernsteins source trees to get an idea of how neat C can be.

Reading DJB's code can be both awesome in that his code is written clearly and neatly, yet at the same time it can be an excercise in frustration because he has foregone most of the standard library and written his own (especially string management, which is superior, in my humble opinion), so it can look foreign or weird to people looking at it.

The source to qmail/daemontools is a pleasure to read though and having read it I feel like I have learned a lot from what it has to offer.

djb once wrote a FORTH implementation for the IOCC. It would be cool to get it running under current BSD/Linux/Solaris.

Arthur Whitney is another I would put on par with djb. He's a bit older than djb.

For expertise you can't beat W. Richard Stevens. He also studies and wrote about FORTH before focusing on solely on C.

Both Whitney and djb have a true appreciation for speed, efficiency and succinctness; both have solid foundations in maths; both can build very level abstractions. But they have different areas of focus.

djb - secure systems administration and networking. (Stevens - documentation.) Whitney - Lisp background; big data.

Whitney has proven that it's possible, using a matrix-based approach, to meet or beat the speed of C with an interpreted language.

But it's difficult to write UNIX systems or networking code without knowing C. For guidance on navigating the many pitfalls of C, djb and Stevens are as good as it gets.

My intro course in college was run this way as well and to this day, I'm very glad to have started my career bottom up instead of top down...

I did top down starting at Lisp. I'm not unhappy with this choice, know assembly by now, and by the way, totally love C.

Bottom-up and top-down are both better than the worst learning method, which is to start somewhere in the middle and just stay there.

Nice try Mr. Daniel J. Bernstein.

I absolutely love writing C. Up until late 2011, it was by far the language I used the most. I am now learning Lisp (not exactly, I'm using Lisp in SICP), and use Python more than before.

One thing I realized, is that reading C is more tedious than code in other languages. Sure that's a gross generalization and is not true for every piece of code out there. However, I find I have less troubles picking up a Python project, understanding how it's written and start contributing than I have with C.

A few weeks ago, I was looking at the code of Qemu. The code relies heavily on preprocessor macros and some weird gcc-only syntax that made my head hurt. It was difficult.

I guess what I'm trying to say, my only problem with C is that it doesn't force the programmer to write in a clear understandable way. Or maybe that's just me.

> The code relies heavily on preprocessor macros and some weird gcc-only syntax that made my head hurt.

The preprocessor is generally recognized as one of C's biggest flaws, is not for nothing that Ken Thompson, the first C programmer and the greatest influence on the language other than DMR himself, cut down most of the preprocessor when he wrote his own set of C compilers for Plan 9: http://doc.cat-v.org/plan_9/4th_edition/papers/comp

And of course Go has no preprocessor.

As for your second complaint, one can't blame C for gcc's extensions ;)

You should try Go, many people would consider it C's spiritual successor (or as somebody put it: the language the people who created C would come up if they had 40 years to think about how to polish and improve it), it keeps all the simplicity of C, while being probably the most readable language I have used, it is concise but keeps everything explicit, and figuring what code does is very easy, because code does what it says and says what it does, no dark magic needed.

I second your thoughts about Go. Much like the author of the post, I was really tired of Java and all the bloat that went with doing most things. I was going to get back into C and I had even blown the dust off my K&R. By random coincidence I saw a post about Go on HN and I've been using it ever since. It's a really great language. It does everything I wanted to get back into C for without any of the things I was dreading.

It keeps all the simplicity of C, without actually being able to write a kernel or device driver!

You can write kernels in Go, it even used to ship with a bare metal runtime and several people wrote kernels using it.

You cannot easily write device drivers for existing operating systems in Go because the existing operating systems provide a particular environment unsuited for Go and expect certain constraints from the device drivers themselves, constraints which Go breaks. In principle, it could be made to work.

The problem with the C preprocessor is that it's sat in a "sour spot" where it's often possible to use it for a task but only by bending it so far that the result is pretty ugly and not very comprehensible. If the preprocessor were less powerful it would be obvious you needed to use a different tool; if it were more powerful it wouldn't result in ugly messes; but it's sat in the sour spot in the middle.

(Make is another tool which suffers from 'sour spot' syndrome IMHO.)

Yep, but remember that both of these tools are wonderful if you don't abuse them.

The one thing that I have trouble with in Python projects is that it can be very difficult to figure out where the actual object is from that is being imported.

  from somedir.somefile import objectX
Then when you go to somedir.somefile you find out objectX is nowhere to be found only to figure out later that it is dynamically created and added to that namespace and it can be imported in the file you've been reading because some init function was already called by the module "foo".

There are quite a few codebases where I have been hunting for the superclass for example so that I could see if it offered functionality I wanted or how it was structured so I could find out where some function was defined and what EXACTLY it did due to no documentation and it took me a while.

I love Python, don't get me wrong. It is by far one of my favourite programming languages, but sometimes it can be very non-obvious where something is coming from and how it is getting there. This may be more of an issue on a project to project basis, but it is an extra complexity that I have found can be rather annoying.

I have come to the exact same conclusion about Ruby. Finding the source of given behavior is where I tend to get the most annoyed. This is doubly so for any library which uses method_missing? or other bits of Ruby-dynamism. Yes, it can (and does) lead to terser, elegant code. But it also makes it challenging to analyze root causes, and I have become less enamored of this style as I have had to deal with it over the past few years.

In this respect method_missing? is analogous to (over)using C's preprocessors. It seemed like a good idea at the time, but...

This isn't directly related to your complaint, but the source_location method in Ruby 1.9 is very useful:

    > method(:gem).source_location
     => ["/Users/wycats/.rvm/rubies/ruby-1.9.3-p194/lib/ruby/site_ruby/1.9.1/rubygems.rb", 1228]

if an object has not been excessively tampered with, the objects's __module__ attribute will give you where it was created.

The inspect module doc page is of great help: http://docs.python.org/library/inspect.html

maybe the biggest improvement in c programming over the last decade is valgrind, making purify-like debugging available to everyone. it's completely removed a whole class of annoyingly hard to fix bugs from my code.

Tools only created to solve the lack of safe constructs in C.

C made lots of sense in the context it was developed, but the world would be better if we had safer systems programming languages.

Valgrind, purify and friends are required to C, the same way Java requires IDEs, to improve language usability.

Have said this, the new trend in having static analysis tools integrated in the development process, like Clang, Eclipse CODA, Visual Studio's tools or HP Code Advisor, among others, can bring a bit more safety into C.

C is popular, good enough, portable, and fast. Skilled C programmers do not need profilers, bounds checkers and memory leak detectors. They help some people, though, no doubt. When size and speed matters, as in Operating Systems, C has won and continues to win the survival of the fittest competition. Almost all the programs I use are C (and/or C++): bash, linux, perl, firefox, awk, sed, grep, apache, mysql, etc. I assume good chunks of Java are written in C, but I'm not sure.

We have "safer" languages but our systems and tools remain in C. Given the intense competition in software, there must be solid reasons C remains the foundation of computing.

I'm with you on the advantages of C, but this is not true:

"Skilled C programmers do not need profilers, bounds checkers and memory leak detectors."

Skilled programmers use profilers because the alternative (guessing) is a poor strategy for diagnosing poor performance. And no skilled programmer would spurn a useful tool like a bounds checker or memory leak detector, because nobody is perfect and these tools save immense amounts of time by pointing you straight to the problem.

Not need profilers? I haven't run into a C (or any other language) programmer so skilled that he's even usually correct at finding the bottleneck without a profiler.

Programmers who don't use profilers write slow programs (even though they spend many hours "optimizing" them).

By that metric, here's a list of non-skilled programmers:

- Donald Knuth (wrote a study on profilers)

- Rob Pike (wrote a profiler, 'though for FORTRAN)

- Brian Kernighan (used a profiler to double the speed of his AWK interpreter)

I wonder who you consider a skilled C programmer, if not K from K&R.

> ... Skilled C programmers ...

Sadly you don't find many of them in the enterprise world, specially when dealing with off-shoring companies. :(

> ... Given the intense competition in software, there must be solid reasons C remains the foundation of computing.

Because it is dumb to code everything new, just because another language is cooler, nicer, safer, etc. So existing software keeps being coded in C.

Even if I expressed my opinion the way I did, I will surely pick C if it makes sense for the project at hand.

In the case of C safer also means weaker and less useful, it is a language where you need to be able to do everything. In some way C is "Assembler-complete". With valgrind you have no compromises, because you have a specialized runtime giving you control over what happens only at debugging time.

There is lots of great discussion going on in this thread. I particularly enjoyed the article, which took a "deliberately novice" approach, and was very Zen-like.

Why attack C at all? Why not instead convince us of the merits of another language?

There are great tools around. I agree with that. Which parts of Clang do you like best?

The static analyser, friendlier error messages and the plugin architecture that allows it to integrate easily in IDEs.

The re-factoring support and modules extensions are also a welcome additions.

Absolutely. Also the gcc mudflap library, splint static checker, Torvald's sparse have really helped.

And that's without the commercial tools such as Coverity.

Valgrind has almost single handedly rescued C from the ashheap of history.

This is why Zed pushed it so hard in Learn C the Hard Way.

Some people actually had the gall to complain about him ensuring the noobies learned to use valgrind before proceeding to write any real code.

Ingrateful dipshits don't remember what the pre-valgrind/dtrace days were like.

Is Learn C the Hard Way a useful re-introduction? Like the article, I need to re-learn C.

Last time I wrote anything in C was on Netware NLMs, and it has been long enough that I have mostly forgotten what I knew.

i was looking for a good book to bring me up-to-date on recent changes to c and the best suggestion i received was to get the latest edition of harbison and steel's book (which includes c99).

i don't know if you know the book - an older copy is on my desk and it use it regularly when working in c - but it's part introduction and part informed guide to the libraries. it's not a "friendly" book (it's not for "dummies"), but it's well written and surprisingly compact for all it contains (at least, the copy i have is; i am waiting for delivery of the latest version).


valgrind, is only available for a few platforms/architectures. If you are unable to writer proper C programs without valgrind's help, please go home. :-)

Depends what you mean by "proper". Any fool can get a C program to work, but ensuring there are no memory leaks etc is very hard without a tool like Valgrind.

If you don't believe me, just run Valgrind on random sampling of C programs that didn't use such a tool - I reckon it will find issues with most of them.

Valgrind (and tools using its methodology) isn't the only way to solve these problems. Libumem finds memory leaks too, and without imposing an immense runtime cost. It also finds many types of memory corruption without the runtime overhead that often changes the program's behavior that you're trying to debug.

Agreed. Valgrind can be very helpful, saved me few times. You know huge modular application, one line bug, dead end situation. Too bad output is mess, ups, full of details :-)

One can always write proper C programs without Valgrind: it is "just" a (really useful) debugging tool. The utility of it is that instead of spending days trying to track down any mallocs missing frees, one can just run Valgrind to find them and their locations straight away.

I guess what my grandparent said, albeit in a slightly dismissing tone, is that relying on tools like Valgrind or Purify to produce good code makes for weak programmers. The days spent trying to track down missing frees can be reduced to hours with practice, and the programmer skilled in this hunt is less likely to forget the free in the first place. (As opposed to the programmer relying on Valgrind and not caring much when writing the code the first time).

I do not necessarily abide by this point of view, but I do respect the kind of harsh discipline it advocates. Valgrind is definitely useful, really useful, but great C programs (and programmers) existed long before it appeared.

You should not be worried about this: even with valgrind a few bugs will take days of debugging to get fixed...

Even with Valgrind, I can't believe our industry has ever produced a single great C program. Certainly not Linux or TeX or GCC, all of which I've seen crash yet still had to use because everything else is even worse. Software engineering is still at the leeches-and-evil-spirits stage of maturity, and the continued use of platforms like C (where very common constructs can cause undefined behavior) is a symptom.

you write bug-free code? please, tell me more.

Yes, please, write proper C without Valgrind so I have to spend my time fixing all the memory corruption issues in it =)

The advantage of not feeling tied to valgrind is that you don't need to run your program under valgrind to get the benefit of whatever mechanisms are in place to detect scribbles, accesses to uninitialised data, and leaks.

(valgrind's lack of availability is a bit of a problem. I'm not complaining - I bet it is a bit fiddly to port it to a new system - but it's very easy to never have come across any system that can run it in your professional life. So being able to work without it is no waste of time.)

Uninitialised data and memory scribbles can be tricky to detect 100% reliably without valgrind, but if you code appropriately, you'll spot it. Leaks are very easy to find (fixing them, not always so much), and I don't really understand why one needs this monster program to discover them - but maybe one day I'll actually be in a position to use it, and I'll find out what I'm missing.

I'd love to have time to play around with C again. I picked up C back in the late 80s when I became convinced that no matter what other programming languages I knew, I wasn't going to be a "true" professional unless I could sling code in C and C++. Fun times.

I also love the idea of using a trie here. That's something else I've been wanting to play around with for a while. Although now I'd do it in a functional language.

He brings up a good point: people coding in C tend to stop and think about data structures, memory usage, and clock cycles in a way people using higher languages very rarely do. It's part of the way to "think in C" Internal data structures are also much more important in FP. Interesting how different languages cause you to think in different ways. (Sapir-Worf anyone?) :)

I also think it really depends on how you are taught or where you start...

I started programming in C/C++ (my first book, I'll admit at age 9 was a C++ for Dummies book, it came with a compiler :P).

I have learned and use a lot of higher level languages, but I still think about data structures, I still think about what the best way would be for handling the data most efficiently, mainly because I don't want to rely on the language doing the right thing.

I've seen Java programmers though that then start programming in C++ or even C and never pick up the art of thinking about their data structures. I work with one co-worker now that went through the extreme trouble of implementing Java like enum's which have caused all kinds of "warts" and all kinds of issues because they are not enums and they aren't "real" classes.

Watching Java developers turn C++/C developers is really interesting, they bring all kinds of "bad" practices back with them and the code is worse off because of it!

"By way of comparison, the Python program takes 1.5 seconds to run, so that's about a 10X speedup."

Only tenfold? Interesting. While Python is surely not the slowest interpreted language around, a result like that borders on the performance of Java. That seems unlikely, especially given the fact that Python version uses worse algorithm.

I would think about how big is the portion of time eaten by I/O - that is, actually reading the `words` file from disk. I wouldn't be surprised if it eats most of the ~100ms that C needs to performs the task, leaving only a tiny percent for actual computation.

Yeah, it was only a tenfold improvement because I was only benchmarking the solving of one board at a time, so a significant amount of time was spent on overhead, reading in the word list from disk, etc. I suspect if I ran multiple passes over larger boards, I'd see a much larger improvement. And the "worse algorithm" that the Python implementation used still wasn't that inefficient.

Ultimately though, this is still why Python gets used so much for real world work. Slow and inefficient as it might be compared to C, on real world problems where performance is dominated by disk seeks and network latency, it's good enough.

> a result like that borders on the performance of Java

I haven't seen the "Java is slow" chestnut in years.

I think he's saying Java is fast (compared with Python at least).

The point is, if you assign nearly mystical properties to writing in C, but when you rewrite a brute force approach Python program with a much fancier algorithm in C and you "only" get 10x speedup then something is amiss.

That's exactly what I meant, yes. And I used Java as comparison because of its speed being not that far from C itself - and certainly surpassing that of Python by a long shot. I definitely didn't intend to propagate the outdated "Java is slow" myth.

>>a result like that borders on the performance of Java. That seems unlikely...<<

What Java program and timings are you looking at?

It's funny, because I recently tackled this exact same problem in Python as well. And guess what my first step was? Writing a prefix tree implementation, in order to use the exact same approach the author took in C. It never seriously occurred to me to do it any other way. It may be because I recently got into C as well after spending years with only Python, but honestly I think I would have done the same thing before that; that's just how I think. So I don't believe it's the language that dictates how much you think about efficiency, it's the programmer.

Well I didn't get back into C and last time I had to solve a similar problem typed:

  from Bio.trie import trie

Right, if time was an issue I would have used an existing implementation, but since I largely code as a hobby I figured I may as well take the time to practice implementing stuff like this.

Providing you are aware of prefix trees. I guess you could come up with the idea independently but I would assume a lot of people when presented with the problem wouldn't.

At the end of my first year of my software engineering degree I wrote a Boggle solving program in C# using a trie. About a year later I discovered that tries are a recognised data structure and that I wasn't the first person to use one.

Tries are notorious for this by the way.

I know a programmer who's been working in the industry since the 70s, she did the same damn thing when she was young with the same exact data structure.

Aha. Someone posted this up here. That's why I've gotten a huge burst of comments on the site in the last couple hours.

I wrote the same program in Clojure somewhat recently. Boggle seems to be a popular problem. https://github.com/dmansen/boggle

Nice. Implementing the same problem in Clojure has been on my todo list. Now I've seen yours though so I guess I'll have to think up another problem :)

May I suggest a wander through http://programmingpraxis.com/

The author implements an inneficient algorithm using python and then a better one using C. He seems to feel python is for dirty brute force and C is for "real" programming...

Yes and no.

I wrote the Python version in probably under an hour. My girlfriend had gotten into playing some stupid Facebook version of Boggle and I just wanted to see her face when I came out of nowhere with implausibly high scores. I didn't think hard about the problem, just reached for the tool I know best and implemented the first obvious approach that came to mind. It worked as needed and I moved on. You make it sound like I think that's a bad thing.

Later, when I had a bit of time to think about it, it occurred to me that a trie would be a better approach, so when I was feeling like getting back into C and wanted a toy problem, re-implementing the boggle solver in C with a trie seemed like a good choice.

The experience of programming in the different languages does feel different though and I think it can affect how one approaches solving problems. Python is so good at just letting me solve the immediate problem that sometimes I rush and don't think things through or settle for a less than optimal solution. This will come back and bite me if that suboptimal code ends up getting built on and re-used elsewhere.

When I write C (or Go, Erlang, Haskell, etc. basically any language that requires me to think a little more up-front about how I'll implement it), I know going in that I'm going to be putting some serious time and effort into the code, so I tend to be more careful about things at every stage. The game changes from "get a result as quickly and painlessly as possible" to "write something that is elegant in itself". That's not always a win. Sometimes you are much better off building the prototype quickly, seeing flaws that you never would've thought of and then being able to approach the problem in a whole new way. Sometimes you just need a result quickly and time spent making things elegant or efficient actually is wasted (I'm not going to build a framework out of the boggle solving code anytime soon, eg).

I code in Python pretty much every day. I have for years. I probably will for years to come. It works for me. I'm just saying that sometimes other languages push you in different directions and I can see why, despite taking more lines of code to write, taking longer to write, having more potential for segfaults, and so on, languages like C still find a niche for writing systems and platforms. And that reason isn't just that it runs a little faster.

From my experience Haskell has two faces. One is the one that says "write a piece of software right with the tools I provide" and another is a messy "one"-liner (happens to me particularly when using pointfree style) with the "line" being an indecipherable mess of operators and library functions that is right as "fire and forget" kind of code. Haskell has a very rich range of libraries on the Hackage.

My Python answer http://stackoverflow.com/questions/746082/how-to-find-list-o... runs in 200ms on my laptop. It probably isn't actually within a factor of 2 of this C code, since I don't know what input board it was tested with to get 100ms, and it matters how much of the dictionary gets pruned while loading. I just stuck in a random 5x5 board and got 412ms real time, still pretty tolerable.

Edit: The next Python answer there uses a trie and takes 16.7 seconds on the 4x4 board. I like tries because they're elegant, but I hardly ever use them because the built-in collections are well-engineered even for problems you'd think are made for a trie.

>> The C version is functional but will probably make more experienced C programmers cringe.

Oh boy. This looks like the type of C code I write. Could some experienced C programmer please point out what parts are cringe inducing ?

Not being a experienced C programmer by any measure, but this doesn't look right to me: https://github.com/thraxil/boggle/blob/master/boggle.c#L92

    struct foo f() {
        struct foo f;
        return f;
isn't returning a stack-allocated struct a bad idea?

It returns a copy of it. Example:

#include <stdio.h>

struct foo { char space[1024]; };

struct foo f()


    struct foo f;

    printf("address of f is %p\n",&f);

    return f;

int main()


    struct foo g;

    g = f();

    printf("address of g is %p\n",&g);

    return 0;

produces this output:

address of f is 0x7fbfffec30

address of g is 0x7fbffff050

If I remember correctly (been a long time), most compilers would implement many cases of struct return by letting the caller pass the address of a block of (stack) memory to contain the return value. The function can then optimize things to get rid of its local copy, operate directly on the intended target, and the caller doesn't need to perform any copies.

Thanks, I didn't know that.

If you wanted to get really fancy, you could convert that trie into a DAWG: http://en.wikipedia.org/wiki/Directed_acyclic_word_graph

Writing a boggle solver was the first assignment in CS106X at Stanford, which was still taught in C when I took it in the fall of 1994.

The trie code as well as the display UI were provided - you only had to write the board-walking code.

Being as this was the first time I had ever written any program, I remember it being quite challenging but also really fun. It was great to see your own program utterly house you when you played against it.

I wonder if I still have that code somewhere ... it would be fun to look at / cringe.

I'm a student who learns C#, Java, Php, javascript... at school. I bought "C: A Refence Manual" to learn this summer...

Hope it will make me a better programmer.

If you want to learn C, K&R is the right way to do it.

C: A Reference Manual is excellent and is probably the only other C book you need, but only if you are already a C programmer, and only for what the title implies: reference.

In addition to K&R I always recommend "C Interfaces and Implementations: Techniques for Creating Reusable Software" to anyone who is picking up a long-term hobby or career in C.

Someone on this forum recommended it to me. It's been invaluable assisting me in refactoring major portions of a legacy code base. In a way, it's helped me bring the DRY principle to our C application. I write in C every single day though and it may not be applicable for a hobbyist/generalist.

That's probably the book I need. I've wondered how I'm actually supposed to structure stuff in C so it's not too gross.

I've already read K&R! Don't worry. I know it's the must-read book on C.

K&R is a book not suitable for beginner tutorial in C, especially modern C. As a reference type work, ok.

No, but it's perfect for an experienced programmer in other languages to pick up C.

I really would like an updated version of it, but I guess it will never be written.

I loved my Harbinson/Steele C book, a long time ago. Google says the last version came 2002 with additions on the web. Good memories, I'll buy a copy and read just for nostalgia. Thanks.

Did anybody noticed eC language www.ecere.com. Still under early development, documentation incomplete. Some issues with preprocesor and other stuff fixed. Definitely has potential.

<butthead>Uhh...hehe..uhh..hehe...he said boxen</butthead>

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact