Hacker News new | past | comments | ask | show | jobs | submit login
Curiosity was built with 2.5 million lines of C (programmers.stackexchange.com)
102 points by dustingetz on Aug 7, 2012 | hide | past | favorite | 33 comments



I found it fascinating that the most productive C programmer on the team was a Python script.

Go Python! ;-)


productivity != kloc's

autogenerated C is used sometimes for complex conditional nestings. in some failure situations it is programmatically more efficient or easier to trace issues if every condition is enumerated


I know that, of course. That's why I used a winking smiley.

In any case, programs that generate code is, at least for me, a code smell.


I frequently use a couple of custom tools I created in Excel (for ease of data entry) to generate and maintain complex state machines and even-driven loop callback C code in embedded systems. That does not mean that the embedded systems were written in Excel. In this case Excel is simply leveraged as a productivity tool as I suspect might be the case for the use of Python scripts to generate chunks of code in the JPL case.


Most of it automatically generated, therefore it is not written in C.


This really depends on the kind of code generation being used. There's "dumb" code generation, and there's a DSL.

In the "dumb" case, the input is usually some tabular data, and the code generator simply translates it to valid C syntax. It makes sense to say "it was written in C" in this case, although it's a bit moot since this is just data.

In the DSL case, the logic is actually coded in a different language, and the code generator translates it to C. Here it would be more difficult to argue that "it was written in C", because C just serves as a compiler intermediate language.

Naturally there are levels in between the above two, but the point is that it's difficult to judge without examining the actual module that was auto-generated.


true, though it's unlikely that you would launch a DSL. far more likely that a lot of the code being generated is lookup tables and the like. engineers that write this sort of fault tolerant related code will do a lot of things the 'dumb' way because it's often less error prone or easier to trace failures. a number of friends work on space related things, and there tends to be an emphasis placed on not being overly clever (i.e. practices like return values being true or false rather than enumerations, etc)


huh? it is auto-generated C as described in the post


A scheme or haskell(or c++) program that is compiled with a compiler that uses c as an intermediate form is still a scheme/haskell/c++ program. Even if the intermediate code is analyzed by a static analyzer or humans(especially if it's not modified, but possibly to some extent even if the intermediate code is slightly modified or used by other intermediate code written by hand) the source language is the actual language.


By that logic, it's written in machine code.


no, autogenerated C code means someone (maybe a build script) runs a Python script that generates a bunch of C files, and then the C compiler picks up those files. Those C files exist in the file tree and may be human readable.

C files are being generated, not machine code, by some script. Fairly common practice in large C projects that have complex/heavily conditionals where you're less likely to introduce incorrect conditions if you let program logic printf your code to a file


You can tell gcc to keep around the assembly files it generates and it is human readable, does it mean that C code compiled with gcc is actually written in assembly? Is the litmus that they are caching the intermediates of the compilation on disk instead of throwing them away and regenerating them every time?


Early C++ compilers actually translated C++ source to C source for compilation by C compiler. Does that mean that the original source was C, not C++?


no, but there is a guy that wrote some python that prints out C code into a file. that is what auto-generating C code means.

the caching of intermediates and regenerating of the C files is the business of the Makefile


You are now officially being disingenuous. Stop it. Read the posts and apply thinking skills. A auto-generated means (by definition) programmers did not write the C code, they wrote something else.

By ANALOGY (here is a definition with examples, since you seem to not understand analogy: http://grammar.about.com/od/rhetoricstyle/f/qanalogy07.htm ), the argument the Curiosity code was written in C, is the same as me saying my C code is actually written in machine code (raw high and low bits), since the machine code is generated by my compiler, even though I actually wrote C.


It sounds like you are saying that auto-generating refers only to DSL's. Here is an example of a guy writing some code that printf's some C code - http://www.opensource.apple.com/source/zlib/zlib-37.2/zlib/c...

        FILE *out;

        out = fopen("crc32.h", "w");
        if (out == NULL) return;
        fprintf(out, "/* crc32.h -- tables for rapid CRC calculation\n");
        fprintf(out, " * Generated automatically by crc32.c\n */\n\n");
        fprintf(out, "local const unsigned long FAR ");
        fprintf(out, "crc_table[TBLS][256] =\n{\n  {\n");
        write_table(out, crc_table[0]);
This example is in zlib. Whether it's the 'pretty' way to do it or not, this isn't a terribly uncommon thing to see in C code bases. In this case, this is a lookup table. Code is a tool, "auto-generating" code is not some fancy term, all it means is a button was pushed and a bunch of code came out the other end. Having worked on large C codebases, we used to do this sort of stuff all the time. Maybe it's the dumb way of doing things, but when you work on large teams that all maintain the same codebase, dumb and readable is more important than cleverness sometimes.


This doesn't change the point the top-level comment was making. Even if some N lines of C spit out some M lines of different C (hopefully with N < M), the actual code written by the programmers is the code that should be counted, not the generated code.

In the case you describe, the language used to generate the code is C, but the concept doesnt change. This is where you are being disingenuous by means of pointless pedantry.


should counted != counted

most of the time line counts are just .c and .h in the file tree, and the generated files to often end up in the file tree and repository.

I'm merely speaking from my own extensive experience in dealing with large C code bases. What you are calling disingenuous and pedantic is ignoring where the hand-wavy line count numbers actually come from. The tend to come from code that is actually checked into the repository which, whether you like it or not, can include large amounts of auto-generated code.


In the late 70's Viking had about 12 KBytes RAM (6000 WORDS) and 5 MBytes (40 MBit) tape storage.

http://en.wikipedia.org/wiki/Viking_program



I feel this is an interesting story to read alongside the OP: http://www.flownet.com/gat/jpl-lisp.html


There will always be someone lamenting that language X (insert Lisp, Smalltalk, Haskell or anything else here) is not being used instead of the horribly terribly utterly broken C, C++ or Java (yes, I'm being sarcastic). But in the end, pragmatic concerns win. Nobody works in vacuum, even if they really want to.

I'm a big fan of functional and otherwise dynamic language, but if I had to write my own Mars-landing controller, I would use C. No doubt there.


> if I had to write my own Mars-landing controller, I would use C. No doubt there.

C is a good choice for code generation, as a replacement for Assembler. But I would never dare write mars landing code in C directly.

I guess the curiosity team used several languages destined for a 100% verified C compiler for a well defined subset of C.


But I would never dare write mars landing code in C directly.

Why not? I used to write code for airborne radars directly in C, so I may have a different perspective.


I wouldn't. I would go with spark ada. No doubt C has better tooling, though.


The problem with Ada is not only the lack of tooling (although it's a huge problem - relatively to C, the tooling is scarce and horrible), it's also the people that know the language. I prefer C code written by a good programmer than Ada code written by a poor programmer any day, and finding good C programmers is much easier than finding good Ada programmers.


Detailed account of using Spark and PVS theorem prover in such an app

http://www.cs.virginia.edu/~jck/publications/Xiang.Yin.disse...


And more people who can write it or prefer writing in it? Technical problems being people problems 90% of the time or whatever it was the fella said.


Well, the farther you abstract away from executable machine code the greater the inefficiencies introduced into the resulting machine code. Object oriented programming is there for the benefit of the programmer, the machine couldn't care less. The trouble is that any OO language will introduce unnecessary code into any given program. What I mean by "unnecessary" is that, if you were coding the solution in assembler you would, in all likelihood, not produce such code.

C let's you code at a level that is extremely close to the underlying machine language. If you are an experienced embedded developer you'll know exactly what code you are producing for the particular processor you happen to be working with.

I would not venture a guess, but I would not be surprised if the overwhelming majority of real-time, mission-critical systems out there are coded in C. The only other language I would consider would be Forth --which I have used extensively. I have played with the idea of going to something like C++. I have never found it to make sense in the context of a resource and horsepower-limited embedded system.

I don't know where the distain for C comes from. Yes, I am aware of the issues on can run into by writing unsafe code. Isn't part of our job to write safe code? I certainly do.


Why are people discussing this for so long? Does it make a difference whether it was 500k or 2500k lines of code? It was a huge project and that's it.


Why are people discussing this for so long?

Because it's fascinating.

Its not the same old bullshit minutiae over-hyped as the latest and greatest thing, it's honestly a feat of human achievement. It means different things to different people. Creating inspiration, awe, and wonder. It shapes opinions on humanity, tech and life.

But forget all that for a second, and the intimate technical details of how things come together are still interesting to techy people. Even the logistics and implementation of traffic lights can tickle our curiosity. This thing is on another planet!

You sound like someone who has never built anything for just for fun.


Point of interest: In one of the press conferences yesterday they said they're planning to push an update to the rover's software soon after they get the X-band antenna deployed (which should have happened already).


It was written in MSDOS Q-Basic




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: