Hacker News new | past | comments | ask | show | jobs | submit login

No. C programs have fine-grained control over the memory layout of all their data, and thus are far better positioned to exploit caching in general and optimize for locality in particular. It's an attractive fallacy to suggest that most programs are I/O bound and thus are equally performant in Java and in C; while that statement does bode well for async code and poorly for threads, it's not as relevant for language comparisons.

But higher-level languages can write the machine code that "exploits caching in general and optimizes for locality in particular" for you. You teach the computer how to do that once, and then you get it for free from then on. You can program your problem domain instead of the solution domain.

Most people using C for high-performance computing are just using it to glue together the high-performance libraries, anyway.

Remember, most people are writing big applications, not tiny procedures. If you just want to multiply a few numbers together, sure, C is going to be fast. If you want to build a complicated application, then C's advantages are going to be almost unnoticeable and its disadvantages are severe.

I think this comment oversimplifies the situation at both a tactical and a strategic level.

Tactically, even code that simply glues libraries together is still maintaining state, and there's clearly a benefit to being able to efficiently manage bit-level control over how that state is laid out and looked up. Lack of bookkeeping and overhead, cache-cognizant data structures, and simple compactness of representation all add up. Not to mention the fact that the top of the profile for lots of programs is malloc(), and C programs can swap out allocators --- I've gotten 500% (five hundred percent) speedups from retrofitting arena allocators into other people's code.

Strategically, well, there's a term for the strategy you're advocating, and it's called "Sufficiently Smart Compiler", and it's a punchline for a reason.

In theory garbage collectors could do a good job maintaining locality of data structures initially allocated in a local fashion. In practice, they don't, but compacting GC may be better than fragmentation for the average non-cache-aware program. In the case of Java, you have the additional problem that everything on the heap is boxed (except primitive arrays and primitive members of objects), which definitely strains memory (both capacity and bandwidth).

But higher-level languages can write the machine code that "exploits caching in general and optimizes for locality in particular" for you

Are there any in existence that do this particularly well (ie, at least as good or better than a reasonably experienced C programmer might)?

Sure, lots. They use the same C libraries that the reasonable experienced C programmer would.

(Also, take a look at some of the ghc on llvm benchmarks, it's very competitive with C, and doesn't require you to jump through any hoops. I'd link you, but Google is blocked at work due to incompetent firewall rules. Sigh.)

You overstate the degree to which large applications written in C have policy discretion over all memory allocations, to take your specific example. Compacting GC, by bringing together memory allocated close together in time, arguably has higher benefits for caching. The bigger problem is avoiding indirection, and that's a place where Java is weak in comparison to e.g. C#, as C# has value types. You can go further with some tricks in C, such as e.g. allocating string storage associated with a structure past the end of the structure's allocation itself, but here you're often just trying to regain what you lost by lack of GC.

OO-itis is something which definitely will sap performance if left unchecked, but it's true whether you're using C or Smalltalk. Much like denormalizing relational databases, careful placement of data members can reduce cache misses, and avoiding long chains of dereference for data commonly accessed together is important.

I don't understand your argument. In terms of effort vs. reward, the one-line change of swapping malloc() with a pool allocator is probably the most effective optimization available to any software in any language.

The idea that C# (or for that matter any GC'd language) is going to beat an allocator in which alloc is amortized to a single load, constant add, and store seems... I don't know, fill in the blank.

You seem to think I'm advocating C over GC'd languages. I'm not. I write in Ruby by default, a language that is not only GC'd but badly GC'd. I just do so with my eyes wide open on performance.

A pool allocator is great, so long as the lifetime of the pool meshes well with the allocations out of it. But the larger your application, the less likely that's going to be true. Different libraries will do their own allocations, and it's often hard to pass the right pool through enough layers of abstraction to be sure that e.g. that string got allocated in the right pool for where it's going to get plugged in when it comes back.

I'm not saying you're advocating C over GC'd languages. I'm specifically disagreeing with the idea that in practice, in large applications written in C, that the application author actually has discretion over allocation policy. I'm saying that you couldn't in practice use a pool allocator much of the time, even if you wanted to, unless your application is very self-contained.

The Delphi compiler I work on is written in C and uses pool allocators to great effect. There's a heap allocator which can be marked and shrunk, there's a pool for every unit, there's a pool for expressions, there's a pool for constants, etc. You've got to keep track of what got allocated where, make sure you don't pollute one pool with references to another, and sometimes do a bunch of copying to ensure that. Pooled allocation isn't a panacea, even for something as self-contained as a compiler, albeit a compiler which can be hosted by the IDE, so it needs to be long-lived, manage memory across debugging sessions, multiple compiles, etc.

This is like saying "there will always be some code you can't optimize, so this optimization doesn't matter". I know you know that isn't true.

Real library code either owns object lifetimes (and can use pools internally because the library's own _release() function is the only thing that can free its state), or keeps its hands completely off allocation. The few counterexamples, where for instance a library malloc()'s something and expects you to free it, tend to be notorious examples of error-prone and evil interfaces.

Meanwhile, just because everything isn't amenable to pool allocation (or, even better, arena allocation, where there is zero memory management overhead at all ever) doesn't mean you don't win huge on the places that are amenable.

You are raising the boogeyman of a hypothetical library that is going to take a pointer that I pool allocated and call free() on it, blowing up the program. I assert that any real example of such a library is going to be easy to shoot down as "an incredibly crappy library".

That's a complete straw man, and you know it. I'm saying that the specific optimization you mentioned is hard if you don't control all the pieces.

The primary advantage of a pooled allocator isn't in allocation - though that's nice - it's that you don't have the cost of iterating through each object to free it. But if you have external libraries, they'll abstract their allocations into handles (say), and now you have the problem of running what amount to destructors.

Nope, I don't think it is. I'm saying that when libraries track their own state and manage their own lifecycles, you're right, you can't make them use pool allocators (unless it's worth it to you to do surgery)... but that doesn't keep you from using pools on your own state.

I think people also overestimate the extent to which C programs depend on third-party libraries for their own statekeeping.

And again... what are we talking about here? I'm not advocating writing things in C. I'm saying, it's bogus to say that since code tends to be I/O bound, C isn't a performance win for most programs. That is simply a bogus argument. That the level of performance you can get out of C is usually not worth the investment is neither here nor there. Again: Ruby programmer here.

Again, I'm not saying that you're advocating writing things in C.

Where I think performance advantages that come out of writing things in C come from is being rarely being able to take shortcuts by relying on provided libraries and primitives which turn out to be not quite tweaked for the problem at hand. That is, C forces you to do so much yourself - largely because it has such poor abstraction tools - that you end up with a more specialized codebase. That specialization can include cache-oriented optimization, but I don't think it's the most important aspect or unique to C such that you can't get 95% of it - to the point where it's no longer a meaningful advantage - in a GC'd language.

Do you think that is true even for modern allocators like Hoard?

In other words, I'm curious what are the underlying malloc() implementations that pool allocation so outperforms.

Good luck trying to carefully align your C# arrays so that the length field (at index -1 and read accessed on basically every write) isn't in the same cache line as the 0th element the CPU across the bus is writing to.

The nice thing about using C# is that you don't need to use an array, if this is critical. With unsafe code, you can all but write C.

True, but I think his post isn't about generalized comparison of programming languages, but rather a pun towards "Its faster because its C" claims. Not that often our software can benefit from exploiting caching, especially in "release often, release early" world of web startups.

Not sure you're following: C programs can be optimized to take advantage of caching (or to use a low-overhead allocator, or to use more or less compact representations) in ways high-level languages either retard or disallow altogether. And all software benefits from caching.

I think your real observation is that C level performance just isn't that relevant in scale-horizontal network-bound programs, and I agree with you.

A lot of average C programmers write high-performance linked-lists in C, while they could have used high-performance hashtables in Python/Ruby/Java with the same programming effort.

(and no programs are high-performance if they have a pointer bug that makes the program crash).

"The difference between theory and practice is small in theory and large in practice..."

Who writes high-performance code using linked lists? Linked lists are awful for performance. Every --- every --- large C project I've been involved in has a good generic hash table and a vector-style resizeable array.

If that really is true, it sounds like you don't have much experience in C, to be frank. C apps abound in fixed-size statically allocated arrays and linked lists with next pointers (often multiple pointers, if the objects are part of different lists) embedded in the objects themselves. The hash tables often aren't broken out until the poor performance shows up as a bottleneck.

My resume isn't too hard to find. There's even some C code out there with my name on it if you look hard enough.

Performant C code doesn't use linked lists. Linked lists shred caches and allocators. Your go-to data structure for storing resizeable arrays is... wait for it... the resizeable array. I call mine "vector.h".

(That performant code often doesn't want a hash table is also why I backported STLport's red-black tree to C specialized on void* in my second-to-last major C project, but I'm just saying that because I'm proud of that hack.)

The kinds of lists that barrkel are talking about are what I call hooked-in lists as opposed to the container style lists seen in the STL. That is, objects that already existed are linked together through fields inside the object themselves. Since you're probably already operating on the object, it's not quite as bad for cache performance as the container style lists. Further, since the objects already exist, the memory allocation overhead is minimal. All objects of that type need an extra pointer, but there's no allocation required to add something to the list.

This is how the Linux kernel maintains just about everything. It's also how I've implemented lists inside of an allocator: http://github.com/scotts/streamflow/blob/master/streamflow.h...

I get that void* lists are even worse, but randomly malloc'ing a bunch of struct-foo's and linking them with pointers is worse than keeping a pool of memory chunks of struct-foo size and maintaining a length counter, since hopping from foo to foo isn't likely to hop from page to page.

With hooked-in lists, you rarely need to navigate through the whole list. You usually just need to get the next one. Having implemented hooked-in lists, the backing for them was a fixed size of memory chunks, but that was one level of abstraction lower. The Linux kernel does something similar.

(Felt I should clarify that it's nothing about hooked-in lists that prevent you from navigating the whole list. It's just the algorithms you end up using to solve the problems that pop-up at that level of systems programming.)

Err... why does the existence of shitty "C apps abound" have anything to do with how performant C programs are written?

Even I agree with this. You can't just do whatever you want in C and still get good performance, you have to have a clue, just like with higher-level languages.

I think the original article should add, "performance tends to depend on the programmer, not the language".

Yes and these containers are built into C++ so no need to role your own.

The side-discussion we're having now is also why <list> and <slist> are a pain to use, and why Stroustrop's examples tend to use <vector>. Storing things in contiguous memory and incurring occasional copies is usually a better plan than forcing every read to follow a chain of pointers.

these kinds of varying opinions that seem like wide schisms, or maybe just ultimately are different problem areas, often come up when i'm reading about C, and that makes me think "i'll learn it more when people figure out how to use it" ... is that fair?

> "i'll learn it more when people figure out how to use it" ... is that fair?

No, because you could be learning that today and get a head start on your own future that way. It's not that hard to understand and once you've seen how the clock ticks from the inside it becomes a lot easier to build other clocks.

Lists vs resize-able arrays is a very old debate, both have their places, it all depends on the problem you're trying to solve which one is the most appropriate. If you code your stuff in a not too stupid way (say using a macro for your accesses) it's trivial to replace one approach with the other in case you're not sure which one will perform better.

Lists work well when they're short and you use a an array of pointers to the beginnings of your lists and maintain a 'tail' pointer for fast insertion in case inserts are frequent. If you use resize-able arrays you probably want to allocate a bunch of entries in one go, roughly doubling the size of your array every time you need to increase the size. That seems to waste a lot of memory, but since it is heap memory (allocated using brk) it (usually) won't be mapped to real memory until you use it, so the overhead is smaller than it seems.

I hope that's correct and clear :)

Only if you're comfortable never learning it.

If you need to scan/remove if appropriate? They're optimal for that.

yeah i was wondering something. Let's say we are bound by I/O, wouldn't we reap more benefits by using lower level languages in the event that I/O boundaries improve? like faster drives or types of drives. Same applies for memory.

shouldn't the paradigm be "least amounts of bottle necks possible"?

Also the points mentioned by other comments are also valid (battery, VM behaviours)

However, his final point, or at least the crux of it, still stands: "Focus on stability and features first, scalability and manageability second, per-unit performance last of all, because if you don’t take care of the first two nobody will care about the third."

Moore's law says that CPUs improve faster than I/O. Therefore any program that is I/O bound today is likely to be I/O bound for every future generation of hardware. And programs that are not I/O bound today, may become so down the road.

Incidentally on the final point, there was an interesting test that was run many years ago. Multiple teams were given the same spec, but were each told to optimize for a different characteristics (speed of development, speed of execution, maintainability, memory use, etc). Most of the teams managed to come in #1 on what they were trying to optimize for. The team that optimized for maintainability came in #2 on most other characteristics. The team that optimized for speed of execution came in last on most other characteristics.

The lesson from this is that a consistent focus on maintainable code results in more of everything else you need. Yes, there really are times that you're writing throw-away code and can forget all that. But by default, code well and let the other details take care of themselves.

> there was an interesting test that was run many years ago

Source? I'd really like to read about it.

I read it in Code Complete. I don't have a copy handy to track down the page though.

Let's say we are bound by I/O, wouldn't we reap more benefits by using lower level languages in the event that I/O boundaries improve?

There are unlimited number of potential future requirements that a program might have to meet. You should only code for these if you have a concrete basis for believing these will become real at some point.

If there's a serious chance your sedan will do stock car racing, then you outfit it accordingly. Otherwise, that super-muffler's just an unnecessary expense and something more to break.

Edit: You'll notice that in average car, every part has about the same quality, power and durability. In a sense, engineering is actually about achieving the least cost and the largest number of bottle necks, since any unneeded quality is wasted time and money.

That's not a great analogy. Using cheaper components to cut costs requires more engineering effort; this is a form of optimization. For a prototype, you might just use more expensive components if it's convenient, since the per-unit cost doesn't matter much.

When writing software, there's no direct analogy because usually the components we build our programs out of have hardly any per-unit cost. We don't save any money by using a crappy third-party library over a high-quality one. Using a library with lots of unnecessary features may be cheaper both for prototyping and for the final product.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact