Hacker News new | comments | show | ask | jobs | submit login
Observations from a real-world Clojure project (groups.google.com)
152 points by liebke on Oct 29, 2009 | hide | past | web | favorite | 32 comments

"Everything* just worked. We took a snapshot at 1.1.0, and we didn't update it. [* The JVM does SEGV on us occasionally. Internal error, which we have not diagnosed.]"

How are memory protection errors just a footnote?

You shouldn't be able to SEGV a JVM no matter what bytecode you feed it (exempting JNI). So it is a JVM bug.

OTOH it doesn't really matter whose "fault" it is; if using Clojure causes SEGVs but some other Lisp doesn't cause SEGVs, that's a problem.

That depends if you're using it primarily because it is a lisp or primarily because of its Java interop / the fact it runs on the JVM.

Still, if Sun doesn't fix it, then Clojure should provide a workaround.

I'd like to reiterate the lack of good debuggers for not just Clojure, but also free CL implementations. I haven't branched out much recently, but is sbcl/SLIME still state-of-the-art for lisp debugging in the open source world?

I personally use SBCL, but I've heard a lot of people use Clozure Common Lisp (http://www.clozure.com/clozurecl.html) because it has better Slime integration (better stack frame inspector, etc).

The very best tools are probably Allegro CL (w/ its non-emacs IDE) and Lisp Works (and its own IDE too). I've heard nothing but good things about both (and have used Allegro CL enough to know it - and its associated libraries - are very impressive).

As far as I can gather no one really uses them since they are just too expensive though. They must be making money somewhere, I just don't know where ...

I've heard a lot of people use Clozure Common Lisp (http://www.clozure.com/clozurecl.html) because it has better Slime integration (better stack frame inspector, etc).

Really? I would have said CCL has worse Slime integration in general (Slime being an epiphenomenon of the SBCL-on-Linux world) and far worse when it comes to debugging. Am I missing something?

We're using CCL with Slime on OS X. The Clozure people are working on a new IDE, but I don't know how good its debugging facilities are yet. It would take a lot to get me off Emacs, but slick debugging might do it.

Alternatively, I often think that maybe we should just take the time to make sldb do what we want. But it's hard to take precious resources away from one's main project.

If anyone feels the same way and is interested in hacking on this, email me - maybe we can work something out.

IIRC, ITA Software (one of the biggest Lisp houses in the world ...) uses SBCL for their main QPX project. A few years ago when they were starting the new Polaris project, they decided to go with CCL instead of continuing with SBCL.

My understanding is that the two major motivating factors for this decision were that CCL compiles faster and has better debugging support.

This is all based on my memory of conversations that took place about a year ago, so it's entirely possible I'm mis-remembering or what used to be true no longer holds. I have no first hand experience with CCL so if you do, I'll take your word for it ...

I currently work on QPX, the air travel search engine. We continue to use SBCL, primarily because it produces faster code for our workload. We and our customers are very sensitive to the performance of the application.

Dan Weinreb, another ITA developer, did a talk at Google a few months ago on ITA's new reservation system (the Polaris project you mention). They do use CCL, but the code also compiles and runs with SBCL. The primary reason he cites in that talk for preferring CCL over SBCL (about 22 minutes in) is that CCL compiles code more quickly than SBCL.

I haven't used CCL much, but I understand SBCL and CCL have different debugging strengths. An SBCL developer tells me that a significant chunk of the compilation time for our application is in constraint propagation for types. That's part of the "generates more efficient code", but it also means that it's smart about telling you when you're doing something that's not going to work. For whatever reason, type propagation or otherwise, CCL won't do as much to help you at compile time. I am told that CCL gives better stack traces, so SLIME can help you more.

Yeah, that vibes with I've seen in SBCL (and been told about CCL). Thanks for the inside scoop.

Remember also that during the time that ITA would have made that decision, SBCL and SLIME were both evolving rapidly.

5. The functionality of the docs hasn't kept up with Clojure.

6. Debugging facilities also have not kept up with the state of Clojure.

These two are the first thing I check before trying to use any programming language.

That is true only if you use bleeding edge clojure (EG. 1.1.0), clojure 1.0 docs are very mature. And since 1.1.0 is a developper version at the moment, it appears quite illogical IMO to complain about the state of documentation.

They're mature, but they are also not very good. I still spend ages digging around trying to figure out what should be relatively simple stuff. The documentation around ns in particular is difficult to navigate.

> Example: ~10M records are processed and transformed, various computations occur, and ~100K records are spit out. Lots of statistics. One type of run takes 12 hours on an 8GB, 4-core Linux box

That seems insanely slow, unless they are doing something insanely complex. In other words, this doesn't tell us very much.

12 hours / 10,000,000 records = 4.32ms/record concurrently. (4 cores maxed out ≅ 16ms per item?)

Doesn't tell us much, but that doesn't seem insanely slow either, necessarily.

Assuming pretty cache-optimal code and 1.5GHz ops with no superscalar gains, that's over 25 million operations per record.

Most PC video games, for example, expect a refresh rate somewhere between 30Hz and 60Hz. 60Hz gives you 17ms for an entire frame, in which it's doing an incredible amount of work: drawing a whole scene, updating the world, running physics, etc. etc.

I think you have too low expectations. Modern hardware is very, very fast.

This whole discussion is silly, there isn't nearly enough information given in the post to infer anything. The author is just making the point that it isn't a toy application.

This seems to be a somewhat unfair comparison you are drawing. Video games use dedicated hardware to achieve their drawing throughput, and frequently for their sound and physics throughput. 4.32ms/rec in a high-level garbage collected language based around a data structure like the cons cell is something that is an impressive testament.

As has been said, though, this is all speculation.

If almost all allocations are specific to the processing of a record, and become garbage after that record is done with, GC is almost free. GC cost is proportional to retained allocations, not allocations made; and if old generation memory is never modified to point to new allocations, it doesn't even need to be scanned (this can be detected by write barriers, either injected into JIT code or via page faults, so it can be quite fine-grained). That's why GC is asymptotically faster than manual allocation and ideally suited to record-processing and server request/response kinds of applications. If it were using manual paired allocate/free memory management, it would actually be more impressive.

And it's not 4.32ms/rec; that it's 4.32/rec with 4 cores, so estimating around 17ms (like the graphics frame) is closer.

I don't agree with your suggestion that the fact that because graphics is usually accelerated and physics very rarely is, the comparison is unfair. Take a look at Pixomatic that Mike Abrash worked on. DirectX 7-level API, done entirely in software; and efficient enough that the game can still do all its work in its own time. Games still have to pump an awful lot of data through to the hardware; the hardware isn't going to do all the high-level scene graph calculations, culling and occlusion itself. The hardware expects a list of pretty basic primitives, and takes care of transforming them into the view frustum, with depth converting into Z-buffer value. The game still needs to make sure it doesn't give the hardware too much stuff that isn't actually intersecting with the view frustum.

We're not talking about the implementation details here, we're talking about the perception of dynamic garbage collected languages, which this summary helps to shift from an unfairly negative to a more balanced light. I'm well aware of why generational GC could be a faster choice for this kind of record processing (then again, we simply do not know from the description given how much information is shared).

As for the comparison being unfair, the person who initially made that comparison was you, sir. Bringing real-time visual simulation into the equation is unfair for a variety of reasons, including the fact that a lot more research has been done in that field's optimization. I'm not sure exactly what you want from a simple high-level example that "Yes, Clojure can be used to do real work," but I don't think anyone here is going to give it to you if you set such outlandish goals as the tip of a heavily funded branch of computer science and mathematics involving real-time rendering.

> that's over 25 million operations per record.

If the algorithm is Θ(n*n) it's suddenly a bit more impressive.

I do physics simulations, and can easily reach those times with a similar amount of input. And that is with highly optimized Fortran code.

But yeah, a comparison with a pure Java or C++ implementation would have been great, both in terms of speed and LOC. However, in the "real world", there usually isn't time for such silly things.

I am crunching graph parameters based on data stored in an "input table". A single row computation can take as long as five minutes, because I compute numerous costly parameters each time.

My work isn't really that complex, but it takes ages to compute everything.

My "slow sense" is tingling too. I'm deadly curious to know how fast I could make that "12-hour" job using Scala or Java.

I've done extensive work with Scala, Java, and bytecodes; writing performant Scala code requires some knowledge of the JVM and how Scala is compiled, and requires sacrificing some of its power; but because the static typing and method dispatching is so similar to Java's, you can get there. From everything I've read about Clojure, it sounds a whole level less efficient.

Not trying to knock Clojure, but 12 hours just sounds way way too long without more explanation.

How can anyone even approach this with a straight face? You don't even know what they are computing!

If you want to question clojure's performance for technical reasons, do that. But let's not use feelings, and instincts in discussions about concrete scientific topics. There have certainly been several interesting articles on clojure performance, and work done to improve performance for native data structures (transients). I, and no doubt many others, would be genuinely interested in performance studies on more complex concurrency problems I've seen so far.

Then this discussion between Cliff Click and Rich Hickey may be of interest: http://blogs.azulsystems.com/cliff/2008/05/clojure-stms-vs.h...

Its a bit dated ('08), but sounds like the jury is still out on STMs.

That's a great read, thanks very much. It might be worth submitting the link itself to get a wider audience.

"From everything I've read about Clojure, it sounds a whole level less efficient."

Curious as to what you have read regarding clojure performance. I recall hearing Cliff Click giving it a thumbs up at a JVM language summit:


This may also be apropos: http://groups.google.com/group/clojure/msg/cccf532ca04fcdf4?...

And I've read claims that with type annotations you can get bytecode on-par or close to that of pure Java.

Bah. I still can't forgive Google for wrapping everything sent though Gmail or Groups to 80 characters.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact