You linked to the Alioth Programming Language Game — even they admit it's very flawed for determining much beyond how certain programs implementing certain algorithms perform. The game's highly contrived circumstances are considerably less realistic than the parent's anecdotes or the OP's report of huge space savings with little slowdown. In particular, many of Lisp's code size benefits lie in scaling better than other languages. The Alioth game is possibly a pathological case for the things being measured here.
I don't want to get into a language debate here, but... if you put random people's anecdotes above objective measures than you can pretty much make any point you want to make. Alioth may be flawed, but it's not necessarily synthetic in the sense that the benchmarks are real problems or are very similar to real problems. More importantly, it's actually an objective measure of language performance, compactness, etc. rather than people's "impressions".
I have no interest in a religious war over languages either, so no worry about that. I just don't like to see Alioth's one small data point extrapolated into a trend.
The benchmark tests are toy programs that solve a small set of problems under constraints that create a known bias in favor of languages like C++. They are an objective measure of something, but that something is not language performance and compactness is real-world, non-toy programs that solve problems unlike the ones in the game.
Impressions are admittedly not the best way to gauge such things, but they're better than relying on a test that does not make any attempt to address the question at hand.
My personal heuristic is to assume Alioth is roughly right with a largish margin of error, and then look for anecdotal evidence of specific areas that the game does a particularly poor job of reflecting. For Lisps, code size appears to be a large blind spot based on everything I have seen. Lisp's ability to create mini-languages and very high-level abstractions — a large source of its brevity — is pretty much useless on the scale of the Benchmarks Game.
I don't know if you're trolling or cocksure, but no, that is not what I was talking about. I said the constraints create a bias, not the measurements themselves. For example, the performance measurements are biased against most garbage collected languages because the rules don't allow any options to fine-tune the GC's behavior (which can make a big difference). Obviously, there are no equivalent rules forbidding people from fine-tuning C++'s manual memory management.