

Q: When are Tcl and Python faster than C? A: When they invoke faster, better libraries. - henning
http://shootout.alioth.debian.org/gp4/benchmark.php?test=regexdna&lang=all&sort=fullcpu

======
bayareaguy
A quick search turned up at least one project to "extract" the version of
Henry Spencer's regex code from Tcl here:
<http://www.straatinternet.com/opensource/hsre>

I wonder how much of the difference here between Tcl and Boost can be
attributed to the DFA vs NFA algorithm and how much to the underlying string
representation (Tcl's flat DString buffer vs the rope used by the Boost
program).

I suspect that for these regexes and for this size text (100k) we're really
just seeing better cache behavior.

------
mark-t
Seeing as how the Python and Tcl interpreters are both written in C, this
really just seems to imply that the C code wasn't written very well.

~~~
silentbicycle
Not necessarily. Code in a language based on C can potentially be faster than
it by having a deeper understanding of the code, and doing sophisticated
optimizations C could not confidently apply. In Haskell, for instance, it
often transforms multiple transforming passes over a list into one pass over
the list with the three transformation operations composed (that is, map x
(map y (map z l))) to map (x . y . z) l). I'm sure there are much, much better
examples...

This isn't 100% true, because _in theory_ you could have written out the
ideally optimized version of machine language for what you're trying to do, or
(perhaps) a direct transliteration thereof in C. In practice, this isn't
realistically possible, except in tiny pieces.

See also: Proebsting's Law
(<http://research.microsoft.com/~toddpro/papers/law.htm>)

~~~
drewp
The proof of Proebsting's Law seems kind of silly because running a compiler
with optimizations turned off wouldn't actually disable all the years of
advancements of compiler technology. It seems quite reasonable that many
compiler advancements have to be added at the same time as hardware support.
That doesn't make them purely hardware contributions.

Also, the proof skips more than a few critical steps when it says "Let's
assume that this ratio is about 4X" and then uses that number as a multiplier
in the answer.

~~~
silentbicycle
I agree that the proof is pretty much handwaving, actually, but I still think
their underlying conclusion is valid: Our efforts into optimizing languages
should focus on making _programmers_ more efficient, before hardware. (In
other words, programmer time is almost always more expensive than processor
time.)

Also, I think most things in computing called X's Law are not entirely
serious, these days.

------
wmf
This sounds like a problem in the contest definition. If you want to compare
programming languages, come up with programs for which no existing libraries
will help or require that 90% of the execution time be spent in the named
language. If you want to compare libraries, call it a library comparison and
not a language comparison.

------
silentbicycle
I don't really know Tcl, but the most interesting part to me is how there
wasn't really any deep magic needed to make the Python version fast -- this
benchmark is almost entirely in the regular expression libary's court.

~~~
henning
Yes. How sweet it would be to have C-competitive performance without having to
make your code ugly, sacrifice safety, sacrifice brevity, or sacrifice
anything, really, on a regular basis.

AFAIK this is the idea behind numpy.

~~~
jrockway
_Yes. How sweet it would be to have C-competitive performance without having
to make your code ugly, sacrifice safety, sacrifice brevity, or sacrifice
anything, really, on a regular basis._

SBCL, Haskell, Java, and OCaml all basically provide this.

~~~
silentbicycle
It's a pleasant surprise to see this coming from Python, though - Python is my
scripting language of choice, and I use it to prototype algorithms sometimes
(though OCaml is winning out there), but I usually expect it to be an order of
magnitude slower than most compiled languages. (Of course, that's almost
always good enough.)

------
drhowarddrfine
Agree with mark-t. These comparisons are worthless.

