

Python vs Clojure 2: Reloaded - lbj
http://blog.bestinclass.dk/index.php/2009/10/python-vs-clojure-reloaded/
Many Python users submitted better code than what was posted in the last article - This time I've dug up exemplary code from Rosetta.
======
icey
Since I know Lau reads comments here:

The way to advocacy is to start with an excellent example in language X and
then tell us how it can be done better with an example in language Y.

Cherry-picking examples isn't going to win you many converts.

~~~
jacquesm
Even worse, it makes you look after two tries at it like you have an agenda to
push.

I've flagged this because I thought the first time around that that was the
case, now I'm sure.

Python and clojure both have their strengths, and their weaknesses.

Contrasting languages to show how one solves something vs how the other does
it is education, keeping 'score' is a childish attempt at proving 'your'
language to be the better one.

key quote: "But like I also told one disgruntled Scala user, it's not always
fun being compared to Clojure and I think a few Python users felt that they
got more competition than they hoped for."

And the picture at the end speaks volumes.

What I really don't get is if you're that much of a clojure fan why you don't
get that these stupid articles are doing clojure more bad than good.

It reminds me of the damage done to Erlang by flooding HN with all those
articles. If a language is good then there must be other ways of showing that.

Show me not how you implement some trivial benchmark in 16 vs 12 lines of
code, show me how you made a serious production system capable of handling a
couple of hundred thousand uniques in a real-world application.

And I'm sure that can be done, but that would be 'real work (tm)', whereas
sniping is so much easier.

And for extra points, then find an expert programmer in the language that you
wish to compare with and have them do the exact same thing.

That would impress me. This just annoys me.

~~~
lbj
jacques, calm down, please.

This is a fun comparison showing off some features of both languages. Notice
in example 2 where Python is actually the most concise, cleanly written
example where my code comes out looking a bit cramped? It's not a hatchet job,
it's a comparison which will tell you something about both languages.

Sure I like the real world examples too, but for a simple outline of a few of
their respective qualities this should tell you enough. If you have a larger
project you'd want to tag-team on, showing the capabilities of both languages
on a much larger scale I'm definitely up for it, that would also be
educational.

~~~
jacquesm
I'm pretty calm, thank you.

I don't have a dog in this fight, I like both languages for pushing the
envelope, but I wouldn't consider myself qualified to do a comparison that is
meaningful. The only language where I think I could make any statements is C,
and that's only because of decades of using it. I've used PHP for business
reasons for 10 years and I _still_ wouldn't think I'm qualified to write a
benchmark comparing PHP to something else.

The other day, there was a discussion here about 'the hundred year language'
(that essay by PG), and I wrote that if there is a candidate for that spot
it's either Clojure or Erlang, and my money would be on clojure fwiw.

The python guys ripping in to perl with cherry picked examples, the clojure
guys ripping in to python it gets us exactly nowhere.

You don't need to compare to anything to show excellence.

But if you're going to do it, Rosetta is the right way to do it.

If you want to do runtime comparisons then I think you should get a person
that has at least your skills in python to have a look at what's going on, and
you should not just show one example where you come out ahead.

You should show many examples along the lines of the great programming
language shootout. Again, that's real work.

Who knows how many were tried where clojure was terribly slow that were not
shown for that reason and so on.

Hand-picked, biased, a waste of time.

~~~
lbj
jacques,

I don't quite agree with your sentiment that excellence stands on it's own,
but time will tell.

I'd just like to correct two mistakes you made.

1) I didn't pick examples where Clojure runs circles around Python - Like I've
stated before, Example #2 is a lot cleaner in Python than what I wrote. The
fact that my implementation is 9.4x faster than the Python teams is not
subjective and tells you something important about parallelized sequence
consumption.

2) I didn't hand pick anything, I picked Lehman because it was on the front
page, followed another link to something which didn't have a Python solution,
another to the 'Count examples' and so on. I didn't set Python up for sniping,
which I why I think the comparison is interesting.

And for the record: I think Python came out looking pretty good.

~~~
jacquesm
> I didn't set Python up for sniping, which I why I think the comparison is
> interesting.

Ok, I'll take that at face value then, but from the writing I sure got a
different impression. This may be my language skills though, tone of voice is
a subtle thing.

Quotes that are confusing to me:

> which (until the next release) is a bit slow on the JVM.

Is that meant as an apology for bad performance on one benchmark ? If so why
apologize, in a benchmark all that matters is the facts, it's slow, or it's
not.

You leave out the timings on this benchmark, concentrate just on the line
count, which suggests in combination with the above that clojure performed
less well than python but you left out that data. Or maybe it wasn't, but it
would have been consistent to show run-times for all examples, code size for
all examples (and coded up in roughly the same way), preferably gzipp'd to get
rid of formatting bias.

> Might be a little confusing, but it's a great performance booster. << means
> bit-shift-left and

Who is your audience here ? Any programmer worth their salt recognizes a shift
operation when they see one.

In your clojure examples you consistently count only the lines with code, in
the python examples you count the blank lines as well.

Besides, who cares about line count when one language puts a whole pile of
expressions on one line and the other does not, clearly, line count is not
going to be a good metric to compare the one to the other.

You're also counting lines which define constants used only once and lines
that print headers, the clojure program doesn't have those luxuries.

So, let's rewrite the python code like you wrote the clojure code:

    
    
      from math import sqrt
    
      def is_prime ( p ):
        if p == 2: return True
        if p <= 1 or p % 2 == 0: return False
        for i in range(3, int(sqrt(p))+1, 2 ):
        if p % i == 0: return False
        return True
    
      def is_mersenne_prime ( p ):
        if p == 2: return True
        m_p = ( 1 << p ) - 1
        s = 4
        for i in range(3, p+1):
          s = (s ** 2 - 2) % m_p
        return s == 0
    
      for p in range(2, 33219):
        if is_prime(p) and is_mersenne_prime(p): print("M%d"%p);
    

Now, please note that I'm not a python programmer of any standing and that
I've mangled the code to make it shorter but keep it functional, to compare
apples with apples.

If linecount was that important than I could shorten it by quite a bit
further, but I don't think it is a very useful metric, especially not if you
count whitespace lines in one and not in the other.

Btw, this one is now 16 lines. Not as small as the clojure example, but less
than half of what it was before, I'm sure a real python guru could shorten
that by another couple of lines, essentially there is no real significant
difference between python and clojure here.

The solution where you claim an impressive speed boost is mostly due to the
simply running a bunch of tasks in parallel, which when opening stuff through
the network is of course a big boon.

If the files would have been resident the test would have been more
meaningful, otherwise you would have to compare apples with apples by running
multiple crawlers feeding a single reducer.

And that would have been a sample worth making, but you couldn't be bothered.
Then the clojure program would probably have been much shorter than the python
one.

By the way, running that counting example on my computer here I get a time of
5:47, which is about twice as fast as your clojure run, wonder why that would
be. It is also _24_!! times as fast as the time that you report for python.

I'm not quite accusing you of cooking the books but it would be nice to have
you find out what went so terribly wrong when you ran that test.

And so on. Really, I'm not impressed.

>

~~~
lbj
jacques,

I'll try to get everything.

I didn't benchmark the 2 examples since they run about 8 hours each. I
mentioned the next JVM release, only to highlight that some improvements are
being implemented for BigInteger, which currently is a slow as can be.

Regarding the 2.nd example and linecounting. Firstly, if I counted blank lines
with Python I apologize, I must have been a bit too fast going over it.
Generally the line counting was not meant to be viewed as THE indicator for
quality, as I obviously compressed the Clojure-code more than what was
optimally - The irony didn't come across though.

Your speed on the webcrawler is really quite amazing. I hit it at 2200 here,
which is around 4 o'clock on the east coast of the US if I'm not mistaking,
using my 4Mbit line. It took exactly 1 hour and 53 minutes, so if yours run in
6 minutes ... Something was horribly wrong.

The fact that my gain is from opening several connections is not masked in
anyway - I knew that was the key to speed, that's why I did it :)

Anyway, putting line counting aside I suggest focus be given to the clarity
and expressiveness of the samples - where Python is not lacking. Line counting
is fun for golfing.

/Lau

ps: Out of curiosity, did you get 8200+ something tasks from your crawl?

~~~
jacquesm
> I didn't benchmark the 2 examples since they run about 8 hours each.

ok.

> I mentioned the next JVM release, only to highlight that some improvements
> are being implemented for BigInteger, which currently is a slow as can be.

Ok, Afaik BigIntegers are slow everywhere.

> Regarding the 2.nd example and linecounting. Firstly, if I counted blank
> lines with Python I apologize, I must have been a bit too fast going over
> it.

ok

> Generally the line counting was not meant to be viewed as THE indicator for
> quality, as I obviously compressed the Clojure-code more than what was
> optimally - The irony didn't come across though.

It didn't, besides that, it degrades the whole of your post to focussing on
meaningless metrics.

> Your speed on the webcrawler is really quite amazing. I hit it at 2200 here,
> which is around 4 o'clock on the east coast of the US if I'm not mistaking,
> using my 4Mbit line. It took exactly 1 hour and 53 minutes, so if yours run
> in 6 minutes ... Something was horribly wrong.

That was my take on it. When I saw how fast it did the first couple I decided
to time it, I was quite surprised on seeing the results. Now I'm curious how
fast clojure would run on this rig. fwiw it wasn't exactly doing nothing while
running that test either, I just didn't feel like shutting stuff down.

> The fact that my gain is from opening several connections is not masked in
> anyway - I knew that was the key to speed, that's why I did it :)

And you're _still_ twice as slow as my python run. I think that benchmarking
anything in a context such as networking should be done by several runs at
different times of day and presenting both the average as well as the min and
max run times.

> Anyway, putting line counting aside I suggest focus be given to the clarity
> and expressiveness of the samples - where Python is not lacking. Line
> counting is fun for golfing.

I think that expressiveness in both languages is comparable, I don't really
see a clear winner, it depends on the use case. I'm more used to imperative
languages (for the moment) so I can read the python code a little easier but
I've long ago learned that micro-optimizing the number of lines at the expense
of clarity is a losing strategy.

> Out of curiosity, did you get 8200+ something tasks from your crawl?

here's the condensed output:

eleven:/tmp# date ; python examples.py

Mon Oct 19 16:49:35 EDT 2009

100 doors: 50 examples.

99 Bottles of Beer: 67 examples.

Abstract type: 18 examples.

Ackermann Function: 56 examples.

Active object: 8 examples.

Adding variables to a class instance at runtime: 16 examples.

Address Operations: 18 examples.

...

XML Creation: 17 examples.

XML Reading: 17 examples.

XML and XPath: 15 examples.

Xiaolin Wu's line algorithm: 3 examples.

Y combinator: 20 examples.

Yuletide Holiday: 30 examples.

Zig Zag: 27 examples.

Total: 8275 examples.

eleven:/tmp# date

Mon Oct 19 16:55:22 EDT 2009

eleven:/tmp#

\---

edit: I just saw jcl's post above, combining his trick with some more cramming
we now have:

    
    
      from math import sqrt
    
      def is_prime(i): return (i > 1 and all(i % x != 0 for x in range(2, int(sqrt(i)) + 1)))
    
      def is_mersenne_prime ( p ):
        if p == 2: return True
        m_p = ( 1 << p ) - 1; s = 4;
        for i in range(3, p+1): s = (s ** 2 - 2) % m_p
        return s == 0
    
      for p in range(2, 33219):
        if is_prime(p) and is_mersenne_prime(p): print("M%d"%p);
    

There must be a way to make it shorter still ;)

~~~
daivd
Making it shorter is trivial :) I have never tried posting code here, so I
I'll try to get the formatting right.

    
    
      #The import line is unnecessary. It can be inlined, like so:
      def is_prime(i): return (i > 1 and all(i % x != 0 for x in range(2, int(__import__('math').sqrt(i)) + 1)))
    
      #Mersenne prime can be put on one line with some functional beautification:
      def is_mersenne_prime(p): return p == 2 or not reduce(lambda x, y: (x ** 2 - 2) % (( 1 << p ) - 1), range(3, p + 1), 4)
    
      #With a list comprehension we do not need "if ..:" and can put the loop on one line as well:
      for p in [range(2,33219) if s_prime(p) and is_mersenne_prime(p)]: print("M%d"%p)
    
    

Three lines. Of course the functions are unnecessary, so inline them in the
for-loop and we have our magic target, one line!

I have replaced the list comprehension and range with generators, so if you
run this one-liner in your python terminal you will get a continuous stream of
primes (if it flushes the prints properly).

    
    
      for q in (p for p in xrange(2,33219) if (p > 1 and all(p % x != 0 for x in range(2, int(__import__('math').sqrt(p)) + 1))) and (p == 2 or not reduce(lambda x, y: (x ** 2 -2) % (( 1 << p ) - 1), range(3, p + 1), 4))): print("M%d"%q)
    

Easy as pie.

I don't know anything about code golf, so perhaps I am breaking some rule with
the longer than 80 char line? Using ; to sequence statements as in the
original is definitely cheating, IMHO :)

------
lbj
Hi,

I just signed up for the single purpose of posting this, since quite a few
Python users felt that the examples in the last post were poor - This time
I've selected prime examples from Rosetta and despite jacques comment, the
point wasn't to make Python look bad - it was exactly the opposite.

Lau out, hope you take something away from it.

