Hacker Newsnew | comments | show | ask | jobs | submit login

DDG is using dynamic language like Perl[1] to achieve high-scalability which is slowest of all languages. This proves languages don't matter much, its all about architecture.

I guess its time to stop worrying about performance of your programming language & start building better high-salable architectures.

[1] Slowest of all languages http://benchmarksgame.alioth.debian.org/u32/which-programs-a...

Those benchmarks are extremely suspicious. I remember comparing Perl and Python for bioinformatic work (non-trivial computational workloads), and finding that Perl was about 2x the speed of Python, on average.

Later, I did similar, non-trivial benchmarks with Python and Ruby, and found a similar, 2x factor. Ruby has improved since then, but unless Perl has become dramatically slower in the same interval, I suspect that these benchmarks are either trivial (i.e. simple loops), or badly written.


>> ... extremely suspicious ... I suspect ...<<

Please - less FUD and more looking at program source code (which is 2 clicks from the URL you were given).


If you want to debug the metrics they're using, you're more than welcome to do it. I've got plenty of direct experience to question the results shown by a random webpage on the internet, and not much incentive to figure out why this particular set of benchmarks looks wrong.

A quick inspection of the tests suggests a high enough bogon count that it already begins to confirm my suspicions: there's a "fasta-redux" test for Perl which dramatically outperforms the "fasta" test that's implemented cross-language. Also, many of these tests are written in least-common-denominator style, which doesn't reflect the way the languages are actually used.


Nope -- The Perl fasta program is 109x slower than the C fasta program; and the Perl fasta-redux program is about 112x slower than the C fasta-redux program.


now you're just being deliberately obtuse; the perl redux program is almost 30% faster than the perl fasta program.


Now you're just being deliberately insulting, to avoid answering the obvious question -- what change was made to the algorithm?


Your benchmarks are likely very biased. The vast majority of the work done by your bioinformatics programs should be done in C code, with thin wrappers so that the code can be accessed through Perl and Python. If anything, your benchmarks only comparing specific implementations, such as BioPerl vs BioPython.


I wasn't using bioperl or biopython. I wrote the algorithms myself.


1 million searches per day is not really that much. If evenly distributed, it's only 11.6 searches per second.


We also do about 12M API requests per day: http://duckduckgo.com/traffic.html


The interesting measure is what is the peak rps

EDIT: I mean, it ~150 doesn't seem that much, but I'm sure the requests are not uniformly distributed, so the system should be able to handle much more than that within reasonable latency bounds.


Indeed. Peak RPS is what is going to be interesting. Depending on where traffic is from, it's very possible to have 12 mil requests in a day only from 9-5 type of hours and a big huge empty space during the night (This is the kind of traffic my company sees).


> within reasonable latency bounds

Have you used DDG? Requests take over 500-600ms regularly for me. Compare to Google's instant search and excellent latency for non-instant queries... which handle orders of magnitude more traffic.


I'm not a DDG user, but it's easy to measure:

    $ time I=$((I+1)) curl -s  "https://duckduckgo.com/d.js?q=test&t=A&l=us-en&p=1&s=0" >/dev/null
    I=$((I+1)) curl -s "https://duckduckgo.com/d.js?q=test$i&t=A&l=us-en&p=1&s=0"  0.01s user 0.00s system 3% cpu 0.256 total
I saw it go over 800ms a couple of times, anyway, I don't intend to run a DDG benchmark.

However, just for the sake of the discussion, about architecture and language choice etc, the interesting part would be to see how many rps before starting to degrade. There is no point bashing it and comparing it to other search engines that have more resources behind them.


Interesting. Any public big users of the API?


FastestFox is the biggest.


Great article. Thanks. Could you please share, the good things you did which made the direct searches jump during Jan-2012 to Mar-2012?


Serious question: at what volume of traffic do you plan to close it down or start charging for an api?


DDG is mostly a front end for a bunch of search engines. Perl, PHP and Python are great to make front-ends like that. Because you are for the most part only dispatching queries, parsing xml feeds and build the gui for the end user.

Writing an internet scale search engine on the other hand would require different tools. Most successful projects has probably been done i C, C++ and maybe Java.


That page shows Perl as being roughly 5% slower than Ruby, however when comparing Ruby and Perl directly, things look significantly different:


So i have to say i am quite suspicious of the chart on the page you linked.


Take away those last three benchmarks (spectral-norm, fasta & binary-trees) then things do look quite different.

For eg: Here is Which programs are best with those benchmarks removed: http://benchmarksgame.alioth.debian.org/u32/which-programs-a...

This shows Perl higher up the chart with it being 49% better (on average) than Ruby.

Some other things to note:

1. Perl has never been fast with binary trees. But Ruby 1.8 is even worse (I recall it being many times slower than Perl at this). So hats off to the Ruby 1.9 VM guys because they've turned things around and now it's even out pacing Lua on this benchmark! - http://benchmarksgame.alioth.debian.org/u32/performance.php?...

2. Fasta in Ruby maybe twice quicker than Perl however it uses over 100 times more memory! This is be because it's inlining some code (eval) giving it a huge speed bump at expense of a bigger program footprint. If I do a port of this to Perl then this fasta.pl runs 2.6 times faster than alioth's perl version (which now means the Ruby version is about 30% slower than it's direct equivalent Perl version).

3. My make_repeat_fasta idiomatic subroutine is actually a little slower than the one in fasta.pl on alioth. So both my fasta.pl & alioth's fasta.rb programs could be speeded up more :)

4. I may even be able to shave a little bit more off fasta.pl time. Same might be true for spectral norm & binary-tree however the Perl versions aren't showing up on alioth site at moment :( ... IIRC, when I last looked at the perl binary-trees code on alioth a couple of years ago I was able to shave 10% off.

ref: My perl port of fasta.rb on alioth - https://gist.github.com/4675254


>> Take away those last three benchmarks... <<

What if we are selective with the evidence a different way, what if we take away k-nucleotide and pi-digits and reverse-complement :-)

>> My perl port of fasta.rb <<



On the contrary my dear igouy, specialisation can be key and knowing which tool in my toolset is best for different tasks is indeed insightful & helpful ;-)

re: contribute - Looking on my hard disk I see that I downloaded the bencher/shootout-scm back on 1st Jan 2010. However IIRC the process of contributing code back was a bit unwieldy. If some free tuits come my way then I may relook at it.


Contributing code is a simple matter of attaching a complete tested source code file to a tracker item ticket. Really not difficult.


Looking at my keychain I see I have three logins for alioth... draegtun, draegtun-guest & draegtun_guest... all created on 1st Jan 2010.

So it looks like I had issues logging (back) onto Alioth at that time :(

Anyway resolved it now because I see that draegtun-guest does work for me :)

It's a little convoluted and the contribute notes don't initially match what you see at login but after rummaging around I found the required tracker and have now submitted the faster fasta.pl.


Frankly, i'm not convinced there's any point in investing time on improving the benchmarks there, considering the overall comparison directly contradicts the detailed comparison, by putting perl in a worse position than languages it outperforms.


I think it might well be worth it. My improved fasta.pl is now on Alioth and (I believe) this one change moved Perl above Ruby & also Python on the Which programs are fastest - http://benchmarksgame.alioth.debian.org/u32/which-programs-a...

Here's a link to my fasta.pl on Alioth - http://benchmarksgame.alioth.debian.org/u32/program.php?test...

NB. For posterity here are the bottom five on this benchmarks at this moment in time:

  PHP       40.49
  Perl      51.96
  Python 3  55.45
  Ruby 1.9  62.40
  Mozart/Oz 74.77
Previously Perl was second bottom with something like 69.3


Replying to myself because I want to add something interesting (to posterity) that I noticed today on Alioth:

Perl (or OS) was upgraded from 5.14.* to 5.16.2. From cursory glance this gave all the Perl benchmarks a little boost. For eg. My fasta is about 3 secs quicker and the "interesting alternative" fasta dropped below 2.0 barrier (now timed at 1.96).

However on the summary Perl slowed down a few points (here's the new bottom five on u32 single-core benchmark):

  Lua       31.09
  PHP       40.49
  Perl      54.90
  Python 3  55.45
  Ruby 1.9  62.40
I think the drop is because the Perl pldigits benchmark is now failing. The Math::GMP module can't be found. Pretty sure this wasn't a core module so perhaps a Perl dependency has been removed in the OS (Debian).

PS. This maybe a temporary glitch so that dependency maybe restored soon. If not then I may amend pldigits benchmark accordingly.

PPS. I see that Python pldigits is using gmpy and is working fine. This means that GMP is installed (as is gmpy Python library) so it's just the Math::GMP perl module that's missing :(


You don't seem to understand what the overall comparison shows.

Are you familiar with descriptive statistics? Quartiles? Box plots?


> You don't seem to understand what the overall comparison shows.

I told you i don't. It looks entirely nonsensical. I asked you for clarification. So far your only response has been to parrot my saying that i don't understand why your data representations are dissonant.

> Are you familiar with descriptive statistics?

Possibly under another name, but i don't know what you mean when you say that.

> Quartiles?

In theory yes, i am unsure how you're applying it here, since we're not talking about binnable quantities.

> Box plots?



>> I told you i don't. It looks entirely nonsensical. I asked you for clarification. So far your only response has been to parrot my saying that i don't understand why your data representations are dissonant. <<

It would have been better if you had said -- "English is not my primary languages and especially english maths are hard for me to grasp." -- instead of saying "it certainly seems deceptive".

You say you are familiar with box plots, so you should have no difficulty understanding that box plot shows - the Perl and Ruby programs have very similar performance when compared to the fastest programs.

"Visual Presentation of Data by Means of Box Plots"



Those two sentences are not a contradiction. I may not be good with reading english descriptions of math, but i am good with applied math. The calculations i did with your numbers disagree with what your graph showed. So to me the graph seems deceptive. There is no contradiction in this.

Further, if you show me the actual calculations done, i will understand it perfectly fine. Yet you refuse to do so. I do not understand why, and i hope you can understand how that makes me even more distrustful.

On the graph in the overview page Perl was shown to significantly outperform Ruby in a number of benchmarks, yes, i could see that. Yet the median of Perl was still set higher than the median of Ruby, which could possibly be explained by perl also being outperformed significantly in one benchmark, but which was not supported by the actual direct comparison numbers.

So i ask again: Please show me the actual calculations performed to arrive at the median values shown in the overview graph.


>> but i am good with applied math <<

Really? http://news.ycombinator.com/item?id=5141025

>> Please show me the actual calculations performed to arrive at the median values shown in the overview graph <<



Okay, i worked it out, no thanks to yoo. Actually, i fucking worked it out IN SPITE of you. All your condescending hints and links and such were entirely bullshit and did not even remotely lead in the direction of explaining why the data seems dissonant. They were flat out orthogonal to the entire problem.

The important thing which you did not bother to point here even once is that the comparisons on the overview page are done against the fastest programs of all languages, thus weighting the results by a factor that is simply not present when one language is compared directly against another.

So, alright, the graphs do entirely make sense.

Would you be open to a patch that reworks the language vs. language comparison pages in such a manner as to make this relationship obvious?


>> The important thing <<

Is stated in plain sight - twice - on the overview page.


That still does not change that your presentation of the data is not consistent in all parts, namely the language comparison.


I know it can be easy to mistake that, but i said calculations intentionally. I was not asking for the code.


Are you suspicious that you may not have understood what is shown? Do both show the same thing?


In fact, i know i do not understand how the table on this page is generated:


That is precisely why i am suspicious. I cannot say for a fact that it is deceptive, but it certainly seems deceptive.

On that page Ruby is being shown as 10% faster than Perl. Yet on the direct comparison page things look quite different:


On that page, for all benchmarks that can be compared, Perl has used an overall time of 9255 seconds, while Ruby has used an overall time of 10662 seconds. As such Ruby is actually 10% slower than Perl.

Where does this difference come from?


>> but it certainly seems deceptive <<

You go too far -- your lack of understanding is simply your lack of understanding ;)

What are you told the table shows?

>> Where does this difference come from? <<

Check the same thing for 2 other language implementations were the arithmetic should be easy. For example, Java median 2.04 and Smalltalk median 21.22 -- the direct comparison shows 11x as the rounded median of the Smalltalk/Java program times.


So, basically: Because Perl is considerably slower in one single comparison, even though it is faster in 7 others, it gets judged as slower overall?

Seems like your graphs up top in the language versus language comparison need to be reworked to make it clear how bit the difference in reality between x3 and 1/3 is, because right now it is deceptive.


>> Because Perl is considerably slower in one single comparison <<

What are you looking at? Perl is shown slower on 3 tasks.


Note the word "considerably". There is only one single task in which perl takes a considerable amount of time longer than Ruby.


Another addendum, i'm not sure i'm getting what's happening here:

If i calculate the average of the time perl took divided by the time ruby took, i get this:

((226/724)+(5.35/16.8)+(3/9)+(2750/3960)+(1120/1368)+(3236/3837)+(30.5/35.8)+(939/618)+(263/135)+(662/214))/10 = 1.07

Which i understand to mean that perl on average took 7% longer.


However if i turn this around i get:

((724/226)+(16.8/5.35)+(9/3)+(3960/2750)+(1368/1120)+(3837/3236)+(35.8/30.5)+(618/939)+(135/263)+(214/662))/10 = 1.58

Which i interpret to mean that Ruby took, on average, 58% longer.


These things contradict and i made some mistake here. Can you clear up what i should've been doing?


>> i made some mistake here <<

The table you don't understand shows the median but you seem to be calculating the arithmetic mean.


Ok, i don't get it. Can you show me the formula i should be using?

Or even better: Try to explain in detail why on the overview page perl is claimed to be slower than Ruby, when in a direct comparison it is not.


You don't know how to calculate the median?



English is not my primary languages and especially english maths are hard for me to grasp. That's why i am asking you to demonstrate, using the actual numbers for Ruby and Perl, what calculations should be performed to gain the numbers your site is showing.

Also, in addition, after reading your link, the situation seems even worse. Using the median Perl outperforms Ruby by ~15%, but the main site does not reflect that at all.


PS: The source doesn't show here: http://benchmarksgame.alioth.debian.org/u32/program.php?test...


Also see: Which programs are best? - http://benchmarksgame.alioth.debian.org/u32/which-programs-a...

One useful metric change is to add "memory (usage)" weight - http://benchmarksgame.alioth.debian.org/u32/which-programs-a...

Based on this we should all be using Pascal ;-)


Free Pascal statically links programs by default, avoiding libc.


Not even the slowest of all language implementations shown on the web page you reference.


This is simply not true. Not every problem can be solved with caching. If DuckDuckGo would be a real search engine, they would not even touch with a 10 foot pole to Perl for any non trivial algorithm/function.

edit: fix the name


Uh, hello!? Blekko is a web scale search engine that is built using Perl.

It's not the language that makes handling big data slow. It's the algorithms and how you move the bits around.


Well, sure if you want to use 1000 servers instead of 100, go ahead and use it. No wonder Facebook noticed this and desperately trying to compile Php to C. Perhaps if Blekko ever gets popular we will see if their choice will bite them.

Also an anecdotal example. Two months ago a friend of mine wrote a rather complex algorithm in Perl (He is very fluent with it). It was a novel sentence alignment algorithm using Gibbs sampling. Algorithm was hard to make parallel and not cache friendly. He needed to wait more than a day for training the system in a server. Well, long story short, with his help another developer converted it to Java in short time it worked around 200 times faster. So there. Moving bits around did not help Perl here at all.


>>Blekko ever gets popular we will see if their choice will bite them.

What choice?

The only reason why Blekko or Facebook is even able to launch and turn around things quickly to survive in competition is because they opt to use dynamic languages like Php and Perl. If they started doing their projects in C, with the current growth in rate of complexity they will never finish their projects ever.

>>No wonder Facebook noticed this and desperately trying to compile Php to C.

Dynamic languages aren't slower because they are not C. They are slow because they do a lot of magic. C is fast because the magic is left for you to perform. I can't see how any one pull out pace out of compiling Php to C directly, unless they sacrifice things along the way. Which really defeats the purpose of using Php at the first place.

>>Well, long story short, with his help another developer converted it to Java in short time it worked around 200 times faster. So there. Moving bits around did not help Perl here at all.

Number of programmers who even need to here the word 'bits' in their day to day activities(Talking of application programmers) are rare enough to make their case totally exceptional.

Besides there are places where C makes perfect sense. There is hardly any other language heard of in the embedded programming world.


Is the backend/search kernel in Perl to?

I wrote the first prototype of Boitho (Norwegian internet search engine, now defunct) in Perl in 2000. That did not work at all because memory access and sorting was so slow. We had to write the next version in C. Of course a lot has happen with Perl in the last 13 years, but search kernel in Perl still sounds odd in my ears.


Granted, sometimes flipping bits is important. Then you write that part in C.

There is no reason to pay the development tax of C for everything else. Not when you can build it in Perl, get it working, and then find the hotspots and convert those to C.


> There is no reason to pay the development tax of C for everything else. Not when you can build it in Perl, get it working, and then find the hotspots and convert those to C.

Certainly not with fewer than 20 million req/day.


Uhm, no. Google can't be implemented in Perl.


Modern day software architecture is too complicated to run on one and only one programming language.

But regardless of that Perl continues to power some very serious work happening some very important places all over the world. And that is not likely to change sooner. No matter how many web frameworks get written in Php, Ruby or Python.

The reason for that is Perl has little or almost no competition in the niche it occupies. And anything that is likely to be invented to replace Perl will by and large like 99% look like Perl(Read: Perl 6 or whatever). This being the case we are likely to be using Perl(or a Perl like language) very far into the future.


Applications are open for YC Winter 2016

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact