Tons of anecdotal stories with no evidence. Obvious outsiders trying to image what it is like on the inside (all the guesses about how DDG works) and of course no facts to back anything up.
I would say this pandering and uninformed behaviour is not common on HN, but it is.
People who engage in language flame wars simply do not understand that when you become a good programmer languages do not matter only the platforms matter.
I'd hardly call the argument over language speed a 'flame war' by any internet standard I've ever seen.
There are probably better examples than this one to pull the 'HN commenters suck' card. The conversation here is mostly civil and trying to learn things, one troll aside.
But that said, choosing anything interpreted does come to haunt you later on, in some way or the other.
We run Java on Jettys for most of the things for our app (traffic is around 60k/70k in a day, so much less in comparison). But even with this traffic, we need to use 2 large EC2 instances in the day time. And mysteriously the Jettys keep going out of memory every now and then.*
I am sure the same thing if done in C++ would need only 1 large EC2 instance. And it will also help the latency a bit, as a parallel gain. At present am analyzing the cost/benefit of such a move. Inputs are welcome.
* With Java its always the memory which hurts you first. Latency wise, not much of a difference, in most cases.
Edit: Down vote? Surprised. Why??
Guess you're building the next killer app in brainfuck?
Also, doesn't it seem inefficient to have crawlers here and indexes there?
Thanks for the searchco.de articles, they were nice to read (Saw them in the previous searchco.de HN) :-)
I guess its time to stop worrying about performance of your programming language & start building better high-salable architectures.
 Slowest of all languages http://benchmarksgame.alioth.debian.org/u32/which-programs-a...
Later, I did similar, non-trivial benchmarks with Python and Ruby, and found a similar, 2x factor. Ruby has improved since then, but unless Perl has become dramatically slower in the same interval, I suspect that these benchmarks are either trivial (i.e. simple loops), or badly written.
Please - less FUD and more looking at program source code (which is 2 clicks from the URL you were given).
A quick inspection of the tests suggests a high enough bogon count that it already begins to confirm my suspicions: there's a "fasta-redux" test for Perl which dramatically outperforms the "fasta" test that's implemented cross-language. Also, many of these tests are written in least-common-denominator style, which doesn't reflect the way the languages are actually used.
EDIT: I mean, it ~150 doesn't seem that much, but I'm sure the requests are not uniformly distributed, so the system should be able to handle much more than that within reasonable latency bounds.
Have you used DDG? Requests take over 500-600ms regularly for me. Compare to Google's instant search and excellent latency for non-instant queries... which handle orders of magnitude more traffic.
$ time I=$((I+1)) curl -s "https://duckduckgo.com/d.js?q=test&t=A&l=us-en&p=1&s=0" >/dev/null
I=$((I+1)) curl -s "https://duckduckgo.com/d.js?q=test$i&t=A&l=us-en&p=1&s=0" 0.01s user 0.00s system 3% cpu 0.256 total
However, just for the sake of the discussion, about architecture and language choice etc, the interesting part would be to see how many rps before starting to degrade. There is no point bashing it and comparing it to other search engines that have more resources behind them.
Writing an internet scale search engine on the other hand would require different tools. Most successful projects has probably been done i C, C++ and maybe Java.
So i have to say i am quite suspicious of the chart on the page you linked.
For eg: Here is Which programs are best with those benchmarks removed: http://benchmarksgame.alioth.debian.org/u32/which-programs-a...
This shows Perl higher up the chart with it being 49% better (on average) than Ruby.
Some other things to note:
1. Perl has never been fast with binary trees. But Ruby 1.8 is even worse (I recall it being many times slower than Perl at this). So hats off to the Ruby 1.9 VM guys because they've turned things around and now it's even out pacing Lua on this benchmark! - http://benchmarksgame.alioth.debian.org/u32/performance.php?...
2. Fasta in Ruby maybe twice quicker than Perl however it uses over 100 times more memory! This is be because it's inlining some code (eval) giving it a huge speed bump at expense of a bigger program footprint. If I do a port of this to Perl then this fasta.pl runs 2.6 times faster than alioth's perl version (which now means the Ruby version is about 30% slower than it's direct equivalent Perl version).
3. My make_repeat_fasta idiomatic subroutine is actually a little slower than the one in fasta.pl on alioth. So both my fasta.pl & alioth's fasta.rb programs could be speeded up more :)
4. I may even be able to shave a little bit more off fasta.pl time. Same might be true for spectral norm & binary-tree however the Perl versions aren't showing up on alioth site at moment :( ... IIRC, when I last looked at the perl binary-trees code on alioth a couple of years ago I was able to shave 10% off.
ref: My perl port of fasta.rb on alioth - https://gist.github.com/4675254
What if we are selective with the evidence a different way, what if we take away k-nucleotide and pi-digits and reverse-complement :-)
>> My perl port of fasta.rb <<
re: contribute - Looking on my hard disk I see that I downloaded the bencher/shootout-scm back on 1st Jan 2010. However IIRC the process of contributing code back was a bit unwieldy. If some free tuits come my way then I may relook at it.
So it looks like I had issues logging (back) onto Alioth at that time :(
Anyway resolved it now because I see that draegtun-guest does work for me :)
It's a little convoluted and the contribute notes don't initially match what you see at login but after rummaging around I found the required tracker and have now submitted the faster fasta.pl.
Here's a link to my fasta.pl on Alioth - http://benchmarksgame.alioth.debian.org/u32/program.php?test...
NB. For posterity here are the bottom five on this benchmarks at this moment in time:
Python 3 55.45
Ruby 1.9 62.40
Perl (or OS) was upgraded from 5.14.* to 5.16.2. From cursory glance this gave all the Perl benchmarks a little boost. For eg. My fasta is about 3 secs quicker and the "interesting alternative" fasta dropped below 2.0 barrier (now timed at 1.96).
However on the summary Perl slowed down a few points (here's the new bottom five on u32 single-core benchmark):
Python 3 55.45
Ruby 1.9 62.40
PS. This maybe a temporary glitch so that dependency maybe restored soon. If not then I may amend pldigits benchmark accordingly.
PPS. I see that Python pldigits is using gmpy and is working fine. This means that GMP is installed (as is gmpy Python library) so it's just the Math::GMP perl module that's missing :(
Are you familiar with descriptive statistics? Quartiles? Box plots?
I told you i don't. It looks entirely nonsensical. I asked you for clarification. So far your only response has been to parrot my saying that i don't understand why your data representations are dissonant.
> Are you familiar with descriptive statistics?
Possibly under another name, but i don't know what you mean when you say that.
In theory yes, i am unsure how you're applying it here, since we're not talking about binnable quantities.
> Box plots?
It would have been better if you had said -- "English is not my primary languages and especially english maths are hard for me to grasp." -- instead of saying "it certainly seems deceptive".
You say you are familiar with box plots, so you should have no difficulty understanding that box plot shows - the Perl and Ruby programs have very similar performance when compared to the fastest programs.
"Visual Presentation of Data by Means of Box Plots"
Further, if you show me the actual calculations done, i will understand it perfectly fine. Yet you refuse to do so. I do not understand why, and i hope you can understand how that makes me even more distrustful.
On the graph in the overview page Perl was shown to significantly outperform Ruby in a number of benchmarks, yes, i could see that. Yet the median of Perl was still set higher than the median of Ruby, which could possibly be explained by perl also being outperformed significantly in one benchmark, but which was not supported by the actual direct comparison numbers.
So i ask again: Please show me the actual calculations performed to arrive at the median values shown in the overview graph.
>> Please show me the actual calculations performed to arrive at the median values shown in the overview graph <<
The important thing which you did not bother to point here even once is that the comparisons on the overview page are done against the fastest programs of all languages, thus weighting the results by a factor that is simply not present when one language is compared directly against another.
So, alright, the graphs do entirely make sense.
Would you be open to a patch that reworks the language vs. language comparison pages in such a manner as to make this relationship obvious?
Is stated in plain sight - twice - on the overview page.
That is precisely why i am suspicious. I cannot say for a fact that it is deceptive, but it certainly seems deceptive.
On that page Ruby is being shown as 10% faster than Perl. Yet on the direct comparison page things look quite different:
On that page, for all benchmarks that can be compared, Perl has used an overall time of 9255 seconds, while Ruby has used an overall time of 10662 seconds. As such Ruby is actually 10% slower than Perl.
Where does this difference come from?
You go too far -- your lack of understanding is simply your lack of understanding ;)
What are you told the table shows?
>> Where does this difference come from? <<
Check the same thing for 2 other language implementations were the arithmetic should be easy. For example, Java median 2.04 and Smalltalk median 21.22 -- the direct comparison shows 11x as the rounded median of the Smalltalk/Java program times.
Seems like your graphs up top in the language versus language comparison need to be reworked to make it clear how bit the difference in reality between x3 and 1/3 is, because right now it is deceptive.
What are you looking at? Perl is shown slower on 3 tasks.
If i calculate the average of the time perl took divided by the time ruby took, i get this:
Which i understand to mean that perl on average took 7% longer.
However if i turn this around i get:
Which i interpret to mean that Ruby took, on average, 58% longer.
These things contradict and i made some mistake here. Can you clear up what i should've been doing?
The table you don't understand shows the median but you seem to be calculating the arithmetic mean.
Or even better: Try to explain in detail why on the overview page perl is claimed to be slower than Ruby, when in a direct comparison it is not.
Also, in addition, after reading your link, the situation seems even worse. Using the median Perl outperforms Ruby by ~15%, but the main site does not reflect that at all.
One useful metric change is to add "memory (usage)" weight - http://benchmarksgame.alioth.debian.org/u32/which-programs-a...
Based on this we should all be using Pascal ;-)
edit: fix the name
It's not the language that makes handling big data slow. It's the algorithms and how you move the bits around.
Also an anecdotal example. Two months ago a friend of mine wrote a rather complex algorithm in Perl (He is very fluent with it). It was a novel sentence alignment algorithm using Gibbs sampling. Algorithm was hard to make parallel and not cache friendly. He needed to wait more than a day for training the system in a server. Well, long story short, with his help another developer converted it to Java in short time it worked around 200 times faster. So there. Moving bits around did not help Perl here at all.
The only reason why Blekko or Facebook is even able to launch and turn around things quickly to survive in competition is because they opt to use dynamic languages like Php and Perl. If they started doing their projects in C, with the current growth in rate of complexity they will never finish their projects ever.
>>No wonder Facebook noticed this and desperately trying to compile Php to C.
Dynamic languages aren't slower because they are not C. They are slow because they do a lot of magic. C is fast because the magic is left for you to perform. I can't see how any one pull out pace out of compiling Php to C directly, unless they sacrifice things along the way. Which really defeats the purpose of using Php at the first place.
>>Well, long story short, with his help another developer converted it to Java in short time it worked around 200 times faster. So there. Moving bits around did not help Perl here at all.
Number of programmers who even need to here the word 'bits' in their day to day activities(Talking of application programmers) are rare enough to make their case totally exceptional.
Besides there are places where C makes perfect sense. There is hardly any other language heard of in the embedded programming world.
I wrote the first prototype of Boitho (Norwegian internet search engine, now defunct) in Perl in 2000. That did not work at all because memory access and sorting was so slow. We had to write the next version in C. Of course a lot has happen with Perl in the last 13 years, but search kernel in Perl still sounds odd in my ears.
There is no reason to pay the development tax of C for everything else. Not when you can build it in Perl, get it working, and then find the hotspots and convert those to C.
Certainly not with fewer than 20 million req/day.
But regardless of that Perl continues to power some very serious work happening some very important places all over the world. And that is not likely to change sooner. No matter how many web frameworks get written in Php, Ruby or Python.
The reason for that is Perl has little or almost no competition in the niche it occupies. And anything that is likely to be invented to replace Perl will by and large like 99% look like Perl(Read: Perl 6 or whatever). This being the case we are likely to be using Perl(or a Perl like language) very far into the future.
Come to think of it, perhaps this is their feature which uses previous searches and treats your current search as a continuation of the last one; though I've altered my query the search terms I abandoned get resurrected.
[Edit: confirmed!] I just tried this in relation to another comment I posted. I typed peer to peer lending into Chrome's bar to search for this. A bunch of finance related results came up. I was looking for movie lending, so I altered my search to peer to peer lending movies and looked at the results. Still unsatisfied, I thought that maybe the term borrowing was better than lending. So I changed my search to peer to peer borrowing movies. Lo and behold, the search results show a page full of results with the word lending bolded and no results that show borrowing at all: http://imgur.com/lPPQmxY
If I absolutely didn't want the word lending in the results, I would need to alter the search to peer to peer borrowing -lending movies to avoid this. For development searches where I'm sometimes trying to find a needle in a haystack, I don't want to have to keep excluding numerous terms I have already decided are undesirable. As I never sign in when searching and I can't be bothered to find and change whatever Google setting causes this every time I fire up my browser, I find DDG to work the way I expect and often with better results too.
That said I think the competition DDG has provided the other search providers has been a benefit to me.
Frankly, I think this is morally wrong. When they use an inferior search engine, or inferior maps app for that matter, people are dumber than they might otherwise be. They miss a boost of intelligence, perhaps not IQ, but they are less effective operating in the real world. The world as a whole is worse because of it.
However, you're saying that the only thing (or the best thing) that will make the world better is better quality results. This is wrong. Playing the long game by supporting such search engines means that quality can improve and you end up with both better quality search engines AND morally improved companies. The whole world will be better because of it. You're thinking too short term.
Now, if Google can pull off something similar with heuristics alone they may have an advantage, because those little boxes are run by code covering a whole bunch of special cases, and that could be a bottleneck.
But I love the exploration options of the instant answers.. What Google is doing is definitely more time-sensitive (they're justifiably tooting their own horn a bit with it), but it doesn't feel as useful in the typical case (does anyone really use Google like a newspaper?). If you search something like "barak obama news" on DDG, you get basically the same result as the Google headline summary, and it didn't get in the way of the list on the "barak obama" results page. It feels more natural, to me at least.
"I came for the privacy, but I stayed for the features..."
> We also do about 12M API requests per day: http://duckduckgo.com/traffic.html
By your admission, you don't have any experience with search engines, and yet you have definitive opinion about everything related to it.
2) If you are offended by what he said, it doesn't give you the right to call him an asshole.
For example, I know many people whose projects you didn't complete on time and who think you are a rotten dick for rightly being one. But I didn't call you a rotten dick, did I?
And also, stop defending DDG like a pussy that's everywhere, dude.
... and I see accounts (with davidpayne11) with just enough comments/karma to be useful for trolling -- and insults from them.
Please get a life or at least take this pathetic shit to some other place than HN. :-(
Andddd, next time, before you waste time insulting people on an international forum, try to use it constructively to atleast complete projects of people whose money you've taken promising them to complete it and you actually didn't. Makes more sense than replying to my comments, right?
I understand why people values DuckDuckGo for privacy reasons. I respect that. But sure devleoping it, is far far easier than developing a real search engine.