How can he use Bing as a backend? I read their API's terms of service and it forbides you from changing search results in any way.
“(c) modify, filter, obscure, or replace the text, images, or other content of Bing results, including by changing the order in which Bing results appear (but this limitation will not apply to Bing results of type "Web"), intermixing Bing results with search results from other sources, or intermixing with Bing results any other content so that the other content appears to be part of Bing results;”
I was quite interested in using Bing API myself but that does seem too restrictive. I hope you can prove me wrong!
That parenthetical only applies to the clause about result ordering. if you read the whole TOS, it's quite clear that Bing doesn't want you to do what DDG appears to be doing. In particular, mixing results from different services is a no-no in most search APIs.
The perl code isn't a full search engine though. He uses Yahoo! Build Your Own Search Service, so that perl code is just taking the results from there and throwing results out, reordering results, getting the zero-click info, etc. Yahoo does all the crawling.
I do my own crawling & indexing in addition to using third-party APIs. It's a hybrid approach where I focus on the places I think I add the most value.
Yes, blekko has a full web crawler, indexer, and some of the query processing code in perl. Some of the innards and query execution are in C++ for speed.
Generally we tried to delay the conversion of a component to C++ as late as possible, since it makes it so much harder to change and iterate on. We call it "pouring cement on the code". For some things it's absolutely necessary to drop to C for speed, others it doesn't make a lot of difference since you're bound by other constraints like I/O.
That's cool. I've used DDG a few times in the past and generally like so far. I had no idea it was done in perl though. It's nice to see new perl apps getting some press.
I wonder if they use any of the newer CPAN libs like Moose and family in the architecture?
Having written an indexer in Perl for my current startup product, I really can't see the need for using something like Moose. I'm biased though, as I'm not a big fan of OO programming. I gave it a try in Perl years ago but it really sucked. OO programming that is, but I guess Moose was designed to fix this, but it's a little too late for me.
I was taught C in school and I've learned to live without OO.
I am not sure why anyone cares about startup time. I start my apps about once a month. 1 second instead of .1 second doesn't really matter. It's like saying, "C++ has an unacceptable performance penalty" because you have to compile your code before you use it. Yeah, you do. You can pay a million times at runtime or you can pay once at compile time.
For desktop apps, start them when you log in and connect to them via App::Persistent.
I believe he was responding to your "I am not sure why anyone cares about startup time." While you may have been referring specifically to Moose, it's possible to interpret that as a blanket statement for all programs and languages, even in context.
Especially when you're learning Moose, writing some more involved Unix scripts would be an option. It's not that bad for those circumstances, though. Personally, I just have problems wrapping my Perl style around more involved types of OO, so for Perl I'm happy enough with normal packages and procedural programming.
And as a side note, the compilation times for C++ made a lot of people switch to another language, although Moose is about a few orders of magnitude away from those kinds of problems.
Like all things, I think it depends on the problem that you are trying to solve. Some things lend themselves well to having a nice abstraction layer while other times, this abstraction layer just adds unnecessary complexity.
I don't want this to turn into a C vs C++ debate but I do agree with Linus in that it's nice being able to just grep for something.
Some Perl press as appeared on HN before. For eg: http://news.ycombinator.com/item?id=565152 (though sometimes the Perl community have done their best to keep head low on this one :)
Thanks. Good Post. This is great example that good application and complex systems can be written in perl as well.
On a side note, i saw the previous post on DDG, where discussion was about how DDG does not store any private information. It makes sense in the context, that since he leverages upon other search engines , his cost of running is low. So he can afford to ignore user information otherwise used for commercial purpose by others.
“(c) modify, filter, obscure, or replace the text, images, or other content of Bing results, including by changing the order in which Bing results appear (but this limitation will not apply to Bing results of type "Web"), intermixing Bing results with search results from other sources, or intermixing with Bing results any other content so that the other content appears to be part of Bing results;”
I was quite interested in using Bing API myself but that does seem too restrictive. I hope you can prove me wrong!