Many people like DuckDuckGo for good reasons but let's be clear: It does not do ...

joenathan · on Nov 22, 2012

According to DuckDuckGo's FAQ you are wrong http://help.duckduckgo.com/customer/portal/articles/216399-s...

The DuckDuckBot crawls and indexes the web. http://duckduckgo.com/duckduckbot.html

ck2 · on Nov 22, 2012

I suspect it's trivial at best.

Every test search I've ever done on DDG shows near identical results to Bing.

I'd like to see a search that uses it's own data, examples?

Gigablast was the last serious third-party backend that had a chance for independent data. It's like old-school Google.

Gabriel should try to buy Gigablast and merge it with DDG so he has his own independent dataset.

lubujackson · on Nov 22, 2012

Blekko has millions behind it and does a full index of the web.

ck2 · on Nov 22, 2012

Ah now that is an interesting engine.

Checking to see if it's hit any of our sites.

3k pages, not bad. Data is kinda stale though.

boyter · on Nov 22, 2012

Gigablasts creator has made this now http://procog.com/ the results are better then gigablast.

boyter · on Nov 22, 2012

Has anyone ever seen DuckDuckBot hit their site? I don't have any web property large to appear via DuckDuckBot, but maybe someone else does? Im fairly certain it crawls Quora, as via this tweet https://twitter.com/yegg/status/33693491838066688

ck2 · on Nov 22, 2012

It's not just you. Maybe he only crawls the Alexa top 10k or some similar minimal set.

metalruler · on Nov 22, 2012

I checked my logs and there are several fetches from 72.94.249.37 and 72.94.249.38, over a number of domains that I host. None are particularly popular as far as the greater internet is concerned; one is a semi private site that I set up for my daughter's photos, another is one that has not yet been developed, apart from a few words of text and an image.

Interestingly, the fetches do not have a user-agent that identifies itself as the DDG crawler:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

I'm assuming this is the crawler because it does not fetch anything besides text/html.

Matt_Cutts · on Nov 22, 2012

That's interesting.

Gabriel, does DuckDuckGo's crawler have a distinct user agent? Can you talk more about how DuckDuckGo observes/respects robots.txt?

boyter · on Nov 30, 2012

Follow up to this (its been a week so probably nobody will see this)

http://techzinglive.com/page/1028/179-tz-interview-gabriel-w...

around the 70 minute mark Gabriel mentions that DuckDuckBot is mostly about determining if pages are spam.

samuellb · on Nov 22, 2012

No, it crawls non-Alexa sites too. I host two sites, one PR4 with Alexa rank 1.6+ million and another with PR3 and no Alexa rank. Only the latter one was accessed in the last year according to my servers access logs (and only with a single "GET /" request).

elssar · on Nov 22, 2012

Try searching for something using their api, and compare with websearch. There is a world of difference between the search results. IIRC they aren't allowed to share results from some of their other sources via their api, so you can get a good idea of how much they take from other sources and how much from their own index.

But that is a non-issue as far as I'm concerned. As long as the results are relevant and they got them legally, then who cares where or how they came from?

bradly · on Nov 22, 2012

I've asked Gabriel specifically and he said that they use their own crawler/indexer.

ck2 · on Nov 22, 2012

Their results are nearly identical to Bing in every case I've tried.

Let me know if you find a search result that is different.

lubujackson · on Nov 22, 2012

They definitely use Bing as their backbone as Gabriel has said somewhere before.

ricardobeat · on Nov 22, 2012

First attempt: http://cl.ly/image/340S0R3b0K0d

ck2 · on Nov 22, 2012

Those aren't actually website results. Those are links to entries for the single word on other sites, then followed by bing results.

How often do you do single word searches in the realworld?

ricardobeat · on Nov 22, 2012

Both are search/website results. "Links to entries on other sites" are exactly that, and DDG's special sauce is presenting third-party results contextually.

As for "followed by bing results", there is a single one that is equal to bing. The results that follow are also far from similar (this is exacerbated by Bing insisting on giving me local results regardless of relevance, which DDG purposedly avoids).

And yes, I do single word searches very often, but if you're so inclined, here are results for a longer search: http://cl.ly/image/0m1E3I3J0M1U (DDG shows Hulu, tv.com, CTV, amazon, and doesn't repeat the Wikipedia entry)

mbreese · on Nov 22, 2012

DDG is really only displaying definitions, aside from hacker.org. Given that this is a dictionary "word", I'm not sure how much of this query is really due to indexing and how much is due to identification of it as a singular word. Because there is so much query specific customization with search engines, (context-dependent results), it's hard to identify what results would have been returned from their raw indexes.

If you adjust the term to "hacker movie", you get more similar results between DDG and Bing. But, overall, it does seem like DDG is returning more differing results now than in the past.

amitamb · on Nov 22, 2012

But they are using legitimate means to provide search results. And feed(/API) providers are not shutting them out.

Same argument can be made about Google, they do not produce their own data just copy webpages from content providers. If content providers decide to block them from indexing their websites, Google would be irrelevant.

ck2 · on Nov 22, 2012

Look at the twitter token fiasco for an example of what happens when you build your business model on someone else's API.

mhartl · on Nov 22, 2012