Has anyone ever seen DuckDuckBot hit their site? I don't have any web property large to appear via DuckDuckBot, but maybe someone else does? Im fairly certain it crawls Quora, as via this tweet https://twitter.com/yegg/status/33693491838066688
I checked my logs and there are several fetches from 72.94.249.37 and 72.94.249.38, over a number of domains that I host. None are particularly popular as far as the greater internet is concerned; one is a semi private site that I set up for my daughter's photos, another is one that has not yet been developed, apart from a few words of text and an image.
Interestingly, the fetches do not have a user-agent that identifies itself as the DDG crawler:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)
I'm assuming this is the crawler because it does not fetch anything besides text/html.
No, it crawls non-Alexa sites too. I host two sites, one PR4 with Alexa rank 1.6+ million and another with PR3 and no Alexa rank. Only the latter one was accessed in the last year according to my servers access logs (and only with a single "GET /" request).
Try searching for something using their api, and compare with websearch. There is a world of difference between the search results. IIRC they aren't allowed to share results from some of their other sources via their api, so you can get a good idea of how much they take from other sources and how much from their own index.
But that is a non-issue as far as I'm concerned. As long as the results are relevant and they got them legally, then who cares where or how they came from?
Both are search/website results. "Links to entries on other sites" are exactly that, and DDG's special sauce is presenting third-party results contextually.
As for "followed by bing results", there is a single one that is equal to bing. The results that follow are also far from similar (this is exacerbated by Bing insisting on giving me local results regardless of relevance, which DDG purposedly avoids).
And yes, I do single word searches very often, but if you're so inclined, here are results for a longer search: http://cl.ly/image/0m1E3I3J0M1U (DDG shows Hulu, tv.com, CTV, amazon, and doesn't repeat the Wikipedia entry)
DDG is really only displaying definitions, aside from hacker.org. Given that this is a dictionary "word", I'm not sure how much of this query is really due to indexing and how much is due to identification of it as a singular word. Because there is so much query specific customization with search engines, (context-dependent results), it's hard to identify what results would have been returned from their raw indexes.
If you adjust the term to "hacker movie", you get more similar results between DDG and Bing. But, overall, it does seem like DDG is returning more differing results now than in the past.
But they are using legitimate means to provide search results. And feed(/API) providers are not shutting them out.
Same argument can be made about Google, they do not produce their own data just copy webpages from content providers. If content providers decide to block them from indexing their websites, Google would be irrelevant.
It does not do any of it's own indexing.
It's just a frontend to other very very expensive backends that have millions of dollars behind them.
The entire company can be shut down overnight if it's data feeds are cut.