I don't believe I have ever made that claim and it isn't true either that we are "little more than a Bing mirror". See other comment on this thread (https://news.ycombinator.com/item?id=36898807) but in short it isn't a binary thing and I think the fallacy people keep making is a modern search engine != web index.
Over the past fifteen years, search has become "universal" with dozens of indexes and modules throughout the page. Also it is important to understand people click/engage with things on the page roughly half as much each position down, so by the time you get to the bottom it is like 100 times less than the top. This means that things on top, increasingly non-web links, have become more and more important.
In this context, as mentioned in the other comment, the largest modules on mobile and desktop we power ourselves, that is, local and knowledge graph. AI will be the same. We do use Bing as the primary source for traditional web links, but not for all and even when we do it sometimes looks different in various ways. In something like this, we can re-insert this link if that is what is needed.
> Bing is our largest source of traditional web links.
I think "traditional web links" is the main (and I think should be the only) product of a search engine, and it seems like you rely on Bing quite significantly to serve search results.
From lots of experience, I can tell you if you don't have these things (and others, this is an incomplete list just off the top of my head) showing up in the right places (often above "traditional web links") at the right times then you cannot get long-term search engine adoption from mainstream users:
I understand that you need these to be a full "search engine".
What we really want to know is if for links alone - what extent do you index? Let's ignore the knowledge graph, let's say we just want a simple word match.
I'm surprised you didn't have techdirt's homepage indexed at all. "techdirt" did not return "techdirt.com". To me this means you aren't indexing the home pages for popular websites at all - let alone their content - and rely effectively exclusively on bing with maybe some "caching".
Talking about links alone - exact word matches with the domain name (or page content other than your knowledge graph), what do you index?
What specifically do you do other than adding sites manually to a list and I assume "caching bings results"? To me this would still be 99.999999% ~= 100% reliant on bing for search results.
This isn't meant to be an attack in anyway, I'm just looking to clarify what's going on.
Gabriel, could you please clarify once and for all: for "traditional web links", how much do you rely on bing? The user consensus seems to be 100%. What is the official number from you? Thanks!
> but not for all and even when we do it sometimes looks different in various ways
Personally I'd like to know if that means "we have a manual list we re-added". I think what myself and probably many people are most interested in is "do you rely on bing for < 95% of word match searches (no knowledge graph)"
> when we do it sometimes looks different in various ways
IMO this is also vague - I assume this means they're knowledge graph augmented? I really want to know if they have their own indexer (vs a manual list + caching bing results) for text matching.
> I think "traditional web links" is the main (and I think should be the only) product of a search engine
This is context-dependent. If you're looking for a link, then you're looking for a link; but more often than not, you're actually looking for answers.
Say you're looking up an exchange rate or the value of a stock; converting units; checking the weather; or just want to look up an actor's photo. Why would you click and wait for a (likely, slow & ad-infested) page to load, if the search engine can provide factually correct data without any extra steps? How often do you read past the first paragraph on Wikipedia?
If you want to dive in / verify any of that information, it's still all just a click away. But having to only do one thing instead of two is almost the definition of technological progress.
I use DDG as my main search engineer. Many of my searches have hashbangs into other sites. On DDG landing pages, I often go straight to the right hand side of the page, with the info panel. Sometimes, I click on traditional web links, but with less frequency as the sites are so often spam.
> I think "traditional web links" is the main (and I think should be the only) product of a search engine
I want room for both.
There are times I want the search engine to stop "helping" (sic) and just give me a straight search based on my terms.
There are other times where I'm looking for something specific but I don't remember exactly what it is, so I want help correcting/ narrowing down my search.
You can spin it any way you like, but the fact is evident: you rely on Bing to the extent that if Bing delists a site, that site will be completely sanitised from every single part of a DDG search, including all those dozens of modules you keep referring to.
Thanks for looking into this. Would love to learn how a specific query like "site: techdirt.com" got wiped like that. Will you share your findings once it's been identified and resolved?
I doubt there will be any findings, because the removal from the index was done by Microsoft rather than DuckDuckGo. The postmortem has to come from Microsoft. Likely the only thing that Yegg can do is poke some internal contact at MS and tell them to fix it.
They can do better, however, it takes time and effort to do that - they can and do augment Bing results with other data sources when needed, but up until recently there was no need to add anything extra with respect to Techdirt; it certainly makes no sense to index all of the internet and try to duplicate all the search results for every search just in case Bing has filtered something out.
I feel like they would love to be able to do that.
One potentially relevant line from the article:
>I love that first one. Microsoft, a company with a $2.5 trillion market cap, “may not have enough resources” to crawl and index Techdirt? Cool.
I have to imagine DDG's valuation wouldn't quite hit 2.5 trillion if it went public. It's not absurd that they aren't able to fully duplicate those efforts (or that they don't need to in order to have a decent product).
They fixed the problem so it shows ability to. Maybe the issue is “detect what parts we should do ourselves”? How do we detect censorship beyond blog posts like this? Can we diff various indices on a regular basis?
You can diff indices but it doesn't make economic sense to do so.
The whole point of using Bing as the base is so that you don't have to make and keep an index of everything yourself, so DDG wouldn't have another exhaustive index to compare to (and if they would, they wouldn't need Bing). They index some things to add onto Bing, but they don't try to capture everything.
Furthermore, any specific site missing from the index does not necessarily mean a flaw - okay, you would diff some indices, and find out a long list of what Bing has excluded... and then what? For every 'fixable censorship' case (accidental or intentional) there will be thousands of spam or malicious sites which should be excluded, as nowadays the key part of search is not finding everything but throwing out the results which want to be found but shouldn't. Again, DDG wants to piggyback on the effort that Bing is doing to filter the index, and if they want to second-guess all Bing's filtering (as opposed to just making a fix for this specific case) then they have to replicate and improve on all of the (huge and expensive!) Bing's filtering effort; which goes counter to the reason for using Bing which is to avoid all this expense.
It's turtles all the way down, I mean, it's Bing. If your basic source of information, the thing that you rely on to create any smoke and mirrors on top of it, i.e. the index, is basically Bing and just Bing, then you're just a Bing mirror. More and more I see myself using Yandex out of all things to get decent results. The extremes the web got in 2023 are bizarre, the "best" search engine in the world today may very well come from a dictatorship controlled country.
Either you’re very misinformed about what a mirror is, or you’re being extraordinarily disingenuous here.
Bing is their initial and primary source of data for web links. Not the only one either, but the primary one. They maintain their own index from it.
That’s not “a mirror” by any stretch of the imagination.
What appears to have happened here is some form of replication bug where in a way yet to be determined a removal from the original data source was unintentionally replicated.
Data synchronization is hard when you’re not directly mirroring data. This kind of bug crops up
If they're not just a mirror, it seems rather strange that when a specific website disappears from Bing's index, it also disappears from DDG. In my very common sense based view of the situation, if DDG really builds it's own "index" based on a variety of sources, then why did TechDirt completely disappeared from it's search results? DDG can argue all day about how much their "local and knowledge graph" is built in-house, yada yada yada. At the end of the day, a search engine must retrieve the most relevant websites for a given query. And nothing can be more relevant to "site:xxx.xxx.xxx" than the site itself. If it doesn't appear on DDG results, then it's not indexed. And then it clearly shows that DDG is nothing but a fancy mirror to Bing search.
> Bing is their initial and primary source of data for web links. Not the only one either, but the primary one. They maintain their own index from it.
What is their other data source? And how did it come to suffer from the same problem?
I keep seeing insinuations that DDG has some other secret source than Bing, but it's always hand waved and never explicitly named. Incidents like this seem to strongly imply that only Bing really matters at the end of the day.
Yeah DDG has always been weirdly evasive about how their search engine works and where results are coming from. If it makes you feel any better, bing has been caught copying results from Google (https://www.wired.com/2011/02/bing-copies-google/) so there's a chance you'll get bing results plus results from whatever other search engine bing copies.
Personally, I don't even mind if using DDG is the same thing as searching bing so long as it actually works, but they still can't get a simple search like "office -microsoft" or "headphones -best" right. I even thought they acknowledged the problem and were looking into it at one point.
It’s not an insinuation, it’s a straight fact direct from the founder.
That you don’t understand data replication problems—or know the inner workings of a private company’s codebase & data funnel—doesn’t make it some conspiracy.
Over the past fifteen years, search has become "universal" with dozens of indexes and modules throughout the page. Also it is important to understand people click/engage with things on the page roughly half as much each position down, so by the time you get to the bottom it is like 100 times less than the top. This means that things on top, increasingly non-web links, have become more and more important.
In this context, as mentioned in the other comment, the largest modules on mobile and desktop we power ourselves, that is, local and knowledge graph. AI will be the same. We do use Bing as the primary source for traditional web links, but not for all and even when we do it sometimes looks different in various ways. In something like this, we can re-insert this link if that is what is needed.