Hacker News new | past | comments | ask | show | jobs | submit login
More Privacy and Transparency for DuckDuckGo Web Tracking Protections (spreadprivacy.com)
78 points by TangerineDream on Aug 5, 2022 | hide | past | favorite | 44 comments



Hi, I'm the author of this post and the CEO & Founder of DuckDuckGo. I'm happy to answer questions about it. I hope folks can read the whole thing but in short, the post explains we're expanding our 3rd-Party Tracker Loading Protection to include blocking Microsoft trackers from loading, how this works with DuckDuckGo search ads, and a new help page that covers details on all our web tracking protections: https://help.duckduckgo.com/duckduckgo-help-pages/privacy/we...


How did you guys get into so much jeopardy over the Microsoft partnership in the first place? How did Startpage figure all this out, apparently with even better privacy protections, for longer, and with Google's Index - the best one? What is going on?


This is actually about our browser and web tracking protections within it around third-party scripts on other websites. From the post: "Microsoft scripts were never embedded in our search engine or apps, which do not track you. Websites insert these scripts for their own purposes, and so they never sent any information to DuckDuckGo."


If you expand the boundary of end user privacy services to "browser extensions," isn't the most intellectually honest answer to have people use an EasyList-adjacent accelerated solution, like uBlock Origin or AdGuard on mobile? That is certainly the consensus here. Is anything in the blog post an improvement compared to that offering?

That's why I'm asking about Startpage. Why do you think DuckDuckGo uniquely got put into a position of jeopardy - something more than scrutiny - when others did not?


Yes, please check out the comprehensive help page referenced for the list of web tracking protections we offer by default across platforms, some of which are not offered by related extensions, and most all of which are not offered by most browsers by default: https://help.duckduckgo.com/duckduckgo-help-pages/privacy/we.... This is the full list for reference/comparison (platform support can vary and in some cases is impossible, but working hard to get them on all platforms -- the help page has all the details): 3rd-Party Tracker Loading Protection, 3rd-Party Cookie Protection, 1st-Party Cookie Protection, CNAME Cloaking Protection, Fingerprinting Protection, Smarter Encryption (HTTPS Upgrading), Link Tracking Protection, Referrer Tracking Protection, Embedded Social Content Tracking Protection, Google AMP Protection, Google Topics Protection, Google FLEDGE Protection, Surrogates, The Fire Button, Cookie Consent Pop-Up Management, Global Privacy Control (GPC).

More fundamentally though our web tracking protections are built upon a data set (we call Tracker Radar) that is frequently updated based on web crawling (we call our Tracker Radar Detector), which offers a much more comprehensive picture of third-party web tracking of which to base lists and evolving protections relative to solely community maintained lists. Both of these are open source on github. We also have an analogous data set for our HTTPS upgrade list (we call Smarter Encryption), updated daily based on continuous crawling, which is also open source.


> 3rd-Party Tracker Loading Protection, 3rd-Party Cookie Protection, 1st-Party Cookie Protection, CNAME Cloaking Protection, Fingerprinting Protection, Smarter Encryption (HTTPS Upgrading), Link Tracking Protection, Referrer Tracking Protection, Embedded Social Content Tracking Protection, Google AMP Protection, Google Topics Protection, Google FLEDGE Protection, Surrogates, The Fire Button, Cookie Consent Pop-Up Management, Global Privacy Control (GPC)

C'mon, you're being intellectually dishonest here. uBlock Origin and AdGuard has all these things to check on. Even the obscure stuff, like FLEDGE ( https://adguard.com/en/blog/adguard-privacy-sandbox-topics-b...).

At the end of the day, at your company and at AdGuard's, it is only really one person's full time job to find this stuff. And your guy and AdGuard's guy are both reading the same garbage Twitter posts all day from the guy who's actually sourcing issues. Which he does for free because his particular malfunction is being really passionate about hating Google. And that's why we hear about FLEDGE.

This is tough. I think people want you to succeed. But I suspect the reason you didn't answer my question - why do you think you guys got raked over the coals for the Microsoft thing? - is, from your point of view, you probably think it's a smear campaign from Google, and your people have cautioned you that it sounds really conspiratorial and salty. Maybe it's true!

I can't imagine you're going to say "because Startpage is better" or "because the issues people complained about were substantive" (they weren't). And indeed, Google complains to the press about how you guys are like a mere fraction of their size, "down the street," and receiving so much intellectual attention and funding from the digerati despite having, essentially, a meaningless impact on anyone's use of the web. No offense. Like I said people want you to succeed.

But you have to do so without the sales pitch. Because it makes it sound like you'll do anything to survive, which may be good in the long term but is probably the real reason the Microsoft thing looks bad: It smelled desperate and unbecoming of someone entrusted to provide a privacy focused service.

And maybe you're going to do a full Justin Roiland and accuse everyone on the anonymous Internet forum of being totally lazy compared to you. You can be one of those "ra ra Elon Musk" people. Listen, you have no idea who writes these anonymous posts.

That said, I think the people who are talking about the censorship thing are morons and I wish you'd just focus on what I'm talking about here.


You missed an important part of their comment:

> More fundamentally though our web tracking protections are built upon a data set (we call Tracker Radar) that is frequently updated based on web crawling (we call our Tracker Radar Detector), which offers a much more comprehensive picture of third-party web tracking of which to base lists and evolving protections relative to solely community maintained lists.

It is certaintly feasible that a funded company has more time and resources to keep these lists up to date, as compared to community-maintained ones


Gabriel, whilst you’re answering questions here, I have one slightly tangential question and it’s as follows: why does DDG not have a warrant canary?

I’m skeptical of using the service since it’s a US based service and the US doesn’t have a good track record when it comes to privacy.

For this reason I only ever use DDG combined with an anonymous mixer service like Tor. It would be trivial for bad actors to be placed in your data center and tie specific IPs to particular searches, no?


Warrant canaries are problematic since they can't be trusted themselves due to possible confidentiality restrictions, can be easily misinterpreted, and can also be counter-productive to an in-process court proceeding if you are in an actual situation that would involve triggering one. Thankfully we haven't ever been in any situation like this because we simply don't collect any search histories or meta data that ties searches to individuals (like IP addresses). And our searches are end-to-end encrypted so network onlookers cannot see them, even if listening in on the network.


Unrelated (although not completely), on your news search, the msn.com reprint of an article usually shows up far more often than the original article. Is this on the list of things to get to?


Yes.


Are there plans to do more search than wrapping Bing and Yandex?


That isn't an accurate characterization of what we do now. We have on the order of a million lines of search code at this point and have a lot of talented people working them. As an example, mobile searches are the largest category of searches, and local searches are the largest category of searches within mobile. We don't get any local search content from Bing. Instead our local search content is a combination of our own indexes in partnership with Apple, TripAdvisor, and others. And then we have to further have a lot of code to know when and where to display that content on the page relative to other types of results, when to reject that content for not being relevant enough to display, and how to display parts of it that are relevant enough. In addition, we currently do not use Yandex for search content.



Best would be to report them to us and we can investigate. I see these ones coming up though right now: https://duckduckgo.com/?q=jessesquires.com, https://duckduckgo.com/?q=lapcatsoftware.com, https://duckduckgo.com/?q=bike.bikegremlin.com


To help clarify the current situation:

One of those websites reported having problems for a month or so, then having it sorted out all by itself. The author's article on that:

https://www.jessesquires.com/blog/2022/07/25/my-website-disa...

My sites are technically present, but shown only when I search for the exact domain (bikegremlin.com) or the site name (BikeGremlin).

But none of my articles are shown in SERP when I search queries that otherwise get my articles to top-3 on Google.

I've tried submitting URLs for indexing, to see if that will help, now that at least some problems seem to have been fixed. We'll know in a few days... weeks... months if that's helped. :)

So far, articles in my native are back, at least to a degree. But none of my articles in English, even those that used to rank highly (both on Google and Bing/DuckDuckGo etc).

I've used this situation to check which search engines rely on Bing for their results (primitive, unscientific method, but I think it's accurate): https://io.bikegremlin.com/28530/microsoft-bing-serp-gone-ov...


Yes, I was able to restore some sites to Bing by Twitter DMing a Microsoft Bing VP. ;-)

Where can cases like this be reported to DuckDuckGo?


Probably best is the 'Share Feedback' button on our SERP, though we're also listening on social channels.


What are you doing to improve the quality of these results? I often find myself hitting !g because Google just understands what I want better. And that's not with these constantly hyped "local" or "contextual" searches, no, it's the searches for errors, programming problems or memes (which, of course, can be vague). I would gladly submit you some of the requests I use !g for if you had a low-friction UI for it. Maybe you could use some more data on this stuff.

You're also in a unique position to compete with Google by offering advanced search tools. You have the data! Yet I can only search images by "past day", "past week", "past month". Off to Google I go again.

In the same vein, what are you doing to combat SEO spam? Again, you're in the unique position to do something for the public good and your profits here. Your search engine also suffers from the many low quality results of "buy buy buy" (the thing I want information on, not buy, god damn it) and github/SO rehashing sites, ad-laden blog spam and the like.

But imagine if it didn't! Imagine the web being searchable again!

Currently I somewhat follow projects like search.marginalia.nu and mwmbwl (or something), but those don't even have 0.1% of your funding.

I just want to be able to find content made by humans again ...


Doesn't change the fact that ddg is basically a front for mostly Bing (plus a few others for local stuff etc) though. Why not actually investing in real search technology and become an actual search engine?


We've invested many tens of millions of dollars in search technology and continue to do so. Local was just one example, chosen because as noted it is the largest category of searches.


Are the search results being "curated" by its political nature or affiliation?


They actually never were. We also recently put out a help page explaining how our news rankings work: https://help.duckduckgo.com/duckduckgo-help-pages/results/ne.... From that page: "when we apply our own ranking signals we do so in a strictly non-political manner, meaning we don’t evaluate or otherwise take into account any potential political bias or leanings of websites in our search result rankings."


That entire page suggests extremely opaque behavior for a post about increasing transparency. Where can I find a list of which sites have been manually "downranked" and by how much? What are the "non-governmental" and "non-political" agencies declaring said sites to be whatever you're declaring them to be? Where are the links to the reports that informed your decisions?


This is a new help page, and so it is just a first step in more transparency on this issue. That said, what you're referencing in the page applies extremely rarely, currently to less than 0.1% of news websites, and it is expected to stay that way. So, you are unlikely to regularly encounter this ranking signal. And even then it's impact is relatively small, only a few slots on average -- from the page, "impacted sites are not moved so far down in the results that they are effectively removed."


So are you implying that you do have plans on revealing plans on revealing which sites you're "downranking"? And ideally precisely how they fit your criteria?

I feel quite strongly about this because I, like I suspect many who use DDG, swapped over largely because I didn't really like the games Google was playing with search. As DDG goes down the same path I've moved on once again, but have been left with quite a sour taste in my mouth. I evangelized for your site for years, including on this site/account. Changes like this are the antithesis of everything that drew many people to DDG to begin with.


Yes, we are constantly working towards more transparency in everything we do. More generally, from the new help page, "A search engine’s primary job is to rank results. In other words, search engines try to put results that most quickly and accurately answer the query on top." and "when we apply our own ranking signals we do so in a strictly non-political manner, meaning we don’t evaluate or otherwise take into account any potential political bias or leanings of websites in our search result rankings."


Could you extrapolate on what you mean by "games"? I don't believe search engine bias is something that is an issue for Google or DDG. There is a lot of misinformation, it shouldn't be ranked near factual information, because humans are still bad at determining what is good information. Medical advice is a good example; this is a hyperbole, but if someone was choking and you needed to know the Heimlich, you probably don't want a result at the top for essential oils.

Politically speaking, lots of people switched to DDG not for privacy, but because it never down ranked politically biased news like Google. I don't believe DDG's mission statement is to be the bastion of misinformation.


Cool, thanks for the clarification.


Isn’t there a way to measure ad effectiveness that doesn’t require making exceptions for scripts from tracking domains? Browsers should implement a standard API for this. I trust my browser. I do not trust websites.


Congratulations on throwing yourself into the dumpster of internet search. I appreciate your efforts to push people toward using Brave Search, Qwant, and Presearch instead of DDG's censored results. Keep "curating" yourself out of the ecosystem.


We don’t censor results. Unless legally prohibited, you should find all media outlets in our results. Related, we recently published another help page explaining how our news rankings work: https://help.duckduckgo.com/duckduckgo-help-pages/results/ne....


"We don't censor, we just push sources I don't like to the 100th page of search results."

Still not buying it.

Does your hiring process still discriminate illegally? https://www.publish0x.com/late-to-the-pol/duckduckgo-alleged...

You're a bad actor with a product that does nothing but scrape Bing and pretends to stand for privacy. Just give it up.


Sure you do. Remember tank man? [0]

You don't index shit yourselves so when Bing censors something you do too.

[0] https://news.ycombinator.com/item?id=27394925


We do index many things (see related comment at https://news.ycombinator.com/item?id=32360874). We do not remove any results ourselves for political purposes and in fact we have been banned in China for many years for that very reason. What you're referring to was a temporary bug in our image search results from Bing that they promptly fixed. If they hadn't fixed it promptly then we would have taken further action.


Just now I went to RT.com and searched various article titles verbatim on DDG, they came up near the bottom half of results. I repeated this test with WSJ and found the link as the #1 result. From this it wouldn't be unreasonable to believe that either you or the sources you pull from are engaging in arbitrary website throttling/downranking.

I hear you when you say you don't remove content, but do you, or to what extent do you throttle/downrank websites?


Might it have any effect on the ranking that rt.com is nowadays blocked on a national level in most parts of the western world?


Yes, that can have an effect. To the parent question, see https://news.ycombinator.com/item?id=32361888


So much negative in this thread! Thanks for a better alternative, which after a decade is a viable alternative daily driver.


It really is unfortunate, but I applaud yegg for responding regardless. Freedom and privacy are unfortunately a polarizing topic. You can't make everyone happy, and people are more likely to comment a negative experience than a positive one.


THIS.


As far as I know this forum hasn't become the Buzzfeed Book Reviews of technology (https://www.nytimes.com/2013/11/30/opinion/banning-the-negat...).

A browser? Extensions? Replacing your search engine? From one company? That sounds familiar. People are equally skeptical and negative of Chrome's security, tracking and ads policies.


Protection for thee but not from me "improving.duckduckgo.com"


All requests to improving.duckduckgo.com are anonymous. We have a help page explaining how that works here: https://help.duckduckgo.com/duckduckgo-help-pages/privacy/at.... From that page, "To be clear, this means we cannot ever tell what individual people are doing since everyone is anonymous."




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: