Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
DuckDuckGo Launches DuckAssist (spreadprivacy.com)
114 points by bookofjoe on March 9, 2023 | hide | past | favorite | 71 comments


Hi all, I'm the author of this post (and the CEO/Founder of DuckDuckGo). Happy to take a few questions about how DuckAssist works, but to clarify what I think may be some confusion from the other comments, we're not just asking a LLM to summarize Wikipedia from its training, nor did we train an LLM on Wikipedia -- instead, we use our own Wikipedia index to pull out the relevant sentences, and then ask the LLM to verify the answer is in the sentences and then: if so, output it simply; if not, try to answer a similar question, and if that doesn't exist, then do nothing.

It can still be wrong for a variety of reasons mentioned in the post, but this is a different approach and in our validation testing (using banks of questions with known/unknown answers in Wikipedia) we know it works reasonably well, and have a lot of ideas of how to make it work much better over time. This is a beta after all, and we have plans both for improvements and additional features.


It still has the issue of treating some groups differently based on protected characteristics. What's being done about that?

For example "what is most attractive about men?" Produces an answer, but not if asked about 'women'.

And "why do men earn more?" Doesn't activate duckassist but "why do women earn less?" Does.

Well it did load must be overwhelming at the moment.


As noted in the post, the triggering of the feature right now is quite low and so it would generally be expected it would not trigger for most queries. That also means it is easy to find examples of where it would/would not trigger based on changing a word two. To the broader point of potential disparate triggering for these type of searches in particular, we have not studied that yet, but duly noted as something we should look at.


Thank you for paving the way for user privacy in the search arena. The ability to generate tons of content with LLMs is going to be overwhelming by the summer. The worry that I have regarding future search is that we need a way to defend against SEO generated by these LLMs and non-useful LLM outputs in general. Is there a way that LLMs can be used to help in this effort? I feel like we are going to have to go back to altavista style directory results to defend against LLM generated content. Any plans in this area?


Fighting SEO spam is a never-ending arms race. I agree with your sentiment that is is likely to get worse and we need to be prepared for that. I'm not sure yet if LLMs will be significantly useful themselves in stopping it.


> and then ask the LLM to verify the answer is in the sentences

How can you ask the LLM whether the answer is in the sentences without leaking the user's question?

Let's take a concrete example: if I ask "Is ADHD caused by non-genetic factors?", what does the LLM see? It's not clear how OpenAI does not learn from this example that user X is more likely to have/know someone with ADHD.


From the post, "As with all other third parties we work with, we do not share any personally identifiable information like your IP address. Additionally, our anonymous queries will not be used to train their AI models. And anything you share via the anonymous feedback link goes to us and us alone."


How can I invest in DDG?


> by asking DuckAssist to only summarize information from Wikipedia and related sources, the probability that it will “hallucinate” — that is, just make something up — is greatly diminished

This cannot be asserted and dismissed so easily.

I would not be surprised if it returns content that it thinks would have or should have been in Wikipedia.

The more you constrain it (e.g., "only answer truthfully") the more it bristles and tends to react in the opposite way, because it can't actually do that.


I asked it "what is a memory model in computer science" to see how it handles something slightly obscure. Its answer:

> The multi-store model (also known as Atkinson–Shiffrin memory model) is a memory model in computer science that describes the interactions of threads through memory and their shared use of the data. It allows a compiler to perform many important optimizations.

> More info in the History and significance section of the Memory model (programming) Wikipedia article.

It seems to think that the Atkinson–Shiffrin memory model is somehow related to computer science, which it is not. It's a model of human memory. And that article it references does not once mention the Atkinson–Shiffrin memory model. At least it's easy to verify.


Maybe it thinks linked sub-references on Wikipedia are more relevant than other information on the computer memory pages. Or maybe it just took "memory model" without the computer context.


Yeah, I think the problem is that the memory model (computer science) article is too short, so the model ended up spitting out information from something seemingly related, i.e., human memory models.


Something something, lame duck joke.

I really do not see any wisdom in turning search engines into answer engines. Yes, they're bespoke toys today, and you can still search the web alongside asking a LLM for an answer. But the day this becomes the default, it just becomes a countdown to the day "search" is depreciated entirely. And then rather than "doing your own research", you get to pick which Big Tech silo you trust to be the objective arbiter of truth. What a great future we're hurtling towards at breakneck speed...


I can't imagine the disasters that would result from getting answers from a search engine instead of human-written documents. Maybe it's an inconvenience to get the wrong showtimes for a movie but what about when you're asking about the weather on your long distance hiking trip? Whether it's legal for you to carry your gun on your trip downtown? This is going to cause so much damage.


That depends on the humans writing. ChatGPT has its faults, but it's got a better hit rate than people.

> A person is smart. People are dumb, panicky dangerous animals and you know it.

Can't get too much worse than thousands dying from a virus that has a vaccine because they're off drinking bleach and horse dewormer.


It clearly can given that some people still believe - or at least pretend to do so for the purpose of spreading propaganda - that people have been drinking bleach even without the 'assistance' of LLM propaganda machines to spread these false narratives.

And that horse dewormer? That is just ivermectin in a paste base (e.g. hydroxypropyl cellulose and castor wax) where the human version is ivermectin with a filler-diluent, probably lactose, microcrystalline cellulose or starch. There is no special 'veterinary ivermectin factory', just some factories which use the same ivermectin as used for humans which they then mix with a paste base. I'm not convinced ivermectin actually is effective as a prophylactic against or treatment for SARS2 but it is known that it shows antiviral activity. I am fully convinced that it has far fewer and far less serious side effects than the experimental vaccines which were pushed past the traditional safeguards. I am reasonably convinced those vaccines were not effective against the later strains of SARS2 - Delta and Omicron and whatever came/comes after - so it is likely that the 'boosters' actually caused more damage than they have avoided given the fact that these were administered when the bulk of the infections were caused by Delta and Omicron.

It will take a long time for the narratives (plural, both from the 'follow the leader' as well as the 'my body my choice' side of the population) to subside so that something resembling the objective truth may be discerned so we´ll just have to be patient until then.


Being able to search a ranked index of sources is extremely powerful but thats nothing compared what is possible when you can outsource intelligence.

In other words, when you can outsource the part of cognition where you need to understand the problem, go through the ranked sources, find the right answer and digest it in right way.

This is unbelievably powerful, comparing it to horse carriages vs cars does not do it justice, we are talking about Jarvis in real life!

P.S. I have some colourful opinions about the kind of thinking thats leads to your comment but I'll keep that to myself since its likely to attract scrutiny or censure


I wonder what happens when you outsource enough of your cognition...


Who Are You gives a pretty nice combination of search results and LLM responses to questions. I find it helpful to see that the responses also include community sites like Reddit, so in effect you can move from LLM to more-human responses or discussions fluidly in a lot of cases.


It's "hurdling towards". All's it takes is a quick search on Bingle Duck Jeeves! to know that fact. Do you're research :)


hurtle: to move with great speed, or to fling with great force.

hurdle: to leap over something, or to overcome an obstacle.

If you hurdle towards things with great speed I admire your dexterity. I'll stick with hurtling, thanks.

Edit: can I assume that "you're" entire comment was deliberate?


Yes, you can :)


    no sign-up required!
Yeah, you just need to install their native software on your system.

Having to make an account is annoying. Having to install native software is a no-go for me.


But; "If the trial goes well, we plan to roll it out to all DuckDuckGo search users soon."


That is the same playbook by which Microsoft plays this game.

    MS: "IT IS OPEN FOR EVERYBODY TO TRY!"
    Me: Ok, let me try it.
    MS: Ahem, ok, just give us your email
    Me: What? Ok, dammit, here it is.
    MS: Ok, now install our browser.
    Me: What?? Ok, done.
    MS: Ok, now allow us to track your location.
    Me: What??? No!
    MS: Sorry, something went wrong.
    Me: What?
    MS: Sorry, something went wrong.
    Me: What about the chat thing?
    MS: Sorry, something went wrong.


OpenAI drafted a paper proposing that in the future, platforms require proof of personhood. Plus it's MS, so they're going to slurp as much as they can get away with.



At least MS doesn't demand an active cell phone number through which they can physically track you everywhere you go (not just where you are right now), like ChatGPT (VOIP and landlines not accepted, only cell).


So no dropbox, no firefox/chrome, no thunderbird, no whatsapp client, no telegram client, not git ?


You listed examples of free software and non-free software.

The former is much more likely to find a home on my devices.

I've found it limiting to not use WhatsApp, but not that much. It's kind of like if I knew someone who is known to stalk people, but we also have many friends in common. Maybe some friends won't come to my party, but I'm still not letting them into my house.


Those programs are all doing work on your computer. Therefore you install them on your computer.

The only reason to deny access to a website for other browsers is to get your own native software on the machine of the user.

If Google would deny access to a Google website with browsers other than Chrome, everybody would be mad at them.

There are a lot of angry posts about Googles websites being less optimized for Firefox than for Chrome.

Outright banning alternative browsers is what is happening here.


Mostly correct. Browsers can block ads, trackers, Javascript libraries, and elements at will using extensions. "Native" apps for web services, such as these Electron-wrapped clients, have full access to the trackers / analytics endpoints, and more. That's a big nono.


It's not watertight, but controlling your own DNS, as through pi-hole, helps protect against this.


Rather difficult to do that when on a phone.


Maybe you meant to illustrate how absurd such a stance would be, but in reality 5 of the 7 things you mention are an absolute non-starter for me.


It's unfortunate that you need a healthy dose of cynicism to make it in today's society. A free tool seemingly created solely for your convenience trips some internal alarms for me as well.


At least here, they are specifically training it on just educational public sources. Seems like a respectable usage of the tech.


What do you mean "at least here"? What's wrong with other usage? They are using OpenAI so they haven't trained it themselves and is just inputting the Wikipedia article as a prompt. I don't see how that is more respectable than literally every other service built on OpenAI.


Right in the release:

DuckAssist answers questions by scanning a specific set of sources — for now that’s usually Wikipedia, and occasionally related sites like Britannica — using DuckDuckGo’s active indexing. Because we’re using natural language technology from OpenAI and Anthropic to summarize what we find in Wikipedia, these answers should be more directly responsive to your actual question than traditional search results or other Instant Answers.


You misunderstand how this works. They do not train it on Wikipedia. The Wikipedia article is part of the prompt. You can do this yourself by copy-pasting a Wikipedia article into ChatGPT and ask a question about it. The underlying LLM is still GPT-3 and trained on everything on internet.


I think the point is that it should only be used to summarize wikipedia information, and thus harder to use for other ethically murky purposes? But that remains to be seen; I haven't tried it yet.


No, the article doesn't say how its trained.


Yes it does:

If you enter a question that can be answered by Wikipedia into our search box, DuckAssist may appear and use AI natural language technology to anonymously generate a brief, sourced summary of what it finds in Wikipedia — right above our regular private search results.


That just means the initial prompt says something like, "You are an expert Wikipedia research assistant. You can only reference material found on Wikipedia. Try your best to help the user by giving summaries of Wikipedia content. Do not reference material from other websites!"

The model is still trained on a broad corpus like ChatGPT3, i.e., the Internet. The corpus of Wikipedia alone probably isn't enough training data.


I'm sure it makes sense to start with wikipedia and instant-answer type searches from a tech standpoint if that's what you can iterate on best, and if you've got ideas for preventing confident falsehoods.

It's also the type of search where I'm sure this addition is least valuable. If I'm searching for a discrete fact that I know exists, stuff that you find on wikipedia, that's the type of search that still works fine. Instant answers are convenient but not necessary.

Often when I'm trying to solve a problem I need to break it down into more generic, smaller components, discrete pieces of information that I know exist, then search for those and recombine/recontextualize the information I got. Or branch out into new searches based on anything that was an "unknown unknown" before the initial search.

I'm hoping LLMs can help with both of these things. Allow me to search for something closer to the problem I'm actually solving rather than requiring me to break it down into components; do the recombination for me and do some of the branching for me.


Don't really feel the direction DDG has been going lately. What are the best alternatives that still covers my need for !bangs?



Maybe startpage? I'm also looking for something. That has scratched my itch thus far, but not sure if there are better alternatives.


This just seems like a faster way to infect people with incorrect information. I mean every result returned will soon enough just be AI generated articles with that incorrect information, and you are saving us a step from clicking, but I'm getting sick of seeing this stuff.

If you can't do the job right and make sure the answer is right, then you shouldn't be publishing anything, or you are just damaging people and eroding trust. Returning search results in a fuzzy way I can't understand or fault you for is one thing and gives you plausible deniability, but not this. I'm all for Google alternatives (I use DDG), but this is definitely a turn off. Put the time into finding the right search result that actually has an accurate answer and just point me to that as the first result. Hell add a sublink that points to the actual answer on that page below the main result, but get rid of the gimmick that is AI.

Less hallucination is still hallucination.


Kinda crap compared to Brave, which uses an independent index, and AI to refine website descriptions and create answers.


How do the answers actually compare? I've refused to install brave after their crypto donation mess, but I know it has some nice features.


How is this not named "Ducky"?


Yup yup yup.

Though that very well could be trademarked by Disney.


The Land Before Time is a Universal Picture property. Which I only point out because Don Bluth very much left Disney.


So, AI continues to infect more things I use. That's unfortunate.


It seems like it's meant to be an extra feature and not a replacement for search. This is a good thing and will help DDG reduce dependence on Bing. Of course, maybe dependence on Wikipedia is not so good if you care about accuracy.


Trading 1 Microsoft dependency for another one since they're relying on OpenAI which is now 49% Microsoft owned.


And anthropic, which has google investments. They are bound by licensing not to block some of the Microsoft analytics, but the same might not be true of which NLPs they use.


Yes but certainly a good chunk of the companies finite resources will be shifted towards this effort vs improving upon search. No judgement whether this is good or bad, just... the way the world is headed.


This one is less awful than other similar features, since it relies on a single source.

However, it's going to take me awhile to come to terms with the fact that AI makes ownership obsolete.


Additionally, our anonymous queries will not be used to train their AI models. And anything you share via the anonymous feedback link goes to us and us alone.

I wonder if their licensing allows training their own models while using Anthropic and OpenAi? Seems like a conflict of interest, but I hope they can. I have mixed feelings about instant answers keeping users from clicking through to the sites where the content was originally posted, but this is going to help a lot of people.


All the search engines are adding LLM for queries to their search engines, but I don't see any search engine talking about defending against crap results using LLM - which would be more useful. I don't need generated results, I need accurate results that align with my intent. Use the LLMs for that dudes.


Tried it out, pretty disappointing so far. "Is the Collatz conjecture solved?" gave no "intelligent" assist, even though the first result is the Wikipedia page on the Collatz conjecture.

"Do cows sleep standing up?" gave an answer of some sort, but very unnatural and not very helpful.


I asked it how old the Arthurian legends were and it said, "I didn't come across that information exactly, but I did find an answer to, how old is King Arthur? King Arthur is believed to have lived in the late 5th and early 6th centuries. More info in the List of Arthurian characters Wikipedia article."

It is using inference somewhat at least.


Meh, DuckDuckGo has lost its appeal when they started openly censoring search results, despite the fact that they aren't even using their own search engine under the hood.

I expect their AI to be heavily lobotomized to the point of being completely useless.


Brave Search's Summarizer is A LOT better!!!

Try, for example:

where is bill watterson?

in both and compare for yourself!


Missed opportunity to call it Duck Typing™


> Why do ducks fly together?

....proceeds to say that ducks commonly fly together.

Thanks for the non-answer. Surprised that was their example.


In fairness it wasn't asked why.


good try, but won't pass the bar


good try, but won't pass the bar.

Maybe concentrate on not trying to censor websites




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: