- A non-trivial part of the current contributions included "cheat sheets" which IMO, really required a lot of effort to ensure correctness/usability but don't really provide much improvement to search results(I don't think I myself used the feature in the past 1.5 years more than 3-4 times), so, this should really free up time for DDG staff to focus on the more important instant answers and features.
- The community has been, for a while now, getting smaller and less contributing in the recent past. Backed by data from official repos(the number of commits over time, that is). After all, there are only a finite number of instant answers before they just become redundant.
- The current model for the triggers(when an instant answer gets displayed) is quite restrictive. It's just regex-based. IMO, a lot more growth can be achieved using ML models for triggering, A/B testing etc.
I'm still kind of disappointed with this. Perhaps unrelated, but does anyone have any suggestions for people willing to work on similar open source projects.
: https://github.com/duckduckgo/zeroclickinfo-spice/graphs/con... , https://github.com/duckduckgo/zeroclickinfo-goodies/graphs/c...
Entire web archives such as the entire dump of wikipedia and stackexchange (including media and indexes
for search) can be stored locally. The missing piece is Google level search quality on the local machine. Given that brute force substring search can process Gigabytes in seconds nowadays. If you have enterprise grade server hardware things are reaching 1000GB/s. At this rate, there is no reason to think in a couple years local search of all known human knowledge can't happen on a local device at Google level result quality.
For anyone interested in the search space look into whats possible today in local offline search.
Anyway, I wish we'd see more search and NLP related posts here on HN. It deserves far more attention than it gets.
All you have to do is look at the speed at which new info is being added to Wikipedia and Stackoverflow which is stabilizing, i.e. it is not growing as it once was. Basic/foundational knowledge is more or less all covered. https://en.wikipedia.org/wiki/Wikipedia:Modelling_Wikipedia%...
And that sum total comes to 50-60 GB compressed. Think about that number. It's not big.
Edit: Fortunately I'm left feeling foolish, rather than horrified.
You do not even need "Google level" for most of today's web users.
You can deliver what users need with respect to web search with much less than "Google level".
For example a simple "<title>" search. This is how Google started.
The entry point into the web should be search for domains. A "<title>" search can do that.
Most users today do not do much searching within websites via Google. They search for websites using Google.
Anyway, you are right about storage space and offline search but obviously that truth misaligns with the "cloud" business narrative and coaxing users to store all their personal data in datacenters instead of on their desk or in their pocket.
Expect much opposition to this simple truth.
Try it out. You'll find that it's... it feels like a trip back to 1998.
And searching for domains is only a tiny part of it, especially now where a lot of information is stuck in general sites with a lot of content (wikis, Q&A sites, social media sites) and not on special-interest sites. And for many generic searches the special-interest domains are various levels of spam/affiliate marketing.
It's been on my list of things that I will almost definitely never take the time to actually work on, but I wished what I had was (A) a browser extension or GNOME extension that incorporates an offline version of all the DuckDuckHack modules, and (B) the same thing in an open source mobile app. (This kind of thing could just as easily live in a command line app, though, and I'd be super happy if a project maintainer incorporated them into something like GNU Units.) I looked into it, especially for (B), but I realized that the DuckDuckHack code depends on Perl.
DDG does have official(and unofficial) browser extensions and apps for iOS/Android.
Sure, but there are a large number of instant answers that can and do work offline because they're simple, static tables, or are self-contained—existing only to apply transformations on the input (e.g., cheatsheets, natural language unit conversions, and calculations).
> DDG does have official(and unofficial) browser extensions and apps for iOS/Android
A browser extension that just sends the query the same as it would if you hit their homepage is in the "what's the point?" category, just like mobile sites that nag you to install their app when all it does is show you the same content that is (or could be) on the mobile site itself. The "is a browser extension" is not the interesting part. "Doesn't send data to a third party" and "can operate without being connected to the network" are.
!s is enough to redirect to Startpage. :-)
there are different instances :
I'd argue the opposite, the time when such a thing would have been possible is long past. If you want to get anywhere near to the quality in results that the big engines offer then you're going to be spending some pretty big cash.
I have no idea why that is, but it's really hard to find relevant information for anything outside mainstream stuff. Just the other day I was looking for how to post JSON in a test with Flask, and I hadn't found anything after tens of searches. Surely, something must be referenced in some code on GitHub, or a reddit comment, or some blogpost somewhere, but it proved impossible for me to find.
This is even expanding into my code search. Like I'll type "do something os related python linux" and get commands for windows as the first few hits. Clearly I don't want windows.
Needless to say I am not going to do the former, because it was intended to be an anecdote indicative of the general case -- so example-specific solutions are not likely to be transferable, because I don't hand portions of my web history to random people to prove something, and also because I have better things to do.
Success to you!
For the record, you do not have the right to demand that someone use their time to prove something that is only of interest to you, nor do you have the right to demand that someone debate you. It is extremely arrogant and self-righteous of you to state otherwise.
> Can you please provide an example of this type of search?
My "arrogant and self-righteous" "demand that someone use their time to prove something that is only of interest to" me
>> demand that someone debate you
Not found. Apologies for any failures of communication or reading comprehension on my part?
>> demand that someone debate you
> Not found.
I am surprised to discover the hard way how the intricacies of tone could cause anyone to document "for the record" an implication that I wrote something that I didn't write (re: demand), and then claim that I have written "arrogant and self-righteous" statements demonstrating my incorrectly supposed possession of two oddly specific non-existent rights (re: 1. others' use of time, and 2. debate) when I wrote no such thing.
I truly appreciate the potential value of this opportunity to learn from your feedback; please don't interpret this new request as a demand! (You are of course as always welcome to just say no [thanks?] or to not respond, among many other options.)
Perhaps my takeaway should be to add a disclaimer similar to the one immediately preceding this sentence to future questions; my mistake may be rooted in taking for granted what I believed to be understood by all as core to the function of this particular means of communication.
PS. FWIW, I have upvoted all of your replies as is my habit in very small part to thank you for your time.
I call this the "Retina watermark fallacy"—when you equate something not being the latest and greatest with being unacceptable. When Apple introduced the "Retina" Hi-DPI display for the iPhone 4, it was good. But what's more, it was supposed to show that everything else was junk. And yet, if you looked at Apple marketing materials from ~5 years prior, you could find breathless ad copy about their then-latest displays that were (necessarily) not Retina quality. That means either one of two things are true:
1. either Apple was selling unusable junk prior to the introduction of the Retina displays, and they managed to mistakenly convince themselves and everyone else that this stuff was acceptable when it actually wasn't, or
2. pre-Retina displays were good enough, and Hi-DPI displays are simply better
The truth is lies in 2.
So, the takeaway as I understood it from the original question would be whether or not the FOSS world could produce today a search engine on par with, let's say, 2002-era Google. (I remember 2002-era Google, and not only did it work, but it was good!)
If there was any way the FOSS community could fund a reasonable search engine, I'd be happy to work at non-profit wages to make it happen. I don't see any way.
It allows you to run a document search engine that can be distributed over multiple machines. It could be adapted to create a web search engine. Interestingly, it looks like DDG uses Solr, though I'm not sure if it's used in their core search features or not.
If this is not correct, anyone have a link to the exact repo I should be looking at? The link in TFA only goes to the main account page, not any specific repo, and the repo names are not clear enough to tell if they have what I'm looking for.
* https://github.com/duckduckgo/zeroclickinfo-goodies - "Goodies" which are generally static answers such as cheat sheets, colour picker or unit conversions.
* https://github.com/duckduckgo/zeroclickinfo-spice - "Spice" for using public APIs, e.g. weather, transport status or currency conversions.
* https://github.com/duckduckgo/zeroclickinfo-fathead and https://github.com/duckduckgo/zeroclickinfo-longtail - "Fathead" and "Longtail" are less common and are for text lookups, e.g. of programming docs.
Disclaimer: DuckDuckGo staff
It's amazing to see so much human effort went into this project and the full 1200-word list. I thought I had read somewhere that this was automation backed by Wikipedia, but apparently it was entirely manual?
I was not aware of this being open source.
A quick look-through led to this sample search -- "Movies with Keira Knightley". However, "Keira Knightley movies" fails to give the same instant answer. Any permutations of words "Keira", "Knightley" and "movies" on Google seems to give the list of movies -- which is how the behaviour should be I guess, will probably open an issue/PR :)
I was expecting "hundreds of new bugs".
If they are making so much money, why would they end the program?
On the other hand, if you view it as an announcement that they're going to be taking Instant Answers closed source to keep future changes in-house, then it makes sense.
Reddit has just pulled the plug to all their open source repositories. This will make it harder to develop and keep third party clients updated, like the ones without ads for example.
Maybe you have a relevant story to tell about DDG?