Hacker News new | comments | show | ask | jobs | submit login
YaCy: Decentralized Web Search (yacy.net)
105 points by chatmasta on Dec 13, 2014 | hide | past | web | favorite | 29 comments

I looked at YaCy not too long ago. It's an interesting technology but needs an economic incentive to work as a Google replacement, I think. If system activity was (carefully and judiciously) tied to cryptocurrency payouts it'd probably experience hockey-stick growth.

I'd love to see something like this. I would definitely contribute.

This is a brilliant idea. I'm sure a lot of people would contribute resources if they were compensated.

payouts are not the problem. Payins are :)

This has been around for a long time.[1] Has there been any noteworthy changes/improvements that prompted the making a new submission? I've always wondered if I should run this but ever since DuckDuckGo came out, I never went through with setting up a self hosted search engine.

[1] https://news.ycombinator.com/item?id=3288586

Hm, it's computationally expensive thing to do and the crawler must be feeded and "crawl" 26/7... I mean either is run by a community running many many nodes or an individual with huge resources, otherwise isn't worth it.

I'm not sure why a federated search engine much be running 26/7 ... what's so prohibitive about federating it?

A collection of well-funded corporations can compose a "community."

Google derives 98% of its revenue from advertising. I'm not sure how advertising revenue divides between content and search, but given the much higher prices on search advertising, it seems revenue distribution is heavily weighted toward revenue derived from search ads.

This is a huge liability for google. Any company with 98% of its revenues tied to one product necessarily creates a fundamental liability for itself. I'm sure google execs realize this, and that's why they are pouring so much revenue into "moonshots" and subscription services (not to mention government contracts). They are actively diversifying google's revenue because they realize it comes from a near single source, search, which may represent a fundamentally unsustainable product.

Can Google control the search market forever? Are people (specifically technophiles and "early adopters") not growing increasingly frustrated with them? The same groups of users that Google once relied upon as an initial source of traction are now abandoning the company's products in search of more open pastures.

Decentralization is an unstoppable trend with momentum across product verticals. File sharing was the first "mainstream" invocation of decentralization technology, and blockchain/Bitcoin is the most recent. (Interestingly, Bitcoin itself is a meta-enabler of decentralization as it introduces the possibility of an automated payment layer.)

Is it inevitable that that a decentralized search engine will replace the current centralized model that Google requires for its sustained business? When will it happen? How can such a movement gain momentum?

If I were an executive at a google competitor, I would be actively exploring these questions, and finding ways to push decentralized search products into the mainstream. Mozilla, Yahoo, are you listening?

File sharing was not the first mainstream distributed technology. Packet switched networking, then http, SMTP, etc were all around first.

This blind spot in people's understanding of how the internet worked, works, and should work is why we got stuck accepting centralized services as normal. They are not sustainable though and will be superceded.

It would be cool to see a peer to peer information retrieval system that works as good or better than Google. I know this area of technology has been researched for a good long while now and still no mainstream product has appeared yet.

I wonder if Bitcoin2.0 projects like StorJ, Bitcloud, Filecoin will be instrumental in the creation of such a product.

This is a huge liability for google. Any company with 98% of its revenues tied to one product necessarily creates a fundamental liability for itself.

The cost of providing Google's organic search service is dropping as compute hardware gets cheaper. But the ad volume isn't dropping along with it. That creates a vulnerability for Google - their "free" can be undercut on price. The problem with having high profit margins is that, to a lower-cost provider, you're lunch.

"Ello" is trying to do that to Facebook. "Ello" is probably much smaller than they claim to be; almost all their Google hits were in the first week of operation. But someone else may be able to bring that off.

I think they want a bigger piece of the pie, not destroying the pie ?

Mozilla currently has no search product. However, they do have a browser. Google also has a browser. Destroying Google search would destroy Google revenue, which would degrade browser quality. Long term, Mozilla should be interested in destroying Google search.

I can't imagine that declining revenues would hurt Chrome. There are lots of other ways for Google to cut costs before limiting their investment in the browser.

And Mozilla has always been happy to ride the coat tails of Google Search. Declining AdWords revenue just means declining Mozilla revenue. Even if the two aren't partnering together going forward, Google helped Mozilla negotiate a high bid with Yahoo.

Long term, I think Mozilla wants to see Firefox OS compete with Android. There's a lot more revenue potential there and the best chance to rapidly grow their user base.

But Mozilla is itself mostly funded by Search Engines. Who pay to be included or set as the default engine.

I tried this a (long) while back... and it just wasn't practical as a search engine.

Would love to see adoption of this though. It would be great if there was a web interface for it for folks afraid to / can't install a fat client. It wouldn't contribute to the network, but more people would use it, which might lead to people with cash supporting it.

The idea for this came to me a couple years ago, but after a quick search I found that YaCy had already attempted it. I've been recently thinking of going for it anyway though. There's enough room for innovation where a really clever design could potentially lead to a useable search engine, and not just a novelty.

This is such an interesting project!

I didn't see a way for anyone who wasn't operating their own server to use the service though. It seems like a lot of the quality control / model building power of google has to do with the volume of in bound queries (they're able to see what user satisfaction with results is based on clicks). It seems like having a public facing server that interacted with the peer to peer network might help with adoption from less technical users.

EDIT: Nevermind found this demo portal (http://search.yacy.de/) although it's not very prominent

I've started a similar project some months ago, however, my idea was to put the crawl- and search-software on a server so that the search engine can be used even on mobile devices, tablets and so on. On server installation could serve all devices in a household or in a small company then.

The software is far away from being ready, esp. the kernel, the distributes search is not really implmented, the last months I had not much time to contribute, however, what is done is available here under an open source license: https://github.com/r10s/gosearch

It definitely has a nice feel to it, even if the search results aren't (at the moment) perfect.

On the left hand side, after a search it allows you to "refine" your search by categories. Stuff like author, site, filetype, language, with each one having a count next to it to display the potential amount of hits with that filter/category activated.

Additionally, looks like it has some sort of "stealth" mode. It appears to limit the search to your own peer, or what I'm guessing is already on your local index. That could be handy by itself, if properly configured, to a local or personalized search.

I've been thinking about this a lot since posting the link.

What are people's thoughts on combining social graph + blockchain + decentralized search? The idea is that your searches will be somewhat similar to others in your community, so the crawl index is sharded/partitioned to optimize for social graph proximity. If you want to index pages non proximate to you, you can get paid Bitcoin to do it.

This could be implemented with xmpp (lookup socialvpn/ipop project) for social layer, chord dht for search, and Altcoin with modified pow for incentive.

There are (at least) two hard(ish) problems involved: 1. Preventing dishonest results. 2. Not disclosing your queries to your friends.

I like the idea and ran a node for a year or so, but the truth is I never used it to actually search - the results were astoundingly bad 80% of the time.

The rest of the time they were just about ok and I'd find what I wanted on the 2nd/3rd page.

It was a good way to throw up some left-field results though, and for that reason it's worth keeping a node going if you can. Might get it up on the new VPS.

Maybe an economic incentive could be integrated into Yacy via ads?

We might get to a pretty efficient system that way. We see this with bitcoin: Mining is done with the cheapest power available and highly optimized algorithms and hardware.

If the advertising income flows back into the spidering - It could become kind of the perpetuum mobile of distributed websearch.

On the technical rather than practicality side, how does prevent garbage data/advertising being added to the collective crawl?

How about using this type of technology for something Google can't (for legal reasons)? Say for example full text search of the library genesis archive? Over Tor or somesuch?

The search page for YaCy is at


but all it searches is the forums for the Free Software Foundation, Europe. That's a job that could probably be handled by one server.

https://metager.de/en/ is a German metasearch engine that includes Yacy P2P results.

What a twist! I remember when YaCy was more like a Tor alternative.

Applications are open for YC Summer 2018

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact