

YaCy takes on Google with open source search engine - a_w
http://www.theregister.co.uk/2011/11/29/yacy_google_open_source_engine/

======
nextparadigms
I'm liking this P2P trend which seems to be spreading everywhere. I don't
think we're ready to have everything P2P yet, but it's good to see the trend
growing. At least now we know that if or when Google will be forced to censor
more results than we'd like them to, there will be a P2P alternative available
waiting for us.

~~~
nickpinkston
Let's hope your views are like those saying the same before open source came
into it's own. We live in exciting times...

------
harryf
IMO what's missing in the search space is a web search engine with an API,
especially with access to the raw crawled content. Amazon used to do this was
the Alexa web crawl data ( see
<http://www.readwriteweb.com/archives/alexa_turned_in.php> ) but later
withdrew that part of the service.

~~~
4ad
I have a gut feeling that we won't see something like that soon due to legal
implications.

------
xtracto
I (among quite a lot of people I guess) have thought about using P2P for web
search.

In fact, P2P protocols like KAD have been using to _search_ for quite some
time. What I would like to see is a search system composed of a client:

1\. Implemented in Javascript (so that the user does not need to download a
program to use it). 2\. Defining a file format which describes one URL, with
any extra useful metadata (document type, last crawling date, text content,
etc) 2\. Share those files using a P2P protocol like KAD 3\. Is able to search
in the _content_ of the URL file for words, phrases, etc

As gubatron said, having an online "frontend" would be optimal. In addition to
that, people could embed the "crawling" client in their webpage (which might
double as ad server) to help the crowling effort.

------
gst
YaCy (while a cool project) is not new and has been around for lots of years
now. I think it has quite some potential, but don't expect it to suddenly lift
of. It had enough time to do so, but didn't.

------
turnersr
I'm sure a lot of people are interested in the implementation details of
YaCy's privacy mechanisms. Does anyone know the default privacy settings? Are
search words that are sent in any way protected? I found this page:
<http://yacy-websuche.de/wiki/index.php/En:Privacy>

But it's not that helpful. I'm currently looking at the source code:
<https://gitorious.org/yacy> .

------
derekreed
"Build a search engine" == "takes on Google" ? Well ... I guess so.

------
simonbrown
I haven't looked into the internals of it, but couldn't a black hat SEO run
nodes that manipulate results in favour of their own sites?

------
danmaz74
This could be a good idea, IF there was a way to stop all kinds of malicious
people to tamper with the search results in so many ways. Google already has
to deal with the manipulation of the signals about page relevance, just think
if you had to also deal with tampering with the ranking system itself...

------
ramanujan
The intranet search engine concept is interesting and will help this grow.
Anyone know of anything else which is a search engine in a box, basically an
open source competitor to the various Google Search Appliances?

~~~
wilkenm
The distributed search model that YaCy uses would never work in a large scale
enterprise. Security, safe harbor, etc are all difficult enough using a
traditional, centralized approach. Trying to imagine this done in a
distributed way across the enterprise is giving me a headache.

And the closest thing to open source, turnkey search is gluing together Apache
SOLR and a web crawler. Lucid Imagination offers this (plus other features) as
a commercial product, but it not open source to the best of my knowledge.

~~~
halfasleep
I was playing with YaCy a little, and there is an "Intranet" mode. As far as I
can make out, this can operate in a distributed way, but behind the firewall.
I didn't look into how to set it up in great detail yet though, was playing
with web search.

------
Gigablah
Was that subtitle necessary ("good idea, stupid name"), people probably
thought Google and Yahoo were stupid names at first too.

------
4ad
So I have to install software on my computer to use it? No, thanks. They claim
an advantage: _"no content can be censored and no search results can be
recorded and analyzed on central servers"_ , this is extremely important for
some applications, but for searching source code, I couldn't care less. It
just raises the bar of adoption to the point I'm not interested in it.

General purpose client side software is dead. Client side software makes sense
only for niche applications.

------
gubatron
their idea is good, but the way that it's executed trumps its growth.

instead of having people install this on their computer, they should make it
instead so that sysadmins run nodes and put ads on their node search results.

the end user would just go to a .com site, and search. everyone running nodes
make money, more nodes are installed. The network would be larger than google
in a short amount of time.

wonder why the hell they haven't thought of this.

people aren't going to be typing <http://localhost:port> to make a search and
keep an engine running, also uptime and firewall configurations leaves a lot
of the desktop nodes out of the equation if they can't do NAT traversal to
participate in the network.

me #facepalms to still see them doing this, going to yacy.net is the most
frustrating thing ever to the curious non-techie user.

~~~
danssig
Well... you've thought of it... and they're open source...

