
Faroo: Peer-To-Peer Web Search - flippyhead
http://www.faroo.com/hp/p2p/p2p.html
======
lrvick
At first I thought, awesome, finally an open distributed search engine.

...Oh wait, you want me to install your closed source code on my computers, to
give you all my personal data and CPU cycles for you to own and use as you
like. (Which you "promise" won't be exploited or abused).

Make no mistake, this is a voluntary botnet for a single party to use to build
a company on. This voluntary botnet will provide search engine services from a
central point so they can keep the majority of profits without the overhead of
paying for resources like every other for-profit company.

This is not Peer-2-Peer it is Peer-2-Company. Also they are late to the party
on this approach, Microsoft already tried this with Bing:
[https://googleblog.blogspot.com/2011/02/microsofts-bing-
uses...](https://googleblog.blogspot.com/2011/02/microsofts-bing-uses-google-
search.html)

This model is just begging for people to troll them by releasing tools to
poison their database with nonsense data.

They should build a -real- open p2p distributed system (the blockchain is not
perfect but a good example). Short of that if they want to be a for-profit
centralized search engine, fine. They should go buy some servers and databases
they can directly control. and try to be a closed source competitor to
duckduckgo.

~~~
joshu
> They should build a -real- open p2p distributed system

what's stopping you?

~~~
CJKinni
I'm not the individual you're responding to, but probably a mix of time,
money, and interest. You don't need to make a better version of something for
your criticism to be valid.

------
ZenoArrow
So would it be fair to say that effectively Faroo works by:

1\. Scanning your browser cache for the sites you visit.

2\. Setting a ranking to the sites in your browser cache, based on how
frequently you visit the site.

3\. Merging this site + ranking information with other Faroo users in a search
index.

4\. Distributing the search index in a distributed way, perhaps all nodes only
having a fraction of the total index to prevent storage issues.

If that's a simplified version of what is happening then I could see it
working. If I've misunderstood something let me know.

~~~
dogma1138
That's pretty accurate and quite scary, mostly because it will start pulling
off bio's from PornHub now instead of Wikipedia and that can never end well.

------
magila
It appears they are relying on a combination of not documenting their ranking
algorithm and updating it periodically to prevent spammers from flooding the
system with bogus "attention" signals. I'll leave determining the likelihood
of this approach being successful as an exercise for the reader.

~~~
ljk
So... security through obscurity?

------
mdip
I found this a little problematic:

 _FAROO indexes only pages which are located in the Internet, but no Intranet
pages or HTTPS protected pages_

If it simply can't see HTTPS pages, it'll leave a large chunk of the internet
invisible to the search engine. I understand the reason for this, but it's a
technical limitation they'll have to find a way past to make it useful as more
and more sites encrypt by default.

~~~
dogma1138
It indexes pages you visit, it does some MITM/Browser snooping but at least
it's not intrusive enough to do SSL stripping.

------
wslh
Sorry but where can I try FAROO without downloading an app or connecting to an
API?

~~~
charlieegan3
[http://www.faroo.com/](http://www.faroo.com/)

~~~
daveloyall
Query: _sed split file by pattern_

Result: _No results were found._

~~~
8_hours_ago
_python string_ also has 0 results. It doesn't seem like this is quite ready
yet.

The website also added a new page to my browser history for every character
that I typed in the search box, which is kind of a pain (also, it's slightly
amusing that it added so much to my browser history when the search engine
presumably gives weight to websites that occur frequently in its users'
browser history).

------
jzelinskie
Something similar was posted to HN a while ago[0]. However, it was made
specifically for indexing scientific papers without keeping all the content.
Totally open source and runs in the browser. I feel like a more generic
implementation could be done well, but hasn't been so far.

[0]: [http://juretriglav.si/an-open-distributed-search-engine-
for-...](http://juretriglav.si/an-open-distributed-search-engine-for-science/)

------
btown
Sadly, by its very nature, it misses a lot of the "long tail" of rarely-
visited sites that Google et. al. can crawl.

------
bhouston
We tried using various for an all we developed and its results were fairly
poor, so we had to switch to bing.

------
curiousjorge
Gene Kan
([https://en.wikipedia.org/wiki/Gene_Kan](https://en.wikipedia.org/wiki/Gene_Kan))
also had the same idea, until he shot himself while working on a distrubted
peer to peer real time search engine in front of his computer I think.

I heard about him when I watched a documentary about Napster.

No documentary about Gene Kan exists. He is virtually unknown but his work was
an important contribution.

