
Why You Should Be Using a Private Search Engine - searchencrypt
https://blog.searchencrypt.com/news/search-encrypt-use-private-search-engine/
======
morinted
The post is titled "why you should be using a private search engine" and yet
there's only a paragraph at the end that's subtitled:

    
    
        Why You Should Use Private Search Engines?
    

Here's the entirety of that section:

    
    
        The truth is, people know that search engines are
        tracking them. Most websites have privacy policies
        which clearly state that they use cookies and other
        means of tracking users. Any in many cases, when we
        are asked to give apps, websites, or products
        permission to track us, we blindly agree. This may
        seem acceptable at first, but it’s problematic if that
        same tracker is following you two years later, when
        you’ve forgotten about it.
    
        Your search engine should be optimized for searching
        the internet, not tracking you once you’ve left. Your
        Google information can be used against you in legal
        cases, even in civil cases like divorces.
    

So if I understand correctly:

1\. You might forget about trackers you've accepted.

2\. Your search history can be used against you in the court of law.

They couldn't come up with anything better for this ad?

~~~
LV-426
Pretty funny how anti-tracking they are, in their daily submissions to HN,
considering the number of resources from external sites - Google, Facebook,
Medium, Twitter, Linkedin, etc - their page loads.

------
ChuckMcM
I don't disagree with the premise, but given that nobody can afford to make
the actual 'engine' part of a search engine other than Microsoft, Google,
Yandex, and Baidu you're pretty much stuck with having your queries ending up
at one of the four anyway.

Masking front ends are ok but not any more effective than say using Tor to
route your search requests.

Today this suggestion, you should use a private search engine, is equivalent
in my mind to you should only use cash to buy things. When you have a panoptic
view of both, digital and commercial activity the ability to create a
reconstruction of what you have been doing or thinking through meta-data
analysis becomes quite difficult to avoid.

At this point having your browser spam with other searches as a fuzzing
technique would be effective.

I suggested (only half joking) that a fundable startup might be a system where
subscribers send in their license plate and for $10/month or something every
month they will get an envelope with one, two, or three other license plate
symbols printed on magnetic material to stick to your car. It doesn't have to
_look_ like a license plate to you and me, only to an automated license plate
reader which leaves a lot of room for avoiding state laws about having
multiple plates.

The idea being that members plates will be thoroughly mixed with a bunch of
false hits in license plate reader data bases everywhere making use of that
data impractical for your enemies.

~~~
hux_
Offline search is the obvious alternative. Every cell in your body has its own
data and search engine. It doesn't need a Google, Microsoft, Yandex, and
Baidu. How come?

Downloading a dump of Wikipedia or Stackoverflow and indexing it however you
like on mid range hardware is trivial today. Look at what Kiwix or
Zeal(offline doc browser) do today. It's just a matter of time before people
start packaging up data and indexes customized to your info needs downloadable
like a song from iTunes.

~~~
ChuckMcM
Having built a search engine I can tell you that your minimum index size is on
the order of 5 billion documents. You can back fill long tail searches with
one of the big players. For minimum hits on the backfill 10 billion documents
are better. To get 10 billion high quality documents you will want a “crawl
frontier” of probably 500 billion URIs, which, given the power log rule of
data you will need to actually crawl about 20% of those every 7 days.

Of course you can hold much fewer pages in your index if you are not
particularly broad in your searches. The gotcha there is that when you need
that thing you don’t normally need, you either have to go online or go
without.

Not impossible of course, just that the scale may be larger than you expect.

~~~
hux_
The value of the long tail has been oversold I feel.

My estimate based on my own usage for the past year and a half or so, of
curating my own local indexes is about 30-40 million docs. This is the
equivalent of having your own personal Library of Alexandria (text and images
no video). I don't have numbers but I work offline a lot and my guess is
70-80% of my queries are probably getting satisfied by my local dumps.

~~~
ChuckMcM
Not disagreeing with you, I was just trying to connect that the index size is
a function of the breadth of information.

For example, I have digitized roughly 300 volumes (books) which collectively
represent about 100,000 pages of information. That collection which represents
a big chunk of my originally print reference library creates a relatively
small n-gram index of about 22 GB. But it is a small sample of generally
available reference material and doesn't include the 22 years of digitized
Scientific American articles (much harder to parse out for indexing when
starting with the PDF form). But it still answers a lot of reference queries
quickly and accurately for things I am interested in. For things that I become
interested in and have yet to have started curating a set of references for,
its worthless.

As a result my experience is that the closer I get to my long term interests
the more likely I am to find something in my library to answer the question,
things that are more temporal (news, new research) are not there at all
generally, and things that are only now of interest are similarly not
represented. The thing that Search engines do so remarkably well is that they
cover a very wide swath of interesting material preemptively.

To host that locally would be a more significant effort for me.

~~~
hux_
Ah full text search...sorry my bad. I have been referring to much simpler
indexes. All mostly sub 1GB that fit in memory. For Stackoverflow for example
title, URL and tag indexes. Wikipedia - title, URL, categories, geotags. This
has been working out somehow for my general use. It has the feeling of working
inside a library with card indexes. Lot of decent work can still get done. I
agree with your points on why the bigbois are valuable and relevant when it
comes to temporal and new constantly changing info. But what I am finding is
(probably unique to my usecases) is I have enough info on disk to keep me
occupied and productive for long periods totally offline.

------
jccalhoun
I couldn't find where they get their results. I searched for my real name
which has a hyphen and it didn't find anything for me when google, bing, and
duckduckgo do. Instead if found a bunch for a baseball player whose name is
part of mine, and a couple of low quality links for things like "______ arrest
records" and other junk. I don't think I will switch.

~~~
jgillich
For any search term I enter, the first 5 results are ads and only one actual
result fits on my screen. What the actual fuck.

------
boomboomsubban
This is spam, probably for malware. They give no reason to trust them, their
site is a series of blogposts that only contain inane promises, and they're
owned by a company who developed "Adverify," which seems to be some service to
see how well your ads are getting past blockers.

------
anigbrowl
This seems like a press release. What's the difference from something like
DuckDuckGo other than branding?

~~~
tmikaeld
They seem to do the following according to their FAQ[1]:

\- "We utilize the latest encryption technologies, including a feature known
as Perfect Forward Secrecy which goes a step further than traditional SSL by
using a unique public key for each individual session"

\- "Even server logs are disabled to ensure that any identifiable information
your browser may be broadcasting" with requests are never read or stored on
our servers.

\- "On top of all that we utilize an extra layer of query encryption at the
client side in order to ensure that your history remains private from other
users who may access your computer"

[1]:
[https://www.searchencrypt.com/about/faq](https://www.searchencrypt.com/about/faq)

~~~
greglindahl
The first 2 features are pretty common in privacy-focused search engines.

The 3rd involves a closed-source browser extension doing who-knows-what... and
who's it been audited by?

------
wafflesraccoon
I really love DuckDuckGo too, while I find my search results can be off
sometimes (Ie, searching for React gives YouTube reaction videos before the JS
library) their bang search feature rocks.

------
shafyy
Why the shit is this even being discussed on HN? This is clearly self-
promotion and should be banned.

