
Human Web – Data Collection Without Privacy Side-Effects - pythux
https://0x65.dev/blog/2019-12-03/human-web-collecting-data-in-a-socially-responsible-manner.html
======
dessant
I'm wary of companies that insist so much on collecting user data, especially
when it's done at such a granular level. I keep wondering what did Mozilla
have to gain from associating themselves with Cliqz.

[https://en.wikipedia.org/wiki/Cliqz#Integration_with_Firefox](https://en.wikipedia.org/wiki/Cliqz#Integration_with_Firefox)

~~~
tschakkaMarc
Hi – thanks for the feedback. I’m Marc (disclaimer, I work at Cliqz). The goal
of the collaboration was to jointly build a better, more private search
engine. Don’t forget that every major browser today sends every keystroke of
the Omnibar to either Google or Bing. No privacy mechanism in place (and don’t
even get me started on all the tracker madness). We wanted to jointly replace
that. Cliqz is building a new search from ground up with privacy by design. In
the end the collaboration didn’t work out (for many reasons, lack of privacy
was not one of them though). Firefox changed back to Google as search
provider. Coming back to your point: There are many services that can’t be
built without data. Search is one of them, without data you will have a very
bad search engine, impossible to compete. We explain this in detail here:
[https://0x65.dev/blog/2019-12-02/is-data-collection-
evil.htm...](https://0x65.dev/blog/2019-12-02/is-data-collection-evil.html) .
We took maximum scrutiny, and this article about Human Web is exactly there to
explain how we collect data that is needed, without the side effect of
collecting personal data. We are so transparent about this, because we want
the scrutiny. Our business does not depend on collecting personal data or
actually any data. But our product needs a lot of data. Denying anyone to
collect data – even if they are as open, transparent, and without any interest
in personal information – just means you support those that are the incumbents
and have no interest in privacy.

~~~
dessant
Is there a detailed writeup on the Human Web proxy network, specifically on
the data transmission? It would be interesting to see how does it prevent
Cliqz and proxy server operators from learning the user's IP address. Was Tor
evaluated as an alternative for data transmission?

~~~
philippclassen
(disclaimer: I work at Cliqz) Not yet, but I'm literally working on it (or
rather taking a break from working on it).

Short answer: The proxy will see your IP but does not share that information
with us. To prevent the proxy from reading your content, we need to end-to-end
the communication (and prevent statistical attacks based on the size of the
encrypted data and so on).

Regarding Tor: Yes, we experimented with sending through Tor. The main issue
is that our code needs to run in a WebExtension, which is a restricted
environment. You can only use WebSocket communication but no real TCP sockets.
The next blog post in the series has more information and has a link to the
code of our experimental Tor client (the native Tor client compiled with
WebAssembly to be used in a WebExtension).

I hope the post will address your open questions. If not, you can ask tomorrow
about the details of HPN.

~~~
philippclassen
[https://0x65.dev/blog/2019-12-04/human-web-proxy-network-
hpn...](https://0x65.dev/blog/2019-12-04/human-web-proxy-network-hpn.html)

There is also a new thread about it:
[https://news.ycombinator.com/item?id=21704850](https://news.ycombinator.com/item?id=21704850)

------
lamkhanhsong
This is the 3rd post from the advent.

day 1: Why the world needs more search engines
[https://www.0x65.dev/blog/2019-12-01/the-world-needs-
cliqz-t...](https://www.0x65.dev/blog/2019-12-01/the-world-needs-cliqz-the-
world-needs-more-search-engines.html)

day 2: Is data collection evil [https://www.0x65.dev/blog/2019-12-02/is-data-
collection-evil...](https://www.0x65.dev/blog/2019-12-02/is-data-collection-
evil.html)

------
chrmod
Asking HN: point another company describing why and how it does data
collection at this extent.

------
dessant
There is an even better solution: do not collect _any_ data, especially when
your customers are paying for your product.

If your business model involves subsidizing product prices with the prospect
of tracking users and collecting personal data, consider releasing a version
of your product that cost more, but does not engage in any data harvesting.

~~~
gfodor
For a given product requirement, you have a set of potential reifications
which include their data impact. There's not a zero or one dichotomy as you
suggest. You can cut requirements, but all the ones that remain will have
varying reliance upon data collection. The goal should be to minimize that
impact to its least possible surface area while still delivering the same user
value. It's especially toxic to pre-emptively collect data for unknown, or
conjectured later requirements. YAGNI applies.

~~~
__ka
We could not agree more. We tried our best to make a case why the data or
privacy dichotomy is a false one, especially in the context of building a
search engine - one that is competitive and independent. Our rules are: no
personally identifiable information (even minimize probabilistic attacks),
minimize data collected to the bare minimum. This is what we use ourselves and
what we want our families to use. We care.

It does come with challenges though.

1\. It requires a change of mindset by the developers

2\. Processing and mining data implies that code be deployed and run on the
client-side.

3\. The data collected might not be suitable to satisfy other use cases.
Because data collected has been aggregated by users, it might not be reusable.

4\. Aggregating past data might not be possible as the data to be aggregated
may no longer be available on the client.

However, these drawbacks are a very small price to pay in return for the peace
of mind of knowing that the data being collected cannot be transformed into
sessions with uncontrollable privacy side-effects.

Disclaimer: I work at Cliqz (some of the comment comes from the article
itself).

~~~
__ka
There was a reply to the parent comment (mine), that appears to be removed,
but that we believe it still deserves to be addressed.

\--------------

REMOVED COMMENT:

\--------------

Is data collected by Cliqz Browser and your search engine used in any way to
provide the MyOffrz marketing service?
[https://myoffrz.com/en/](https://myoffrz.com/en/)

Edit: according to the privacy policy the browser and the data collected
through it is indeed used to deliver targeted ads.

[https://cliqz.com/en/privacy-browser](https://cliqz.com/en/privacy-browser)

This is the problem, we do not need any more ad companies that care about our
privacy. We need browser vendors that are in the primary business of creating
browsers, and possibly charging for advanced features.

We need search engines that do not collect our entire browsing history and how
we interact with sites on the pretense that its needed to deliver better
search results, while their business model actually revolves around using the
same data to deliver targeted ads.

Cliqz could very well collect data and use it only to make search better, and
release paid search products for businesses and customers, but they chose to
make their users the products.

I had doubts about why people were so upset about your company, but now I see
it. Cliqz is esentially capitalizing on the growing privacy movement to market
itself, uses subpar technical solutions to ensure data privacy (routing
sensitive data through FoxyProxy), and in the end delivers the same old
service: our data is harvested, and we are delivered targeted ads.

\-----

REPLY:

\-----

You're cherry-picking and putting things out of context:

> "the data collected is used to deliver targeted ads"

That's not said like this anywhere [e.g. in the linked privacy policy],
because it would be false. We do targeted ads, but the targeting is done on
the >>client<<, which is the only place that has your history information.
It's not possible to do targeting outside the browser itself, because, unlike
the rest, we do not have this information.

> you collect browser history

Also incorrect. We collect URLs, but on isolation, we never have the full
history. In the article we discussed at length why this is dangerous, of
course, we would not do the same.

> Uses subpar technical solutions to ensure data privacy (routing sensitive
> data through FoxyProxy)

There is a nice short reply here
[https://news.ycombinator.com/item?id=21696963](https://news.ycombinator.com/item?id=21696963),
but just in case please have a look at the paper:
[https://arxiv.org/abs/1812.07927](https://arxiv.org/abs/1812.07927).

> Cliqz could very well collect data and use it only to make search better,
> and release paid search products for businesses and customers, but they
> chose to make their users the products.

Everything we do is privacy preserving, the business model is no exception.
We've been experimenting with paid products too (see:
[https://https://www.lumenbrowser.com/en/](https://https://www.lumenbrowser.com/en/))
- but it seems to me that if you had found that, you'd say "Yeah, but you want
me to give credentials" \- and suggest we live on donations instead. We need
to be skeptical, and we thank you for that, but also be constructive and
reasonable. Preventing anyone to improve the web, only helps companies that do
not care about privacy.

> Cliqz is esentially capitalizing on the growing privacy movement to market
> itself.

We do not do it for marketing. We started building products with privacy in
mind since 2014, back when privacy was even more niche than it is today. (from
Day 2: Is data collection evil? [0]

We understand that people whenever the see data collection they assume it's
for the worse, and whenever they see ads, double on that. In a way, we are
victims of the wrong-doings of people that has come before us. But we are
trying to build good products maintaining the privacy of the people. If you
don't trust, you can verify, we are transparent on what we send and in how we
do it. I you still don't like us, well we cannot please everybody, but please
do not accuse us of doing things that we are actively fighting against, unlike
many, we do not claim that the world is wrong, we are trying to change it.

[0] [https://0x65.dev/blog/2019-12-02/is-data-collection-
evil.htm...](https://0x65.dev/blog/2019-12-02/is-data-collection-evil.html)

