
Show HN: I made a neural net that analyzes privacy policies - rameerez
https://useguard.com
======
rameerez
Hi guys!

So as basis for my thesis on AI and NLP I've been working on a RRN-based text
classifier that basically reads and analyzes privacy policies. It understands
that "we don't share your data with third parties" is privacy friendly while
"we may share your data with anyone" is a potential threat.

I've then created this website with a bunch of analyzed services to showcase
the most relevant info about each service along with other interesting stuff
like recent data breaches or instructions to delete your account in said
service.

Happy to answer Qs about the tech behind, it'd also be great to hear your
feedback on what the site lacks and possible improvements!

~~~
thecleaner
Oh my god thank you so much for doing this !

I think its better (for me as a user) if you don't boil things down to a score
as different people expect different things when talking privacy. It would
help if you could simply highlight the potential problematic clauses in
different privacy statements along with some reason why it might be
problematic.

~~~
nickodell
I don't agree. I just looked in my password manager, and I have roughly ~220
accounts across the web. If I want to go through that list and see which
website rank well and which rank poorly, and I want to do that in under two
hours, that gives about 30 seconds per service.

In other words, giving a single score plus a two-sentence highlight is
probably about the right amount of information.

~~~
m-p-3
Or make the rank adjustable to some personal criteria that matches different
privacy expectations.

~~~
drusepth
This would also be helpful in determining how to weight (or not) user feedback
in the training portion. I just tried it out (the 10 questions) and there were
at least a few I thought, "huh, I know some others would disagree with me on
this" because I value X and they value Y more.

Having scores that weight X more than Y would give me more accurate scores,
while seemingly also giving other people more accurate scores at the same
time.

------
eleen
Hi, I like what you did. I am a trained German lawyer working with NLP at a
Computer science faculty. I would like to talk about this with you, as I am
very interested in this topic. Also, I ve been working for the German Data
protection Agency...

------
foxes
The "game" (training) that asks you to analyse "privacy threats" is a bit
strange. It feels like it takes two random excerpts from a privacy policy and
asks you to compare them, but with this, it feels like it is missing some
global information, you are just looking at local details.

Eg one policy might be disclosing what they do (but its actually relevant to
collect data, eg password manager) while the other just says "no we don't
collect anything". In this case one feels like its a better option, but its
not exactly the same situation, its missing some context. I feel like this
could potentially bias ratings.

I'm not sure if you could add in extra information with some of that global
information, eg the type of service, classifying different "parts" of the
privacy policy etc.

~~~
raynr
Yeah, it's not comparing like for like. Feels like the system is trying to
collect training data from users.

To elaborate on your last sentence, context is critical in assessing whether a
clause is pro- or anti- privacy. Is the collection of information critical to
the provision of the service? What is collected, and how much? And so on.

~~~
mceachen
Agreed with parent and gp. I gave up with the 10 questions, as the sentence
comparisons were almost comically incomparable. I fear your model is going to
be a random number generator.

------
1f60c
I’m trying to help teach the AI, but some options don’t have to do with
privacy at all.

For instance, I got (option A):

    
    
      The Games Press Web site can, optionally, store a Cookie on your computer in order to automatically log you into the site on each visit.
    

and option B:

    
    
      Back to Top ^ NO THIRD-PARTY BENEFICIARIES There shall be no third-party beneficiaries to this Agreement.
    

It would be great to have a skip/flag option for cases like these.

Edit to add:

Some other notes:

    
    
      - Mozilla, a tech company I consider ethical, is right down there with Netflix, LinkedIn and Waze
      - The box under “Sentence Breakdown by Risk Level” is empty when my ad blocker is enabled (Adguard on iOS Safari)
      - Telegram, a company I also consider ethical, has a score of 105%—is this an oversight?

~~~
quickthrower2
And sometimes you get two privacy friendly policies and I'd like to say
"equal".

After a few of those I picked one at random and then it puts me onto some kind
of 'Game' which made me feel they were trying to train me instead of vice-
versa. The game didn't respond to clicks so I closed the app.

------
the_pwner224
I went to the homepage and ctrl-clicked a company card (Mozilla) to open the
details in a new tab. Instead, the site jacked my ctrl-click and instead tried
to navigate to the link in the same tab. Middle-clicking does not work at all.

It then went to an error page instead of loading the details for Mozilla, but
while it's an interesting idea I'm not sure how useful it is. I don't usually
create an account on a website unless I have to do so, and the privacy policy
is nonnegotiable. So why would I want to check what their privacy policy is?

I only give personal data to websites when I have to (e.g. to services that
work or school use) or if I already trust the company to not do anything shady
with it (Mozilla has done some sketchy stuff but I believe they won't leak my
passwords).

And for websites where participation is more optional, like HN or Reddit, you
don't usually need to give much personal data anyway.

Edit: the website is fully working now. Mozilla has had one security breach
where emails and hashed passwords were leaked, in 2014. At the bottom the
sentence breakdown is 2.5/12/22% concerning/mild/friendly. Meanwhile Reddit
has no breaches, but keeps your messages forever and shares data with ad
companies. Their sentence breakdown at the bottom is 3.3/23/9%. Overall, the
AI rates Mozilla at 33% and Reddit at 41%. That doesn't really make sense to
me.

I would really like to see more details about the privacy policy sentences on
the website. If 2.5% of Mozilla's privacy policy is very concerning and 12% is
mildly bad, I would like to see the actual sentences to know the risks. There
is a button to view the full annotated policy, but clicking it says to send an
email to you. Edit: this seems like a bug, it shows a few sentences in a
WebKit-based browser [Falkon] but in Firefox it just shows the chain link
icons.

Finally, I took the A/B test from Guard, and quite of a few A/B choices seemed
to not really have anything to do with privacy. If the dataset is kept the
same, then I think a different test format would be to rate each A and B
snippet as:

\- Not about privacy

\- Good for privacy # only if the other one is not about privacy

\- Bad for privacy # only if the other one is not about privacy

\- Better than [A/B] # only if neither is not about privacy

Anyway, the data itself may be somewhat useful to me if I want to learn more
about a company's privacy practices. But for normal people, I think it would
be helpful for the website to also explain why privacy is important and why
people should care.

~~~
simongr3dal
More and more websites are so advanced that they can’t even use an <a> tag
anymore. Instead they do some convoluted onclick-scripting that breaks all
standard behavior and accessibility functionality.

~~~
gempir
I think this is mostly a problem of the tools used. An anchor tag by
definition is an inline element, so it shouldn't really be a giant box that's
clickable, so you default back to an onClick.

onClick in popular frameworks just means left click and nothing more, which
makes sense, except for that use case of opening in a new tab with middle
mouse button or right clicking etc. So you have to add a lot of logic to
support all that.

If you break the html spec and make an anchor tag a block element then you
have to deal with catching and stopping the event from it otherwise it would
work as a normal link but you actually just want to change state in your JS
app.

So I think tools like Angular, React, Vue etc. should get a better way to
create links on website that just change state .

~~~
mcbits
Styling an anchor as a block element has never violated the spec, and HTML5
deliberately added support for wrapping anchors around block elements because
people had been doing that anyway, even though the browser wasn't required to
make it work as intended (but it now is).

------
epoch_100
Hi. A related project that takes a more human-powered approach is PrivacySpy
([https://privacyspy.org](https://privacyspy.org)). Would be neat to see how
these tools intersect.

PrivacySpy is open source, community run, and more about grading policies on a
standardized rubric (as opposed to entrusting that to ML), so these tools
might complement one another.

(Full disclosure: I'm a contributor to PrivacySpy.)

~~~
jchook
Another one: [https://tosdr.org/](https://tosdr.org/)

~~~
epoch_100
ToS;DR is great, although it's more focused on terms of service so if you're
looking for privacy-only info, you'll have to cut through a bit of noise.

------
greggman2
Analyse LastPass. Even without ML it's clearly saying they spy on everything
you do and share it with anyone they want. I'm surprised they get recommended
so often given their privacy policy.

[https://www.logmeininc.com/legal/privacy](https://www.logmeininc.com/legal/privacy)

------
carbocation
Hmm, I was trying to go through your A/B options but it didn't seem to
register any click. So I started clicking repeatedly. Then, it processed those
clicks on the first 7 items, giving you bad data. FYI.

~~~
rameerez
This just hit the frontpage so the server might be a bit overloaded, I'm
sorry. Trying to resize resources right now. Thanks for the heads up, in the
long term (ideally) noise shouldn't be a huge problem in a sufficiently large
dataset (or at least I'm already expecting some noise haha)

------
tboyd47
Hi there. This is really neat. It reminds me of a talk I listened to recently
on digital privacy, where the guy was using the price of "privacy products" as
a way to measure how much people value their privacy. This seems like it would
be one of those.

Did you find that there's one single variable, like length or the presence of
certain words, that the system relies on heavily?

------
the_watcher
This is interesting, but I wonder what a boilerplate privacy policy would
score. Given the clustering of scores between 30 & 50% for what read like "our
lawyers pulled the standard policy and billed us for 4 hours of minor tweaks",
it seems like some of the most effective privacy advocacy would come by
challenging the most common "dangerous sentences" in court.

------
BrS96bVxXBLzf5B
This is a good tool, but the execution of the website is disappointing.

1\. Only showing excerpts of the highest threat levels. Trying to view the
less severe threats asks us to email in. If you're willing to volunteer the
information, why the hoops?

2\. "Play a short game to continue using this tool" ensures I'm not going to
share this with anyone. Putting a stranglehold on users is _never_ the way
forward. I might have volunteered my time if I were at home and browsing
through. I can't when I'm quickly flicking through during taking a five minute
break from looking at work. But it's left me with a final negative impression
before being unceremoniously blocked off.

------
lab00002
Holy Crap , Tinder shares your profile with potential employers ( or rather
companies contracted by those employers).

So glad I gave up on online dating a long time ago.

But really this is amazing.

The next step would be be to have a lawyer write a small opinion piece on the
most popular sites.

------
murukesh_s
Cool tech and good use of NLP. but isn't the privacy policy system entirely
broken? It's like driving on unmarked, unpaved roads, why don't we have a
global template that's list in the beginning a checklist that is human
understandable/comprehensible quickly like \- Do we share your data : (Y/n)

Etc

Without something like that the problem is the companies could change the
wordings and the neural net could not detect until trained again which is
potentially more dangerous!

It's time we get some standard for user web privacy policy docs like gdpr

------
martin__
Hello, if the webmaster for this site is reading this, your `change.org` file
is getting a Content-Type of `application/octet-stream` instead of
`text/html`, which is giving me (in firefox) a prompt to download a file
instead of displaying the page.

The problem is probably with your type mappings:
[https://nginx.org/en/docs/http/ngx_http_core_module.html#typ...](https://nginx.org/en/docs/http/ngx_http_core_module.html#types)

------
olafure
Tinder and Mozilla have the same score of 33%. I don't agree with that. Tinder
willingly shares very, very personal data along with your contacts (assuming
without said contact consent).

~~~
scarlac
Keep in mind that it's analyzing their privacy policy, not their actions. Who
a company specifically choses to share the data with and how often they do it
is likely not considered.

------
michaelaiello
Hello!

[http://www.privacyparrot.com/](http://www.privacyparrot.com/)

Made a similar project many moons ago and is still kicking along. Thanks all
on HN for the feedback.

~~~
ignoramous
I used it as recently as a month ago! Thanks.

------
R888D0
You're doing incredibly noble work. Ive bookmarked the site and will make
great use of it. Thank you for working hard to protect data privacy, I hope
you never stop.

~~~
rameerez
Thank you for these words :)

------
19ylram49
To the creator:

I actually just went through the trouble of resetting my Product Hunt account
(I haven’t been on there in a long time) just to give you that upvote!

Thank you for this! Cheers.

------
nerdponx
This is great, I've been wanting to do a project like this for a while.

Would be great to get some insight into your data collection/labeling and
model design process.

~~~
rameerez
Some of the process on gathering the data to create the labelling dataset is
described here:
[https://useguard.com/experiment](https://useguard.com/experiment)

But most probably I'll be publishing a paper later this year detailing all the
details and process :)

~~~
underanalyzer
Very cool idea! One question I couldn’t seem to find the answer to on your
site was are the policies featured on your a/b training exercise distinct from
the polices that the ai grades? For example will a user going through your a/b
trainer ever see a snippet from the Instagram privacy policy?

~~~
rameerez
Yes, initially it will all draw from the same dataset, so a user in theory
could definitely see all services' snippets. But, to increase statistical
significance in the data it gathers, I've restricted the initial amount of
items in the test so _right now_ this will not be the case (otherwise, I'd be
dealing with circa 3,500,000,000 different pairwise comparisons hehe)

------
andrerm
Great work. Thank you. If I may, don't show bad scored results in the first
page. Bad advertising is still advertising, and we all assume they are all bad
anyway. I understand the surprise people have a first but the next step is
finding good apps. Alsi sort by grade and search by name is a must.

Again, thank you for your time and work.

Edit: telegram had hacking news this year, search for "Telegram voicemail
account hijacking"

------
brenden2
The need for things like this is partly why I quit my job 6 months ago to
start my own company (in profile). We need to start building companies and
products that provide valuable communities and services (like social networks)
without the need for ads/privacy violations. My belief is that the issue is
largely related to incentive misalignment (users != customers).

Congrats on the launch!

------
VvR-Ox
Though it's nice you built this I'd wish we wouldn't need it because of
governments who do their job and protect the people they were originally
intended to serve.

How long can we keep up arming us in the battle against powerful and rich
entities who steal our data and buy politicians to have direct access to the
process of law-making?

------
a2x
Letting people doing the training without any knowledge and context seems to
makes no sense.

How do you ensure, that it isn’t a bot, or an army of bots?

What is the neuronal network doing?

How do you unsure your personal integrity, those of your team members and the
overall integrity of your ‘system’?

------
nuc
Where is Facebook? I guess that would be one of the most interesting policies
to analyze.

------
Angostura
I'm slightly puzzled that Telegram is scoring more than 100%. What am I
missing?

~~~
rameerez
A bug in the scoring algorithm that I need to fix

------
dastx
Great website. Quick note - your site doesn't functions without javascript.
Having enabled it and trawled through it I see no reason to require js to be
enabled. Adding a non-js fallback would be great.

------
29athrowaway
A great idea. Some ideas for features:

\- Keep a timeline of privacy policy changes, being able to compare scores
between 2 versions.

\- Subscribe to notifications for changes in score.

\- Browser extension that shows you the score on the site.

------
A4ET8a8uTh0
I will join the chorus of great work. I love it. It may even make some people
more privacy conscious ( very few people read those -- usually the ones who
wrote it ).

~~~
rameerez
Thank you! :) I've read some recent research and looks like this is actually
measured: only 0.001% of all internet users start reading them (and even a
smaller amount of people likely finish reading them). On top of it, if you had
to read all the privacy policies you accepted only on the past 5 years alone,
you would have to use 3.040 hours of non-stop reading. Crazy. Love your
privacy-oriented username btw! ;)

------
aledalgrande
Love the "biggest threat" for Telegram!

------
jbduler
I love the work, and it is directly applicable to the work I am doing now.
Have you published your thesis?

------
lingrino
I own the domain policies.dev and would be happy to hand it over to this
project if you’re interested.

------
TheUSSR
Great site! Would be nice to have the option to submit scandals as some seem
to be missing.

------
foxhop
what if you took this idea and used it to normalize privacy policies into a
normalized form?!

------
paktek123
I get ERR_SSL_PROTOCOL_ERROR

~~~
rameerez
Some other user reported that same thing this morning but I couldn't find any
explanation to this err. It basically works for everyone except for these two
precise cases. One idea I have is that you might be behind some sort of
firewall that's blocking my website (because in the past either the IP or the
domain got flagged by some antivirus company and now some business networks
block it) – might this be the case?

~~~
jazzyjackson
I'm not an expert but I wonder if this error comes up in the case of the HTTPS
handshake not being able to agree on a protocol -- one side of the transaction
is trying to insist on a crypto protocol that's out of date or a little too
fashion-forward?

Just a thought.

------
artificial
This is really cool, what prompted the idea to combine the two?

~~~
rameerez
Thanks! Last semester I did a bootcamp on Artificial Intelligence and I had to
do a final project. Last year I started becoming concerned about digital
privacy when I discovered Facebook had an updated copy of your phone contacts
including nicknames [1] (which basically means strangers at Facebook know the
names I call my GF). I later found out this was explicitly said in FB's
privacy policy. So when I did the bootcamp and discovered how powerful RNNs
are to model the complexity of the English language I came up with the idea.

[1]
[https://news.ycombinator.com/item?id=16661735](https://news.ycombinator.com/item?id=16661735)

------
the_watcher
Telegram has a 105% score? Is that expected or a bug?

~~~
rameerez
It should be pretty close to 100%, though, they seem to be super privacy
friendly!

~~~
the_watcher
Yep, I read their policy. I just wasn't sure if extra credit existed.

------
quickthrower2
This’d be great for contracts too. E.g. an NDA.

------
nurettin
offtopic: Is it worth using producthunt for games? Or do we concentrate our
efforts on platforms like steam, appstore and play store?

~~~
lucb1e
Why are you posting that to this thread? I know you said off topic, but
usually when someone says that, it's still related: e.g. if the parent comment
mentioned something tangential, or I could imagine if the link was to
producthunt and you wondered about games on that platform... are you hijacking
this thread for a completely unrelated question that you want to ask the HN
audience, or is there a connection I'm missing?

~~~
nurettin
They are on producthunt.

~~~
lucb1e
Ah I missed that top banner, must have automatically scrolled past it. Fair
enough; thanks for responding.

------
privasim
This is amazing! How would you categorize this?

------
privasim
This is amazing, how would you categorize this?

------
verbify
Great work! Would also work as a browser plugin.

~~~
rameerez
This was actually one of the ideas to evolve the project! Another one is
making an app that protects your digital privacy from these threats, kinda
like an antivirus but for privacy threats instead of viruses
([https://useguard.com/blog/future/](https://useguard.com/blog/future/)) Would
love to hear feedback on what should this project become next :)

------
godelmachine
How are you better than Firefox Monitor?

------
helloiloveyou
Very creative project! Congratulations!!

~~~
rameerez
Thank you! :)

------
ignoramous
This is great but like tosdr.org before it, what does it tell people they
already don't _assume_ to be true? I use tosdr only to often keep ignoring
what it's telling me.

Also, I'm not sure if a majority care abt privacy when the value delivered is
super high. They submit to the will of the service provider, as if it was the
cost of doing _business_ without realising they could either look for
alternative or exercise stricter control over what they share and how [0].

To that end, I like tools that let users take action in addition to showing
what's wrong rather than simply point it out. Actions can include:

\- Replace: Push the users towards alternatives and help them seamlessly take
their data elsewhere.

\-- Help change their usage behaviour. Most digital-wellbeing / internet de-
centralization tech fall under this category?

\-- Translate / Pipe data exported from one service provider and import it
into another. For instance, it is tiresome to move away from wordpress to
ghost.org; or from WhatsApp to Signal. Emails work great.

\- Reduce: Hand-hold them as they grasp various privacy and security settings
on offer and exercise them, as appropriate.

\-- JumboPrivacy does this for popular social networks.

\-- PrivacySettings Firefox plugin _for Firefox_ is another example.

\-- Plenty write blog posts to help others navigate arduous settings across
popular web properties, and expect nothing in return.

\- Restrict: Provide tools that let them control what the services can and
cannot collect.

\-- Application sandboxes like firejail / sandboxie, firewalls like Snitch /
LuLu, DNS based content blockers like pi-hole, in-browser content blockers
like uBlockOrigin are some examples.

[0] One of the first questions folks asked after an exodus-privacy (which is
super nice and something I use every other month) presentation at fosdem was,
'What can I do now that you've exposed what apps on Android do with the
permissions granted to them and the SDKs they embed?' exodus-privacy, as great
as it is, doesn't let you take action but presents a nice overview of the
dangers to your privacy due to the app you've installed. Instead, you might
end up having to independently discover and install Blokada or AdAway or Pi-
Hole or NetGuard or XPrivacyLua or microG or GrapheneOS or...

------
acgan
Very cool! Reminds me of [https://tldrlegal.com/](https://tldrlegal.com/) and
[https://fossa.com/](https://fossa.com/).

