
Show HN: AI tool reads privacy policies, tells you which sites sell your info. - michaelaiello
http://www.privacyparrot.com
======
michaelaiello
I've been a member @ HN for quite a while, but I usually just tend to focus in
on security and privacy topics. One of my good friends is visiting, and we
wanted to work on something challenging together. Both of us find privacy
policies overly confusing and annoying, so we decided to tackle the problem.
We built a tool that that crawls for privacy policies and uses guided machine
learning to analyze them. We would love any feedback you have.

~~~
davidcuddeback
It got the right answer for my company's website:
[http://www.privacyparrot.com/privacy-policy-for-
identified.c...](http://www.privacyparrot.com/privacy-policy-for-
identified.com)

The site is pretty slow. I noticed that entering my site a second time didn't
produce a result any faster than the first time. You might consider adding
some caching.

~~~
michaelaiello
Glad we got it right. The site should be much faster now thanks to some tips
folks have given here.. and turning off excessive debugging info.

------
pkulak
You can go so much farther with this. How about letting me paste in any TOS
and have it analyzed for important bits or things out of the ordinary? I'd
love having my own personal robot lawyer to read over all the stuff I sign or
agree to!

~~~
gglanzani
Would you trust such a lawyer? No offense to the developer, I think he id a
terrific job, but trusting a machine to parse a contract for something fishy
seems a little too... Well, risky.

Of course it'd be great when you have a positive match.

~~~
bostonvaulter2
I view it more as a bug-finding tool. That is it can indicate the presence of
data sharing/selling (bugs) but cannot prove that there isn't any.

------
carbocation
The site is loading extremely slowly. Might I suggest you turn off, or
dramatically reduce, the KeepAliveTimeout?

Header:

    
    
      Connection:Keep-Alive
      Date:Fri, 11 Nov 2011 01:27:19 GMT
      Keep-Alive:timeout=15, max=100

~~~
michaelaiello
Thanks for the heads up and the tip, we are getting way more traffic than
expected here...

------
martey
What about inconsistent privacy policies?

For example, <http://www.privacyparrot.com/privacy> states that they never
"share any information about you" but then has an offhand mention about the
site using Google Analytics.

~~~
michaelaiello
Thanks for pointing that out, we've updated it to say "never share any
personal information about you."

We've trained the analyzer to mentions of sharing PII with 3rd parties to
point out sites that sell data.

~~~
aw3c2
I consider Google Analytics a great harm to my privacy and brushing it off
with "We give you a cookie so Google analytics works, but it's nothing
personal." gives me a dishonest and careless expression.

------
saalweachter
On a more serious note, I have two real questions:

1\. So, having crawled a boatload of privacy policies, what fraction of them
say that they'll sell your data?

2\. Are you worried that the lawyers will find your tool and tweak their
policies to beat it?

~~~
michaelaiello
So far we are looking at

14% can sell data

66% do not sell data

20% sell only when they go bankrupt or get acquired

We view people trying to beat the system as a way to improve it.

------
dylangs1030
I love this idea, but here's how I think it could be improved:

1\. Consider porting the entire service to a browser extension for Chrome or
Firefox, and making the homepage more of an information/FAQ center.

2\. Demo video. A good demo video explaining why "John Doe" should worry about
his personal information being sold would be more convincing - this is how you
get your service to less savvy internet users who aren't primarily concerned
with privacy.

3\. Find a way around inconsistencies. It would be better to report if a
website _actually_ sells/uses your personal information rather than returning
a simple search result with TOS findings. A website can tweak or flat out lie.
You should try to account for this.

4\. Are you planning to commercialize this in any way? How do you plan to fund
it, if at all?

~~~
michaelaiello
1\. Seems we got a lot of feedback and requests for an extension. Sounds like
a good idea to me too!

2\. Yes a demo video explaining why this is important would be helpful to
folks

3\. Not sure how to address this. The tool reads and classifies terms of
service. Any suggestions on how we can do what you suggest

4\. Potentially, want to explore a bit more and make sure it works well first.
We have ideas for premium and related features/services that would key off of
this well if it becomes popular and trusted.

~~~
eitland
As for 3: A simple feature to add would be to add a form for users to submit
samples (links, comments).

BTW: Cool tool with a lot of potential! See also
<http://www.javacoolsoftware.com/eulalyzer.html> (They analyze EULAs, not
Privacy Policies though. Also I wasn't too impressed when I tested but a nifty
idea and I only tested it a few years ago so it might have been improved a lot
in the meantime.)

------
jeremyarussell
Just curious, if it could highlight the offending phrases it used to figure
out the difference between selling, not selling, and bankruptcy selling. This
way when we put in our revisions we can better help it learn.

Also if you aren't planning on making this a commercially viable product,
could you release source code? Things like this make the world better and
safer, (not to mention easier and funner.) All in all though it was rather
interesting. (Still trying out websites and i see myself doing this until the
end of the day at work.)

------
jasonkester
Of the two sites of mine that I checked, one came up as "Danger! Warning!
They're going to sell your information in case of a Bankruptcy!!!"

Why?

Reading one of the submitter's comments below, it seems to lump "sold the
entire company, therefore the user database went with it" into the same
category as "we're running out of money, so let's sell everybody's email
addresses to spammers."

They're not in any way related. I'd suggest splitting out those two
categories, as I suspect it will drop that "bankruptcy email fire sale"
category down to somewhere near 0%.

~~~
michaelaiello
That is a good point. Perhaps the way we've worded it is giving an incorrect
impression.

It's tough, most policies look something like this

In the event that XXXXXXX is involved in a bankruptcy, merger, acquisition,
reorganization or sale of assets, your information may be sold or transferred
as part of that transaction

From what I gather, the example above implies that the company considers your
data an asset and it "may be sold" as part of the transaction or bankruptcy.

Help me come up with a better way to describe this situation succinctly! =)

~~~
pasbesoin
This may be OT, but I'd like to know more about organizational practices (and
corresponding contractual language) that mitigate this risk.

My vague impression is that unless one takes deliberate steps to remove such
information from the available... "asset pool", when bankruptcy strikes, all
bets are off (in the U.S., at least).

Any effective limits after that point seem more often to be PR-based (bad PR
decreasing, negating, or even outweighing the value of the information) than
due to legal stricture. Or else a matter of getting a "one-off" restriction
from a court proceeding.

This is just my impression from the news. I would welcome any clarification.

------
jrockway
Why does it automatically add www in front of what I type in the URL box? If I
type in news.ycombinator.com, it says "www.news.ycombinator.com does not
exist".

Well, yes. That's why I didn't type that.

~~~
michaelaiello
Thanks for the heads up. We've got an issue with our crawler when it comes to
subdomains. Working on it...

------
jakubw
It'd be interesting to have a browser extension based on this popping up a
warning on websites with a suspicious privacy policy.

~~~
michaelaiello
Browser plugin was our initial plan, but wanted to get something out there for
folks to test and give feedback on without having to install something.

~~~
dylangs1030
If you made a Google Chrome or Firefox extension I'd install (both of) them.
In fact, porting this service to an extension seems more logical and
streamlined if it could push notifications in real time as you entered a
website, the way Firefox can warn you if a website has a false certificate or
is a registered scam/phishing website.

------
route66
In a galaxy far away there was once conceived the idea of a machine readable
privacy policy ... checking the interwebs reveals that
<http://www.w3.org/P3P/> was updated for the last time in 2007.

After some more searching: <http://www.cdt.org/paper/looking-back-p3p-lessons-
future> points to more information about that path from the past. What would
the p3p.xml of facebook look like?

On another note: www dot freeprivacypolicy dot com [1] seems to generate the
kind op privacy policies the site featured in this post sets out to parse.
There is humor in that.

[1] don't want to feed page rank as privacyparrot says: "Your information may
be sold during a bankruptcy"

------
raheemm
I entered facebook and got two results, one saying facebook.com does not sell;
the other saying www.facebook.com does sell. See
<http://www.privacyparrot.com/search?search=facebook.com>

------
ams6110
Follow up idea, read mutual fund prospectuses and identify anything "out of
the ordinary."

~~~
michaelaiello
Cool, wonder if we can get some training data for "out of the ordinary" and
screen for companies that are big frauds and then short them (like
<http://www.muddywatersresearch.com/> does)

------
roryokane
A bug report: if someone already added “.com” to their results because of no
result, don’t offer to add it again if there are still no results.

Example:
[http://www.privacyparrot.com/search?search=http://www.noSuch...](http://www.privacyparrot.com/search?search=http://www.noSuch.com)

Also, I suggest when you suggest adding “.com”, you strip spaces from the
search as well. For instance, I searched for “Less Wrong” and found nothing,
and you suggested “Less Wrong.com”. That doesn’t exist, but “LessWrong.com”
does.

------
simonbrown
I'm not a lawyer but...

How can someone trust you to parse the policies correctly? What if someone
sues you for incorrectly interpreting a policy which they then use to make a
decision.

~~~
fleitz
A lawyer could probably advise them quite quickly as to a TOS that would limit
their liability from such an action by specifically informing users of the
site that the information should not be relied upon. I'm not a lawyer either,
but it doesn't look like it would be a big deal to get a good TOS to prevent
these issues.

~~~
michaelaiello
Yeah this is a very good suggestion. We'll get a TOS up there so we don't get
into hot water.

------
SoftwareMaven
Is it really possible to keep user data safe during a bankruptcy? It is a
tangible asset that may provide value to creditors.

I really want (for my co) the answer to be "yes".

~~~
billiamram
If the site's privacy policy explicitly states they will never sell your
information, it may be possible to sue if they do so. There is some precedent
with a toysmart.com case back in 2000, though it was settled out of court.

The main point of checking for the bankruptcy clause is to raise awareness of
the risks involved so people can make an informed decision before trying a new
site.

------
samg_
I am just learning some of these machine learning tools and am rapt, so
forgive me for asking, but would you be able to explain a little about what
you are doing?

How are you generating features? Stanford parser? Are you using logistic
regression or something more advanced?

I love the idea. I am interested in applying some of these concepts myself. Do
you have any ideas that you are not able to pursue yourself, that I might take
a crack at?

~~~
michaelaiello
Works just like spam filtering: We're using a naive Bayesian classifier with
training data. Built and tested a custom extractor that makes the most sense
for the legalese of privacy policies.

Ideas: email me at michaelaiello (at) michaelaiello.com

~~~
raphman
Might I suggest adding at least this sentence to the training set: _"We sell
your private information."_

Your Policizer happily tells me that the sentence means _"They DO NOT sell
your private information."_

------
rednaught
Would it be possibly to also identify sites that share your information?

A great example: facebook.com Does not sell your private information.

But they obviously do share information and while this is apparent to most
users, how many sites practice the same and users are not aware of it?

Also, any plans to capture change in privacy policies over time? Often times,
site owners do not proactively notify users when their policies or legalese
has changed.

~~~
michaelaiello
We do plan to note changes in the policies. Considered a feature that lets you
subscribe to the policies you are interested in and get notices if they
change, or if in the news, it is mentioned that the site loses your data. What
do you think?

~~~
johnbatch
I'd love a site that stored diffs off all legal docs for large organizations.
I was trying to find old TOS of amazon.com the other day, and couldn't find
them anywhere.

------
simcop2387
It does not appear to like subdomains. I've been trying to get it to visit
<http://news.ycombinator.com> and see what it thinks. But I keep getting back,
We were unable to connect to <http://www.news.ycombinator.com>. If it exists,
please try again later.

~~~
siddarthcs
Also, this is a problem: <http://imgur.com/K0d2z>

~~~
michaelaiello
Thanks - had an old version of the scan in from when we weren't correcting for
the www.

------
a3_nm
> See if a site sells your personal information.

Rather, see if a site tells you that it sells your personal information. It's
an important difference.

------
rblackwater
This is very cool. Maybe you could make a scriptlet bookmark that pops
something on the page you are viewing. Here's one that will redirect you to
the privacy parrot page :
javascript:location.href="[http://www.privacyparrot.com/privacy-policy-
for-+location.ho...](http://www.privacyparrot.com/privacy-policy-
for-+location.host);

------
user24
Would be useful to reproduce the part of the terms which causes the parrot to
reach the conclusion it does.

------
giulivo
I tried the policizer with some copy and pasted policies but it frequently
told me "CAN SELL" just because the text did not include any specifics
regarding selling and bankruptcy

it that the intended behaviour?

------
omouse
Where's the code?

Seriously, where's the code? Your server seems to be getting hit pretty hard.
Would be nice to be able to hack on it and to be able to host a mirror.

~~~
dotBen
Or, where's the data?

Seems to me most of your traffic is going to be people asking about the same
sites - would be interesting just to publish that information.

------
JacobIrwin
I wonder if Facebook does (<http://tinypic.com/r/wjctow/5>) hmmm... that
interesting..

------
01PH
Thank you so much for the effort. This is really something useful. Would love
to see more projects on making privacy themes more accessible.

------
dodo53
Who new the singularity would start as an arms race between AI writing
obfuscated legal documents and AI decoding them :oP

------
michaelfeathers
What are the AI tool's terms of service?

------
saalweachter
bool CompanySellsPrivateData(string privacy_policy) { return true; }

------
rwaliany
it says doubleclick.net does not sell my private information...

~~~
michaelaiello
If it's wrong, help train the parrot =)

[http://www.privacyparrot.com/feedback/correction?site=http%3...](http://www.privacyparrot.com/feedback/correction?site=http%3A%2F%2Fwww.doubleclick.net&policylink=http%3A%2F%2Fwww.google.com%2Fintl%2Fen%2Fprivacy%2Fprivacy-
policy.html)

------
lucian303
facebook.com Does not sell your private information.

www.facebook.com Can sell your private information.

~~~
michaelaiello
Thanks - had an old version of the scan in from when we weren't correcting for
the www.

