The site is pretty slow. I noticed that entering my site a second time didn't produce a result any faster than the first time. You might consider adding some caching.
1. If it sells your information to third parties,
2. if it shares your information with ad campaigns without directly selling it
Would you make any other criteria that users might want to be aware of?
Of course it'd be great when you have a positive match.
Date:Fri, 11 Nov 2011 01:27:19 GMT
For example, http://www.privacyparrot.com/privacy states that they never "share any information about you" but then has an offhand mention about the site using Google Analytics.
We've trained the analyzer to mentions of sharing PII with 3rd parties to point out sites that sell data.
1. So, having crawled a boatload of privacy policies, what fraction of them say that they'll sell your data?
2. Are you worried that the lawyers will find your tool and tweak their policies to beat it?
14% can sell data
66% do not sell data
20% sell only when they go bankrupt or get acquired
We view people trying to beat the system as a way to improve it.
1. Consider porting the entire service to a browser extension for Chrome or Firefox, and making the homepage more of an information/FAQ center.
2. Demo video. A good demo video explaining why "John Doe" should worry about his personal information being sold would be more convincing - this is how you get your service to less savvy internet users who aren't primarily concerned with privacy.
3. Find a way around inconsistencies. It would be better to report if a website actually sells/uses your personal information rather than returning a simple search result with TOS findings. A website can tweak or flat out lie. You should try to account for this.
4. Are you planning to commercialize this in any way? How do you plan to fund it, if at all?
2. Yes a demo video explaining why this is important would be helpful to folks
3. Not sure how to address this. The tool reads and classifies terms of service. Any suggestions on how we can do what you suggest
4. Potentially, want to explore a bit more and make sure it works well first. We have ideas for premium and related features/services that would key off of this well if it becomes popular and trusted.
BTW: Cool tool with a lot of potential! See also http://www.javacoolsoftware.com/eulalyzer.html (They analyze EULAs, not Privacy Policies though. Also I wasn't too impressed when I tested but a nifty idea and I only tested it a few years ago so it might have been improved a lot in the meantime.)
Also if you aren't planning on making this a commercially viable product, could you release source code? Things like this make the world better and safer, (not to mention easier and funner.) All in all though it was rather interesting. (Still trying out websites and i see myself doing this until the end of the day at work.)
Reading one of the submitter's comments below, it seems to lump "sold the entire company, therefore the user database went with it" into the same category as "we're running out of money, so let's sell everybody's email addresses to spammers."
They're not in any way related. I'd suggest splitting out those two categories, as I suspect it will drop that "bankruptcy email fire sale" category down to somewhere near 0%.
It's tough, most policies look something like this
In the event that XXXXXXX is involved in a bankruptcy, merger, acquisition, reorganization or sale of assets, your information may be sold or transferred as part of that transaction
From what I gather, the example above implies that the company considers your data an asset and it "may be sold" as part of the transaction or bankruptcy.
Help me come up with a better way to describe this situation succinctly! =)
My vague impression is that unless one takes deliberate steps to remove such information from the available... "asset pool", when bankruptcy strikes, all bets are off (in the U.S., at least).
Any effective limits after that point seem more often to be PR-based (bad PR decreasing, negating, or even outweighing the value of the information) than due to legal stricture. Or else a matter of getting a "one-off" restriction from a court proceeding.
This is just my impression from the news. I would welcome any clarification.
Err on the side of caution when you've got somebody else's reputation in your hands.
Well, yes. That's why I didn't type that.
After some more searching: http://www.cdt.org/paper/looking-back-p3p-lessons-future points to more information about that path from the past. What would the p3p.xml of facebook look like?
On another note: www dot freeprivacypolicy dot com  seems to generate the kind op privacy policies the site featured in this post sets out to parse. There is humor in that.
 don't want to feed page rank as privacyparrot says: "Your information may be sold during a bankruptcy"
Also, I suggest when you suggest adding “.com”, you strip spaces from the search as well. For instance, I searched for “Less Wrong” and found nothing, and you suggested “Less Wrong.com”. That doesn’t exist, but “LessWrong.com” does.
How can someone trust you to parse the policies correctly? What if someone sues you for incorrectly interpreting a policy which they then use to make a decision.
I really want (for my co) the answer to be "yes".
The main point of checking for the bankruptcy clause is to raise awareness of the risks involved so people can make an informed decision before trying a new site.
How are you generating features? Stanford parser? Are you using logistic regression or something more advanced?
I love the idea. I am interested in applying some of these concepts myself. Do you have any ideas that you are not able to pursue yourself, that I might take a crack at?
Ideas: email me at michaelaiello (at) michaelaiello.com
Your Policizer happily tells me that the sentence means "They DO NOT sell your private information."
A great example:
Does not sell your private information.
But they obviously do share information and while this is apparent to most users, how many sites practice the same and users are not aware of it?
Also, any plans to capture change in privacy policies over time? Often times, site owners do not proactively notify users when their policies or legalese has changed.
Rather, see if a site tells you that it sells your personal information. It's an important difference.
it that the intended behaviour?
Seriously, where's the code? Your server seems to be getting hit pretty hard. Would be nice to be able to hack on it and to be able to host a mirror.
Seems to me most of your traffic is going to be people asking about the same sites - would be interesting just to publish that information.
Can sell your private information.