Hacker News new | comments | show | ask | jobs | submit login
Show HN: Write your Privacy Policy once and export to multiple languages (github.com)
112 points by marco1 97 days ago | hide | past | web | 43 comments | favorite



Very interesting! OP if you are the developer of this, I have a few questions:

1) Did you consult with any legal entities when creating this? Is there any risk of creating legal jeopardy for people by inferring that they could use a utility like this to create a legally-sound privacy policy without consulting a lawyer?

2) What consumers currently exist for machine output of privacy policies? I can imagine a standardized machine-readable JSON file on all sites (a la sitemap.xml) could be useful for browser utilities to forewarn you about present or absent tenets of a site's privacy policy.

3) Have you considered using something like this as a boilerplate utility for actual lawyers to improve their throughput for creating privacy policies?

4) Have you considered a similar utility for site terms of service? Is the problem different in magnitude?

5) Are privacy policies legally onerous if you do them in multiple languages? Seems like translation arbitrage would be disadvantageous.


Yes, I'm the author of this. Thanks for these interesting questions! (Happy to answer other questions as well.)

1) Yes, I consulted with a lawyer for the version in one language which became the basis for this project. Though this has just been the basis, as I said, and things have changed already, and are expected to change in the future. Some kind of regular review of the current version with legal experts may be a good idea. Apart from that, please see the disclaimer that is included for obvious reasons.

2) None. It's as simple as that, sorry! The important use case has so far been using the output to display on websites. But it's obvious that, having all the input available in a structured and machine-readable form, it would be a good idea to also offer outputs in various machine-readable forms and formats. The vision was exactly that something like a "privacy-policy.json" besides your "sitemap.xml" or "robots.txt" could be helpful and an interesting first step towards a larger "ecosystem", which must necessarily exist.

3) No. I just didn't have the time for that, yet. I also don't know if that's really something that lawyers could use or would want to use. But this is definitely an interesting idea that might be worth exploring.

4) The “Terms of Service“ are a completely different beast, I would say. There's probably not much that could be re-used, and the challenges have only minor overlap, although the concept is quite similar in general.

5) You would probably declare one policy to be your canonical version that really counts, and any translations would be for information purposes and assistance only, without legal validity. Turning to the implementation in this project, translations could probably differ somewhat from each other and wouldn't need to be exact 1:1 translations, word-by-word. Languages such as "en" could also be split up into "en-US", "en-GB", etc. -- not just for the purpose of handling small language differences better, but also for the purpose of adjusting this to the jurisdiction.


On 4) I can pretty much guarantee that this [1] was from a standard template for a generic website+app gathering user info and allowing social integration. Same for the privacy policy.

[1] https://web.archive.org/web/20160603092300/https://www.thene...


Thanks! What you wanted to say is this proves that terms of service, just as a privacy policy, can be generated from templates and be used in practice, right?


Yes, exactly.


Well there is a nice disclaimer:

Disclaimer

This project does not constitute legal advice and is not to be relied upon or acted on as such. Any material presented here is for general information purposes only and may be out of date, incomplete or not suitable for your jurisdiction. You should seek independent legal advice from a qualified professional to guide your decisions around a valid and complete privacy policy.


Right, that's the disclaimer that has been put up there, both for the safety of the authors and for the safety of any potential users.

I hope that disclaimer is not deterrent. It's simply the truth: For legal purposes, you should never rely on any online resources without talking through this with a lawyer of your choice, if you want to be on the safe side. No matter if those resources are a manual guide or a computer tool.


A PHP project? Not React.js with a Redis and Docker API mess on top?

How dare you call yourself a hacker! /s


Yes, I should probably have implemented it in Haskell, at least. But seriously, the language to choose was definitely a question.

I was considering JavaScript instead, to make this more accessible (to virtually every web developer). But I found server-side to be more effective than client-side. You could argue that the JavaScript module could have been used in Node.js, but then you're back at Node.js vs PHP, where PHP can be regarded at least as accessible.

Anyway, since this makes use of the most primitive and basic language features only, a later rewrite would be trivial.


Legal questions aside, this should have been an online form that creates a zip file containing the generated privacy policies in various markup languages and a configuration file. I don't know about you, but the barrier to entry here is too high. You have to run a PHP instance yourself, at least momentarily.


You're right, for demo purposes, an additional form that just sends all its output to the script would probably be helpful. Will definitely consider this going forward. Thanks!


Nice work. This is what I hoped would eventually come out of P3P (https://en.wikipedia.org/wiki/P3P) alas it wasn't to be.


It's interesting to see how Facebook and Google currently handle P3P

https://fb.me/p3p

https://www.google.com/support/accounts/bin/answer.py?hl=en&...


Oh, interesting, didn't know that.

> Some browsers require third party cookies to use the P3P protocol to state their privacy practices.

Which browsers should this be today?


Just old Internet Explorers. Given Google's scale it's worth the work but not something the rest of us need to bother ourselves with.


Yes, P3P has been an approach with similar concepts and goals, though with much more vision but being less practical or pragmatic.

This project here does not fall into the trap of believing it could change how privacy works on the internet, at least not in the short term. The target audience is rather the developer community, not big corporations and consumers.


This is one of those things where you wonder why no one's thought of it before, or why this isn't just standard practice.

Maybe time to blow the dust off my PHP binary... Great work!


Thank you!

This could have been implemented in any language, of course, but server-side seemed to make a little bit more sense than client-side and PHP is still more or less ubiquitous.

I think there have indeed been small attempts to do this before, as with W3C's P3P [1]. But they always had the (end) user in mind, and tried to build advantages for the user. As we all know, users don't desperately request something like that, and so the big companies who could have pushed this forward had no incentive to do so.

This project, on the other hand, regards benefits for the user only as its long-term "vision", and the important short-term goals are benefits for developers, especially small teams.

By the way, this library requiring PHP should not be much of a hurdle. It uses the most basic and primitive language features only and requires virtually no tooling. A minimal example could be constructed by writing one single file yourself, including all the individual files of the library and then having the respective method calls.

Regarding the general concept, there's a lot of room for improvement, of course, e.g. more translations, support for additional clauses and topics, fixes and enhancements based on input from legal experts, etc.

[1] https://www.w3.org/P3P/


I would personally prefer to do this client side and then just embed the generated policy. Especially since I use several different languages server-side, often not PHP. Of course I can always run the PHP client-side as well :) Anyway, thanks OP!


Thanks!

Well, the preferences on this subject will probably vary. For me, server-side has been more helpful in practice.

And you really don't have to embed PHP into every project or site that you do. You could build a small internal service that generates the privacy policies in HTML for each project and then include those policies in the individual projects without any additional requirements on the server side.


So the translation is done with a gigantic switch statement. Um... Please consider using gettext instead.


Of course we know gettext. But gettext is less portable and less user-friendly, especially for users who use PHP only casually. That's also the reason why some of the largest PHP frameworks don't primarily use gettext for translations.

The solution here is pragmatic and works. You can measure performance, if you think this is really an issue.


gettext is the most portable format I know of among translation systems.


Yes, I agree with you. And plain old arrays or switch statements are the simplest format that I know, understood by developers of (almost) any background.


It doesn't seem smart to use machine translation for anything legal related. I wonder if this kind of tool can get one in trouble. For example, a mistranslation causes the privacy policy to mean something else than intended.


This doesn't use machine translation. Did you look at the link?

It specifies a privacy policy with options using JSON, and then there is a set of (human-written) strings for each language to turn those options into text.


It's not machine learning, but the text in other languages also isn't written by a lawyer according to the authors responses in this thread, so the risk of mistranslation and changed legal meanings is still there.


The risk is substantial. Drafting standards are not equivalent across languages, and the translators are not legal translators.

Legal vocabulary contains words that are terms of art which are then bastardized and changed in usage when used in lay discussion, leading to differential meanings.

Let's not even raise the issue of the language having dramatically different meanings and legal effects across jurisdictional lines even if the wording of the text is unchanged. Or the fact that best practices in drafting specific clauses might be changed in a few weeks following the release of new jurisprudence.


Isn't that why many places declare only one of the variants to be legally binding and the others only for convenience?


> many places declare only one of the variants to be legally binding and the others only for convenience

Variants of what?


Translations. Say, the English one is the official and legally binding one, and the Chinese and Spanish translations are only for added convenience, without any legal effect.


Ah, got it. You typically have to say that really explicitly in all the languages. Also, it depends. Generally speaking, best practice is to have one and only copy and then a phone call where it's explained however. Better practice is not to take legal advice from my Internet comments, because I am not a lawyer :D.


Totally aware of that. Reviewing individual translations again, and repeating this from time to time, is definitely something that can and should be done in the future.

So there's obviously an easy fix to the problems you outline, which are no fundamental problems.


It's a whole different matter, though. Using automatic machine translations is still not what is done here, and you can't compare human translations with that.

Regarding legal expertise, I or anybody else could go through this with a lawyer next week and have a review, including a "pull request" of required changes, no more than a week later.

For the translation, of course the process was not copying and pasting words into Google Translate one by one and using the result. The translation process has been much more thorough.


Exactly. Thanks for the explanation!


As SamBam noted, this does not use machine translation but relies exclusively on the good old approach of hand-crafted translations done by humans.

As you said, one should probably not use machine translation for any legal-related stuff, at least not in today's state of machine translation.


Have you ever checked out iubenda (http://www.iubenda.com)? That's pretty much what it does, with 600 modular clauses in 8 languages.


Not really, actually. I took a quick glance at their homepage during research. But seeing that they're commercial, not a free project, and requiring sign ups and upgrades for most things, I wasn't really interested in exploring this further.

By the way, I thought that they would mostly do terms for social integrations, such as Facebook like buttons, widgets, etc. This is the thing that got the least focus with the project that I did. So I didn't realize they have 600 modules (whatever they are and do, exactly), as you say.


It was cheap enough and comprehensive enough that I’ve used it in the past without even thinking twice. Far cheaper than a lawyer, and if it’s <$100/yr that counts effectively as free for any real business.

I want it to be something that I pay for – I’d expect quality, updates as laws change, support if anything goes wrong, ideally some kind of risk sharing, etc.


Can understand that reasoning, and often do the same. Though it's important to keep in mind that quality is not something that can only be achieved by paying any random party some amount of money.

Developing and drafting things in the open can produce the same level or even higher levels of quality.

Since the project is not a commercial product, though, I don't care what solution you use :)


Ye there are all sorts of modules. There is also a free version, while the paid one is $27/yr.


The price was not really what prevented me from using it. I just had different use cases and an open model in mind.


What was the use case exactly? The readme doesn't go into much detail about this aspect




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: