Hacker News new | comments | ask | show | jobs | submit login
Show HN: Using AI to Summarize Terms and Conditions
91 points by andrewnc 10 months ago | hide | past | web | favorite | 41 comments
88% of people never read the terms and conditions of websites or services they use. However, most people want to know what they are agreeing to in those terms. That is why we created Legal Leaf. We strongly believe that everyone should have easy access to those agreements, in language they can understand.

Legal Leaf works behind the scenes, in your browser, to read and summarize these terms using powerful AI. We're constantly working to improve the accuracy of these summaries. The results are displayed in the top right corner without affecting web speeds.

Legal Leaf is a beta product still going through development, but it's improving rapidly and we would love a group of willing beta testers.


One concern that comes to mind if my company uses this: A user reads the auto generated summary but not the actual ToS, then does a thing that violates the ToS. The summary didn't say the thing was against the rules, so the user consciously made their choice on that fact.

Not being a lawyer, this raises some questions. Some companies go the extra mile to make their ToS quite short and readable for their users, but that text is still reviewed by a lawyer (presumably). But if the summary is auto generated, that review isn't necessarily in place unless leaf.legal is just a summary tool subject to lawyer review before approval and publication to my website.

Also, how do we keep track of which version of the automated summary that is seen by which user? This seems like it could have legal ramifications. For example, the user who violates the ToS because something isn't in the summary. Tomorrow legal.leaf updates its algorithm and regenerates all the summaries for its clients and now the missing ToS article is in the summary.

I imagine there are some pretty solid answers on how to handle these situations. Would love to hear how you are approaching them.

This is an awesome question, and one that was raised by the lawyer on our team. Since summarization is not a lossless process, there will be some information that the user doesn't see.

For now, we have a blanket disclaimer that our product is merely for summary and should not represent the will of the company whose terms you are reading.

However, you're absolutely right, we want to make sure that the companies are also well represented. We are building out an avenue for companies to "contest" the summary on their page to be more accurate, or write a personalized summary.

The question can be looked at for any one, what happens if they don't read the ToS at all, and then do something that violates those? We think Legal Leaf is a step in the right direction towards education, but there is still lots of work left to do.

I can see the next progression being something like:

You're about to upload a photo. The T&C of this website gives them irrevocable legal rights to use this photo in [social media settings, advertising campaigns]. Click No to stop the upload, Yes to continue, or Always to stop warning for this website.

Oh that's awesome! I hadn't thought of that. This would really allow people to change behavior based on the information they get. That's great. Thank you!

This looks really interesting! Two points:

1) A quick side-by-side of a sample Terms/Conditions versus Leaf's summarized version would be helpful. It would help me understand the product more before I install it.

2) What ML/NLP tools did you use for this? It looks like Sumy for Python summarization, along with a specific list of clauses (will, agree, must, etc). When you get a chance, I'd be curious to know more about the technical process.

Also, I noticed that you are stemming words - you may also be interested in lemmatization, which is a slightly more complicated way of converting words into their base forms (like running -> run or ran -> run). Lemmatization also takes into account part of speech context. Given that legal documents are fairly grammatical (I'm assuming?), lemmatization should work well here. I've been fairly happy with Spacy's lemmatization results (https://spacy.io/)

Great feedback! 1) That is an awesome idea, I hadn't thought of that. We'll put that together. 2) Right now its sumy/regex/bs4 for our tech. As you can see, it's nothing complex, but we're hoping to had some real ML to warrant our use of buzzwords. The hardest technical challenge was actually working within the Chrome Extension framework, the actual summarization (currently) is fairly straightforward. 3) Spacy lemmatization looks like exactly the next step! Thank you for that. link.

A small resource, which may help you - http://thescipub.com/PDF/jcssp.2016.178.190.pdf

I am currently (and for some time actually) interested into the same problem. Summarization of text. It is a hard one to master. Not a lot of work has been done. I will be happy to be of any help.

(Bonus - a recently published paper about extractive summarization - https://arxiv.org/abs/1708.04439)

These look like great resources! Thank you!

I like the concept, but using both "ai" and "blockchain" on the front page of your website triggers me.

Maybe look into the maps.org phase III clinical trials for MDMA? Pineapple fund just gave them another $4MM in BTC ( a blockchain based cryptocurrency).

But to be fair, they were citing an award they were given. "Best use of AI/Blockchain"

I believe this is the relevant repo (tangentially linked in the footer): https://github.com/andrewnc/terms-and-conditions

When you say machine learning...do you mean NLP text summarization?

It's a cool project, I just wish its capabilities weren't being misrepresented like this. It's currently just sumy, tokenization, and summarization. Like got2surf said, you'll benefit greatly from lemmatization.

For people wondering what Sumy is:

Our students compared and benchmarked Sumy against a bunch of other popular summarization techniques, https://rare-technologies.com/text-summarization-in-python-e...

Yes, you guys are right. Currently it uses NLP text summarization. Next steps would be to incorporate some from of NaiveBayes to pick out "odd" sentences that are the norm in most documents.

I don’t know if the title was changed after your comment, but I don’t see the term “Machine Learning”, and “AI” is not an inaccurate term in this context

The title used to be "Using machine learning to...".

Can it create a small faq based on the summary, e.g. q) What happens i upload a photo to this website? a) the website owns it and can use it anywhere q) what happens when i give them my e-mail? a) be prepared to recieve lots of promotional email from them and third party sites, etc..

Also if there was a sample on the home pageit would be great since I'm on a mobile and can't open Chrome store right now

Currently we divide it up into 3 categories. 1. What you agree 2. What they agree 3. Other terms

But you're right. There is good room for thought there.

We're working on an example to put on the homepage. It's a good suggestion.

Fun fact: Wordpress's permissively licensed terms of service (CC BY-SA 4.0; using it in my product, many thanks to the Automattic folks) has a treat. Just go to the page [1] and ctrl+f for "treat".

[1] https://en.wordpress.com/tos/

This is awesome! One thing: it would be really cool if clicking on one of the summary text boxes brought you to where the full details are approximately located on the page. That way if a certain clause or something is particularly relevant to you, you don't have to search the full thing for it.

There's a claim that you use blockchain technology for your technology. How's it used?

If you're referring to the hackathon award. They combined the award to say "best use of AI/Blockchain" we qualified as using AI, but are not using Blockchain

How about an example of the results it produces on your website? Might convince more people to give this time if you can show some output, given either a known T&Cs like Apple's or if that's a legal issue, use a T&C template

That's a solid idea. A couple of people have suggested it. I'm working on one right now to put up on the site. Thanks for the suggestion!

Alright, there is a screen shot of some of our results on the site for now. Thanks again!

Nicely done

Reading your summary wouldn't mean I've legally read the ToS tho. I like that its sort of when you log in with FB and they clearly summarize what access they'll get.

Hey Andrew, cool product! I appreciate the work you all have put into it. I have a few things that crossed my mind:

If the TOS can be summarized into a shortened version that is understandable and readable, then was the original TOS too long and complicated to begin with?

I wonder if summarizing can really distill that which the TOS covers. I further wonder if someone reads a summarized TOS and then violates a part not covered in the summarized version, then who do they blame?

Does it matter what is written in website TOS? Are there any cases where a court decided that some surprising term in such a thing was legally binding to the user of a site?

There are cases about scrapping and who owns the data, user or owner of website.

Reminds me of https://tldrlegal.com, but i guess this has an AI component

If this ever becomes the norm, we might see some attempt to obfuscate Terms and Conditions which might be really interesting to see.

I'm also concerned by the fact this tool could miss some important pieces of information or subtleties. How can its reliability be improved ?

This is a great question that we're working on. It obviously will never be as good as reading the whole thing, but you're right, how can it be improved?

I installed it and tried running it on your site, but it just showed the "summarizing" spinner forever.

Also tried it on https://www.humblebundle.com/terms (happened to have a bundle page open) - the output wasn't great. Coincidentally the bundle I was looking at is a bunch of ML books.

First, thank you for being willing to try it out! The results can be... underwhelming... at times. However, we're working hard to really nailing down the tech and so if you stick with it for the next few weeks, I think you'd be pleasantly surprised at how the summarizations improve over time.

Either way, thanks for the feedback!

Please publish it as a Firefox addon!

Only 80%? I have totally given up on reading them.

What I really hate is when I get T&C at a credit card terminal.

I would guess 88% is nowhere near the actual number. People _never_ read those.

You're probably right. That was the number we got from our initial user interest survey

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact