
Show HN: Using AI to Summarize Terms and Conditions - andrewnc
88% of people never read the terms and conditions of websites or services they use. However, most people want to know what they are agreeing to in those terms. That is why we created Legal Leaf. We strongly believe that everyone should have easy access to those agreements, in language they can understand.<p>Legal Leaf works behind the scenes, in your browser, to read and summarize these terms using powerful AI. We&#x27;re constantly working to improve the accuracy of these summaries. The results are displayed in the top right corner without affecting web speeds.<p>Legal Leaf is a beta product still going through development, but it&#x27;s improving rapidly and we would love a group of willing beta testers.<p><a href="http:&#x2F;&#x2F;leaf.legal" rel="nofollow">http:&#x2F;&#x2F;leaf.legal</a>
======
geuis
One concern that comes to mind if my company uses this: A user reads the auto
generated summary but not the actual ToS, then does a thing that violates the
ToS. The summary didn't say the thing was against the rules, so the user
consciously made their choice on that fact.

Not being a lawyer, this raises some questions. Some companies go the extra
mile to make their ToS quite short and readable for their users, but that text
is still reviewed by a lawyer (presumably). But if the summary is auto
generated, that review isn't necessarily in place unless leaf.legal is just a
summary tool subject to lawyer review before approval and publication to my
website.

Also, how do we keep track of which version of the automated summary that is
seen by which user? This seems like it could have legal ramifications. For
example, the user who violates the ToS because something isn't in the summary.
Tomorrow legal.leaf updates its algorithm and regenerates all the summaries
for its clients and now the missing ToS article is in the summary.

I imagine there are some pretty solid answers on how to handle these
situations. Would love to hear how you are approaching them.

~~~
andrewnc
This is an awesome question, and one that was raised by the lawyer on our
team. Since summarization is not a lossless process, there will be some
information that the user doesn't see.

For now, we have a blanket disclaimer that our product is merely for summary
and should not represent the will of the company whose terms you are reading.

However, you're absolutely right, we want to make sure that the companies are
also well represented. We are building out an avenue for companies to
"contest" the summary on their page to be more accurate, or write a
personalized summary.

The question can be looked at for any one, what happens if they don't read the
ToS at all, and then do something that violates those? We think Legal Leaf is
a step in the right direction towards education, but there is still lots of
work left to do.

------
rhacker
I can see the next progression being something like:

You're about to upload a photo. The T&C of this website gives them irrevocable
legal rights to use this photo in [social media settings, advertising
campaigns]. Click No to stop the upload, Yes to continue, or Always to stop
warning for this website.

~~~
andrewnc
Oh that's awesome! I hadn't thought of that. This would really allow people to
change behavior based on the information they get. That's great. Thank you!

------
got2surf
This looks really interesting! Two points:

1) A quick side-by-side of a sample Terms/Conditions versus Leaf's summarized
version would be helpful. It would help me understand the product more before
I install it.

2) What ML/NLP tools did you use for this? It looks like Sumy for Python
summarization, along with a specific list of clauses (will, agree, must, etc).
When you get a chance, I'd be curious to know more about the technical
process.

Also, I noticed that you are stemming words - you may also be interested in
lemmatization, which is a slightly more complicated way of converting words
into their base forms (like running -> run or ran -> run). Lemmatization also
takes into account part of speech context. Given that legal documents are
fairly grammatical (I'm assuming?), lemmatization should work well here. I've
been fairly happy with Spacy's lemmatization results
([https://spacy.io/](https://spacy.io/))

~~~
andrewnc
Great feedback! 1) That is an awesome idea, I hadn't thought of that. We'll
put that together. 2) Right now its sumy/regex/bs4 for our tech. As you can
see, it's nothing complex, but we're hoping to had some real ML to warrant our
use of buzzwords. The hardest technical challenge was actually working within
the Chrome Extension framework, the actual summarization (currently) is fairly
straightforward. 3) Spacy lemmatization looks like exactly the next step!
Thank you for that. link.

------
rcshubhadeep
A small resource, which may help you -
[http://thescipub.com/PDF/jcssp.2016.178.190.pdf](http://thescipub.com/PDF/jcssp.2016.178.190.pdf)

I am currently (and for some time actually) interested into the same problem.
Summarization of text. It is a hard one to master. Not a lot of work has been
done. I will be happy to be of any help.

(Bonus - a recently published paper about extractive summarization -
[https://arxiv.org/abs/1708.04439](https://arxiv.org/abs/1708.04439))

~~~
andrewnc
These look like great resources! Thank you!

------
HaHa31
I like the concept, but using both "ai" and "blockchain" on the front page of
your website triggers me.

~~~
ada1981
Maybe look into the maps.org phase III clinical trials for MDMA? Pineapple
fund just gave them another $4MM in BTC ( a blockchain based cryptocurrency).

But to be fair, they were citing an award they were given. "Best use of
AI/Blockchain"

------
bckmn
I believe this is the relevant repo (tangentially linked in the footer):
[https://github.com/andrewnc/terms-and-
conditions](https://github.com/andrewnc/terms-and-conditions)

------
needcaffeine
When you say machine learning...do you mean NLP text summarization?

~~~
needcaffeine
It's a cool project, I just wish its capabilities weren't being misrepresented
like this. It's currently just sumy, tokenization, and summarization. Like
got2surf said, you'll benefit greatly from lemmatization.

~~~
Radim
For people wondering what Sumy is:

Our students compared and benchmarked Sumy against a bunch of other popular
summarization techniques, [https://rare-technologies.com/text-summarization-
in-python-e...](https://rare-technologies.com/text-summarization-in-python-
extractive-vs-abstractive-techniques-revisited/)

------
superasn
Can it create a small faq based on the summary, e.g. q) What happens i upload
a photo to this website? a) the website owns it and can use it anywhere q)
what happens when i give them my e-mail? a) be prepared to recieve lots of
promotional email from them and third party sites, etc..

Also if there was a sample on the home pageit would be great since I'm on a
mobile and can't open Chrome store right now

~~~
andrewnc
Currently we divide it up into 3 categories. 1\. What you agree 2\. What they
agree 3\. Other terms

But you're right. There is good room for thought there.

We're working on an example to put on the homepage. It's a good suggestion.

------
adtac
Fun fact: Wordpress's permissively licensed terms of service (CC BY-SA 4.0;
using it in my product, many thanks to the Automattic folks) has a treat. Just
go to the page [1] and ctrl+f for "treat".

[1] [https://en.wordpress.com/tos/](https://en.wordpress.com/tos/)

------
tetchart
This is awesome! One thing: it would be really cool if clicking on one of the
summary text boxes brought you to where the full details are approximately
located on the page. That way if a certain clause or something is particularly
relevant to you, you don't have to search the full thing for it.

------
DoritoChef
There's a claim that you use blockchain technology for your technology. How's
it used?

~~~
andrewnc
If you're referring to the hackathon award. They combined the award to say
"best use of AI/Blockchain" we qualified as using AI, but are not using
Blockchain

------
harryf
How about an example of the results it produces on your website? Might
convince more people to give this time if you can show some output, given
either a known T&Cs like Apple's or if that's a legal issue, use a T&C
template

~~~
andrewnc
Alright, there is a screen shot of some of our results on the site for now.
Thanks again!

~~~
harryf
Nicely done

------
jasonsmash
Reading your summary wouldn't mean I've legally read the ToS tho. I like that
its sort of when you log in with FB and they clearly summarize what access
they'll get.

------
sinab
Hey Andrew, cool product! I appreciate the work you all have put into it. I
have a few things that crossed my mind:

If the TOS can be summarized into a shortened version that is understandable
and readable, then was the original TOS too long and complicated to begin
with?

I wonder if summarizing can really distill that which the TOS covers. I
further wonder if someone reads a summarized TOS and then violates a part not
covered in the summarized version, then who do they blame?

------
TekMol
Does it matter what is written in website TOS? Are there any cases where a
court decided that some surprising term in such a thing was legally binding to
the user of a site?

~~~
riku_iki
There are cases about scrapping and who owns the data, user or owner of
website.

------
swyx
Reminds me of [https://tldrlegal.com](https://tldrlegal.com), but i guess this
has an AI component

------
jprissi
If this ever becomes the norm, we might see some attempt to obfuscate Terms
and Conditions which might be really interesting to see.

I'm also concerned by the fact this tool could miss some important pieces of
information or subtleties. How can its reliability be improved ?

~~~
andrewnc
This is a great question that we're working on. It obviously will never be as
good as reading the whole thing, but you're right, how can it be improved?

------
eppsilon
I installed it and tried running it on your site, but it just showed the
"summarizing" spinner forever.

~~~
eppsilon
Also tried it on
[https://www.humblebundle.com/terms](https://www.humblebundle.com/terms)
(happened to have a bundle page open) - the output wasn't great.
Coincidentally the bundle I was looking at is a bunch of ML books.

~~~
andrewnc
First, thank you for being willing to try it out! The results can be...
underwhelming... at times. However, we're working hard to really nailing down
the tech and so if you stick with it for the next few weeks, I think you'd be
pleasantly surprised at how the summarizations improve over time.

Either way, thanks for the feedback!

------
fiatjaf
Please publish it as a Firefox addon!

------
BatFastard
Only 80%? I have totally given up on reading them.

What I really hate is when I get T&C at a credit card terminal.

------
asow92
I would guess 88% is nowhere near the actual number. People _never_ read
those.

~~~
andrewnc
You're probably right. That was the number we got from our initial user
interest survey

