
Structuring Legal Documents with Deep Learning - arnaudmiribel
https://blog.doctrine.fr/structuring-legal-documents-with-deep-learning/
======
avinium
Interesting article, but it's not really clear to me what the actual
application is (and this is coming from someone who started my career in law,
and was recently working on using ML for contract review automation).

As far as I can tell, it's intended to classify each paragraph as one of
"facts", "pleas", "grounds", etc. On its own, this doesn't seem particularly
useful. I assume it feeds into the broader Doctrine platform (litigation
search and case summaries perhaps?). I jumped to their website but couldn't
say for sure as I don't speak French.

~~~
arnaudmiribel
That's totally right. In our original dataset, 45% of decisions have no
explicit titles splitting categories, so it's quite difficult to read them.

We classify each paragraph as one of "facts", "pleas", "grounds", etc. in
order to generate a table of contents and help users read a court decision
more easily. For example, one may want to directly jump to the operative part
of the judgement: he just has to click on the "Operative part" of the table of
contents, instead of having to read all the decision and guess when that part
actually starts.

It's indeed intended to improve our users experience on Doctrine's website
[https://www.doctrine.fr](https://www.doctrine.fr) (legal search engine).

------
randcraw
In reading the webpage, I must have missed the explanation for the purpose of
this system. I know nothing about the domain, so the following questions are
newbie.

What is the tool's intended output? Is it a discrete classification of
documents (or their problem spaces) into predefined categories? What's the
number of categories (and some examples)?

What's the cost of simplifying/ignoring most or all of the contextual details
in each case? Wouldn't that make a small number of output classes
dysfunctionally procrustean?

98.5% accuracy sounds pretty good, but how does system performance compare to
the state-of-the-art alternatives that use more traditional probabilistic tag-
based methods? Does it tend to fail consistently, perhaps on specific topics
or complications?

~~~
woliveirajr
> Our goal here is to detect the structure of decisions on Doctrine (i.e. the
> table of contents) to help users navigate through them more easily.

A legal document from France seems to have come common topics, structured, but
as lawyers are free to write in any fashion, the simple fact of finding where
in the document is a specific part is time-consuming enougth

------
woliveirajr
This is interesting. I graduated in Law and did an MBA, master degree and PhD
in IT, using ML (and then deep learning) to extract meaningful context of
legal documents. Our documents are structured more or less like the French
ones, and the main difficulty is the use of synonyms and expressions instead
of objective, direct words.

In our case using Doc2Vec achieve better results, will try the wang2vec to see
if it improves something.

------
amelius
This seems like a band aid. Isn't it time lawyers start using a formal
language instead?

~~~
rayiner
To what end? This isn’t like indexing EBay listings, where using standard
structures and fields would help you categorize a large volume of low
complexity information. Categorization and search of information are not tasks
that take up much time in the law.

I just had a trial which turned on a couple of sentences in a contract. I
could have a paralegal pull every relevant legal in a couple of hours, through
“dumb” terms and connecters searching. More than a year of litigation boiled
down to applying a rather small volume of law to a large volume of human
facts. Even in appeals, which are case law heavy, the process of finding and
categorizing the law takes vanishingly little time compared to synthesizing it
into a legal framework, and applying it to a large volume of facts.

------
mingodad
Is the original data freely available on the net ? I mean the courts make this
public on the web ?

~~~
arnaudmiribel
Currently, not all this data is freely available on the net. Doctrine has
actually established partnerships with many French jurisdictions to collect
court decisions and publishes them on
[https://www.doctrine.fr](https://www.doctrine.fr) :)

------
ngc6677
AI providing legal analysis assistance? Taking legal decisions in court?
Great!

