Hacker News new | past | comments | ask | show | jobs | submit login
Why isn't there a Google for the law? (openlawlib.org)
371 points by vqc on Jan 2, 2017 | hide | past | favorite | 214 comments

Anecdote, I was at Google in 2006 and there was (is?) a database of ideas that people shared that Google might pursue. As their motto was organize the world's information and make it easily accessible.

I had gotten a ticket for obstructing the intersection (the "anti-congestion" law in California that sought to limit gridlock by making it illegal to enter an intersection that you couldn't exit before the other light changed). I was fighting it and wanted to find other cases that had been decided on this law[1]. The only way to do that was to go to the public library and look through their published volumes of decisions and cases. So I thought here is a really useful thing Google could do, it isn't even a hard problem, collect the decisions that the courts publish anyway, and just connect the ones that are about the same part of the code. Match a number, match a date. And very useful to people who are fighting cases. It saved me some time and money [2]. But the idea never made it past the discussion stage because, as I was counseled, doing that would take on "powerful interests" who would really fight back hard and Google didn't want to draw that level of scrutiny. It wasn't until Carl Malamude started attacking this problem in earnest[3] that it became clear to me what it means to take away a revenue stream from lawyers.

[1] And learned this is called 'Sheparding' based on finding citations -- https://en.wikipedia.org/wiki/Shepard's_Citations

[2] turns out there had been no case law on this particular law and lots of dismissals so I just entered a plea of not guilty at the clerk, and the court informed me a week before my trial date that the prosecution had declined to prosecute.

[3] https://www.techdirt.com/articles/20150726/23080731763/even-...

Chuck, I'm not sure who you talked to in 2006 at Google, you should know we basically done done it already :) It was being done then, in fact. It made it well past the discussion page, and was already being implemented.

We spent a lot of time building up a case law, pushing to open Pacer (meeting with the federal judiciary and offering to just pay the revenue they make), etc.

Scholar has pretty up to date legal data since then, from OCR'ing federal reporters to parsing it, to what have you.

In fact, ask Carl. We worked really closely with him and have helped fund him for years, both directly, and through public.resource.org (https://yeswescan.org/index.court.html - 9F is Anurag, who started/ran Google Scholar, 22F is the group I ran)

A lot of states have exclusive contracts with westlaw or nexis or what have you, so it's non-trivial.

The target was also to try to have data for the kind of situation you mention, not to try to replace the dedicate legal resources that exist.

Anyway, again, not sure why you think Google didn't do it.

> A lot of states have exclusive contracts with westlaw or nexis or what have you, so it's non-trivial.

Have you considered bidding for the contracts when they come up?

You say this as if they aren't like 10+ year contracts :) Also, they often come with crazy requirements and local IT support (designed this way in order to ensure only lexis/westlaw can fulfill them).

That sounds suspiciously like Corruption.

Sorta, but not really. Usually what happens is that a company works with a state or local government on custom projects for so long that they create their own niche which is extremely difficult for a newcomer to fill. This is mostly just an emergent property of bureaucracy. Eventually these niches are choked off and die, however it takes decades.

> pushing to open Pacer (meeting with the federal judiciary and offering to just pay the revenue they make)

Wait what? You offered to pay the ~$145M/year they bring in for access fees in return for an open access PACER system? Was this offer just made to the PACER people or was it brought to the attention of the Judicial Conference?

We talked to the judicial conference of course. Honestly, it this was a long time ago at this point. We may have only offered to pay what it costs them to run it instead of revenue. It was definitely somewhere between those two.

(I realize there is a huge difference in those two numbers, but we were also idealistic and thought they might actually care about public access at the time. Look how dumb we were)

Interesting. I have never been able to put my finger on if the resistance to opening pacer comes from the judiciary itself (judges fearing public scrutiny of the entire corpus of court filings) vs the Admin Office / IT wanting to maintain what is essentially a court-tech slush fund. Probably the reality is a little bit of both. I think the only way it gets fixed is with congressional action, so not holding my breath.

From what i can tell, it's the latter, not the former.

They are so underfunded otherwise they can't afford to give it up

Does Google Scholar have transcripts from trials before appealing?

The anti gridlock law is actually a great law that DOES stop gridlock. The problem is you need to get people to actually follow it and in california for some reason no one does. In chicago we all know about this law and I had never seen gridlock the whole time I lived there (23 years). crazy right? not really. But to the people in LA it seems crazy to them. within a year of moving to LA, I saw gridlock. I had no idea this was still a problem in the united states. I thought it was only a 3rd world problem. But nope. In LA they have it. And it's becuase people block intersections here ALL THE FUCKING TIME. It's crazy.

Why? I don't get it. Do LA drives really not know it causes gridlock? Or do they just not care. I remember talking to an uber driver who didn't even know what gridlock was, or why blocking intersectinos was bad. As someone from chicago, I was amazed. And even on HN, OP doesn't know that blocking intersections causes gridlock, it amazes me. I thought this was common knowledge. For some reason people in these gridlocked towns are skeptical that the law doesn't work.

for example, in downtown LA they blocked one street for a festival and the entire downtown became gridlocked. Ive never seen traffic this bad IN MY ENTIRE LIFE. All because of people blocking intersections. I was able to walk 10 blocks faster than cars did. I saw cars sitting in place for an hour and more. It was crazy. I've never seen anything like it. And when the lights turned green people still blocked the intersection. THEY WERE SITTING IN GRIDLOCK AND STILL DIDN"T STOP BLOCKING INTERSECTINOS. Like wow. I was dumbfounded. How are LA drivers so bad at driving?? I was so glad to be on foot.

It's a completely fair law; it's easy to follow, and it benefits everyone. blocking intersections DOES CAUSE GRIDLOCK. IF YOU STOP BLOCKING INTERSECTIONS, GRIDLOCK WILL GO AWAY. Get this through your head!

the problem is people dont know about the law in california, and don't have the common sense to figure out themselves. it needs to be enforced more. LA has some of the worst gridlock problems ive ever seen.

> The problem is you need to get people to actually follow it and in california for some reason no one does.

This problem isn't specific to this law. There are many laws which aren't being enforced, but do exist. I'm from NL, and we're known for using bicycles a lot, and this is true. Yet bicyclers don't use their hand to show the way they go, they drive through red light, drive without light, and drive on the wrong side of the road all the time. And they get away with it because the cops aren't enforcing the rules.

> It's a completely fair law; it's easy to follow, and it benefits everyone. blocking intersections DOES CAUSE GRIDLOCK. IF YOU STOP BLOCKING INTERSECTIONS, GRIDLOCK WILL GO AWAY. Get this through your head!

Another reason it isn't working is due to selfishness plus ignorance. Inform drivers. Heck, teach it during driver license exams.


1) Inform the users (in this case the general public).

2) Enforce the rules.

I would like to add laws need to be simple to explain, but I'm not so sure on how to describe that point.

I'm sure that people do know, but selfishness drives (pun not intended) people to do it. When there was a horrible freeze down here a few years back, traffic became gridlocked. Yes, the lack of traction caused issue, but the main problem was that everyone was hellbent on getting home without regard to how the flow of traffic normally works.

I witnessed gridlock happen. At a particularly odd, 6-way intersection, people started edging the cars into the intersections. The light would change, cross traffic would get pissed, and then they'd edge around the blocking traffic creating a snaked grid. As cars moved out the way in the cross, the other section would block more and more until the whole intersection was blocked. It was a beautiful tragedy of bucket crabs[0] stopping progress for all. That's when I pulled into a nearby lot, called a friend, and just crashed with them for the night.

[0]: https://en.wikipedia.org/wiki/Crab_mentality

For what it's worth, I was taught to sit in the intersection when making a left turn in driver's ed.

Maybe it's an east west thing, but here is a video of a guy complaining about people who DON'T enter the intersection: https://www.youtube.com/watch?v=q_bcjCOzob4

Again, he is saying that the proper thing to do is to enter the intersection and sit, regardless.

So, it seems like the issue may be that in some areas this behavior is promoted (even in drivers ed classes) and in others it's illegal.

Of course, I am not from a large city that deals with gridlock, so that may be part of the difference.

If there is no space in the lane you're driving into, whether turning or going straight, you should not enter the intersection. That video just doesn't deal with the case of the lane you're turning into being full already.

You're supposed to enter the intersection to make a restricted turn across oncoming traffic. Though I believe only one car should do it at a time.

It's different from gridlock because you know that when the oncoming traffic stops, there is somewhere for your car to go.

Except that you block the view for oncoming vehicles turning the opposite direction (their left). A properly labeled intersection has white lines (in the US, other places may use something else) that indicate where to stop, and if engineered properly, stopping before these white lines leaves a clear view.

which doesn't matter because after the oncoming traffic gets the red light you take your left and are no longer blocking anyone's view.

Huh? You are still blocking the view and preventing the other side from turning left safely until the traffic clears and you can turn.

It annoys me as well when people don't enter the intersection on green when waiting for oncoming traffic to clear so that they can turn left. However I don't think it's related to the gridlock discussion unless the road you are turning onto is backed all the way up to the intersection, so that you will not be able to complete the turn even when oncoming traffic clears. In that case I believe you should not enter the intersection.

Your comment is not about gridlock.

You are supposed to enter the intersection when turning left (then turn when safe), but not if there is no room to complete your turn in the target lane.

The whole point is to not have people clogging the intersection and preventing movement when the lights change. That's what creates gridlock instead of just a backup on one street.

I would appreciate if anyone can point to any noteworthy studies documenting the impact on traffic flow that avoidable gridlock can have, I would like to contact administrators in my city to get them to consider a public education campaign to teach drivers about turning lanes and pedestrians about walk signals. The combination of these two things means at rush hour we're often lucky if we get 2 cars at a time through a light (1 lane plus turning lane, driver trying to turn leaving his ass in the main lane preventing anyone else from passing, meanwhile pedestrians don't respect walking stop signal, so sometimes the turning driver can't even make it through after the light has turned red as then the main traffic is coming, as they even often have to wait for someone crossing).

There is a similar law in the UK, which works well and is enforced automatically through the use of ANR cameras at junctions. (As I know to my cost, since I misjudged and ended up stationary with my back wheels on the yellow hatching.)

Block an intersection in Chicago and people will usually make you feel like you're a bad person and you did a bad thing. I've seen people nearly pull onto the sidewalk to get out of an intersection they were blocking.

Similar in Manhattan ... every intersection has large signs that say, "Don't Block the Box". Driving there has made me more conscientious when driving in other areas that don't seem to enforce this at all.

It's a good law, but not always easy to follow. Sometimes traffic is moving along slowly, and then suddenly stops when you're already into the intersection. This mainly happens when you're immediately behind a large vehicle like a city bus and can't see what's further up ahead.

You don't have to predict whether your exit will be clear. You just wait until you _can_see_ it's clear.

In the UK, it's pretty simple: "You may enter a yellow box junction when your exit is clear and there is enough space on the other side of the junction for your vehicle to clear the box completely without stopping."

It's not complicated. It just requires an incentive (don't break the rule, don't pay a 1000 GBP fine).


Though it might not be obvious to American drivers, this is the solution.

There will always be that driver who attempts to change lanes mid-intersection to get into an opening first; that's part of what motivates American drivers to enter the intersection before their exit is clear.

(Lane changes within the intersection are illegal in the US, but rarely enforced.)

American drivers also accelerate quickly only to brake desperately, which creates "openings" for rapid lane changes as well as more trouble staying out of intersections. In spite of higher accidents, more fuel costs, and even occasional tickets (for not signalling properly), this is a persistent driver behavior in many areas like California.

False. In California and some other states the vehicle code doesn't prohibit changing lanes in an intersection. Although depending on the circumstances an officer might cite you for an unsafe lane change.

That might fly in the land of socialism but here on the free side of the lake sitting at the intersection while the light is green will just encourage people to take their "right on red" in front of you ensuring you never make it across, unless of course you're at an intersection where the locals have been deemed to be idiots and are not allowed to take a right on red.

Yes but if traffic is moving slow and you're at a stale green light then you simply don't enter the intersection until you know you can get through it. Sure, you might miss the light, but blocking the intersection means everyone your blocking will miss it instead.

Here in Europe people often enter roundabouts and junctions before the exits are clear and it is against traffic law, the reason given for it being a wrong thing to do is that there should be a clear path through for emergency vehicles. Surely that is reason enough to implement a law in US states?

Which part of Europe? According to everyone I know in Sweden (including myself) it is really frowned upon to do that. If you enter without being sure that you'll be able to exit in time you're a massive jerk.

From the UK originally, now living in Norway. Norwegians drivers are 50% ok-ish but often irrational, 50% massive jerk imo.

From the UK here too, I've been living in Norway for 30 years now and Norwegian drivers only really started to get the hang of roundabouts and pedestrian crossings (zebra crossings) about five years ago. In the Drammen area at least.

Lost count of the number of times I have been almost run down on pedestrian crossings but it's much better now than it was.

One thing that is very good here is that everyone gets out of the way of emergency vehicles very promptly.

I work with a guy who got a fine and points for not stopping at a zebra crossing in Holmestrand, and the person the police meant he should stop for wasn't even crossing. Apparently the rule is that you should stop even if they look like they might cross.

My biggest pet peeve is that many people don't seem to understand slip roads and tailgating is rife. Also that patience is a virtue, if I'm in the middle of a 3 point turn on a country road then don't drive around me, just wait the 4 more seconds it will take me complete the maneuver. And parallel parking, don't get me started.

I see this all the time: A group of (say) 3 cars enters the intersection when there is plenty of space for 3 cars on the other side of the intersection. One of the first two drivers stops way short, leaving the third car stranded in the intersection. Sometimes it doesn't matter if you "know".

No, it's very easy. Never enter the intersection unless you can make it all the way across. Even if the light is green you stop at the intersection if you don't have room to go on the other side.

So sit and wait for a couple seconds for a spot to be open before you move across the intersection.

> I just entered a plea of not guilty at the clerk, and the court informed me a week before my trial date that the prosecution had declined to prosecute.

It is disappointingly common. To the point you can unironically say: tickets are a tax on the busy, and those unfamiliar with or frightened by the courts.

This shotgun "see what sticks" approach is infuriating.

My wife had a even more annoying case of this in SF. They actually set a trial date, she went, sat on court for hours and then when the trial came up it turned out that the cop who had given her the ticket hadn't shown up and she got to get home and get the money from the ticket back. Then it took the city about a year to pay the money back. Just imagine we had taken a year to pay the ticket...

CaseText (https://CaseText.com) is pretty close to Google for law. They have a new tool that is aimed at lawyers doing legal research where they suggest other relevant cases based on the cases you're already citing.

Disclaimer: I have nothing to do with CaseText but I do know the CEO as a result of being a legal operations software CEO and being in YC with them.

I came here to recommend CaseText as well. It's a superb resource for law-related research.

>I had gotten a ticket for obstructing the intersection (the "anti-congestion" law in California that sought to limit gridlock by making it illegal to enter an intersection that you couldn't exit before the other light changed).

You monster. :)

To be fair, there is a lot of traffic created from this problem--especially in the Financial District in San Francisco. Whereas NYC has those "don't block the box" signs and it seems like their drivers are much more aware of it.

SF does have these signs, for example on Battery crossing onto Market [1] (including most intersections onto Market). Unfortunately it doesn't look like the intersection I see block most often (California / Montgomery) has any, and I've been honked at plenty of times waiting for the intersection to clear, so education is probably pretty bad.

[1] https://www.google.com/maps/place/San+Francisco,+CA/@37.7914...

I don't know what NYC you're referring to but the one I live in has huge problems with asshats blocking intersections and I've never once seen the NYPD pick up anyone for it (or any other traffic violation).

Of course why anyone drives in Manhattan is beyond me.

If there's one thing New Yorkers like doing, it's blocking the box. I haven't seen the grid get locked though.

Oh it's not just traffic. It makes it dangerous for pedestrians as well. You're either waiting patiently for the intersection to clear (hint: it won't) or you're walking in traffic, around the obstruction. I suppose the latter isn't so bad until someone else that's blocking the box tries to move and nearly runs you over.

The MTA has the right approach. At a number of intersections downtown the traffic enforcement guys just hang out at the intersection taking pictures and writing down license plates of folks who block the box.

Oh, I agree. I was gently chiding OP. I think blocking the intersection like that is one of those things that everyone hates, but has also committed at one point or another. Though from his explanation of the situation in another subthread, it sounds like he was in the right after all.

No one is going to explain how they're wrong, if he was ticketed the officer involved had a perspective that included him not being in the right.

I'm sad to hear that gridlock laws were not enforced. Gridlock is a pet peeve of mine as it wastes a lot of time unnecessarily. It's a simple coordination problem, with an easy solution (yellow boxes with expensive, automatic fines) that's proven to work (at least in London).

One day ~3 months ago, I spent >3 hours going 16km in Beijing, as large parts of the city were gridlocked: https://news.ycombinator.com/item?id=12980961

Sheparding isn't hard and Google Scholar does an adequate job of it for cases in its database. Doing anything more sophisticated is much harder though.

Take a simple query I needed the other day: when is it proper to file a summary judgment motion before claim construction of a patent? The query "summary judgment before claim construction" on Google Scholar returns a bunch of highly-cited cases, and on the first page only one is even a patent case, and none answer the question.

In contrast, the first hit on WestLaw is a directly relevant case. It's a District of South Carolina case, cited by a single other case, but it answers the question! And that's with a free-form query, which I'd never do. One minute of planning to create a good terms-and-connectors search immediately returns mostly-relevant cases.

Why were you fighting it?

The citing office was on the corner of the intersection and upon seeing him, a car on my left which had been heading for the carpool lane, opted instead to cut in front of me to stay in the legal lane. Both driving in the car pool lane and the unsafe lane change were illegal on the part of this other car. My argument was that the officer (having seen this whole thing (we talked about it when he wrote up my ticket)) was that safely avoiding the collision from the illegal driver was the proximate cause of the alleged violation. I was also irritated he didn't cite the other driver for his egregious actions.

That would have rubbed me the wrong way too. Good on you for pushing back and winning.

s/Sheparding/Shepardizing </lawyer pedantry>

Reading this made lawyers the closed source programmers of society in my mind.

Yes. This. Not just in terms of access to code but general willingness to be helpful to your colleagues also. As a lawyer turned programmer I remember when I discovered IRC thinking "Why can't attorneys be this chill with each other?!"

> I had gotten a ticket for obstructing the intersection > I was fighting it

I hope you lost.

but see the charity Bailli http://www.bailii.org/

They don't seem to have been targeted by uk lawyers. If anything they're a respected site.

So much for google ever really caring about anything other than an easy advertising revenue stream.

Appropriate that this subject makes an appearance on HN so close to the anniversary of Aaron Swartz's suicide under duress and attack from the Federal Bureau of Investigation.

Making court documents more publicly accessible was one of Aaron's projects (circa 2008). He and project collaborators downloaded more than a million documents from the government's PACER electronic access system using public library terminals and attracted the attention of the FBI.[1] Part of the goal at the time was to uncover privacy violations in filed court documents that were legally a matter of public record but behind a lucrative, government administrated pay wall.[2]

There is something important to be said for the social and moral importance of keeping the public record publicly accessible. The justification for these intermediaries to exist and extract rent from the cataloging of public information grows slimmer and slimmer, but cataloging and indexing everything in a common law (precedent based) system is tremendously expensive. I suspect that developing of an algorithm to usefully search the dense and interweaving web of judicial opinions, case history, written legislation, and jurisdictions in which all those elements apply/overlap/supersede each other is also a massive capital investment.

It all does have to be paid for somehow, and I don't think how to fund is a settled question. Pay walls clearly have pernicious externalities (privacy violations go unnoticed; access to law is practically limited to professionals for whom the costs are a business expense). But I don't trust the state to properly fund or develop such a service through general tax either.

Consider supporting the individuals in this thread who are working to make that sort of open information access in law a reality, and consider also who will seek rent from the finished service who will not.

[1]: https://en.wikipedia.org/wiki/Aaron_Swartz#PACER

[2]: https://public.resource.org/crime/

The law is formed by the government and the judiciary and applies to everyone living in or visiting the State. It's entirely reasonable to require the State to pay for unfettered access to the law.

While access to case law is in fact required unless the case has been sealed by a judge, convenient and free access is not required. The states can charge for paper copies and make those copies only accessible via a USPS mailed form. That's the underlying problem. Barriers to access can be an effective tool for creating "expert" silos, where only those who have the means and the correct keys may travel.

Any idea of what the cost to, say, buy up all the case laws would be? And what is the law on republishing said copies?

Most docs are available through Pacer. Pacer pricing is $0.10 per page capped at $3 per document. Buying everything would cost a lot. In addition, new case law is continuously being created. So you couldn't buy everything and be done with it.

You can republish... court docs are in the public domain. RPXCorp tried doing this with patent law. For a time, they made everything free. That practice didn't last and they now pass on their costs to customers.

Just as it is entirely reasonable not to trust the state to do so either competently or altruistically.

As a lawyer I've thought a lot about this (and if anyone is working on this and wants to talk get in touch). There are two reasons:

1. Reasoning in law relies on complex language semantics, both in statute and case law. Take for example a court decision that says "in the circumstances of this case I do not agree that John v Doe applies". That can be expressed a million ways and I'm not sure our natural language processing can replace humans yet in this area.

2. There is a lot of copyright problems that need to be overcome. Companies like Lexis and Westlaw own the rights to a lot of decisions and even statutes and can paywall the . This is slowly changing however, for example in the UK recently the courts took back the rights to publish decisions.

It does not seem ethical to hold a person accountable for not following a law, if they do not have free access to read that law and the various ways the court has ruled to how that law should be applied.

That defense is for particular reasons not going to work. The supposition is trivial whether the texts are paywalled or not, because the shear amount of text doesn't leave any chance to read it all.

Clearly though, reasons for the texts to be free are obvious and reasons against it are less than before the proliferation of the internet.

I agree entirely however law is like all professions where access to information is only half the equation, its application and interpretation is derived from extensive training and experience. So I'd argue that until we nail 'Google for the law', access to free lawyers at least for the poor etc is more important than access to the legal databases

If you start from the assumption that the law is whatever lawyers and judges tend to think it is, then access to lawyers is more important. If you take the egalitarian perspective that the law is the law and a lawyer is just someone particularly skilled in applying it, then a person of average eduction should be able to handle a routine legal dispute without paying a specialist. This is what people have in mind when they want to make the law more available online.

If there are laws out there that are currently applied or interpreted differently than their plain meaning as written down, that's a failure of government. Either legislators should have fixed a stupid law, or judges should have thrown it out for vagueness.

The problem is that both of your assumptions are true. The fact is that the law is constantly being discovered. To the extent that an area of law is well explored, a layperson should be empowered to handle it alone, but to the extent that it is not, it requires abilities that have not been instilled in the average citizen.

>access to free lawyers at least for the poor etc is more important than access to the legal databases

That itself is a problem, while we have public defense lawyers, we don't have public preventive lawyers (who I can call and ask if what I'm about to do is altogether legal and what can I do to avoid run-ins with the law).

That's not really the service we want because those lawyers won't be able to give definitive answers for all but the simplest cases. What I think you really want is a government sponsored law office that is given special privileges.

1. They are tasked to give well researched legal advice in all fields.

2. Their advice should be minimally restrictive.

3. If a person faithfully follows the advice of the office the office assumes criminal and civil liability.

Individuals are not capable of evaluating the law without the aid of legal professionals. Worse, individuals don't have the ability to evaluate the quality of lawyers. This system would allow individuals to be secure that they're not heading into legal gray areas or situations where the legality is truly unknown until there's a trial.

I like this kind of system because it's in the best interest of such an office to give the most accurate advice possible.

> 2. Their advice should be minimally restrictive.

> 3. If a person faithfully follows the advice of the office the office assumes criminal and civil liability.

The problem is these two are in conflict. If the office gets in trouble for approving something they shouldn't then they'll have the incentive to be overly restrictive in what they approve.

A better solution is to make this office a subdivision of the justice department and then if they say you're allowed to do it then you can't be prosecuted for it. And if they say you aren't allowed to do it then you can hire your own lawyer to appeal the decision to a court, and they get penalties for being wrong.

This sounds like a process for giving any citizen standing to challenge a law, which I think would be a very significant change to the way the system works today. It naively sounds like a good change, but I suspect there would be some ill effects - e.g, companies asking over and over about slightly different ways to manage taxes to try and find a loophole, people on both sides of the Obamacare contraception mandate trying to prove that loopholes did or did not exist in the law...

You say that like it's a bad thing. Then people would actually know what the law is.

If you don't want people looking for loopholes then don't put so many in the law. When you pass thousands of pages of tax code and then companies spend a lot of time trying to save themselves billions of dollars, what did you expect to happen? That's what happens already.

I think it would be easy to DDOS the proposed system and yes, I think that would be bad. Feel free to explain why either that would not happen or why it would not be bad.

There are several places where I strongly believe that access to the law can meaningfully improve access to justice. And I hope, through our work at Open Law Library, we can show this to be true.

1) Educated lay-people. If you have good reading comprehension, and if your problem is one many other people have faced, there is a chance the law that pertains to your situation is clear and unambiguous. Access to the law in this case means you can resolve your issue.

2) Legal services at the margin. At the high end, where you are paying an attorney hundreds of dollars per hour, that attorney is passing database costs straight through to you, but you can afford it. At the low end, legal aid clinics usually receive free or reduced cost access to the databases. However, at the margin, when you are scraping together the money to pay a $30/hr lawyer to represent you in a civil matter, neither you nor the lawyer can afford to pay. It is in these cases on the margin where access to high quality laws can make a significant difference.

3) Secondary legal sources. Many legal aid clinics put out high quality secondary sources written at a grade school reading level. Where I volunteered, we had around 100. We could have had many, many more. They don't really take that long to write and the number of people helped per hour of writing was quite high. The problem, however, was maintaining them. Each document we added to our library represented a commitment of several hours to a couple days of work quarterly or biannually to review the law and update the document. It was this maintenance commitment that limited our ability to provide understandable legal documents. This time commitment can be cut by an order of magnitude by pushing pertinent changes to the law to legal aid clinics, rather than them having to sort through all laws for pertinent changes.

4) Government opinions. Many governments have legal departments that will provide opinions on the law. These opinions are often (though not always) written with a general audience in mind, and explain a particularly complex or often misunderstood part of the law. Unfortunately, these opinions are not easily discoverable, especially if you don't even know to look for them. Open Law Library works with jurisdictions to help them coordinate publishing, linking, and discoverability across branches and departments.

Into the future, as we build the foundation of computer-readable laws, others will build tools, apps, and bots on top of this foundation that will make the law truly accessible to all.

> It does not seem ethical to hold a person accountable for not following a law, if they do not have free access to read that law and the various ways the court has ruled to how that law should be applied.

Well, that's incompatible with a common law legal system. Common law literally means that we respect legal traditions that aren't always codified and are instead established by precedent and/or consensus via tradition. That's the reason you'll sometimes see precolonial British law cited in US legal memorandums or court rulings - those laws literally are not part of US legal code, but they may provide persuasive precedent.

So there not always a codified law to read, but that doesn't mean people can't be expected to uphold the societal structure.

The same works in reverse. If a law exists, it's possible for the law to become legally unenforceable (for a variety of reasons, not just court rulings) even without the law being repealed. So merely providing access to the legal code doesn't actually provide a complete picture of what the law is.

Engineers want to think about the law the way they think about code - it may not always do what you expect, but Von Neumann architecture means that it's at least consistent. But that's not how the law works - it's not always clear ahead of time what the inputs are (which is why litigation is so complicated), and that's even before you account for the judgment calls that enter the picture at different stages.

> Common law literally means that we respect legal traditions that aren't always codified and are instead established by precedent and/or consensus via tradition.

If access to court decisions is restricted, they hardly qualify as "established by precedent and/or consensus via tradition", no?

I'm saying that the set of things that can comprise "law" in a common law legal system is impossible to define precisely. So you could always find some obscure, hard-to-obtain source documenting a legal custom that could be used as persuasive precedent.

If we stated that (say) criminal laws could not be enforced unless the defendant had access to the full body of possible codes and precedent before the crime occurred, we would literally never be able to convict a single case, ever. Because any defense attorney could just find some arcane memo and prove that the defendant could not reasonably have been expected to have access to it before the crime occurred, and that would be sufficient for excusing them of culpability. And that's not even raising the question of whether or not they could reasonably be expected to interpret and understand the text, which would be the next hurdle. (The same applies to non-criminal cases too.)

(Persuasive precedent is not binding, so it's not "law", but it's undeniably influential enough that it's necessary to understanding the law.)

I think I understand your argument from the legal perspective but what about from a public records perspective? It still feels like the precedents that have been established inside the US legal system should be publicly available without restrictions. Sure, other restricted sources may end up incorporated but restricting acccess to actual US legal decisions feels like it only benefits those few with the access for commercial gain.

Any historically accepted source should still be available for establishing precident but that doesn't mean our legal system should conceal the decisions it has made.

If nothing else access to these decisions could be a great area of study for language processing.

>but what about from a public records perspective? It still feels like the precedents that have been established inside the US legal system should be publicly available without restrictions.

Yes, I agree with that. And in many cases (but not all), they are - court documents are generally available for nominal processing fees, though there's a long way to go before I'd say this is all truly "publicly available without [unnecessary] restrictions".

With some notable exceptions like FISA, I don't think most of the secrecy is out of a desire to conceal law from citizens. It's largely the fact that our legal system is shockingly low-tech and hasn't yet caught up to what technology now allows.

That state of it is actually quite sad, to be honest. I'm in South Africa, and I am baffled at how difficult it is to get access to the Acts that were approved by parliament and form part of the body of law. It's not even about precedent, and court-decisions, rulings, etc. Just the plain law, with all the out-dated portions removed/updated.

This makes it quite difficult for me, as an individual, to interpret and act on what the law says. Sure, we all know the "basics" of criminal law: Don't steal, hurt, go where you shouldn't, etc. But everything else (regulations) is a giant black-box of "you need to speak to a lawyer" and pay them money. There are probably hundreds of sites and blogs out there trying to help/guide people about what the regulations say, but that's a poor substitute and not something you want to rely on for anything more than mundane. There needs to be a clear, government-run, up-to-date resource that has all laws.

Sure, but rent-seeking, parasite lawyers will fight to prevent that, so that they can continue to extract money from the rest of us while being a big drag on the entire economy.

> our legal system is shockingly low-tech and hasn't yet caught up to what technology now allows.

Are there ways to improve this? What do you think the hurdles to technology adoption are in this field? Is it a document formatting problem, or a hosting problem?

Do we need to create a WordPress for state and local courts to adopt?

> Well, that's incompatible with a common law legal system. Common law literally means that we respect legal traditions that aren't always codified and are instead established by precedent and/or consensus via tradition.

> Engineers want to think about the law the way they think about code - it may not always do what you expect, but Von Neumann architecture means that it's at least consistent.

Well, that’s why Civil Law might be better – and why most Civil Law countries already have fulltext searchable archives of all laws and decisions. (the dejure indexing engine for Germany, for example, is quite awesome).

#2 is a really big barrier to innovation in this space.

Another issue is that the economics of codification and publishing disincentives existing publishers from releasing laws and codes in open and accessible formats. And without open and accessible formats, it makes it extremely difficult and unsustainable to build things on top of the law.

If you are in either Texas or California check out https://www.oconnors.com/ (formerly Jones McClure Publishing)

They have an online reader for all their legal commentary books. They've invested heavily in search. It is pretty fantastic.

They only have one book listed for California.

Do you know of any similar companies more focused on CA? I'm specifically interested in the Vehicle Code.

You were originally correct. They currently have one book specific to California.

They do have federal books as well that apply nation wide, but as most lawyers are highly specialized, that may not apply to what you do?

I'll reach out to them. I used to do some contract coding for them. They may have more books for Cali that haven't made it to the marketing site.

Can #2 really stand up to scrutiny? My understanding is that decisions and statutes are not copyrightable in the US as they are "edicts of government", and thus owning rights to them aught to be impossible.

what do you think of 3) the way we make law is deliberately "broken" such that inconsistent laws can be passed (with humans in the loop to sort it out) otherwise nothing would pass - if we could build a type checker for laws, active law would not type check and would be impossible to fix

This approach doesn't acknowledge that law is a dynamically typed language, to continue your analogy. It responds to societal concerns and emotional and frequently irrational reasoning. There's a saying bad cases make bad law and I think this isn't going to change any time soon.

Nevertheless dynamic languages can fail to typecheck. To continue the analogy:

   try {
       fair (trail);
   } catch (e) {
       if (e == InconsistentLawException) {
       // Oops, divided by zero,
       // add a tiny epsilon to the denominator and keep going
           law += epsilon;
       } else if (e == ClassCastException) {
       // some types are more equal than others
           settlement (trail);
       } else {
       // escalate to higher instance
           mapReduce (court);
   } finally {
       // might silently fail if
       // higher_instance == SCOTUS and
       // SCOTUS.busy == true
       sentence (trial) . await (appeal (higherinstance)); 

> This approach doesn't acknowledge that law is a dynamically typed language, to continue your analogy. It responds to societal concerns and emotional and frequently irrational reasoning.

The problem with this line of reasoning is that it leads down the wrong path. Formally verifying the consistency of the law cannot be done in practice because it has NP-complete problems inside of it. The amount of work it would take to create a legal code which is internally consistent and always yields an agreeable outcome is not feasible.

But it's foolish to go from there to the other pole where all the laws are overly broad and the only thing that determines whether you go to jail is prosecutorial discretion.

The formally-verified internally-consistent always-righteous version of the law is the unattainable platonic ideal. You never actually get there but progress is measured by whether we get closer today than we were yesterday.

> The amount of work it would take to create a legal code which is internally consistent and always yields an agreeable outcome is not feasible.

Is there something you can cite here? The infeasible thing to me is refactoring a pre-existing lawbase into something formally verifiable; if we could throw it all away and start over at the constitution, it might work.

The existing law isn't the underlying source of the complexity. It's that people want the complexity. They want killing to be against the law, but not if it was self defense, or you were under duress, or you couldn't reasonably know that your actions would kill someone etc.

You can easily pass a law that says all killing is illegal, but that isn't good enough. You either have to consider every possible thing that could happen in the universe and encode what should happen in each case into the law, which is clearly infeasible, or there will be things that can happen which you haven't considered ahead of time, and then you still have to specify something.

If what you specify is that unanticipated acts are illegal then everyone will be in prison. But if they aren't then it will be easy to find a provable loophole to murder. Neither of those is acceptable.

That's why we have judges. To address that. But that answer is still terrible because then you don't know what the law is until you're already in court. It's just less terrible than either putting everyone in prison or letting anyone get away with murder.

Which means the goal is to minimize the number of situations where that needs to happen, without causing the well-specified outcomes to be unrighteous.

>The formally-verified int vv v vv ernally-consistent always-righteous version of the law is ...

Always righteous wasn't required by OP, but ideally it's a mere consequence of consistency. If you preclude Consistency, the apriori

haha dynamically typed language with no log trail!

So basically the law is Brainfuck?

haha except the fucking is more than just a brain

This is one of the tools we are building at Open Law Library. Open Law Draft is a linter that finds common errors in draft bills right in Word as they are being drafted. Fixing the error before the bill becomes law is much easier than having to pass another law to fix it. Extending the analogy, Open Law Codify is essentially our compiler. We will be integrating the two so we can find more complex errors at the drafting stage rather than having to wait for the codification stage, as we do now.

Unlike what I think you are proposing though, our system must work with any law that is passed, not just laws we deem "correct". We simply write software that helps legislative bodies pass laws that are as close to what they deem correct as possible.

Then you want a safe incremental process that allows you to run unchecked code and annotate which code has been upgraded to meet the type checking standard - this is already done for javascript projects, you can introduce Flow to typecheck only the annotated functions in a file: https://flowtype.org/docs/existing.html

I am not a lawyer but I from my computational linguistics background I have some experience with Legal English:

1. From my admittedly limited experience legal language, while certainly complex, follows a clear set of rules to the extent its almost formulaic. There's no pragmatics (pragmatics is the main reason we don't have generalised natural language understanding yet: https://en.wikipedia.org/wiki/Pragmatics ) involved. Legal language is precise and its primary objective is to avoid ambiguity or potential for misinterpretation.

These features actually make legal documents a perfect area for application of NLP algorithms.

2. Frankly, I think this is outrageous. It's "Common Law" for a reason. How can one expect someone to abide by the law if the law is not commonly known but that knowledge is restricted to a select few? How law firms like those you mentioned could arrive at thinking about copyrighting legal decisions and thinking that's even remotely ethical is beyond me.

I'm pretty sure Lexis and Westlaw don't own the text of published cases or statutes. They do own the headnotes which are commonly seen in printed and electronic reporters.

See in part: MATTHEW BENDER & CO. v. WEST PUBLISHING CO., 158 F.3d 674 (2nd Cir. 1998)

For point one, it seems like increasing access would still help. It takes a lot of skill to read a mathematical proof and know if it is relevant to an open theorem, but making math publication public and searchable is still useful.

Im bummed to hear about point 2. We are all ruled by law, and the courts are a branch of government. Its not encouraging to know the legal documents that are used to govern me are owned and copyrighted by a private organization.

How do companies own the rights to legal decisions? Can you not go to the court and purchase them somehow?

Uhm... isn't the entire _point_ of legal writing to be as unobfuscated and clear as possible? "Processing" legal documents should be tantamount to mapping words with boolean operators.

If it's not, the problem isn't with the software engineers...

hey eelliot. I'd love to get in touch. I'm a dev/lawyer to get more of my state's local laws/cases/templates on Github here in Vegas. Best way to contact you?

Drop me a line eddie (at) technicality.io

The premise of this blog post is a little off base. (Though I think Open Law Library is doing good work.) The difficulty in building a high quality legal search engine is not in parsing the links between the documents. High quality links matter, but they only get you about 25% of the way there. The more important thing is to have a highly accurate and structured understanding of the law. (Think of Google's Knowledge Graph, or the maps they use for their driverless cars.)

Disclaimer: I worked on Google Scholar and am the CEO of Judicata.

A recent evaluation of various legal search engines [1] found: "The oldest database providers, Westlaw and Lexis, had the highest percentages of relevant results, at 67% and 57%, respectively. The newer legal database providers, Fastcase, Google Scholar, Casetext, and Ravel, were also clustered together at a lower relevance rate, returning approximately 40% relevant results."

Westlaw, Lexis and Google Scholar all have high quality citation parsing (i.e., links). And Scholar relies very heavily on PageRank (as [1] demonstrates). But it is Westlaw and Lexis that are the better search engines. That's because they have invested more into going beyond just links; they've invested a lot into understanding what it is happening with the law.

At Judicata our own findings are that the average legal search query is significantly more complex than the average Google query -- having more terms and more concepts. Moreover, whereas only 15% of Google queries are unique, the inverse is true in legal research: more than 85% of queries are unique. What that means is that in order to return a good result, you need to understand a lot more about the query and the documents you've indexed. You can't rely on links between documents and past searches and clicks to power a quality search engine (the way that Google.com can).

As has been mentioned in other comments here, the real challenge for legal research is extracting structure out of the law (Shepardization, Procedural Postures, Causes of Actions, Dispositions, Legal Principles, Arguments, Facts, etc.). That is what will get legal search engines closer to where Google really shines -- results that are powered by the Google Knowledge Graph.

[1] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2859720

Thank you for your perspective. It's quite helpful. However, I wouldn't even a plain text searchable database be better than nothing? And I don't understand how this can be monopolized by Westlaw when law should be public domain…

Getting free or low cost access to a plain text searchable database is no longer a problem for lawyers. It was 8-10 years ago, but since the entry of Google Scholar (and Casetext, Ravel, and a half dozen or so other providers) getting access to the law is no longer difficult. To echo the original post, the law today is in a place like the "deep, dark, early days of the Web, using search engines like Lycos and Alta Vista". We do need a "Google for the law," but Google isn't good enough to be that. It's a very hard problem to create a good search engine for the law, but legal search engines will eventually get there.

The law is public domain. At least in the federal system, and increasingly in most state systems, everything is published as PDFs on courts' websites.

But being public domain doesn't mean someone is required to OCR and host it for you. And it doesn't mean someone needs to go and OCR all the hundreds of years of old cases and OCR and host those.

Doctrine, a French startup is actually doing this. https://www.doctrine.fr [Disclaimer: I'm a Data Scientist @ Doctrine]

Part of the reason why we are the only one providing something clear is that, indeed, law data is a mess, and we working hard to have a clean & consistent database.

As simple point as legal references is really complex. Every country has it's own identifier system , some editor have their own identifier system and people are referencing in really different manners...

The second point is that, we really heavily on NLP/DL to extract insights and informations about the data. This is something that couldn't have been done /easily/ in the past.

Shameless plug: We are hiring! https://doctrine.typeform.com/to/uyjXoE [French only]

Hehe, I was about to talk about you, but you were faster. Do you plan to expand to other countries legal system?

> Shameless plug: We are hiring! https://doctrine.typeform.com/to/uyjXoE [French only]

Great, but for what position? This link is a multistep application form, and I am pretty sure others will find equally annoying to fill it up just to see the description of the job.

Hey! You're right, we have 4 open positions on Angel List: https://angel.co/doctrine-/jobs

[Disclaimer: Cofounder here ;)]

> The reasons this problem exists are complex, but they boil down to the fact that laws and the links between them are not being published in ways computers can easily process, making it difficult to extract the valuable information they contain.

I'd argue that laws and the links between them are not being published in ways that the average person can easily process or understand.

Furthermore, I believe this ambiguity directly impacts the governed, causing them to be, in general, distrustful of most laws that do not affect them in an observable way.

Making laws more digestible is only part of the solution; the "what" should be annotated with the "why". Otherwise, with so many decentralized cities in an already decentralized nation, fundamentally sound, universally-applicable legislation may be ignored due to stereotypes and generalizations. Whereas, documented and annotated legislation can be analyzed, duplicated, and modified to fit different environments around the country, or reasonably ignored on verifiable grounds.

> I'd argue that laws and the links between them are not being published in ways that the average person can easily process or understand.

So true. A premise of this blog post is that laws need to be easier for computers to understand, but that's skipping a step: making laws easy for humans to understand.

I really like what TLDRLegal did: made software licenses digestible by humans. I would love to see similar sites pop up for other verticals of the law, but it's a lot of work and there's not much incentive to get it done at the quality-level of TLDRLegal.

I had to note I also enjoy TLDRLegal... I wish they'd open an API.

> The reasons this problem exists are complex, but they boil down to the fact that laws and the links between them are not being published in ways computers can easily process, making it difficult to extract the valuable information they contain. If the format were computer-friendly, it is easy to imagine leveraging the links between laws to improve search results.

This is totally untrue. Legal documents are linked together with citations written according to very precise rules, which lawyers spend a lot of time getting correct. Almost all laws and cases are published in a quasi-append-only record: sequential publications in reporters organized by volume and page number. So unlike URLs on the Internet, 47 F.3d 167 will always refer to the same page of the same case. Forever. Most agency decisions, etc, have similar sequential records. Statues and regulations are precisely identified by structured citations as well. WestLaw and Lexis have no problem parsing these, and will happily find you all the cases that cite to say a specific Supreme Court case from 1880.

The reasons lawyers use terms and connectors searches instead of a "Google-like" engine is because the underlying concept of Page Rank absolutely sucks for legal research. Page Rank equates in-degree in the link graph with relevance. This will get you highly cited cases that you knew anyway that are only tangentially related to the cases you actually need.

In a legal brief, a couple of trial court decisions that are factually similar but uncited are infinitely more valuable than a highly cited Supreme Court case that happens to pertain to the same general area of law.

Why are they more valuable?

In addition to VCQ's point below, there is the fact that such cases are often simply not helpful despite technically being relevant. A seminal Supreme Court case might state the broad principle of law: e.g. you have a right to due process before losing government entitlements. But stuff like that is never actually disputed. In practice the dispute is over e.g. "how much process is enough?" or "when does a government benefit rise to the level of an entitlement?" The judge knows the big overarching Supreme Court case. While it's technically relevant, it's not helpful. What you need as the lawyer is to show the judge case law that supports your specific argument applying that general principle.

Also elaborating on VCQ's point: I was trained that when citing cases to a court:

(1) The best case to cite is one where the previous trial judge ruled in favor of the position that's being taken on an issue by your opponent, but the trial judge was reversed on appeal on that issue;

(2) Next-best is one where the previous trial judge ruled in favor of the position that you're now taking and was affirmed on appeal; and

(3) The worst of all cases to cite is one where the previous trial judge ruled against the position you're taking and was affirmed on appeal; if you're going to cite that kind of case, you must find some way to distinguish your facts from those of the previous case.

It's great fun when your opponent cites a case in support of a position, and then upon reading the case you discover that the case went against the party that took your opponent's position.

EDIT: Which reminds me of a related story: My late senior partner, mentor, and friend, Tom Arnold, was once arguing a point before a judge. His opponent, seeking to discredit Tom's argument, cited a law-review article that Tom had recently published, where Tom had argued the exact opposite of the position he was taking before the judge. Tom didn't miss a beat: He responded, in effect: Your Honor, my friend is correct: My article does take exactly the opposite position from what I argue here today. I still hold to the view expressed in that article. What my friend hasn't told you is that my article criticized a recent decision of the Supreme Court of the United States. No matter what my personal view of the Court's decision might be, it's the law of the land unless and until the Court changes its mind or Congress changes the statute. And under the Court's decision, my client is entitled to prevail.

Tom won the argument, of course.

Judges like to see how other judges have ruled in similar situations. A generic SCOTUS opinion on a topic doesn't help too much.

When I was an active editor on Wikipedia, I rewrote the USA PATRIOT Act from scratch.

It wasn't easy, and I'm not talking about the law itself here (though at 10 title long, with title III literally an anti-money laundering bill they bunged into the Act and had passed, it is still n extremely complex bit of legislation). No, I'm talking about the ability to find information on certain laws - I'm an Australian, so it was a major challenge to find good quality sources. I was lucky in a way, as the Patriot Act is so controversial I did eventually manage to track down info. But it wasn't easy, and when I tried to find sources for some truly ancient and tangential legislation a few times I hit a brick wall entirely.

It makes me think: ignorance of the law is not an excuse for breaking it... but with the current system you are often going to be ignorant of the law no matter what you do! Unless, of course, you have the money to pay for expensive legal searches.

How anyone could consider resyricted access to information about the law and the law itself to be anything but a violation of human rights is beyond me.

What sort of information and sources were you looking for? The Patriot Act text is available online, as are the (unclassified) notes from Congressional debates and voting records.

All primary sources, which are allowed. However, there are plenty of other primary sources that I can't get easy access to, including case law. In fact, there are old Acts I found I didn't have any access to at all.

Co-founder of Open Law Library here. Happy to answer any questions about what we're building, the law, and anything else people are interested to discuss.

How can people (like me) contribute? The word Open makes me think you accept non-monetary contributions.

With respect to contributing code, the short answer is that we're not sure yet. There is certainly no shortage of code to write, but we want to make sure that we work with volunteers in the right way. For anyone that's interested in please reach out here or through the contact form on our website. I will personally respond to everyone.

Other than writing code, we need advocates in and out of government who understand and believe in the value of free (as in freedom) and accessible laws. Contact your local, state, federal representative and let them know that free and accessible laws are important to you. Let them know that a system exists that can not only make this a possibility, but that it will also make their lives a lot easier.

We would also love to hear about what you would want to build on top of computer-readable, always up-to-date laws that could programmatically alert you when something changed and let you diff against old versions of the law. E.g. a) internal annotations for civil servants that wouldn't immediately be obsolete once the legal code changed; and b) legal alert system for the part of the law you care about.

We'll put together a form that makes it easier to collect this information!

Aside from code, how are you gathering the laws themselves? Public libraries?

We get them directly from the source.

Previous attempts at accomplishing our mission saw organizations scraping government websites and re-hosting the laws on prettier websites. The problem was that a) the laws were only as up-to-date as the law the governments made available (which, unfortunately, are not up-to-date at all) and b) the projects were not sustainable because no one pays to access the law and websites needed updating every time the law changed.

Sites like these are potentially very harmful. They haven't been updated in years and people who stumble upon them and miss the fine print end up relying on laws that have long since changed.

Because timeliness matters, the only way to guarantee that we get it is by working directly with the governments. So we build software into the law drafting, codifying, and publishing process that governments can really benefit from and enjoy using. The software changes the economics of codification and publication and permits publishing the laws freely and openly.

Not code ( I could if the need be). But to help raise awareness. Are you doing a concerted effort that one could join? Maybe chat over email?

hey what's the best way to get in touch? the info email on the website?

The contact form goes straight to my email. Also try vchuang at the domain.

I was always wondering rather on Stack Overflow for law, where one could get advice from other knowledgeable. But I talked with few lawyers (technologically open-minded) and they weren't interested. It seems the final root cause is that in software development the level that you get help is not the level where the end products are and compete. For lawyers it would be different - an advice is the root of their service, so it is the end product. So if they were helping with advice, they would directly help competition. Secondly, their customers could end up in that level instead (while software customers cannot benefit from stack overflow).

There is http://law.stackexchange.com/

But I think it is not as active as others stackexchanges.

Casetext is a startup working on that, at least for case law, which is the more difficult body to gain efficient access to in the US:


I'm not associated with them at all, other than once emailing the founder best wishes.

Casetext and Judicata are great companies working on searching through judicial opinions. There's a lot to be said about PACER and publishing state judicial opinions. Suffice it to say, those companies could probably save a lot of time on parsing and focus on other problems instead if judicial opinions were released with some structured information, e.g. party names and procedural posture. Open Law Library makes it possible for legislatures to, among other things, publish their laws in structured formats.

One thing to consider is that on a day-to-day basis, an individual might be impacted more by city/county/state law than by federal law.

Uh, the real question is "why isn't there git for the law".

There is at the Federal level.[1] It's XML-based. Here's an example of a bill in raw XML.[1] It displays in the form that a bill is printed.[2] The GPO even puts in the XML, "Pursuant to Title 17 Section 105 of the United States Code, this file is not subject to copyright protection and is in the public domain."

There's a change control system behind all this. Here's a history of a bill, again, in XML.[4] There are change transactions, which are also in XML, but they're not in this database.

[1] https://www.gpo.gov/fdsys/bulkdata [2] view-source:https://www.gpo.gov/fdsys/bulkdata/BILLS/114/2/hconres/BILLS... [3] https://www.gpo.gov/fdsys/bulkdata/BILLS/114/2/hconres/BILLS... [4] https://www.gpo.gov/fdsys/bulkdata/BILLSTATUS/114/sres/BILLS...

Another co-founder of Open Law Library, here. This is precisely what we are doing - building a patch-based xml database of the law. However, we are building a general tool that can work with the vast majority of state and local jurisdictions.

We can't build good tools without good data. The only way to get it is to work directly with governments and make it easier for them to publish their official laws as clean xml than to publish their laws as PDFs.

Revision control and the law deserves its own blog post. In addition to being patch-based, patches can be created today, but apply retroactively, or languish for years, then suddenly apply because a triggering event occurred.

Our system ingests official laws and outputs an xml version of the official law. A government attorney uses our IDE to review the xml output and annotate it with codification instructions. Our system then uses those instructions to automatically codify the law, which is then published openly on the web in multiple formats.

We have been rapidly iterating on our xml format. You can see the beta version for our first partner, the District of Columbia, here: https://github.com/dccouncil/dc-law-xml/tree/development (feedback welcome).

As we partner with more jurisdictions, we will build a foundation of open, clean, accurate, and timely computer-readable laws on which anybody can build tools to improve government, citizen engagement, and access to justice.

> In addition to being patch-based, patches can be created today, but apply retroactively, or languish for years, then suddenly apply because a triggering event occurred.

Hmm, interesting! I always assumed the patch applied immediately but the new law might contain some "effective z/y/z, ...".

This is rather unfortunate (of course not your fault), because as we all know rebasing is not necessarily conflict-free.

Thank you! I was unaware of any of this.

It is understanding that bills are patches, and thus the law works like darcs—patch-oriented rather than revision-oriented.

Does this sound correct to you? I arrived at this conclusion asking people who know nothing about VCS, so something mighthave been lost in translation.

Yes, and no. Each bill is subject to revisions, but once it is passed, it is a single patch – you could imagine that each bill is a branch in git, developed in commits one after another, and then the latest status is squashed and merged.

Yeah, that is what I meant. The revision of the bill would be some meta-history that doesn't quite fit the darcs metaphor.

I also hear the applying of all these patches is a slow manual process only done periodically? :|

Amendment updating is done nightly, and the results are on "congress.gov", which is the user-friendly access interface. The XML dump is the raw data, made available to the public.

The user guide for the XML data is on Github.[1]

[1] https://github.com/usgpo/bill-status/blob/master/BILLSTATUS-...

"ammendment updating" would be the meta-history of the bills themselves, right? I meant the applying passed and signed bills to the overall body of all. When is that done?

Good point, laws will be much easier to search if they are stored in a revision controlled system in the first place. But I think that it is even harder than what Open Law Library is trying to do.

And not just why isn't there something that links legislative citations together, but why isn't there something that can tell laypeople what is current legislation and what isn't? In the UK, we have numerous acts of the same name but different years. If I find some information about a slightly obscure issue of concern - say, a website that's ten years old - it might cite an act of a particular year - and I then look up the details of that legislation, I can see it, but I've no idea if or how it currently applies, it it's been superseded by newer legislation etc; it might even just be a list of 'edits' to previous legislation that aren't easily researched by someone without a formal understanding of law.

The common response to this has usually been "that's why you need s [jurist]", but I take issue with the idea that the legislation that applies equally to us should only be understood by those equipped with the means to make sense of it.

Our startup, Casetext (YCS13) was mentioned a few times here, so I thought I'd stop in.

The crux of the article is that most legal research solutions have ignored the immense power contained in the links between laws:

> Laws frequently reference other laws in order to reuse definitions, introduce exceptions, or make it clear that two concepts are meant to work together. Consequential laws tend to get referenced in other laws as their influence spreads throughout the legal system. Experienced lawyers build up detailed mental maps of these links, allowing them to jump immediately to core issues of complex legal problems.

> However, most laws can only be searched using the dark-age, Lycos strategy—guess at keywords and hope—and it’s often necessary to pay for even that limited functionality.

We at Casetext are taking a very different approach than the "dark-age, Lycos strategy" that you have to pay for:

1. On Casetext, the law is free, as is basic search. Honestly, it's insane that Westlaw and LexisNexis charge as much as they do for basic keyword search over a database that should have been free to begin with.

2. We make money by charging for advanced, data-driven ways that lawyers can research more efficiently. CARA, our premium product, enables a lawyer to drag-and-drop upload a document they're working on, and will recommend the research that the lawyer missed but is very relevant to what they're working on (https://casetext.com/cara). A key ingredient behind this awesome tech is the network of citations that the article mentions.

Whether it's us or other startups, I agree with the article that in the next few years you'll see a trend towards more "Google for Law" -- companies will make legal research free, and their comparative advantage will be on their technology, often driven by ML/AI. As a lawyer/coder, it's a pretty exciting time to be in the space.

Oh yeah, and we're hiring! https://casetext.com/jobs

There is, it just isn't free. Lawfirms pay mountains of money for up to date 'google for laws'. At least in the U.S.

Speaking from a global perspective: every nation has its own way of writing and linking their laws. Although civil law countries, and case law countries respectively, have a lot of similarities (historically induced), they are still unique.

That is not the case when it comes to searching the world wide web. Here, by design, everything is linked and national borders are mostly irrelevant. So if you want to implement a Google of law, you have to do it locally. The only exception would be international law, which itself can be seen as local.

You can also create a normalized system that aggregates all of the local systems together. And this also applies to international law (assuming you're talking about treaties) because there's a treaty and then there's the national implementing legislation in all of the treaty countries.

Lexis Nexus and Findlaw?As usloth_wandows said, there are law databases but that's a huge part of a lawyer's value and knowledge, so these databases are extremely expensive.

I'm the CTO of Global-Regulation.com which is the search service with the most number of countries (78) and machine translated laws. We are often described by clients as the "Google of laws" but there are huge differences. Getting to Google-level search, where the engine understands your intent, is very challenging. People often search for industry terms like "SAR reporting" and what they want is Suspicious Activity Reports (SAR). A Google-like engine would need to understand what the query means from an industry point of view (since the terms often don't actually appear in legislation) and then translate that to the specific term used in each country. This is far from trivial and requires looking at secondary sources, not just the laws themselves.

Other problems include official vs. unofficial laws, slow consolidations, updates to the law (of various kinds) and attempting to normalize the world's laws to a US standard (like excluding municipal laws and avoiding guidance-type documents from civil law countries). These are problems Google doesn't have to deal with and customers expect a very high standard for legal search results.

There is already one being worked on by Casetext.

Just saw this posted to HN RSS feed by lever...

Maybe lever watches what's trending on HN and then puts job adverts? If so, NICE ;)

Become a Data Scientist/Machine Learning Engineer at Casetext https://news.ycombinator.com/item?id=13307644 https://jobs.lever.co/casetext/c7f0129e-af9b-461e-b791-a9323...

Machine learning is at the core of Casetext's mission to make the law free and understandable and we're looking for an ML engineer/data scientist to help us build the next generation of legal research products. The data team at Casetext is working on groundbreaking legal technology for document recommendation and search. If you have industry experience developing production software for machine learning, especially in areas like NLP, graph models, topic modeling, and/or recommendation engines, we'd love to talk to you.

The closest thing right now is Cornell's Legal Information Institute--https://www.law.cornell.edu. There is also the same problem with academic journals--Lexis Nexis has asearchable database with cases for a pretty penny. Also, pacer.gov enables users to access cases and dockets but the structure and "per page" cost make it difficult to be a useful search engine.

I've argued a couple cases in district court (and one case is on the docket of the Supreme Court) and I've used a mix of law school textbooks, Scotusblog.com, Cornell's Legal Information Institute and lawyer's blogs to start background research.


there's a french startup killing it https://www.doctrine.fr/ (works for france only)

Another group that's taking this on is https://github.com/statedecoded/statedecoded -- although they're targeting converting existing data rather than being a workflow for the states.

We have free access to law with canlii.org in Canada.

I'm not familiar at all with Canadian law. Does canlii.org provide statutes, regs, case law from every jurisdiction from the local level (cities/towns) up through the federal level? Is this a distinction that matters?

You can see CanLII's coverage here: https://www.canlii.org/en/databases.html.

That is magnificent. It seems to "end" at the province/territory level (*this is not a critique at all). How does lawmaking work at the city level?

Cities pass "bylaws" that affect their municipality. In Canada municipalities are more limited in their lawmaking powers than cities in some other countries.

Bylaws are hard to lookup and are city-specific.

It is free-ish. It's free to use but there are restrictive licensing terms that prevent anyone from building anything with it: http://www.canlii.org/en/info/terms.html.

My blog post on this topic: https://www.cameronhuff.com/blog/ontario-case-law-private/ and https://www.cameronhuff.com/blog/canlii-licensing-terms/.

My previous comment was a bit harsh on CanLII. They actually do have an API for building some limited types of applications: http://developer.canlii.org/docs. But certainly not the sort of free and open access that a developer would want for building something like a search engine.

In France we have Legifrance, which is government run, and give you access to every France law, code, etc, and let you see modifications over times and where the article is cited (the "links").

Eg. https://www.legifrance.gouv.fr/affichCodeArticle.do;jsession...

It's probably made simpler by the fact that we are not a federation with law making bodies everywhere...

Another reason why it is difficult to create the "google" for the law is that some laws are copyrighted and are not in the public domain. Stop. Think about what I just said. There are laws that you must pay to read.

A good example of this is when a state or municipality enacts a building code. A common building code is the electrical code published by NFPA. Most states use this. NFPA owns the copyright for this. You cannot publish a PDF of the electrical code on your website, yet you are required by law to follow it.

There may be other cases of this, I don't know. But I think it is crazy!

I'll just mention PlainSite: http://www.plainsite.org/

It basically is "google for the law" (provided that your definition of law only extends to federal courts and state appeals courts). But they typically have full opinions available for free.

General summary: http://thelegalpioneer.blogspot.com/2014/02/plainsite-puttin...

In university I had full access to LexusNexus, of course it's not free, but you could literally find any obscure legal text written in the country's history, in the click of a button.

It's "LexisNexis"... it's not referring to the nexus of a car company. :)

I think the point is that Google "understands" what you mean even if you don't say it quite right, or use the right phrases or words, and that's especially needed when searching for legal rights and things where what you're looking for often uses completely obscure or weird language. From the article:

> "However, most laws can only be searched using the dark-age, Lycos strategy—guess at keywords and hope—and it’s often necessary to pay for even that limited functionality."

Well, for the german law, there is https://lawly.org, the result of a recent bachelor's project at my university.

This looks nice. Are you involved in it?

I took a look at the Umsatzsteuer [Value Added Tax] part as I had to read it some years ago when I started freelancing: https://lawly.org/gesetz/UStG%201980/4.1#12-steuersaetze

The Inhaltsübersicht [content overview] list at the right side is yellow on white which makes it hard to read.

In comparison with the place where I've read the German law before, there seem to be surplus list elements in the HTML: https://www.gesetze-im-internet.de/ustg_1980/BJNR119530979.h...

It's nice though that by registering I could download the content for off-line use. AFAIK gesetze-im-internet.de does not provide that.

Are there any other relevant differences between those two services?

One things that deserves a mention is the Lexis add-in for Microsoft Word. http://www.lexisnexis.com/en-us/products/lexis-for-microsoft...

It's a pretty cool utility to integrate sheparding, research and info from the online Lexis law database into the context of a document a lawyer/ para legal may already be working on.

This reminds me of the concepts written about here* basically trying to develop robust definitions for legal terms and objects so a question like 'what are my rights in X situation' could be answered accurately by a computer.


The issue with our system of justice is that it's too unfeeling and robotic. Everyone has an inherent sense of what's just, yet it takes at least 7 years of schooling and many difficult technical achievements to be allowed to participate in its implementation (beyond simply plucking suspects from the street).

Making it computerized does not seem like the correct course of action. Human judges and juries are needed to fully evaluate the context and pass judgment.

My personal preference would be to go toward a less rigid system of law, not one so rigid that computers could reliably enforce it.

When the law becomes less rigid that means that whoever happens to be in power at the time gets to decide on a whim what's legal or illegal. Historically that hasn't worked out very well.

"Whoever happens to be in power" already decides what's legal or illegal. That's what being in power means. No one respects a piece of paper; power ultimately comes down to the ability to exert force to see the decrees of the powerful imposed. If this isn't underneath the covers somewhere, the power moves to someone who does have this ability.

There's already a huge amount of finagling by powerful individuals and groups in our government, they just have a lot of pomp and circumstance to try to cover it up. Removing some of the formalities makes flexibility more accessible.

Sure, you can spend the millions of dollars it takes to successfully lobby Congress if you're a big multinational corporation. If you're a niche concern, you're stuck.

Everyone hates mandatory minimum sentences these days. They were put in for a lot of drug crimes in the late 80s-early 90s, and they result in a lot of unneeded incarcerations, not only costing the taxpayer a lot of money, but costing society, family, and community the participation and productivity of someone who would be much more beneficial outside than in. Because of our rigid legal traditions, mandatory minimums must be enforced regardless of circumstances.

When you get down to the bottom of it, no matter what system of governance you have, you need its administrators to be benevolent and wise to get desirable outcomes. I believe that more local authorities are more able to make wise decisions because they not only know the area more intimately, but are more impacted by the outcomes. A far-off judge doesn't care if he sends 40% of the community to jail. A local judge does.

This is kind of like being entitled to being judged by a jury of your peers. Peers know the cultural norms and the local expectations. High-powered attorneys sitting on a bench in Washington, D.C. may not.

>Why isn't there a Google for the law?

There's no (or little) money to be made doing it.

Maybe in the future when collecting and modeling such knowledge is cheap. For now, it's not cheap.

Quite simply, this is a racket. Courts are public, and anyone can attend and write down what goes on and publish it. There is an official court reporter who does this. There is no authorship, but the reporter is granted a copyright. They then charge fees for access. The problem lies there. The courts should publish all cases in the open, with no copyright. That however blocks lawyer and reporter revenue streams and they will protect their racket.

Is WorldLII not already a good start?


I was about to mention AustLII http://www.austlii.edu.au/ which has been around for about 20 years. I hadn't heard of WorldLII and CanLII; seems they're part of a network.

And it looks like AustLII was where it started; the WorldLII contacts are all AustLII people.

There are many others: http://www.saflii.org/ (Southern African countries) http://www.paclii.org/ (Pacific Island countries)

WorldLII is great in theory but trying running some searches. The results are not nearly as good as just an aggregation of all the national "LII" systems would be (e.g. CanLii)

Has no one mentioned Ravel yet? https://www.ravellaw.com/search

I remember when Google Patent was around. My uncle and cousin found it very useful as patent lawyers on both ends of the experience spectrum. When it got shut down they were disappointed.

Google Patent shut down? Whaaat

It has risen.

Indian Judiciary System has its own online repository of court orders. http://www.judis.nic.in/

Why is google not enough google for the law? All my law requirements have been met by Google. And my lawyer friends also use it.

Judicata is a start. You can search California statutes and case law.

But lexis and westlaw are the tools needed for serious research.

My brother who is a lawyer said Westlaw and LexusNexus have the unfortunate problem of being 99.9% accurate but the original source material still sometimes needs to be pulled because of errors during import such as a dropped comma which can change entire meanings of a law.

It would be a good idea to make a wiki, since the laws themselves aren't very readable.

because money

source: my dad was a lawyer for 20+ years.

There are plenty of sources for acts (many national and supra-national services [1]) and for US case law (Google Scholar).

The _text_ of law alone is worthless. In very few cases you should search for keywords.

The unsolved problem is that what is needed (and there are private systems that can do this) is the ability to make queries like

   In 2014
   my only child was 17,
   my family lived in Italy
   but I worked most of my time in the UK;
   which version of the Italian child-care law applied to me at the time?
In order to answer this query you need to:

1. Know all the text of all the acts out there at the Italian level, European level and supra-national level.

2. Find the main acts that deal with child-care law.

3. Find all the acts that modify those main acts (they could extend it duration, modify its content).

3b. Find all the acts that modify the acts that modify those main acts (maybe the extension has been repelled).

3c. Find all the acts that modify the acts that modify the acts that modify those main acts (I think the point is clear now)

4. Consolidate (merge) all these acts using the rules that were in place at the time of the enactment. This produces a tree of versions for each point in time, not just a single version.

5. Find all the judgments that reference any of these acts.

6. Highlight the points that have to do with the user query.

Truth be told, having the raw text (point 1) is the easiest part. The rest is what is extremely complicated. Regardless of this, there are private systems in places that can perform this kind of queries (although in a very limited fashion: their idea of "the whole corpus of law" is extremely narrow).

To make the life of implementers easier, markup formats like AkomaNtoso [2] or Oasis LegalDocumentML/LegalRuleML [3] are being used, sadly not enough.

Making the corpus of the law accessible is an important first step. But the corpus alone is is not going to be much helpful. It may even be dangerous if the single texts are not cross-referenced with other relevant texts.

Appeal to authority: I worked on versioning legal documents (bills, acts, judgments, etc) during my PhD. I also worked in the research group that shaped the early versions of AkomaNtoso.

[1] IT: http://normattiva.it (ex NormeInRete) DE: https://www.gesetze-im-internet.de EU: http://eur-lex.europa.eu US-CA: http://legisweb.com

[2] http://akomantoso.org

[3] http://www.legalxml.org/

I worked on a platform for EU law for about half a year and got quite deep into Akoma Ntoso. It's certainly the right direction but it was unfortunately extremely difficult to even get an overview of the status. The various websites felt dead and the documentation wasn't inspiring my confidence either – IIRC it was actually provided as a word document :).

Plus, I just couldn't find anybody publicly using it. The EU parliament supposedly does, but they could never give me an answer why they weren't sharing it online (only .doc and .pdf).

Apparently, the UN and some African countries also make use of Akoma Ntoso, but like you I've never seen it in practice. Open data is not provided well. For example, it is possible to get XML documents from the EU parliament, but you have to request access to their FTP server and it is a laborious process.

I've met Monica Palmirani of Akoma Ntoso recently and she told me they have just launched a case law standard, and so they're still working on it. I'm actually amazed she hasn't burned out yet. Trying to get governments to play nice data-wise is i n c r e d i b l y hard.

The case-law standard is called ECLI. It's used in a few countries and being rolled out in more. At least one Dutch news paper uses ECLI when referring to court cases.

https://en.wikipedia.org/wiki/ECLI http://bo-ecli.eu/

Dutch law and government publications are available as XML and ODF.

The constitution: http://wetten.overheid.nl/BWBR0001840/2008-07-15

A publication about standards: https://zoek.officielebekendmakingen.nl/stcrt-2015-39782.htm...

Each article can be linked and documents referring to each article can be found as well.

For example all known documents that link to article 5 (equality) of the constitution:


The links are available as RDF.

Currently work is underway to publish law as XML and RDF with ODF/PDF/HTML as secondary formats. This will allow embedding of data such as property lines, lists of medicines, reusable financial reports.


Nexus/Lexus? Westlaw?

Applications are open for YC Summer 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact