The premise of this blog post is a little off base. (Though I think Open Law Library is doing good work.) The difficulty in building a high quality legal search engine is not in parsing the links between the documents. High quality links matter, but they only get you about 25% of the way there. The more important thing is to have a highly accurate and structured understanding of the law. (Think of Google's Knowledge Graph, or the maps they use for their driverless cars.)
Disclaimer: I worked on Google Scholar and am the CEO of Judicata.
A recent evaluation of various legal search engines [1] found: "The oldest database providers, Westlaw and Lexis, had the highest percentages of relevant results, at 67% and 57%, respectively. The newer legal database providers, Fastcase, Google Scholar, Casetext, and Ravel, were also clustered together at a lower relevance rate, returning approximately 40% relevant results."
Westlaw, Lexis and Google Scholar all have high quality citation parsing (i.e., links). And Scholar relies very heavily on PageRank (as [1] demonstrates). But it is Westlaw and Lexis that are the better search engines. That's because they have invested more into going beyond just links; they've invested a lot into understanding what it is happening with the law.
At Judicata our own findings are that the average legal search query is significantly more complex than the average Google query -- having more terms and more concepts. Moreover, whereas only 15% of Google queries are unique, the inverse is true in legal research: more than 85% of queries are unique. What that means is that in order to return a good result, you need to understand a lot more about the query and the documents you've indexed. You can't rely on links between documents and past searches and clicks to power a quality search engine (the way that Google.com can).
As has been mentioned in other comments here, the real challenge for legal research is extracting structure out of the law (Shepardization, Procedural Postures, Causes of Actions, Dispositions, Legal Principles, Arguments, Facts, etc.). That is what will get legal search engines closer to where Google really shines -- results that are powered by the Google Knowledge Graph.
Thank you for your perspective. It's quite helpful. However, I wouldn't even a plain text searchable database be better than nothing? And I don't understand how this can be monopolized by Westlaw when law should be public domain…
Getting free or low cost access to a plain text searchable database is no longer a problem for lawyers. It was 8-10 years ago, but since the entry of Google Scholar (and Casetext, Ravel, and a half dozen or so other providers) getting access to the law is no longer difficult. To echo the original post, the law today is in a place like the "deep, dark, early days of the Web, using search engines like Lycos and Alta Vista". We do need a "Google for the law," but Google isn't good enough to be that. It's a very hard problem to create a good search engine for the law, but legal search engines will eventually get there.
The law is public domain. At least in the federal system, and increasingly in most state systems, everything is published as PDFs on courts' websites.
But being public domain doesn't mean someone is required to OCR and host it for you. And it doesn't mean someone needs to go and OCR all the hundreds of years of old cases and OCR and host those.
Disclaimer: I worked on Google Scholar and am the CEO of Judicata.
A recent evaluation of various legal search engines [1] found: "The oldest database providers, Westlaw and Lexis, had the highest percentages of relevant results, at 67% and 57%, respectively. The newer legal database providers, Fastcase, Google Scholar, Casetext, and Ravel, were also clustered together at a lower relevance rate, returning approximately 40% relevant results."
Westlaw, Lexis and Google Scholar all have high quality citation parsing (i.e., links). And Scholar relies very heavily on PageRank (as [1] demonstrates). But it is Westlaw and Lexis that are the better search engines. That's because they have invested more into going beyond just links; they've invested a lot into understanding what it is happening with the law.
At Judicata our own findings are that the average legal search query is significantly more complex than the average Google query -- having more terms and more concepts. Moreover, whereas only 15% of Google queries are unique, the inverse is true in legal research: more than 85% of queries are unique. What that means is that in order to return a good result, you need to understand a lot more about the query and the documents you've indexed. You can't rely on links between documents and past searches and clicks to power a quality search engine (the way that Google.com can).
As has been mentioned in other comments here, the real challenge for legal research is extracting structure out of the law (Shepardization, Procedural Postures, Causes of Actions, Dispositions, Legal Principles, Arguments, Facts, etc.). That is what will get legal search engines closer to where Google really shines -- results that are powered by the Google Knowledge Graph.
[1] https://papers.ssrn.com/sol3/papers.cfm?abstract_id=2859720