Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: I built the first open LLM for Australian law (umarbutler.com)
40 points by ubutler 9 months ago | hide | past | favorite | 9 comments



Hey HN,

Last month, I had the honour of seeing my article on how I built the largest open database of Australian law (https://umarbutler.com/how-i-built-the-largest-open-database...) reach the front page. Due in large part to the outpouring of support and encouragement I received for my work from HN, I became determined to publish the first open LLM for Australian law by training a model on my database. I am excited to share that I finally achieved that goal today with the release of Open Australian Legal GPT2, a finetune of GPT2 trained on 37,560 laws and regulations, comprising 635,482,112 tokens, taken from my database.

Although it may not be as large as I had originally hoped, I'm still quite proud of the model. It was a struggle to wade through mountains of options trying to find something that worked. And now I have code I can reuse for training any other causal language model and dataset. The model is thus a small but important step towards maturing the legal AI field here in Australia.

If you’re interested in playing around with the model, you can find it here on Hugging Face: https://huggingface.co/umarbutler/open-australian-legal-gpt2


Thanks for sharing. Two questions:

1)What’s the main use case for this? 2)Was training your own LLM necessary? How did GPT-4 compare?


> 1) What’s the main use case for this?

Originally, I intended to produce an LLM that could be used for answering questions related to Australian law, potentially via retrieval augmented generation. The LLM I ended up building, however, is unfortunately too small to be useful for that. It could, however, be useful for autocompletion. Perhaps if it was further finetuned, it might also be capable of tasks such as text summarisation and question answering. Nevertheless, the model is also valuable in that it has taught me how to train causal language models and taught me that, if I were to do the same thing with a bigger model and more of my legal database, I could achieve much greater results than major LLMs which lack a sufficent amount of Australian legal material in their training data.

> 2) Was training your own LLM necessary? How did GPT-4 compare?

GPT-4 is certainly much better given that I only trained the smallest version of GPT2. I do think that there is still value in training a larger LLM on my database as, in my experience, GPT-4 lacks a sufficently advanced understanding of Australian law, although it has the best understanding of all the models I have tried (GPT 3.5 Turbo, Claude 2, Bard, Llama 2, Mistral, etc...).


Is there a link to try it?

This is literally the main application of ML that I’ve been awaiting for years. Making complex legislation and bureaucracy searchable and useful for people with no context.


How do you compare your own trained LLM versus using for example GPT4 + RAG (Vector DB + your Australian Law DB?


See my response to /u/nextworddev (https://news.ycombinator.com/item?id=38399702). In short, GPT-4 outperforms my LLM simply because my model is quite small (its smallest version of GPT2). At the moment, I'm trying to figure out if I can train a larger model, say Phi-1.5, and get results comparable to or even better than GPT-4. With enough computing power, I'm sure that would be possible given that: 1. GPT-4 lacks a sufficently advanced enough understanding of Australian law (likely owing to the composition of its training data). 2. Australian legal English can be considered a very small subset of broader Australian English and then English in general, which should make it possible to generate a model with a deep understanding of the domain without as many parameters as one might need to understand English in general, or even a popular dialect of English.


I might be wrong but I think they’re suggesting you start with GPT-4, which as you’ve said is already somewhat knowledgeable, and then either fine tune with or otherwise integrate your law db.


What do you think about DevonThink?


One if the best current uses cases for LLMs is to point out possible errors in human produced work.

I would love to see every small town judge forced to submit a complete recording of the trial, a draft opinion, and what the AI thought of it before submitting their final ruling. All of this should be on the public record.

The problem is that, at least in Texas, the justices of the peace (JPs) would refuse to record or erase recordings if evidence that hurts their buddy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: