This is awesome! Since some years ago we don't see NLP things that we can run in normal computers we have at home (no, a 24GB GPU still isn't normal). I have some questions:
How much time do you take for training?
Which are the specs of your system?
In total, how much is the training dataset?
It's probably over 50GB so it's very unlikely I will be able to overfit the model, the best part is this means the model is always learning new things and showing new abilities.
I noticed that straight after giving it ONLY a coding dataset it became a bit better at logical puzzles so I think there's a side-effect of training an LLM on sequential information like code.
The LLM seems more inclined to learn structure for example:
Structure of language
Structure of poems
Structure of music lyrics etc etc.
3080Ti (12GB VRAM), DDR4 64GB
AMD Ryzen 7
8TB drive-space (I built it with the idea that I would be hoarding a lot of data for potential AI research)
---
The model has probably spent ~2 weeks straight of training to get it to this level (it learns VERY fast I suspect from GQA+ALiBi + the type of training I've given it)
It's why I think there's a considerable amount more training it can do.
You can also use Project Gutenberg if you want a huge, legal dataset.
I’ve been collecting papers on small models, training with small data, using small ones to jump start big ones, alternatives to back propagation, etc. I wanted to do a sub-500M with those techniques eventually which could be reused in other projects. I may or may not get around to it.
Email me and I’ll send you some of those links for your research.