Hacker News new | past | comments | ask | show | jobs | submit login

[1] is a good start, although if you want to train from scratch on CPU, you'll have to downscale as transformers need quite a bit of data before they learn to use position embeddings. For example try a single-layer RNN on Shakespeare texts [2] or a list of movie titles from IMDB [3]. You'll have to fill in the blanks because things have evolved quite a bit since those were used for language models, but you can find some tutorials [4] and examples [5].

[1] https://jaykmody.com/blog/gpt-from-scratch/ [2] https://github.com/demmojo/text-rnn/blob/master/datasets/sha... [3] https://datasets.imdbws.com/ [4] https://pytorch.org/tutorials/beginner/chatbot_tutorial.html [5] https://github.com/pytorch/examples/tree/main/word_language_...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: