[1] is a good start, although if you want to train from scratch on CPU, you'll have to downscale as transformers need quite a bit of data before they learn to use position embeddings. For example try a single-layer RNN on Shakespeare texts [2] or a list of movie titles from IMDB [3]. You'll have to fill in the blanks because things have evolved quite a bit since those were used for language models, but you can find some tutorials [4] and examples [5].
It doesn't appear to have an online component, nor is there any mention of OCW, so I'm guessing not. Perhaps they'll record it and post later, though. I was curious to see why this was posted if it's not something that is open to remote participation or viewing, but I guess it was posted for the syllabus/readings?