I read that post recently and it felt prescient to someone who has not been deeply involved in ML
Even the HN discussion around this had comments like "this feels my baby learning to speak.." which are the same comparisons people were saying when LLMs hit mainstream in 2022
I had forgotten it's existence by now, but I remember reading this post all those years back. Damn. I also remember thinking that this would be so cool if RNNs didn't suck at long contexts, even with an attention mechanism. In some sense, the only thing he needed was the transformer architecture and a "fuck, let's just do it" compute budget to end up at ChatGPT. He was always at the frontier of this field.
I tried to find the where I heard that Radford was inspired by that blog post, but the closest thing I found is that in the "Sentiment Neuron" paper (Learning to Generate Reviews and Discovering Sentiment: https://arxiv.org/pdf/1704.01444.pdf), in the "Discussion and Future Work" section they mention this Karpathy paper from 2015: Visualizing and Understanding Recurrent Networks https://arxiv.org/abs/1506.02078
That blog post inspired Alec Radford at Open AI to do the research that produced the "Unsupervised sentiment neuron": https://openai.com/research/unsupervised-sentiment-neuron
Open AI decided to see what happened if they scaled up that model by leveraging the new Transformer architecture invented at Google, and they created something called GPT: https://cdn.openai.com/research-covers/language-unsupervised...