Hacker News new | past | comments | ask | show | jobs | submit login
Ask HN: Resources to brush up from 'Intro to ML' to current LLMs/generative AI?
30 points by OJFord 17 days ago | hide | past | favorite | 13 comments
My 'AI' experience to date consists of basically a prolog course, and an 'Intro to ML'. I could say handwavy things about regression and SVMs. I'm pretty sure we covered convolutional neural nets but I barely recall that at all.

I'm interested in understanding more about transformers/GPT/LLMs and more 'media-rich' generative AI like DALLE and Midjourney etc. (I assume they're linked, because they seemed to have breakthroughs and blow up at the same time, but I don't understand that at all) - but not 'prompt engineering' or specifics about tuning model parameters etc.

Can anyone recommend any resources for 'writing a CMS' vs. 'how to configure Wordpress and install a plugin', as it were?

(Prefer text, but understand it might be too new for good ones to have established themselves. In that case, something like OCW preferred to 'screamface'.)


I really like Andrej Karpathys content

yeah its youtube i know... but, its hand on too

for gpt/llm/ml in general https://karpathy.ai/zero-to-hero.html

it starts with writing back prop from scratch and take you through writing everything you need and training a gpt2 equivalent model in the end

I also thought his lectures at standford, on youtube cs231n 2016, were really good. they cover GAN for generation, but I think after that you can read the papers that were the source for diffusion models dalle and midjouney use

I just wanted to second this, I recently went through all of Karpathy's videos from almost zero baseline ML knowldege and am now fairly comfortable writing language models from scratch (simple bigram statistical models, MLPs, transformers, etc.).

Karpathy is quite good at helping you build an intuitive understanding of core concepts and linking/referencing literature where appropriate for the more curious learners. Thanks to his videos, I was able to read through several of the foundational papers on resnet, convnet, transformers, and some misc. normalisation techniques without _too_ much struggle.

At one point in time I also went through half of Andrew Ng's CS229 ML lectures from Stanford (https://www.youtube.com/watch?v=jGwO_UgTS7I&list=PLoROMvodv4...). Found them much more math/proof heavy, but definitely valuable for understanding the underlying statistical methods & theory that ML apply.

I'm building a course that teaches deep learning from the ground up - https://github.com/VikParuchuri/zero_to_gpt .

It balances theory and code, and builds from the foundation up, so you're never typing something without understanding it. Teaching method is text, diagrams, and code. Most lessons have optional videos, too.

It focuses on text models over image models (rnn, transformer, etc).

It's not 100% finished, but has enough to get you very far.


This is probably what you’re looking for.

Contrarian view. Don't bother with that, just use a GPT (ideally 4) to "write a neural network to do <x> where the input is <y> and explain your reasoning". You'll learn way more from doing this and actually get a lot further than "starting from scratch".

The handouts at https://fleuret.org/dlc/ are fantastic.

Also two books: Data Science from Scratch and Deep Learning from Scratch. They are more hand-on, but you'll build all the low-level things in Python and learn a lot.

I think you're asking for some of the things mentioned in the 'research science' section here - https://news.ycombinator.com/item?id=36195527 - is that right?

Quite possibly, that's certainly a helpful collection (and probably that whole thread, I did search for past topics but somehow missed that), thank you.

Depending on your math background, i would go for Bishop's Deep Learning Foundations and Concepts and Simon Prince's Understanding Deep Learning.

Thanks for the reply, I'll have a more detailed look, but just on a cursory skim:

> After finishing this course you will know:

> How to train models that achieve [...]

Is exactly the kind of 'applied' or straight to coding something, using an existing model to create some product, that I'm not really so interested in.

Like, if I needed it for something, I (perhaps naïvely) think I'd just figure it out from API docs etc. I'm more interested in the theory (and in abstract, not to apply to some problem I actually have) - how/why does it work, what is that model, how was it produced, etc.

Good resources here. Thanks guys

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact