Testing steps (based on thinking about this for 30 seconds - so probably can be ...

renewiltord · on July 25, 2023

I was thinking about a different problem as I was typing that and got some mental memory alias bug. I wanted to know a set of steps to take to train a model. My apologies.

In any case, that was an lmgtfy-level question. Here's what I found: https://til.simonwillison.net/llms/training-nanogpt-on-my-bl...

I shall try that soon.

mcapodici · on July 25, 2023

Shaaaaameless plug:

I did a writeup like this. (Not as nicely as Simon though) where I modal.com (cloud GPU, containers, quick starts, free $30/m spend) to use their GPUs (e.g. T4, A100).

https://martincapodici.com/2023/07/15/no-local-gpu-no-proble...

T4 I think was good enough for the job, not much need for the A100.

Since this post I am working on an easy way to do this with a script called lob.py that requires no code changes to the nanoGPT repo (or whatever repo you are using) and runs in modal.com. The script exists but gets refined as I use it. Once it is battle tested a bit more I will do a post.

(It is named lob.py as it "lobs the code over to the server" where lob is UK slang for throw)

Watch this space.

renewiltord · on July 25, 2023

Thank you. FWIW I often find write-up + script superior to script because I often want to modify. e.g. I want to run GPU-only, but other script provide part-way solution when textual description added. Therefore, much appreciated.

nl · on July 25, 2023

In the Qualcomm AI paper linked in this post it turns out they use a similar testing approach:

BERT 109M, testing perplexity

OPT 125M, testing perplexity

ViT 22M, testing on ImageNet top-1.