55x Speedup of Andrej Karpathy's Minbpe LLM Tokenizer with PyTorch/CUDA

kuprel · 2024-02-22T21:56:41

This adds PyTorch/CUDA training support to Andrej Karpathy's minbpe. It takes 2min 28sec (148 seconds) on an RTX4090 to train the BasicTokenizer with a vocab_size of 512 on 307MB of Enron emails. The original code takes 2hrs 15min (8076 seconds) on an M2 Air with Python 3.11 to do this. That is a 55x speedup.

threesevenths · 2024-02-23T15:23:00

Am I reading this right? A 55x improvement while also going from an M2 Air to an RTX 4090?

If so this doesn’t seem like a logical comparison and the 55x claim would likely not translate when using the same hardware.

lostmsu · 2024-02-24T06:22:47

Why is it surprising? CPU-only M2 probably has under 1 teraops while RTX 4090 has 77. M2's GPU was not used, but even it only provides around 4 teraops, so would have been ~20x slower than 4090.

kuprel · 2024-02-23T17:06:10

The M2 Air was actually much faster than whatever CPU was on the cloud RTX4090 machine I rented. I chose the stronger benchmark to compare to

kuprel · 2024-02-23T20:16:52

Using int16 and an H100 the speedup is actually 108x over the M2 air

Havoc · 2024-02-23T01:18:27

> 307MB of Enron emails

Wait what?

Is that some sort of inside joke?

_aavaa_ · 2024-02-23T01:32:47

Nope!

See for example: https://www.cs.cmu.edu/~./enron/

Havoc · 2024-02-23T15:32:23

> This data is valuable; to my knowledge it is the only substantial collection of "real" email that is public.

Interesting. Something good came out of Enron after all

erichocean · 2024-02-23T13:41:33

Now someone needs to do a Mojo version, and write up the blog post.