Testing steps (based on thinking about this for 30 seconds - so probably can be improved):
Train a Transformer based model with and without the modified Softmax (Suggestions: GPT-2 or nanoGPT)
Measure performance - I'd probably start with Perplexity and see if there is any difference (we'd expect little difference).
Quantize both models with different quantization strategies.
Measure the perplexity of the quantized models of different sizes. We'd expect the performance to drop off quicker for the non-modified model than the modified one if this is working.
I was thinking about a different problem as I was typing that and got some mental memory alias bug. I wanted to know a set of steps to take to train a model. My apologies.
I did a writeup like this. (Not as nicely as Simon though) where I modal.com (cloud GPU, containers, quick starts, free $30/m spend) to use their GPUs (e.g. T4, A100).
T4 I think was good enough for the job, not much need for the A100.
Since this post I am working on an easy way to do this with a script called lob.py that requires no code changes to the nanoGPT repo (or whatever repo you are using) and runs in modal.com. The script exists but gets refined as I use it. Once it is battle tested a bit more I will do a post.
(It is named lob.py as it "lobs the code over to the server" where lob is UK slang for throw)
Thank you. FWIW I often find write-up + script superior to script because I often want to modify. e.g. I want to run GPU-only, but other script provide part-way solution when textual description added. Therefore, much appreciated.
Train a Transformer based model with and without the modified Softmax (Suggestions: GPT-2 or nanoGPT)
Measure performance - I'd probably start with Perplexity and see if there is any difference (we'd expect little difference).
Quantize both models with different quantization strategies.
Measure the perplexity of the quantized models of different sizes. We'd expect the performance to drop off quicker for the non-modified model than the modified one if this is working.