> Is there a way to get something going on hardware a couple years old?
Tensor accelerators are very recent thing, and GPU/WebGPU also recent.
RAM was also limited, 4Gb was long time barrier.
So, model should run on CPU and within 4Gb or even 2Gb.
Oh, I forget one important thing - couple years old mobile CPUs was also weak (and btw exception was iphone/ipad).
But, if you have gaming mobile (or iphone), which at that time was comparable to Notebooks, may run something like Llama-2 quantized to 1.8Gb at about 2 tokens per second, not very impressive, but could work.
Unfortunately, I could not remember, when median performance of mobile CPU become comparable to business Notebooks.
I think, Apple entered race for speed with iPhone X and iPad 3. For Androids things even worse, looks like median achieved Notebooks speed at about Qualcomm snapdragon 6xx.
Tensor accelerators are very recent thing, and GPU/WebGPU also recent. RAM was also limited, 4Gb was long time barrier.
So, model should run on CPU and within 4Gb or even 2Gb.
Oh, I forget one important thing - couple years old mobile CPUs was also weak (and btw exception was iphone/ipad).
But, if you have gaming mobile (or iphone), which at that time was comparable to Notebooks, may run something like Llama-2 quantized to 1.8Gb at about 2 tokens per second, not very impressive, but could work.