At least Google is still "usable" meanwhile Youtube search became the biggest piece of cr*p ever produced: "Shorts" "People like you also like this completely unrelated content" "Do you want results? haha, here you have home results as well".
I recently wend through the process of selecting a MP3 player software for my Linux laptop and after testing many settled on the Strawberry player. It is actually very good: https://www.strawberrymusicplayer.org/
imo monorepos are great, but the tooling is not there, especially the open-sourced ones. Most companies using monorepos have their own tailored tools for it.
I really hope we see AI-PU (or with some other name, INT16PU, why not) for the consumer market sometime soon. Or been able to expand GPU memory using a pcie socket (not sure if technically possible).
My uninformed question about this is why can't we make the VRAM on GPUs expandable? I know that you need to avoid having the data traverse some kind of bus that trades overhead for wide compatibility like PCIe but if you only want to use it for more RAM then can't you just add more sockets whose traces go directly to where they're needed? Even if it's only compatible with a specific type of chip it would seem worthwhile for the customer to buy a base GPU and add on however much VRAM they need. I've heard of people replacing existing RAM chips on their GPUs[0] so why can't this be built in as a socket like motherboards use for RAM and CPUs?
Expandable VRAM on GPUs has been tried before - the industry just hates it. It's like Apple devices - want more internal storage? Buy a new computer so we can have the fat margins.
The original REV A iMac in late 90s had slotted memory for its ATI card, as one example - shipped with 2mb, could be upgraded to 6mb after the fact with a 4MB SGRAM DIMM. There are also a handful of more recent examples floating around.
While I'm sure there are also packaging advantages to be had by directly soldering memory chips instead of slotting them etc, I strongly suspect the desire to keep buyers upgrading the whole card ($$$) every few years trumps this massively if you are a GPU vendor.
Put another way, what's in it for the GPU vendor to offer memory slots? Possibly reduced revenue, if it became industry norm.
Expansion has to answer one fundamental question: if you're likely to need more X tomorrow, why aren't you just buying it today?
The answer to this question almost has to be "because it will be cheaper to buy it tomorrow." However, GPUs bundle together RAM and compute. If RAM is likely to be cheaper tomorrow, isn't compute also probably going to be cheaper?
If both RAM and compute are likely cheaper tomorrow, then the calculus still probably points towards a wholesale replacement. Why not run/train models twice as quickly alongside the RAM upgrades?
> I strongly suspect the desire to keep buyers upgrading the whole card ($$$) every few years trumps this massively if you are a GPU vendor.
Remember as well that expandable RAM doesn't unlock higher-bandwidth interconnects. If you could take the card from five years ago and load it up with 80 GB of VRAM, you'd still not see the memory bandwidth of a newly-bought H100.
If instead you just need the VRAM and don't care much about bandwidth/latency, then it seems like you'd be better off using unified memory and having system RAM be the ultimate expansion.
> The answer to this question almost has to be "because it will be cheaper to buy it tomorrow."
No, it doesn't. It could just as easily be "because I will have more money tomorrow." If faster compute is $300 and more VRAM is $200 and I have $300 today and will have another $200 two years from now, I might very well like to buy the $300 compute unit and enjoy the faster compute for two years before I buy the extra VRAM, instead of waiting until I have $500 to buy both together.
But for something which is already a modular component like a GPU it's mostly irrelevant. If you have $300 now then you buy the $300 GPU, then in two years when you have another $200 you sell the one you have for $200 and buy the one that costs $400, which is the same one that cost $500 two years ago.
This is a much different situation than fully integrated systems because the latter have components that lose value at different rates, or that make sense to upgrade separately. You buy a $1000 tablet and then the battery goes flat and it doesn't have enough RAM, so you want to replace the battery and upgrade the RAM, but you can't. The battery is proprietary and discontinued and the RAM is soldered. So now even though that machine has a satisfactory CPU, storage, chassis, screen and power supply, which is still $700 worth of components, the machine is only worth $150 because nothing is modular and nobody wants it because it doesn't have enough RAM and the battery dies after 10 minutes.
hmm seems you're replying as a customer, but not as a GPU vendor...
the thing is, there's not enough competition in the AI-GPU space.
Current only option for no-wasting-time on running some random research project from github? buy some card from nvidia. cuda can run almost anything on github.
AMD gpu cards? that really depends...
and gamers often don't need more than 12?gb of GPU ram for running games on 4k.. so most high-vram customers are on the AI field.
> If you could take the card from five years ago and load it up with 80 GB of VRAM, you'd still not see the memory bandwidth of a newly-bought H100.
this is exactly what nvidia will fight against tooth-and-nail -- if this is possible, its profit margin could be slashed to 1/2 or even 1/8
Replacing RAM chips on GPUs involves resoldering and similar things - those (for the most part) maintain the signal integrity and performance characteristics of the original RAM. Adding sockets complicates the signal path (iirc), so it's harder for the traces to go where they're needed, and realistically given a trade-off between speed/bandwidth and expandability I think the market goes with the former.
The problem with GPUs is they're designed to be saturated.
If you have a CPU and it has however many cores, the amount of memory or memory bandwidth you need to go with that is totally independent, and memory bandwidth is rarely the bottleneck. So you attach a couple memory channels worth of slots on there and people can decide how much memory they want based on whether they intend to have ten thousand browser tabs open or only one thousand. Neither of which will saturate memory bandwidth or depend on how fast the CPU is, so you don't want the amount of memory and the number of CPU cores tied together.
If you have a device for doing matrix multiplications, the amount of RAM you need is going to depend on how big the matrix you want to multiply is, which for AI things is the size of the model. But the bigger the matrix is, the more memory bandwidth and compute units it needs for the same number of tokens/second. So unlike a CPU, there aren't a lot of use cases for matching a small number of compute units with a large amount of memory. It'd be too slow.
Meanwhile the memory isn't all that expensive. For example, right now the spot price for 64GB of GDDR6 is less than $200. Against a $1000 GPU which is fast enough for that much, that's not a big number. Just include it to begin with.
Except that they don't. The high end consumer GPUs are heavy on compute and light on memory. For example, you can get the RTX 4060Ti with 16GB of VRAM. The RTX 4090 has four times as much compute but only 50% more VRAM. There would be plenty of demand for a 4090 that cost $200 more and had four times as much VRAM, only they don't make one because of market segmentation.
Obviously if they don't do that then they're not going to give one you can upgrade. But you don't really want to upgrade just the VRAM anyway, what you want is for the high performance cards to come with that much VRAM to begin with. Which somebody other than Nvidia might soon provide.
Technically we definitely can, but are there sufficiently many people willing to pay a sufficiently high premium for that feature? How much more would you be willing to pay for an otherwise identical card that has the option to expand RAM, and do you expect that a significant portion of buyers would want to pay a non-trivial up-front cost for that possibility?