I wish US sellers would start doing this, too. Nvidia's continuing refusal to put higher amounts of VRAM into its "affordable" cards combined with AMD's head scratching resistance to entering the ML space leaves average consumers who want to run bigger models locally with very few options.
> The GeForce RTX 2080 Ti 22GB has found its way outside China. eBay seller customgpu_official, an upgrade and repair store in Palo Alto, California, sells a similar MSI GeForce RTX 2080 Ti Aero with 22GB for $499. The store is touting the graphics card as a budget alternative for students and startups that want to get their feet wet in AI workloads. The graphics card is allegedly stable in Stable diffusion, large language models (LLMs), and Llama 2. According to the merchant's website, it has sold over 500 units of the GeForce RTX 2080 Ti 22GB.
The China stuff is just background story I think.
> Nvidia's continuing refusal to put higher amounts of VRAM into its "affordable" cards [...] leaves average consumers who want to run bigger models locally with very few options.
That's market segmentation and product differentiation for you. You want that much VRAM you're doing ML; you want to do ML you can pay us ML prices.
What 'average consumer' wants to 'run bigger models locally' anyway!
If they made the 'affordable' cards with VRAM ranging from say 8GB to 80GB, it would be only gamers buying at the bottom end and only MLers at the top. And they can charge the latter a lot more - so even if they did what you want, you'd end up with Applesque pricing bumps for beefier/more GDDR chips.
That's not how this traditionally worked. A few generations ago board partners were allowed to solder whatever RAM they wanted onto the GPU board.
Boards from different OEMs were different in meaningful ways and OEMs had the chance to differentiate themselves from the competition with some actual engineering. More RAM, multiple GPUs per board, AGP to PCIe chips, you name it.
Nowadays Nvidia restricts is partners from all that fun and undercuts them with their own models. No wonder EVGA quit this game.
The murder of SLI made me so angry. It was so wonderful to be able to buy a card, then a few years later buy another of the same card for a bargain price and boost your performance by ~60%.
That and increasing RAM down the road for the system itself were some of the few upgrades that made sense.
Usually by the time you could buy a high-end CPU replacement, it was a "better deal" to just replace the entire motherboard/CPU combo and get a whole new computer. But a second video card, some more RAM, additional SSDs, those made sense.
SLI was ultimately killed by the rise of TAA. Interleaving frames breaks the assumptions of TAA models about the presence of temporal history. That's why NVIDIA went down the path of building a better TAA model/upscaler instead.
TAA is also terrible and turns visuals into a smeary mess, particularly because devs keep building it into games in such a way as to be nearly impossible to turn off like in Halo Infinite.
Nah, you hate bad TAA. DLSS is TAA too and it's actually great, DLAA is real nice, or the cool kids are doing some DLSS+DSR thing (bigger intermediate space?). Halo infinite is the poster child for fucked-up TAA, same for RDR2, you know what works better than anything else? DLAA lol. It provides a minimum floor.
Games are just going to use temporal accumulation now. There is too much signal to be gained from re-using past samples. It makes things way cheaper and opens the door to extrapolation and non-uniform sampling etc. It's the least bad of all the options, and DLSS actually is quite good at weighting samples pretty reasonably. DLSS 2.5, 3.0, 3.5 and upwards are actually significantly better and that can be injected back into (non-anticheat) games and the bar will likely continue to be raised. It is a signal processing technique that recovers a lot of signal with very low "noise factor".
The only difference between the consumer RTX cards and a Quadro card is memory (and maybe power pin location, which is better on Quadro cards for rackmounted chassis). If nvidia put more memory inside their consumer cards, professionals would just buy the relatively less expensive consumer cards rather than the Quadro cards which are twice the price.
True, but they weren't doing such a range, it was like 512MB or 1GB, 2GB or 4GB, and there wasn't such ML interest in them as there is/would be today - if there had been that market then EVGA et al. would've been lapping up that premium too, I don't think it really affects the argument who's responsible for placing the chips. (And I wasn't really aware of that shift.)
Yes, and NViDIA didn't care when it was all selling to gamers. Now it wants crypto bros (obv a little historical there) and ML people who are using it to make money to pay a higher price point. So they do it by restricting the options available at the affordable level. See also, what happens when you want to use group administration features in Windows.
NVIDIA did care when it was all selling to gamers - although at least they had the decency to use ECC memory and some other CAD relevant bits, as well as signed production-ready drivers. They've just transitioned from QUADRO to RTX as ML became the dominant professional tier.
I think they're saying that Microsoft uses features like group admin as differentiating factors between Windows variant so that they can sell professional/enterprise-oriented variants for a higher price than Home. So, same sort of "pro features in the pricier Pro variants" logic.
The RTX 4090 is so huge and power-hungry you can barely fit two into a case, let alone any more than that.
And even if you replace your case, and your power supply (and maybe your motherboard and CPU too, gotta have enough PCIe lanes) you'll still only have 48GB of vram.
OK, so we can't use an RTX GPU. Good luck even understanding the rest of nvidia's product range. The A40? "The World’s Most Powerful Data Center GPU for Visual Computing". V100? "The most advanced data center GPU ever built". H100? "Unprecedented performance, scalability, and security for every data center". A100? "Unprecedented acceleration at every scale to power the world’s highest-performing elastic data centers". H200? "The world’s most powerful GPU" L4? "The breakthrough universal accelerator".
Good luck figuring out who sells them in your country, or if they're in stock.
Oh, and remember to read the entire spec in excruciating detail. That $2700 L4 24GB is slower than an RTX 4090.
Think you'll launch something in the cloud? With Google Cloud half the cards are only available in some regions - and even if you're in a region that offers a given GPU, maybe there's a shortage and support tells you to just keep requesting GPUs repeatedly until you get lucky.
Yeah I went down the rabbit hole of trying to buy a GPU for local model training and execution about a year ago. My takeaway was “anything that starts with A is too expensive and often trades blows with a 3090 or 4090 anyway”. I ended up with a used 3090. 24GB of VRAM, which is more than some of the A series GPUs, and it was around $700 on eBay at the time. Now apparently they’re around a grand.
I had to upgrade to a 1000W PSU for the 3090 because the 3090s will occasionally voltage spike and trigger OCP on power supplies. TDP is supposed to be like 350W but for a slip second it might pull around 600-700W. Plenty of complaints about it on Reddit and elsewhere. It’s a beast but it cooks my home office when I use it. Cest la vie.
Nvidia specifically indicates the VRAM size and whether the product is suitable, so you pick the ones with enough to satisfy your expected needs. What’s there to not understand?
* Is a given product their most advanced, or was it merely their most advanced when the marketing copy was written 5 years ago?
* Is the product actually available for purchase today, and at what price?
* In what form factors - Full height? Half height? Two slots? Three? SXM? Does it have a fan built in?
* What is the relative performance? Half the time the specs all quote different numbers. Why does one product quote 'single precision performance' and 'rt core performance' and 'tensor performance' while another quotes the 'tensor cores' and 'shader cores' and another quotes the 'FP64', 'FP32', 'FP16', 'BFLOAT16', 'TF32' and 'INT8'?
And don't imagine you're going to get away with ignoring those specs. A $2700 L4 24GB is slower than an RTX 4090, for example, because it's for power-efficient servers or something.
Most of this information is indicated on their website or online spec sheets?
Like I said Nvidia themselves indicate whether it’s suitable.
And if your still confused or doubt the accuracy, or don’t want to spend any time reading spec sheets, I imagine the sales channel folks will be able to guarantee it in writing for a fee.
> Most of this information is indicated on their website or online spec sheets?
You'd think so, wouldn't you?
But nvidia is not that smart. They have decided that the GPUs should be split over at least three different pages, and those pages should be camouflaged.
I can only assume the marketing team are judged based on time spent on site or number of pages viewed, rather than on sales made.
When you go to the home page, point to products and click on "NVIDIA RTX / Quadro" you might expect to find the RTX 4090 and some Quadro products. You will in fact find neither - the RTX 4090 is under 'GeForce' and the 'Quadro' brand is no longer used.
Maybe on the home page you choose "NVIDIA RTX-Powered AI Workstations" - that'll give you a list of their workstation-suitable cards for ML, right? No, that page contains no products at all. The only option is to Find A Partner which links you to a bunch of partners - several of whom do not in fact sell AI Workstations.
Other partners sell AI workstations... without GPUs. As far as HP is concerned, $4000 only gets you the base model workstation, with 32GB of RAM and a 1TB hard disk. If you want GPUs with that, you'll need to call the sales team, who might perhaps deign to sell you one.
Or perhaps you're at nvidia's home page, and you're looking for data centre grade GPUs? For that product page, simply choose whether you mean the DGX, EGX, IGX, HGX, MGX or OVX platform?
But they make it very easy to find the keynote speech by His Excellency Omar Sultan Al Olama on the latest breakthroughs in AI. He does, in fairness, have a very appropriate surname.
This is such a bizarre rant - you're buying an AI GPU and are ranting be used you need to understand what kind of workloads you need with it? And while also ranting that you won't pay the price of the obvious high end models if you're too lazy about it?
What, exactly, is your expectation here? If you want the simple Apple model where you don't need to look at the spec, do the Apple think and pay up for the pricy GPU.
The Al Olama joke really made my day. I support what I say in the model lines and assured website. But perhaps only MS and IBM so far know how to do corporate sites…
Did you even finish reading my comment? The whole point of the last sentence is that you don’t need to “seriously consider this problem of which GPU to buy”, if your willing to pay more for written guarantees through the sales channel.
It’s not even remotely that simple. Well, I guess it is if you have infinite money. For the rest of us, VRAM is just the tip of the iceberg. There’s what you can actually order and get shipped to you, how much that costs versus MSRP, whether to get one giant GPU or two, or to leave room in your computer/PSU budget for two down the line… it’s the Wild West. I think most of us end up with 3080s or 3090s mostly due to price/what’s available on the market.
There's a lot of chaos that stems from the crypto boom, COVID, the AI boom, the fact that NVidia is essentially a monopoly, some groups are making custom GPUs so they have more VRAM....
right. it's bizzare watching tech heads confused about price segregation like it's some phantom force and not one of the glorious tools of monopolies and price gouging.
If the card onboard memory is not enough, the card can use the PC RAM over the PCIe connection - and then the PCIe bandwidth is very important. But the main reason for less lanes is to cut costs.
Yes, but it starts to crawl on 16x anyway. My firsthand experience with LLMs is that there's little difference between VRAM-to-RAM spills and running on just RAM. Same for LoRA training. When I accidentally disturbed too much VRAM by e.g. parallel upscales, it went from 1.3it/s to 30-40s/it and never recovered. NVidia even added a new setting in their control panel to disable CUDA RAM fallback to work around unintentional slowdowns in lowvram cards, afaiu.
> the modified GeForce RTX 2080 Ti will not face any Nvidia driver issues as the graphics card is seemingly supported and working without vBIOS modifications.
That seems very surprising. Bet Nvidia will push out an emergency update soon, and then you'll have bricked and unbricked cards.
Easy enough to get around - don't update your BIOS/drivers when they do. Same game that used to be played with pirated software and even iPhone updates for jailbreaking.
There aren’t really VBIOS updates for GPUs like there are for motherboards. They exist, but for the most part NVidia nerfs cards with driver updates. For a while, to mine Ethereum you just needed to use a specific GPU driver version. It really was that simple.
Aren't they already? That's still not much of a consequence. If anything happens, it'll be years from now, which is totally out of scope for quarterly profit management.
The dream of game graphics was murdered first by crypto and then by LLMs. Is there any hope for a time when the market for high-spec GPUs will actually be games? Or has that ship sailed forever? At least AI has some sort of real world utility I guess, but I still wish neither revolution had happened.
I think for the most part graphics have been "good enough" for a while, and since raytracing is the new hotness that's what games are working on improving rather than pure polygon count or resolution. The market for gamers spending insane amounts of money on hardware is kind of disappearing as well, I know so many people who just buy "upgrades" to their existing hardware to play newer games, but they certainly aren't buying 4090s.
It's a similar issue with EVs, the market they were trying to sell to didn't really exist. They wanted people with enough money to buy an expensive car, but also people who are not expecting oodles of luxury features which would normally come on a car that expensive. So now they are having to lower their EV prices in general because inflation kind of murdered any market they may have had. Add on the EV growing pains of battery tech still being susceptible to reduced range in the cold, increased tire wear, and defects...I don't know many people who will take such a risk for such a price.
Tbh, graphics is not what games lack today. Personally I was fine with what we had 7-10 years ago graphics-wise, and most modern cards hit FPS limits for these games, apart from few pathological cases. But if you're into graphics, I agree it sucks, and probably forever now. Otoh those who were into graphics brought us here, didn't they, so maybe it's karma ;). Imagine that the great filter is not supernovae nor nuclear, but the fact that species may happen to have good enough eyesight to glimpse into the box of pandora computing.
I really do like fancy graphics in games. Once you have seen the good ones, I now find it pretty off-putting when something looks like it was made 10 years ago (Recent example: Starfield).
I think some games like the modern iterations of CP2077 are just mind blowing - all gameplay aside. But it's a shame that it costs basically north of $1000 to play it the way its developers hope you will.
Ugh. Cyberpunk I don’t like specifically for its neon darkness and these RTX-enabled monochrome colors and ambients. Same for modern young streamers who like “pink from one side, blue from the other side” lighting. Also speech is all f on f on f. Recently I watched SF/CP comparison video that ought to show superiority of the latter. But it left very mixed aftertaste. Yes, CP is much more “technological” and “produced”, but SF is at least not a bunch of teens who can’t change a lightbulb and learned about the word “fuck” yesterday.
That's always been the problem.. it's hard to justify the expense of making a game few people have the hardware to run.
So instead most games just target whatever modern consoles are which aside from right after a new release is usually relatively old for a PC.. which then makes it hard for PC gamers to justify spending that $1000+ when few games utilize it.
Personally I spend a ton on gaming hardware so I can (ab)use mods that are horribly unoptimized, buggy, and require about as much effort to install and use as would be required to become competent in another programming language.
Minecraft and Skyrim have been my two constant companions for years now.
It depends. If it feels empty or walled off with loading screens then it doesn’t feel large just because it is big in extent. The experienced “size” of a world feels more related to density than how long it takes to traverse. Starfield vs CP2077 feels like an example here again.
Hmm, not sure what you mean. The consumer cards from AMD and Nvidia are still gaming-first, and games utilize new features like ray-tracing and DSS. Speaking as a lifelong PC gamer and current Twitch addict, the industry still seems very healthy.
Games are targeting 3090 or 4090 for decent settings and frame rates. My 2070S is tired as hell. The mid range jumped from $150 to $500 in a couple of years. The enthusiast range jumped from $250 to $1500. That’s painful.
It's happened before, silly as it is, AMD once decided that a new 8GB card needed a 4GB variant at the last minute before launch so they shipped the first batch of "4GB" cards with 8GB installed and half of it just disabled in the VBIOS. You could easily flash it to unlock the full 8GB - the one time you really could download more RAM.
Those were the Phenom Tripple Core Chips you are thinking of. What the Pencil mod did was enable the 4th core on the Tripple Core Chips. However, the 4th Core was likely a factory rejected core for some reason or another, so enabling it could lead to instability
> However, the 4th Core was likely a factory rejected core for some reason or another, so enabling it could lead to instability
It may have been a defective core, but when they make these cut-down SKUs they need to meet a quota of units regardless of whether that many salvagable defective chips roll off the line, so it's not uncommon for them to disable perfectly functional hardware. Especially as the process matures and yields improve.
I remember doing exactly that with one of my Athlon's. Don't remember if it was an Athlon XP or Athlon 64, but they'd removed a jumper on the top and you just needed a 2B pencil to join the pads to get a free upgrade.
I did that with a Phenom II X2 555. The 3rd core was stable up to around 3.8ghz if I remember correctly, whereas the 2 enabled by default were fine up to at least 4.2.
The signaling requirements are way too tight for pluggable VRAM to ever be a thing. If anything we're headed in the other direction, with CPUs losing pluggable memory in order to achieve tighter timings like GPUs do, Apple is already doing it and Intel is set to follow.
Exactly. There's a reason these chips are always surrounding the processor (since the 2000s) and why we haven't seen GDDR based plugable memory modules.
For this same reason (timing precision) you see that soldered DDR5 memory often reaches way higher speeds than what's available in DIMM or SODIMM form.
We're already half-way in a heterogeneous future, with chiplets[1] and mixed cores[2][3] etc. Could we expand this to memory, having some soldered (on-chip?) high-speed memory, and then slots for additional slower, yet faster then the alternatives, DIMMs?
Or would the cost of the extra complexity of the memory controller likely not be worth it ever?
> Could we expand this to memory, having some soldered (on-chip?) high-speed memory, and then slots for additional slower, yet faster then the alternatives, DIMMs?
Intel's already doing that with Xeon Max, it has both onboard HBM and an outboard DDR5 interface. It can be configured to run entirely from HBM with no DDR5 installed at all, or use the HBM as a huge cache in front of the DDR5, or to map the HBM and DDR5 into different memory regions to let software decide how to use each. I don't think there's been any indication of that approach filtering down to consumer architectures though, Intel is talking about doing RAM-on-package there but without any outboard memory interface alongside it.
Obviously high-end consumer CPUs already have about 30MB of on-chip memory, with server CPUs reaching a solid 300MB. We just prefer to call it L2 and L3 cache. If we add more memory in a chiplet format I suspect mainstream CPUs would simply expose (or rather hide) it as L3 or L4 cache.
Most software isn't even NUMA aware, and would completely fail to take advantage of a tiered memory hierarchy if it was given the option. But if we make the fast memory a big cache and let the CPU worry about it it's a "cheap" win.
Though there is the Xeon Phi which has about 16GB of on-package memory that can either be configured as cache or as "scratchpad" memory. But of course that's not meant for general-purpose software
I'm looking forward to the performance, but not looking forward to higher capacity RAM being segmented off to overpriced "professional" SKUs like high VRAM capacity is on GPUs. Currently you can run up to 192GB RAM on a consumer CPU platform but I doubt RAM-on-package consumer parts will scale that high.
Yeah manufacturers love this evolution because it means everyone who wants or needs high memory will be forced to buy for their projected memory needs throughout the live cycle of the product on day one, and the only place they can get it is at inflated prices from the vendor.
I wonder how they will do this in the workstation and server space, I don't really see how they can do away with socketed CPUs.
I wonder if we will go back to slotted CPUs, with a SOM style board with CPU and memory being plugged into a motherboard/chassis that's really just an I/O back plane. How will multi Cpu communication look then?
I guess we already have memory being pinned to a NUMA node and connecting to others via a vendor specific interconnect, so maybe it's not that strange and different from today.
> I wonder how they will do this in the workstation and server space, I don't really see how they can do away with socketed CPUs.
I'm guessing the endgame will be consumer parts all being RAM-on-package with no external memory interface, and workstation/server parts will take a hybrid approach like Intel is already doing with the Xeon Max chips which have 64GB HBM on the package and an external DDR5 interface supporting terabytes of slower bulk memory.
I haven't upgraded this CPU yet, as it's still too new, but my last motherboard got two new CPUs, and previous was 2-3? (Maybe more it was awhile ago, thanks AMD)
Given that AMD has been releasing AM4 CPUs since 2016, I think it's reasonable to assume that many of those who know how to build computers in the first place have upgraded their CPU. Why switch the whole motherboard/CPU combination when you can just plug in a better CPU?
Well, if you've been using Intel platforms you do because Intel obsoletes the chipsets at a rapid pace so there often isn't anything appreciably better to upgrade to on the platform.
One could imagine a two deck PCB, where you have another PCB board underneath the main one for additional close memory chip locations with a high density vertical interconnect.
EU companies selling in the US have to follow US regulations, US companies selling in the EU have to follow EU regulations. And in both cases, only for activities in the respective market.
For example Apple would be free to sell lightning port iPhones in the US and USB-C iPhones in the EU. Or don't make USB-C iPhones and not sell any iPhones in the EU, if they don't want to be "leeched". Same for Nvidia and this hypothetical regulation for swappable RAM (which is never going to happen because it isn't technically viable)
If you do business with citizens of another country, in that country, you should expect to have to follow that country's regulations.
Your position is literally American exceptionalism. Do you think that because of NATO that US companies should be able to ignore EU consumer protection laws?
this is not a new kind of mod, and it existed from many years now and has been demonstrated in 30 series GPU also (see https://news.ycombinator.com/item?id=26389996 from 3+ years ago).
i am still unsure why doubling the vram works so seemlessly, but i wonder if you need modifications at driver/vbios level to fully utilize the new capacity.
Not sure why this comment got nuked, but I vouched for it. Perhaps automatically falsely detected as spam? Anyways the first and last link show the modification. Very cool.