$141 for 128GB ram + 2x 2TB NVMe SSD, and a very fine 16 core chip with gobs of cache. This is a pretty solid offering.
If you get a 1 year up front price reserve instance (lot of caveats there), a aws m5ad.2xlarge is $176/mo. 32GiB, 8vCPU, and a 300GB NVMe ssd on a 2019 era Epyc. That's the best offer I could find from AWS for something near that price point. Having 4x the RAM, way more SSD space, and 2x the cores, on a dedicated machine sounds great. And it makes sense as an offering, is a reasonable offer, attainable.
Hopefully these come to more than the 1 data-center! :) Hopefully we see more competitors offering great stuff like this. Wild that this easily a sub $2000 box, $700 cpu, $200 ram, couple hundred for ssd and mobo and case.
I Love hetzner, especially with the fact that you can have private networks between dedicated and (dedicated) VPC servers, you can have endless scaling kubernetes clusters for a fraction of the price you pay elsewhere.
Provisioning is relatively fast and everything has an API.
If you are brave enough you can just build your own server and have them co-locate it for you too, making advanced machine learning clusters possible aswell (albeit of course with more upfront investment).
I have seen startups go through their 250k cloud bonuses in 3 month because of unattended scaling, I recon on hetzner, you would have a couple of years playroom =)
> I have seen startups go through their 250k cloud bonuses in 3 month because of unattended scaling, I recon on hetzner, you would have a couple of years playroom =)
But then how will you get invited to the next AWS conference to talk about your (self-inflicted) problems and then write about it on your company’s “engineering” blog?
I really would like to use Hetzner but their ToS [1] forbids any cryptocurrency related usage. I only want to archive block transaction data (think a block explorer). I don't know why they are so aggressive with that, I would understand PoW and PoS ban but not a hole industry.
I would love to have a dedicated server to run custom LLM models like llama, alpaca, but with dedicated context and memory (like knowning my codebase :).
I'm not an ai expert yet but:
1. Is this processor suitable for that?
2. Is it feasible to customize one of the LLM models directly on that machine or retraining the model requires more power?
Not really. If you're going for performance, you're going to be using NUMA aware code, and your compute is going to stay on a single CCD (if you set up your NUMA domain correctly). Then it's up to you to decide if you benefit more from cache or frequency.
EDIT: AMD CPUs with nonuniform cache (such as this one), with the right motherboard, support L3 as NUMA, which means that each CCX is exposed as a NUMA domain.
But I don't have such a system, nor have I played with the BIOS settings of that chip. But it only has one memory controller, so I doubt there's a NUMA setting. Correct me if I'm wrong of course.
The memory access is through the same IOD (so memory access to main memory is fairly uniform) but there are two different CCD's, two different NUMA zones. Communication between them is slower than communication within the CHIP. And one chip has vastly more L3 cache than the other.
Epyc has a NUMA node per CCX mode but I don't know if Ryzen "BIOS" includes any NUMA support. That also doesn't help if you're running a uniform workload.
There are consumer motherboards which do expose this setting, so I would expect it to also be available here.
There are very few workloads where you will have less per-core performance on a 7950X3D than, say, an EPYC CPU. Even those relatively lower clocks are pretty high by server standards.
It seems poorly suited to a lot of workloads absent some scheduler improvements. Take the work-around for gaming: park the CCD without the 3d cache. Well at that point why not just use a 7800x3d?
I don't think this inevitably follows. It means, for a web server, some requests will be faster than others, but some faster is probably still better than none faster. If you have a very specific need for consistency, then slower is better, but that's rather niche.
Do companies like google and Facebook replace 100% of their servers all at once? Or do they upgrade in stages, allowing some requests to be processed faster than others?
For most workloads the 7950X3D is slower and costs more than the 7950X. The performance variability is just icing on the cake. It's really a gaming processor.
IMO the predictability, that a certain request for a single server should have the same response time, is very important to find performance regressions. I'm still unsure how others that use Hetzner deal with this: https://news.ycombinator.com/item?id=35165764
Process #0 is on HardwareThread#0 through HardwareThread#15 inclusive. These will run faster because of L3 cache (96MB IIRC).
Process #1 is on HardwareThread#16 through HardwareThread#31. These will run slower because they don't have the cache (only 32MB).
Load balancer checks queue-depth of Process#0 vs Process#1 and distributes as appropriate. Done. You probably should be doing this under NUMA-conditions anyway (which will have different memory-access speeds and different PCIe access speeds / asymmetries. NUMA#0 will talk to PCI#0 faster, and if PCI#0 is Ethernet, then NUMA#1/#2/#3 will be penalized), but its no different for asymmetric cores.
Yeah, this gets to the crux of our disagreement. People shouldn't do this IMO and they shouldn't buy NUMA either (or turn it off in the case of Epyc). These optimizations are a waste of time when the alternative is to buy hardware that simply doesn't require them.
> they shouldn't buy NUMA either (or turn it off in the case of Epyc). These optimizations are a waste of time when the alternative is to buy hardware that simply doesn't require them.
Turning off NUMA reporting doesn't make NUMA go away. Whether you can buy hardware that doesn't need this kind of optimization really depends on your needs and budget. The 7900X3D and 7950X3D don't have a very large niche, IMHO; but if you were thinking of a 7900X for a VM system, and you have some loads that would benefit with more cache and some that wouldn't, it could make sense. Depending on your needs, you might be better served with a 7950X or maybe two 7800X3D systems or ??? there's lots of options. I think if you're going with Epyc, you're needing or wanting lots of cores, lots of memory channels/capacity, and/or lots of pci-e; if you need that in a single machine, you're just going to have to deal with the fallout of NUMA that results, because it's not like you have a choice. Sure, if you can manage to split up your load into multiple small UMA machines, that'll be easier to manage, but then you do have to manage more machines and balancing at different levels. It's all tradeoffs.
> These optimizations are a waste of time when the alternative is to buy hardware that simply doesn't require them.
Its cheaper to own/maintain 50 servers, rather than 100 servers. Running 2-processes per server is just cheaper.
You have half the motherboards, your RAM is consolidated, your SSDs / Hard Drives / Ethernet ports are consolidated. This leads to lower power and more efficiency (its easier for threads to migrate over if possible).
I tried to sign up for a basic VPS on Hetzner, they said my information was
suspect, even after already purchasing VPS with these companies:
Amazon Lightsail
Atlantic.Net
DigitalOcean
Google Compute Engine
Linode Shared
Vultr
kamatera
lunanode
upcloud
I tried to give them the benefit of the doubt, but once they asked for a photo
of my drivers license, I gave up. no thanks, I will just stick with Digital
Ocean.
Due to their really low prices and the fact that they don't force you to prepay for your VMs they get a lot of spammers/scammers, so while this is unfortunate it is somewhat understandable.
I've been using them for a few years already and only have good things to say.
Hopefully these come to more than the 1 data-center! :) Hopefully we see more competitors offering great stuff like this. Wild that this easily a sub $2000 box, $700 cpu, $200 ram, couple hundred for ssd and mobo and case.