Hacker News new | past | comments | ask | show | jobs | submit login
AMD EPYC processors come to Google, and to Google Cloud (cloud.google.com)
197 points by jhealy 66 days ago | hide | past | web | favorite | 84 comments

Wow, first photo is a 2000x1489px PNG (4.5MB) and it is scaled to just 451x335px. (not even clickable to view at full size, it's just for page decoration)

Nice web design, google!

There's also an animated GIF (Turn_it_to_11.gif) that weighs 6.7MB and is not even visible until you click "Show related articles" at the bottom. And even if you do that, it's only displayed in 164x82px instead of the original 2880x1200px.

Looks like Google's CDN works so well that creating resized versions to save bandwidth is not worth it. :)

Sounds like their automatic compression functionality on their CDN isn't exactly working the way its supposed to. I know AWS Cloudflare has something like this that I use in conjunction with saving website images on S3.

> Looks like Google's CDN works so well that creating resized versions to save bandwidth is not worth it. :)

Worth it to them, maybe; what about the poor reader?

Ouch, good catch!

The page loads 41 requests and 12.88 MB of data, with the page taking 2+ seconds to properly load the content (1.09s according to Chrome dev tools, but this is just the initial DOM load).

For a simple webpage with white background and dark text.

Not only that, all those MBs are there just to help deliver a 1800 byte text-payload. That's literally a 2500x bloat-factor.

But hey, AMP is because Google care deeply about quick web-pages. No alternate agenda here, no siree.

Google is a mind-bogglingly massive company. You can't infer anything about its stance on perf from a blog post, except perhaps that the people who care about web perf aren't babysitting the blog (which is probably using a third-party CMS).

Incidentally, I only see a 260K image when I dev-tools that page. I can't reproduce the 4.5MB download.

And resized to that size, saved as JPEG with 90% quality and optimized results in 25 KB. So 180x size reduction with almost no quality loss (and you can always link original image if someone wants details).

Yikes. Add this to the long list of Google not following their own recommendations.

Or maybe I've just revealed Google's secret new 5xdpi range of phones :-)

It's truly amazing how bloated and slow most of Google's products have become, you can tell that they have hired literally zero people who came from poor places with slow internet.

I think they just get too caught up in their tooling and silly over-engineering and never actually think about the end product.

Quanta mag <https://www.quantamagazine.org/> used to do similar numbnuttery, 5MB animated crap typical. They seem to have caught on - current front page banner is 'only' 1MB ie. only ~20% of shakespeare's entire life's work, uncompressed.

But we need AMP! Faster, faster page loads, loads.

With 200 CPU threads (128 CPU cores), I think most startups can run a complex microservice environment on a single machine.

Although Kubernetes requires 3 to start, but still, this greatly reduces the need to have a lot of separate machines.

256 threads per machine this year. 512 threads per machine 2 years from now? And then hopefully 1024 threads per machine 4-5 years from now? That would be really fun.

(I will take a laptop in 4 years with just a lowly 64 cores please, leaving the heavy iron for the cloud machines.)

I want Moore's law back but in parallel form. The years since 2004 with x85 machines have been quite boring from a CPU performance increase perspective.

> With 200 CPU threads (128 CPU cores), I think most startups can run a complex microservice environment on a single machine.

Why would you want to? Your whole environment will be down when the machine, or some component thereof, fails or is rotated out for maintenance

I think this is more of a benefit for cloud providers in that they can pack more disparate customer workloads on to a single machine.

So have two of them with synchronous replication and fail over between them. You have to think about dying or suddenly slow machines even in a microservices architecture.

The advantage of such a machine is that if it starts to die, you fail over everything atomically and then repair/replace the backup. You don't need to think about what happens if one component is on a dead machine and the others aren't (does your load balancer handle that well, given machines often "die" in ways that just make them slow rather than totally failed?)

The big win though is if you get rid of the microservices and run the whole thing in one big process. No complex RPC failures, obscure HTTP/REST attacks like https://portswigger.net/blog/http-desync-attacks-request-smu... and so on.

That might sound mad but modern JVMs can run lots of languages fast, and have ultra-low-pause GCs that can use terabytes of heap. Like less than one msec low. Many, many businesses fit into these really high end machines with a giant JVM.

I've written about the possibility of a swing back to big iron design here:


And a look at modern Java GCs - GC being historically the bottleneck to really large single processes:


It'd be great as a CI machine for doing full end-to-end integration testing.

> Although Kubernetes requires 3 to start

It doesn't actually, and if you're running on GKE then the master is a managed service - you can have a single-node cluster (I do this occasionally for testing).

Said startup might want to consider going with smaller machines in separate availability zones, though ;-)

(Disclaimer: I work in Google Cloud)

2 is a decent number for GKE. At least some hope of staying up if a node falls over.

3 is more realistic though. If you run at 66% utilization, can deal with a single node down.

This greatly reduces the need to have complex microservice environment.

you can run k8s on a single machine. there is even an easy way to do so k3s: https://k3s.io/

All the hassle of sharding out your workflow for horizontal scaling, with the downsides and expense of vertical scaling.

Just buy more cheap 4-8c servers.

Not a hugely surprising development given what was discussed yesterday:


If they bring a 64 Core to the desktop, Intel will have no market for yrs.

Never underestimate the entrenched incumbent.

I know plenty of people who know the Intel name and would consider AMD to be some kind of cheap knock-off (non-techs, obviously). And obviously desktop/laptop manufacturers are going to have some sweetheart deals with intel.

As far as I understand it, AMD doesn't natively support thunderbolt and that's an emerging standard that people really like.

Intel graciously allowed the USB consortium to use Thunderbolt 3 tech for USB 4. There will be some minor differences, but this differentiator in Intel favor should fade over time.

Here's is briefly explained: https://www.youtube.com/watch?v=Q0W7fHJMnyg

Never underestimate non tech savy customers.

If they can buy a notebook with processors or 16 processors they will be second one because bigger numbers means better.

The biasing isn't just non-techs. I've hand built every PC I've ever owned for over 20 years now, in addition to building many for family and friends, and I still have to do a bit of mental gymnastics to get over my biases for certain manufacturers. When someone like Intel has consistently delivered better performing parts for as long as they have, when AMD throws up a better part it almost feels like its a trick. Hell even now on the consumer desktop parts it looks like Intel is still edging out the newest AMD offerings on single-threaded performance. Just not at a good price. People, even smart informed people, still try to avoid decisions when it seems easy to do so. Reaching for an Intel processor has meant getting the faster part for so long that its muscle memory for people. That's going to take a while to undo.

But a counter argument is that those people aren't the market for a 64-core desktop.

You are right, but that's not the reason not to bring it. Enthusiast level hardware helps sell commodity hardware. NVIDIA Releases RTX like cards so that their cheaper Commodity cards sell. They don't make money on High end. Same is true for Processors.

You only have to look at Nvidia’s or Intel’s balance sheet to know that’s patently false.

Perhaps you have to use your common sense to know there are only so many enthusiasts but there are many more gamers that cannot afford 1080-Ti. Enthusiasts consists of tiny fraction of the entire GPU and CPU market if you are insisting that the majority of sales fro NVIDIA or Intel comes from that segment of the market then I think you are wrong.

On the other hand they can put extreme margins for high-end hardware. So one high end device might be more profitable than 100 low end devices.

Just have a look at the 10-Ks. Nvidia's: "Gross margin for fiscal year 2019was 61.2%, compared with 59.9% a year earlier, which reflects our continued shift toward higher-value platforms". Datacentre is their biggest growth area today, and they have traditionally created specific high-margin products to extract value in areas with very limited competition.

The same thing is true for Intel when they don't have competition. They are the company they are today due to insane gross margin on Xeon and the explosive growth of cloud computing. DCG has been Intel's top performing BU for ages until recently. Expect the same story from AMD as it eats Intel's lunch in that market. High margin semiconductors are money making machines.

At least in the consumer market in the UK this is exactly what AMD is viewed as by non-techs.

If you go to an average big box store and look for laptops AMD based systems can start as little as £250 where as you can't get a mobile i3 based system anywhere close.

When these pathetic CPUs are teamed with slow disks and a pile of bloatware it makes them seem cheap and awful.

> As far as I understand it, AMD doesn't natively support thunderbolt and that's an emerging standard that people really like.

Thunderbolt has been an "emerging standard that people really like" for almost a decade, with virtually no installed base outside select Apple products.

My wife's couple years old Lenovo Yoga 720 has a thunderbolt 3 port (type-C variety). I'm sure many other laptops have it.

> As far as I understand it, AMD doesn't natively support thunderbolt and that's an emerging standard that people really like.

Most of the AsRock x570 motherboards support thunderbolt currently afaik

This might sound like a silly question, but what would one use a 64-code CPU on a desktop machine for? Or more precisely, in what situations is a 32-core ThreadRipper2 insufficient?

I can answer this! So my work computers are 28, 32, and 96 physical cores each. And honestly, none of those are enough. I'd ideally have a very high clock speed 128 core desktop with a 1,000-10,000 compute cluster available with 10 gb/s internet speeds.

Now back to your question on why these are needed, parallelized simulations. I do computational fluid dynamics (CFD) for designing, fixing, troubleshooting, and optimizing processes and products. They solve large systems of equations that need many cores for meshing and solving and then we need high CPU/GPU core counts just to handle and process the data. In my case at least, the average industrial/manufacturing piece of equipment needs about 1000 cores to recreate as a digital twin due to the amount of multiphysics, complicated geometry, etc.

If you do data science/ML with CPU computed models (basically everything except deep networks), you get a ~2x speedup when doing e.g n-fold cross validation (n >= 64), or hyperparameter optimisation via grid search.

But can it run Crysis?

I don’t think even enthusiasts can fully engage a 64 core device with any meaningful uses that a 32 core wouldn’t suffice.

This is a server chip that excels at virtualization.

Then you are not a target. There are many scientific workloads that actually can benefit from shared memory multi core processors like this. They do not scale properly in cluster. Data Science is very simple and obvious target. Developers also can benefit a lot. Have you worked on a very large project of mixed c++ and Fortran + for some reason python (Compiling or profiling)? This is god send. I am currently working on a single node with 2x28 core and I still need more cores. For licensing issues I cannot use more than one node for my task.

I think most of the people finding themselves here would be in the “I wait on my computer several times a day” category. Data science and local compilation will see benefits for any amount of core scaling. One could argue all of this could be done on a server using a remote terminal, but that isn’t how many workflows are set up.

Try compiling Chromium or LLVM and tell me you don't want more than 32 cores.

Also there are plenty of embarrassingly parallel operations like 3D rendering, video editing, password cracking, and so on that would benefits. Ok usually you can use GPUs for that sort of thing but not always.

Tell that to someone who is running 112 threads at 100% doing Reinforcement Learning... Overall, Machine Learning enthusiast would eat any core they could get. With 64 cores Threadripper might even get close to 5TFlops and act as an older GPU for Deep Learning training.

I could definitely fully engage a 64 core CPU for fuzzing.

What you say is logical, but... have you opened your web browser's task manager recently?

3D rendering and video encoding need every bit of power they can get.

I don't know about others but my lightmapper will certainly benefit :-P. Perhaps compilation times, for some stuff.

Chrome, Google web properties, gmail javascript, dev perk. What kind of system do you think a developer making this

>first photo is a 2000x1489px PNG (4.5MB) and it is scaled to just 451x335px.

had on his desk?

Marketing copy is usually not written by developers.

Compilation of C++ code is one workload. Just make sure you also have a lot of RAM!

Isn't c++ compilation single-threaded and linking multi-threaded? I imagine with pre-compiled headers 64 cores would be a massive overkill. I don't know, if even my 2700x wouldn't be.

C++ compilation is massively multiprocess. Linking is mostly single-process but the gold (for little benefit) and lld (for good benefit) linkers support multithreading.

I am not sure if any c++ compiler can actually compile a single file in parallel but usually you have many small to large files in compiling a project then "make -j" or "ninja" can build in parallel, and it is quite embarrassingly parallel task (but very memory intensive too).

C++ compilers are normally single threaded, so you launch as many instances of the compiler as you can and make them compile different files in parallell

If your code base is 1000+ c/cpp files each of these needs to be compiled into an o file so all of that can be done in parallel.

Run dozens of parallel vms for Selenium tests. Honestly, I could utilize anything the industry could offer as long as the price would be affordable.

Aside from that let me just remind that pretty much everyone in here is an edge case user. We're not the norm so my guess is that your question should be addressed to the general public. In that context I doubt anyone would need such processing power. It's no surprise that Apple is investing heavily in the iPad Pro product line. Most users could be fine with just a tablet.

The question is, what workloads do people have that they'll run often enough and/or profitably enough to warrant those 64 CPUs sitting around idle when they're not using them.

This is the advantage of a cloud host. Sure you pay a premium, but you're paying not to pay for it when you're not using it (and networking and power and updates and security etc.).

(Disclaimer: I work on Google Cloud)

Most cloud hosts are painfully expensive. If you don't value your own time at market value (as is typical and reasonable for hobby time; it isn't really exchangeable), then we're comparing an ongoing lease to just the hardware cost.

It doesn't take much usage for those lines to cross.

Oh there's definitely a point at which you save money buying the hardware and running it yourself. But especially for hobby usage my experience is that's often further out than real usage. YMMV.

I think I'd save money by renting a RPi for even $20/day, for instance. :)

Uses right now are fairly limited. However advancing technology has a tendency to open up new possibilities unthinkable before

Run Slack of course.

Laughed harder than I should have at this. Thanks for the morning chuckle.

Open 10+ tabs in Chrome.


“640k should be enough for anyone”?

I was not saying 16 or 32 cores should be "enough for anyone", I was just curious about the specific workloads people might want such a beast for. ;-)

Looking from the context, it is easy to expect that all available cores will inevitably be used by newer dynamic workloads. From the context of the “640k” sentence, the same thing applied. Apply it to now, and the same effect will be noticed.

Athlon 64 didn't kill Intel.

Don't underestimate a King like Intel that's backed into a corner. Lots of dirty tricks to play and lots of time to come back.

What is the pricing? It is cheaper per core-hour than Intel? It should be.

Also how many machines will they have? Will it be a token amount in a single data center or will there be a good chunk of these around in multiple data centers?


>Intel can cut its prices, to be sure. Beyond that, it has limited maneuverability. Ice Lake servers will not arrive for another year. Pricing on these cores is simply amazing, with a top-end Epyc 7742 selling for just $6950, or roughly $108 per core. An Intel Xeon Platinum 8280 has a list price of over $10,000 for a 28-core chip, just to put that in perspective. If you want a 32-core part, the Epyc 7502 packs 32 cores, 64 threads, higher IPC, and an additional 300MHz of frequency (2.5GHz base, versus 2.2GHz) for $2600 as opposed to the old price of $4200 for the 7601. AMD doesn’t segment its products the way Intel does, which means you get the full benefits of buying an Epyc part in terms of PCIe lanes and additional features. AMD also supports up to 4TB of RAM per socket. Intel tops out at 2TB per socket, and slaps a price premium on that level of RAM support.

Why should it be cheaper than Intel? The performance is similar. The fact that it's cheaper for Google to run doesn't mean it's less valuable for the end user.

If anything, they should cut prices across the board. AMD just cut the price of an x86 core in half.

Because Intel has a better brand name / recognition. Many companies default to Intel, even when it's not the cheapest.

I'm glad AMD is making such progress that these companies can't ignore them anymore. Having one CPU maker would be horrible.

I'd love to see a comparison of new EPYC vs Intel on database workloads. Acc. to recent anandtech article, which quotes Intel, memory access can give large edge to Intel.

I would love to see benchmarks with Clickhouse which scales much better than regular SQL databases on a single machine.

ClickHouse is happy to use multiple cores if the query is heavy enough. We have tested it on AMD EPYC 7351 more than a year ago and get promising results. (I have not saved them but I'll try to reproduce and post them here.)

Another case of scalability: we have also tested ClickHouse on an Aarch64 server (Cavium ThunderX2) with 224 logical cores and despite the fact that each core is 3..7 times slower than Intel E5-2650 and the code is not optimized as much as for x86_64, it was on-par in throughput of heavy queries.

There are also tests of ClickHouse on Power9 if you mind...

I hope they'll also attach GPU's to those machines. We switched part of our operation to local hardware because wouldn't get both the fast GPU's and the fast CPU's in the same node.

A proper solution of course would be to have the CPU intensive algorithms run on different nodes, but it's an integrated solution we pay for so we don't control that.

You can already get 1-4 GPUs on VMS with 64-96 vCPUs in a number of regions: https://cloud.google.com/compute/docs/gpus/#gpus-list

Is the problem that the CPU max scales with the number of GPUs, so you can't get 1 GPU with 96 vCPUs?

No, we were looking for the high single thread performance machines, so the new compute optimized cascade lake ones, but those you can't get gpu's on.

> We switched part of our operation to local hardware because wouldn't get both the fast GPU's and the fast CPU's in the same node.

And how did that work out? Upfront costs but should be significant savings overall right?

Not sure yet, the machines should arrive next week. To be honest we might switch back and forth between own and cloud hardware as we scale up. Right now we process one dataset per week or so, but by December that should become multiple per day. The 3 machines we bought should put us at around 1 per 2 days, but it's more so we can do worry free experimentation. It's really annoying to be constantly worried about wasting money when experimenting. We 'wasted' thousands just experimenting with flags and rendering bad datasets, of course it might still come out as cost effective but it discourages experimentation.

The big question for us is if we're going to be able to afford to do it locally, there's both the upfront cost and the cost of system administration. These 3 I could still do with my dev team amateur style, but when that becomes 30, we'll probably need some technician that has experience managing compute clusters.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact