AMD’s 64-Core Threadripper 3990X, only $3990 Coming February 7th

ChuckMcM · on Jan 7, 2020

When I left Sun in 1995 their "biggest" Iron was the Enterprise 10K (which internally was called "Dragon" because of the Xerox bus) A system with 64 cores and 256GB of RAM was just under 2.5 million dollars list. It needed over 10kW of power provided by a 60A 240V circuit. The power cord weighed in at like 20 lbs. I put together a new desktop with the TR3960 and 128GB of ECC ram, that motherboard will take the 3990 and 256GB of RAM if I chose to upgrade it. It really boggles my mind what you can fit under your desk these days with a single 120V outlet.

jonas21 · on Jan 7, 2020

In 2001, the fastest supercomputer in the world was ASCI White. It cost $110M, weighed 106 tons, consumed 3MW of power (plus 3MW for cooling), and had a peak speed of 12.3 TFLOPS.

Right now, sitting under my desk is a RTX 2080 Ti GPU which cost around $1000, weighs 3 pounds, draws a maximum of 250 watts, and has a peak speed of 13.4 TFLOPS [1].

We truly live in amazing times.

[1] Not quite a fair comparison: the GPU is using 32-bit floating-point, while ASCI White used 64-bit. But for many applications, the precision difference doesn't matter.

sillysaurusx · on Jan 7, 2020

It's not fast enough. After having access to 160 TPUs, it's physically painful to use anything else.

I hope in 20 years I'll have the equivalent of 160 TPUs under my desk. Hopefully less.

The reason it's not fast enough is that ... there's so much you can do! People don't know. You can't really know until you have access to such a vast amount of horsepower, and can apply it to whatever you want. You might think "What could I possibly use it for?" but there are so many things.

The most important thing you can use it for is fun, and intellectual gratification. You can train ML models just to see what they do. And as AI Dungeon shows, sometimes you win the lottery.

I can't wait for the future. It's going to be so cool.

fluffything · on Jan 7, 2020

How do you program a TPU ?

I looked to use some for solving PDEs, but Google had literally zero documentation on how to cross-compile C to TPUs, launch kernels, etc.

AFAICT, you either use tensorflow or some other product that supports them, and for which the TPU code is not open source, or you can't use TPUs at all.

sillysaurusx · on Jan 7, 2020

Do these help? https://github.com/google/jax/tree/master/cloud_tpu_colabs

The ODE solver might be close to what you want.

I use Tensorflow 1.15. The world has been steadily pushing for Tensorflow 2.0 or Jax, but I like the simplicity of the Session model. It's so simple you can explain it in one sentence: it's an object that runs commands. Tell the session to connect to the TPU, and it will run all those commands on the TPU.

Jax is new to me (and to everyone; they just released it). But it looks like Google is pouring some serious R&D into it.

Two things help a lot. One, twitter. You can get a direct line to the people who actually make these beasts. Exploit it when you can. Like you, I dislike using a black box, and I'm intensely interested in the details of how to communicate with a TPU at a low level. I recently asked someone on the jax team about it here: https://twitter.com/theshawwn/status/1213221594052599808

Two, TFRC support has been incredibly helpful. https://www.tensorflow.org/tfrc I don't know who they have working the support channels, but those guys and gals are some of the most helpful and cheerful people I've come across. I often asked them very technical questions and to my surprise, they followed up with an A+ response almost every time, usually the next day.

Pytorch is giving TF a real run for its money, and to be honest I once felt it was a mistake to invest so much time into Tensorflow. But it turned out to be a big advantage due to Google's investment in the overall ecosystem. TPUs are something that only Google has the resources to pull off.

Note that the traditional path towards "just get a TPU up and running and start playing with it" is to use one of their Colab notebooks on the topic. https://cloud.google.com/tpu/docs/colabs I've been implicitly steering you away from these because you seem (like me) to want to know more of the low-level details. Those notebooks are designed to let ML researchers get results quickly, not for hardware enthusiasts to exploit heavy metal. The jax notebooks felt much more satisfying in that regard.

p1esk · on Jan 7, 2020

Or you can buy Nvidia gpus which will be much cheaper than tpus in the long run for the same performance.

so_tired · on Jan 7, 2020

Great twits and write up.

Can u mention how much human Dev time is involved?

We have a stupid-basic single machine Deep reinforcement Self play setup. It takes about 24 hrs to run a full experiment. The NN is the bottle neck. Using Tensor flow. Nothing fancy.

How much dev time for a good enginner (backend, kernel, multi core experience) to get this down to say 1hr ?

Obviously a very general question. Thanks for any input.

cr0sh · on Jan 7, 2020

> You can't really know until you have access to such a vast amount of horsepower, and can apply it to whatever you want.

Something I've often wondered, and there are probably good reasons why, is that billionaire tech moguls - even the ones who are outwardly technical (or were in the past - people like Bill Gates, who we know had technical chops in the past) - that none of them (that I'm aware of) haven't ever tried to build "their ultimate computer".

For instance, if I had their kind of money, I've often thought that I would construct a datacenter (or maybe multiple datacenters, networked together) filled with NVidia GPU/TPU/whatever hardware (the best of the best they could sell me) - purely for use as my "personal computer". Completely non-public, non-commercial - just a datacenter I would own with racks filled to the brim with the best computing tech I could stuff into them (on a side note, I've also pondered the idea of such a personal datacenter, but filled with D-Wave quantum computing machines or the like).

What could you do with such a system?

Obviously anything massive parallelism could be useful for - the usual simulation, machine learning, etc; but could you make any breakthroughs with it - assuming you had the knowledge to do such work?

Which is probably why none have done it - at least as a personal thing.

I mean, sure, I would bet that people who own large swathes of machines in a datacenter, or those who outright own datacenter (like Google or Amazon) - their founders and likely internal people do run massively parallel experiments or whatnot on a regular basis, ad-hoc, and "free" - but it's a commercial thing, and other stuff is also running on those machines...

But a single person is probably unlikely to have or think of problems that would require such a grand scale before they would just "start a company to do it" or something similar; because in the end, just to maintain and administer everything in such a datacenter, if one were built, would require (I would think) the resources of a large company.

Of course, then I wonder if such companies - especially ones like Google and Amazon, which own and run many datacenters around the world, and also sell the resources of them for compute purposes - weren't started in some fashion (even if only in the back of their heads) by their founders with that idea or goal in mind (that is, to be able to own and use on their whim "the world's largest amount of computing power"...?

kjs3 · on Jan 7, 2020

Paul Allen kinda did just that, although in a different direction. He built a datacenter and filled it with a bunch of old computers he thought were cool, like the DEC PDP-10. It's now the Living Computer Museum in Seattle.

https://www.pcworld.com/article/3313424/inside-seattle-livin...

derefr · on Jan 13, 2020

I feel like "tech moguls" are the wrong type to expect this kind of interest out of. They got rich on either tools or workflows (i.e. CRUD), not intelligence/analytics/prediction. It's not the same mindset.

If anyone were to own a secret HPC cluster, it'd probably be a finance billionaire. Or the owner of a think-tank who made their money as a subcontractor for state intelligence agencies.

jrs95 · on Jan 7, 2020

Or as it currently stands you can run the buggier, more resource intensive equivalents the software you used to run! Now featuring pervasive spyware that tracks and catalogues your every action! Wanted a permanent copy to the software you paid for? Too bad, it's only available as "A Service" which means you get constant changes you never asked for AND you get to pay for them on a recurring basis whether you like it or not!

Seriously though I feel like most of the gains in hardware have been wasted by shittier software both in terms of quality and in the way the software itself acts against the interests of its users.

prennert · on Jan 7, 2020

A bit off topic, but I am looking at TPUs at the moment. Can I ask for clarity if you mean TPUs are easier to use than GPUs out vice versa?

I thought TPUs are harder to work with because they only support Tensorflow rather than Tensorflow and other high-level frameworks as well as low-level CUDA that are supported by GPUs

sillysaurusx · on Jan 7, 2020

Sure!

TPUs aren't necessarily easier to use – it's about the same – but they're powerful. I've documented some benchmarks in this tweet chain, where I trained GPT-2 1.5B to play chess using a technique called swarm training: https://twitter.com/theshawwn/status/1214013710173425665

The power turned out to be from the fact that every TPU gives you 8 cores at your disposal. I never use the Estimator API. I just scope Tensorflow operations to specific TPU cores. Works great.

In terms of actual performance, I was delighted to discover that TPUs can be faster than GPUs when you use all 8 cores: https://twitter.com/theshawwn/status/1196593451174891520 (solution notebook: https://twitter.com/theshawwn/status/1205914446918492170)

It also gives you flexibility. TPUv2-8 can apparently allocate up to 300GB (!) if you don't scope any operations to any cores. Meaning, you run it in a mode where you only get 1 core of performance, but you get 300GB of flexibility. And then you can connect multiple TPUs together as described in the tweet chain, which quickly makes up the difference.

There is also the question of cost savings. A TPUv3-8 seems about as expensive as a V100. Which one is worth it? Well, it depends. In my experience a GPU is easier to use and quicker to set up if you only need one GPU of horsepower. But suppose you wanted to train a massive model in 24 hours. What's your best option? For us, it was TPUs.

The reason is subtle: It's hard to find any single VM that can talk to 140 GPUs simultaneously. But you can talk to 140 TPUs from a single VM no problem. And since you get 800MB/s to and from the VM, you can average the parameters across all TPUs very quickly.

This is similar to what TPU pods do internally. And while TPU pods are impressive, they are also impressively expensive. A TPUv3 pod will run you $192/hr at evaluation prices. Whereas you can play with a TPUv3-8 for $2.50/hr. You can also play with a TPUv2-8 for free using Colab: https://github.com/shawwn/colab-tricks

Yesterday I used that notebook to port forward Colab's free TPUv2-8 using ngrok, then trained using the new StyleGAN 2 codebase: https://twitter.com/theshawwn/status/1214245145664802817

I think a swarm of TPUs can cost significantly less than a cluster of V100s with less engineering effort.

That said, right now most codebases are designed to work with V100's. It will take time before TPUs widely proliferate. But speaking as someone who was once skeptical of TPUs and who has spent several months trying to discover their secrets, I feel that TPUs can get the job done quicker and easier than a GPU cluster. The hardware is also more accessible, since you can more easily spin up 100 TPUs than 100 V100s. But mainly I like that it's all coordinated from a single machine. It's conceptually simpler to debug and to implement.

If you run into any issues or have any trouble with TPUs, please feel free to ask here or DM me. I love talking about this stuff.

EDIT: In regards to usability, the new Jax library works with TPUs out of the box. Google seems to be heading in the direction of Jax. My initial reaction was "Not another library..." but first impressions were positive. It's not quite the React of ML – an idea which I hope to see soon – but it does seem easier for certain research purposes.

PyTorch also recently gained TPU support, and as far as I know they've put in some serious efforts to make sure things run quickly. As for how you use all 8 cores of a TPU using PyTorch, I haven't looked into it yet. But I'd be surprised if you couldn't. It seems unlikely that they would design an API that would hamstring you to just 1 out of 8 cores.

prennert · on Jan 7, 2020

Thanks a lot. This is a lot of information to digest. I will check out your Twitter feed and I think I need to start to play around with TPUs then. Scaling seems to work fantastic for you. I am working more in computer vision and we are running sometimes into weird bottlenecks with our GPUs where neither GPUs not CPUs are under full load. Unfortunately, drilling down on where the bottlenecks come from is not easy at all. I am assuming the profiler from Tensorflow works with TPUs in the same way it does with GPUs?

sillysaurusx · on Jan 7, 2020

Oooh, you're so lucky you get to work on those kinds of problems. I know sometimes it feels frustrating to hunt for bottlenecks, but man is it satisfying to find it.

We had a similar situation at one point. The problem turned out to be that our CPU wasn't generating input data fast enough. So the first step is to confirm that your input pipeline isn't the issue.

The next step would be to break down the problem: Can you extract the smallest part of the codebase into a separate program, and try to make that run under full load?

That's not the technique I used, though. To figure out the multicore stuff, the trick for me was to comment out almost all of the code, until you're left with only a small part that actually runs on the device. Ideally the smallest part.

Basically, change your code so that the model file returns tf.no_op() (or as close to that as possible while still letting your input pipeline run). You want to be in a situation where your training loop is doing an equivalent of while(true) { read_input(); } so that you can verify that your pipeline is able to peg your GPUs to 100% usage.

If you get 100% usage, fantastic! That means you're left with an easy problem: start turning parts of the code back on until you find which part is reducing your performance. Then study that part to figure out why.

If you're not at 100% usage, you're either running into a fundamental limitation (which sometimes happens) or the pipeline isn't designed correctly in some way. I would compare it against other popular codebases such as StyleGAN 2 https://github.com/NVlabs/stylegan2 which is designed to use 8 V100s. The optimizer.py file is pretty insightful: https://github.com/NVlabs/stylegan2/blob/eecd09cc8a067e09e12...

Finally, my biggest tip would be to step back from the problem and think: is there something simple you can do to reframe the problem? When I find myself in a situation where I'm spending a lot of time and energy trying to get a certain thing to work, I can sometimes do X instead for 80% of the benefit. Try to find something like that in this case.

FWIW the TPU profiler was the first tool I reached for. I never got it working. The bag of tricks above ended up giving me effective results on a variety of codebases with no profiler. (A usage graph is pretty crucial, though, which Colab TPUs don't provide.)

So there are a bunch of general tips for solving weird bottlenecks blindfolded.

To answer your question directly:

I am assuming the profiler from Tensorflow works with TPUs in the same way it does with GPUs?

Not really. You're supposed to use cloud_tpu_profiler: https://cloud.google.com/tpu/docs/cloud-tpu-tools

But yeah, if you give specifics (ideally a link to a codebase + dataset + script that runs it) then I can try to look for candidates of what might be the bottleneck.

wolco · on Jan 7, 2020

Sounds awesome and exciting. I can't wait either.

The personal ML bots will be a big things. Next step in total automation.

mrb · on Jan 7, 2020

For a fair comparison (fp64 vs fp64):

«A single GPU card like the AMD Radeon MI60 has more computing power than year 2000 supercomputer ASCI Red (fastest supercomputer in the TOP500 list of June 2000):

• MI60: 7.4 TFLOPS (FP64)

• ASCI Red: 3.2 TFLOPS (FP64) »

https://mobile.twitter.com/zorinaq/status/112491212518746521...

gruez · on Jan 7, 2020

Comparing GPU floating point performance with CPU floating point performance is comparing apples and oranges. GPUs may have higher raw FLOPS, but they have issues with workloads that aren't massively parallel or require branching.

Symmetry · on Jan 7, 2020

That's true when you're comparing a GPU to a single CPU. But when you're comparing a GPU to an entire supercomputer the requirement that the workload has massive parallelism to use all available resources is present in both.

gruez · on Jan 7, 2020

It's a little more complicated for that. The main problem is that the way GPUs are designed, their execution units share the same instruction pointer[1]. That's not an issue if you're multiplying matricides, but it's an issue anytime you have branches. Therefore, even when your workload is massively parallel, it doesn't necessarily mean that a GPU cluster would perform nearly as well as a CPU cluster with the same amount of FLOPs.

[1] https://en.wikipedia.org/wiki/Thread_block_(CUDA_programming...

mlang23 · on Jan 7, 2020

However, since you seem to count the GPU, fully utilising a modern system is definitely not easy.

creato · on Jan 7, 2020

It's not easy to fully (or even partially) utilize a supercomputer from 2001 either. Or a modern supercomputer for that matter.

m4rtink · on Jan 7, 2020

Also the supercomputer likely had a substantial ammount of solid state & fast spinning storage, even back then. Thats also often overlooked in these comparisons, not just the difference in precission.

pixl97 · on Jan 7, 2020

A single 80mm nvme ssd would likely be faster than a significant amount of that supercomputers storage. In 2000 a million IOPS was a lofty target. Now we can do it on a single device.

dgacmu · on Jan 7, 2020

ASCI red had 1TB of DRAM and 12TB of disk. Not bad, but three NVMe drives would clobber it. Putting that much DRAM in a box is still expensive today, but entirely feasible for about $5k-10k.

nullc · on Jan 7, 2020

> while ASCI White used 64-bit.

x87 is 80-bit. :)

alkonaut · on Jan 7, 2020

ASCI White wasn't Intel (it was IBM Power3)

nullc · on Jan 7, 2020

oh, I was thinking ASCI red! fair enough!

_ugfj · on Jan 7, 2020

When the univ I went to at Budapest got a second hand VAX cluster (also around '95) for the bargain basement price of only 50 000 CHF they needed to dig up the street to the nearest substation and have a new powerline installed. http://hampage.hu/oldiron/vaxen/9000_4.jpg the furthest away cabinet is the power supply. This photo is not even half of the cluster.

A few years before that, at another univ, they put an ancient IBM mainframe in place with a crane, temporarily removing the roof of the building.

mlang23 · on Jan 7, 2020

I recently "upgraded" from a desktop to a laptop. This marks the end of an era for me. I always had a relatively powerful desktop at home, mostly running 24/7 since I couldn't be bothered to wait for it to boot. This ThinkPad X1 is the first laptop I own which is apparently powerful enough to host all my work in a 1.09kg package that easily fits in my backpack... OK, I am a text mode user, so gfx isn't what keeps my computers busy. Low latency realtime audio synthesis much more so. And I still remember when 33.6kbps were an exciting thing to have :-) Nice to see that tech moves ahead.

lokedhs · on Jan 7, 2020

Are you sure it was in 1995? I joined Sun in 1997, and the E10k was launched a bit after that.

According to Wikipedia it was launched in 1997 so it does line up. If I remember correctly, the system was bought from Cray after SGI bought the rest, so I didn't know they even had it at Sun in 1995.

Also, the original model of the E10k supported 64 GB RAM.

tverbeure · on Jan 7, 2020

My company back then bought one of those around that time. It was used to run large EDA software jobs.

On one beautiful day during the weekend, only a two weeks after delivery, our system admin noticed that the machine went offline. Logging in remotely didn’t work at all. No ping either.

He drove to work and ... the machine was gone.

Thieves had used a crane to lift it out of the building through a window onto a truck.

Sun told him that this wasn’t the first time such a thing had happened and that somewhere in the chain from order to delivery, an insider tipped off the thieves about where to find the latest.

selectodude · on Jan 7, 2020

I'm curious where the black market for something like that would even come from?

cr0sh · on Jan 7, 2020

Back then, smaller nation-states wanting to do nuclear device simulation and the like would be my guess. Basically, countries that were restricted in some manner on gaining large amounts of parallel processing compute power for such simulations.

tverbeure · on Jan 7, 2020

My money is on breaking it up and sell it for components. The 256 GB in server grade RAM alone cost a fortune.

jacquesm · on Jan 7, 2020

The thing that boggles my mind is what people casually put in their pockets.

solotronics · on Jan 7, 2020

I think we already live in the cyberpunk/sci-fi realm in this regard. Soon we will have swarms of tiny AI powered robots...

andy_ppp · on Jan 7, 2020

I was thinking about this for house building, you go to a panel and the bots that make up the house reconfigure themselves to add a swimming pool, or an extra guest room, etc. Could be pretty awesome. Let’s hope they are not used for evil :-/

jotm · on Jan 7, 2020

Until you try to open your door once and it goes "intruder alert"... Wasn't that a Doctor Who episode? :D

fulafel · on Jan 7, 2020

OTOH we'd rather have seen the single-core perf keep improving. What's the average performance speedup (vs 1 core) that the Sun customers got, or the AMD users get, on the average software they use? The progress in programming language technology hasn't been very kind to multiprocessing[1].

[1] GPUs are another kettle of fish of course, but have their own wellknown problems that prevent widespread use outside graphics

boris · on Jan 7, 2020

Could you share the specs for your build, specifically the motherboard and RAM models. Also have you verified that ECC actually works?

I am tempted to build something similar but AMD's wishy-washy ECC guarantees as well as Linux-specific issues make me unsure.

ChuckMcM · on Jan 7, 2020

TR3960X Threadripper (24 core, 48 thread)

ASUS PRIME TRX40-Pro motherboard

8 sticks Kingston KSM26ED8/16ME (16GB, 2666Mhz DDR4, ECC)

Dual boot Windows 10 / Ubuntu 18.04

2x Samsung EVO 970 NvME M.2 1TB SSDs in RAID0 configuration.

Running in a Coolermaster Cosmos case with a stupidly big air cooler at the moment, to be replaced with a decent liquid cooler (it works, the Cosmos is a huge case because it had to be to hold a hacked Supermicro dual Xeon server board in it before (I really wanted ECC for my workstation))

nVidia 1080Ti+ GPU.

The ECC is detected and claims it is working although I've yet to see it correct an SBE. I haven't been running non-stop memory tests either though so.

I may end up removing the Linux partition since WSL2 works so well on this box.

glandium · on Jan 7, 2020

I have a 3970X with ECC RAM and got error correction notifications in my Linux logs when I tweaked RAM timings too tight. Note that memtest86 doesn't know about ECC on Ryzen, so you may get unnotified error corrections happening if you use that.

boris · on Jan 7, 2020

Thanks for the info. Is this the same motherboard as OP?

glandium · on Jan 7, 2020

No, but all TRX40 motherboards should support ECC the same. Mine is a gigabyte aorus pro wifi, FWIW.

BTW, Century Micro has the only unbuffered ECC modules at 3200MHz native speed (at least that was the case on the 39x0X release day). I don't know if they can be sourced outside Japan, though.

Fun story, I actually botched my order and got 2666MHz ones... but on closer inspection, it turned out the chips on the modules were actually native 3200MHz ones. With the SPD EEPROM saying they are 2666MHz. So I ended up overclocking them at their actual native speed. And I tweaked the timings to be a little shorter than what the 3200MHz modules were advertized for.

boris · on Jan 7, 2020

Thanks. Seeing that we are sharing ECC overclocking information, this post has some useful information for Kingston RAM: https://old.reddit.com/r/ASUS/comments/cw74rl/asus_pro_ws_x5...

voltagex_ · on Jan 7, 2020

What are you going to be using for the liquid cooler?

jankotek · on Jan 7, 2020

7 years ago it was doable to build similar desktop at similar price from refurbished server (4x6core Opteron CPU, 128GB DDR2 ECC RAM). It took about 1KW and was loud, but great complement for performance testing.

BTW I seen 64GB DDR4 sticks on amazon...

tracker1 · on Jan 7, 2020

The memory access performance on ThreadRipper (and current Ryzen/Epyc) is much better than anything prior to this generation for workstation loads. Not that the shared I/O controller is without cost, only that it tends to average out better in most cases where multiple cores across chips are in use together.

Just got my 3950X w/ 64gb ram, not sure that I'd be able to practically use any more compute than this for what I play with, which is mostly multiple back-ends and some container orchestration for local dev and occasionally video re-encodes (BR-Ripping for NAS).

Some think $4k for this CPU is too much... considering the shear performance that you can get these days for under $10K there's never been a better time to build or buy a computer. My only regret is wasting time and money on aRGB that I cannot configure in Linux.

agumonkey · on Jan 7, 2020

sorry for the buzz kill but I feel super weird about how something that was once a corporation class tool is now a casual consumer thingie.

I do wish for exascale in the medical field though.

I even-more-wish that the energy world can see ~similar improvements in efficiency.

cr0sh · on Jan 7, 2020

The real funny thing is that for most people (not OP) it would still be used mainly for word processing, email, and occasionally casual gaming - and it would still be slow to run and boot.

The amount of processing power we each carry in our pockets (even the cheapest throw-away smart phones) would have been almost unthinkable 30 years ago; it's akin to the difference of an Altair of the 1970s vs what was available just 10-20 years prior. What took up a room now sat on a desk and could be purchased for the price of a car.

Now, what took up a room now sits in your pocket, and almost could be given away in a box of cereal its so cheap.

Heck - think about what's available in the embedded computing realm for pennies (or just a few dollars in single quantities) - it's mind boggling to an extent.

agumonkey · on Jan 7, 2020

I concur but it depresses me somehow. Some say that it's worth it, to me it's just the same cycle of marketing trying to disguise the things as progress.

faitswulff · on Jan 7, 2020

The comments are mentioning the Xeons in Mac Pros and how Apple should switch. I have no factual basis for this, but I figure Apple has got to be using AMD's new chips as leverage to get some pretty sweet deals on Intel silicon.

wmf · on Jan 7, 2020

Deals that Apple does not pass on to their customers.

uncle_j · on Jan 7, 2020

Why should they? The market dictates the price.

If people are buying Apple products at those prices then why should Apple lower their prices? The answer is they shouldn't.

liquidise · on Jan 7, 2020

Indeed it does. Which nicely explains the rising popularity of Hackintosh's, particularly among developers and other technologists. AMD Hackintosh's[1] in particular have skyrocketed in maturity and simplicity since Ryzen.

1: https://amd-osx.com/

uncle_j · on Jan 7, 2020

Apple are locking down the platform via proprietary chips etc. It isn't a long term solution and shouldn't be relied upon.

reanimus · on Jan 7, 2020

To be fair, the Hackintosh community is pretty persistent. They added opcode emulation into the kernel, for example, to handle running on CPUs without the expected instructions (older AMD CPUs back in 10.8 or so). I wouldn't be surprised to see the cat-and-mouse continue.

close04 · on Jan 7, 2020

Wonder what happens when Apple starts "enforcing" a Tx chip to boot.

https://en.wikipedia.org/wiki/Apple-designed_processors#Appl...

AdvancedCarrot · on Jan 7, 2020

Indeed though keep in mind that the current method for running macOS on AMD doesn't use this and instead relies on patching through Clover. The downside being that whilst the OS itself may run any applications that use an opcode that isn't implemented will simply crash.

saagarjha · on Jan 7, 2020

This sounds really interesting, do you have any links that I could follow to learn more? Is it setting an invalid opcode exception handler?

reanimus · on Jan 7, 2020

They released a whitepaper on it, yeah. It was a bit annoying to dig up, but Archive.org comes to the rescue:

https://web.archive.org/web/20100217014904/http://xnu-dev.go...

saagarjha · on Jan 8, 2020

It's strange that I can't find any source code for this…

reanimus · on Jan 8, 2020

https://github.com/sinetek/xnu-amd/tree/master/osfmk/OPEMU

saagarjha · on Jan 8, 2020

Thanks! I still wonder how usable programs that utilize SSE3 with this, but it's a pretty cool workaround.

tracker1 · on Jan 7, 2020

I'd been running Hackintosh and an rMBP for my desktop and laptop respectively... this past year I've passed on both and now running Linux for my personal desktop, and will be getting a new laptop within the next few months, some of the Ryzen Asus laptops coming soon are interesting.

Although still not without issue, my workflow has aligned so much with Linux and it's finally reached a good enough point for my day to day use... not having to use the VM based mac or windows docker has been really nice (WSL isn't good enough imho).

einpoklum · on Jan 7, 2020

So, explain to me again how "the marked dictates the price"? ...

uncle_j · on Jan 7, 2020

I know you are being facetious, however there are options which the market provide.

1) Other companies that are willing to sell you workstation and laptop computers that you can run other operating systems on such as one of the Linux variants, Windows, BSD etc. Nobody is forcing you to buy a computer from Apple.

2) There is a thriving second hand market of Apple machines just look at ebay, craigslist, gumtree etc.

If a new Apple machine isn't worth it to you, you are free to buy alternatives.

I am going to buy one of the newer Lenovo Thinkpads as I don't think the MacBook pro is worth it to replace my ageing Macbook Pro.

AmericanChopper · on Jan 7, 2020

> Which nicely explains the rising popularity of Hackintosh's, particularly among developers and other technologists.

Can you provide a citation for this? Having worked with literally thousands of engineers, I have never seen a hackintosh in real life.

1123581321 · on Jan 7, 2020

I believe he’s wrong and they are not meaningfully rising in popularity. Hackintoshes started as soon as the Intel transition, and they do exist (I’ve seen a couple personally, both built around the Leopard era when tools were mature and the desirability of iCloud/iMessage integration was lower.) Today there aren’t many people who need to push computing beyond the relatively affordable Mac Mini and iMac configurations. Most Hackintosh practitioners want Apple to release the “XMac,” a cheap and configurable desktop tower* and operate their Hackintosh in its stead.

It’s a respectable hobby, though, like iOS jailbreaking or emulation, and for persistent people it does let them run MacOS on more powerful hardware than they could afford.

* https://arstechnica.com/staff/2005/10/1676/

austinheap · on Jan 7, 2020

Not sure a citation exists for such an assertion -- but I'm a decade deep and numerous production (profitable) iOS apps shipped without ever touching Apple hardware. MacOS is a joy but the hardware is consumer rot.

AmericanChopper · on Jan 7, 2020

I’m sure if it was indeed popular, there would be some way of demonstrating that. Aside from the fact that I have never seen one in my entire career as a consultant, working with hundreds of organisations, the reason this sounds ridiculous to me is that Apple has a long history of making very little effort to obstruct the hackintosh community. Which suggests very strongly that the community is too small for Apple to bother with. There are a few topics on HN that seem to bring out people claiming that incredibly niche interests are actually very common and popular. Apple is one of them. So I don’t think it’s unreasonable to expect that somebody making such an incredible claim should have at least some way of substantiating it.

jodrellblank · on Jan 8, 2020

I’m sure if it was indeed popular, there would be some way of demonstrating that.

Google Trends suggests that searches for 'Hackintosh' peaked around 2009 and have been steadily declining down to around a half since then. Searches for 'Ubuntu' dominate so much it makes the Hackintosh graph look flat by comparison, but Hackintosh seems about as popular as 'Manjaro' (Linux Distribution) currently is, fwiw:

https://trends.google.com/trends/explore?date=all&q=hackinto...

rumanator · on Jan 7, 2020

I personally know of a iOS software house where all developers use hackintosh, so yeah it's popular.

AmericanChopper · on Jan 7, 2020

Which you can derive from there being at least one hackintosh shop?

rumanator · on Jan 7, 2020

It's the only iOS software house I personally know, so in my POV it represents 100% of macos/iOS shops.

YMMV.

BubRoss · on Jan 7, 2020

You put together all your information from anecdotally never seeing one.

comprev · on Jan 7, 2020

The only reason I'm using a Mac now is by pure good luck as all my PC components at the time (back in 2011-ish) were OSX Snow Leopard compatible, right down to Wi-Fi, bluetooth, motherboard, soundcard, etc. I admit it wasn't completely vanilla due to the infamous tonymacx86 software method but it did get me running quickly.

I gave OSX a test drive and found it much more simple compared to Windows. As I already had an iPhone/iPad it made sense to switch. Come upgrade time I bought a Macbook Pro and have been on OSX since.

wmf · on Jan 7, 2020

Ah yes, the "market" for macOS machines with PCIe slots.

uncle_j · on Jan 7, 2020

While you may scoff enough people are willing to pay the extra for Mac OS and a high level workstation to justify their prices.

caseymarquis · on Jan 7, 2020

Elasticity?

rubyn00bie · on Jan 7, 2020

The market price is the equilibrium of both supply and demand.

What you're saying is the fact people are buying their machines means they shouldn't change price or value proposition... because there are in fact purchases; which, is quite frankly, baffling, to me, a humble idiot.

Mercedes must think their EQC is positioned perfectly in the market with 55 sales? I now imagine Magic Leap will be leaping to raise prices with their next version?

uncle_j · on Jan 7, 2020

> The market price is the equilibrium of both supply and demand.

Obviously.

> What you're saying is the fact people are buying their machines means they shouldn't change price or value proposition... because there are in fact purchases; which, is quite frankly, baffling, to me, a humble idiot.

If Apple are selling the machines in sufficient quantities at whatever they are priced at (I haven't cared to look) then obviously Apple's customers think they are worth it. It isn't really more complicated than that.

rubyn00bie · on Jan 7, 2020

You assume without any facts or supporting evidence that they are achieving optimal sales. What you're saying is literally something you're just making up out of thin air. It's okay to be completely full of shit, just don't market it as truth.

uncle_j · on Jan 7, 2020

Just doing a quick web search and the company made $224 billion dollars in 2018. Do you honestly think they aren't achieving optimal sales? The proof is in the pudding and they have a very very large pudding.

> It's okay to be completely full of shit, just don't market it as truth.

I know you think you are being big brained but not everything is "you must provide a citation". It is pretty obvious Apple knows the market well, knows exactly what they can and can't charge for certain products. You pretending otherwise because I haven't provided you with a citation is a complete joke, it is like asking someone to cite for evidence that the sky is blue.

jodrellblank · on Jan 8, 2020

> "Obviously the way it is, is the way it is. Obviously".

In conclusion, you think no company should adjust their prices, ever? Because the current price is the market price which is obviously the right price because it's the market price?

It's circular reasoning, which can be applied to any sales situation - and if it explains everything, it explains nothing.

uncle_j · on Jan 8, 2020

> In conclusion, you think no company should adjust their prices, ever?

Obviously not. I am saying they have no incentive to change the price if the sales are inline or above with what they would have forecasted.

> Because the current price is the market price which is obviously the right price because it's the market price? > > It's circular reasoning, which can be applied to any sales situation - and if it explains everything, it explains nothing.

Again you don't seem to understand basic market economics. Your product is only worth what people are willing to pay for it. There is the odd exception to the rule (Head and Shoulders Shampoo being one of them, which is priced far higher than they originally intended because people assumed it didn't work because it was cheap).

Generally if there are two or more companies producing product X (in this case Computer Workstations and Laptops) then the market will coalesce around a particular price point for a particular specification. Sure there are those that will always stick to a brand, but the vast number of consumers won't be loyal.

Whether or not the company makes a profit on each unit sold is irrelevant to its market price. If they price their product higher than their competitors people will look at the alternatives.

e.g. I bought a MacBook Pro in 2015 because Apple's machine was cheaper than Lenovo, Dell for the same spec and had a better screen than any of the competitors machines.

This really isn't complicated stuff. I think that personal bias seems to cloud people to some basic truths.

einpoklum · on Jan 7, 2020

> The market price is

No. There is usually no such thing as _a_ market price. There's a distribution of prices for purchases of the same item (and that's when we ignore the cases of transactions involving more than just the transfer of money).

> the equilibrium of both supply and demand.

Supply and demand for a specific products are more the _result_ of socio-economic processes and phenomena rather their _causes_.

rubyn00bie · on Jan 7, 2020

You must have never seen a supply and demand graph, let me elucidate you: https://cdn.britannica.com/70/74270-050-317C4423/Illustratio...

If you're talking to me about socio-economic processes when we're talking about something simple, it pretty clearly shows you're you're not very educated in Economics.

einpoklum · on Jan 8, 2020

You _do_ realize that if you say something with a chart instead of with text it does not become more valid, right?

https://en.wikipedia.org/wiki/Supply_and_demand#Criticism

If you like your charts and economic formalisms, and believe in "market prices", perhaps you should take the time to read the Candide-like "Production of commodities by means of commodities" by Piero Sraffa.

jlgaddis · on Jan 7, 2020

If people continue to pay Apple's (sometimes) outrageous prices, why should they lower them?

(I'm just as guilty, having spent over two grand on MBPs multiple times!)

bcrosby95 · on Jan 7, 2020

Despite their price, there was a time when macbooks were only 5-10% more than the PC equivalent laptop. People that were complaining about its price were inevitably comparing it to bottom of the barrel PC laptops, not higher end business laptops that had comparable specs.

I haven't priced out any recent macbooks to know if that's still true though. Glancing at the new 16" macbook pro, it seems like it might be reasonably priced for what you're getting.

Polylactic_acid · on Jan 7, 2020

There is so much that goes in to a laptop that doesn't make it to the spec sheet either. People only look at a couple of specs to decide how much it should cost but some laptop makers put everything in to those specs and cheap out on everything else and you end up with a laptop with a fast CPU but brittle plastic, A TN display, a DAC that hisses and a whole bunch of other nastyness

x3sphere · on Jan 7, 2020

Yeah, the MBP 16” is pretty comparable to the Dell XPS in price - at least when comparing base models.

However, the costs go up a lot if you spec out a custom config (+$400 just for 32GB RAM). Then the MBP starts looking quite a bit more expensive. Overall I don’t think they’re a bad buy though if you want macOS.

ksec · on Jan 8, 2020

>Yeah, the MBP 16” is pretty comparable to the Dell XPS in price - at least when comparing base models.

The Dell XPS [1] with a comparable spec cost $1650 compare to MBP 16" $2399. In the old days Apple would have priced it closer to $2199 or slightly lower.

Somewhere along the line they started making Mac same margins as iPhone.

[1] https://www.dell.com/en-us/shop/deals/new-xps-15-laptop/spd/...

tracker1 · on Jan 7, 2020

That said, the Mac pro, even starting at the base price is pretty outrageous... I mean, I get $500 for the case and $1500 for the MB, but the rest just seems to be too much in aggregate, and even more ridiculous for mfg upgrades out of the box.

ksec · on Jan 8, 2020

>Despite their price, there was a time when macbooks were only 5-10% more than the PC equivalent laptop.

I seriously doubt 5-10%. You are talking about minimum $50 - $100+ dollar difference, that has never happened. Mac has always been roughly 20-30% more expensive than a laptop with comparable specs. So $1000 comparable spec laptop, Apple will sell you one for $1300, ( But with more expensive upgrades )

The 30% has been fine for years, the quality and finishing as well as macOS was well worth the price tag. But in recent years it hasn't been 30% at all.

olavgg · on Jan 8, 2020

I remember when I bought my first Macbook Pro in 2014, the specs compared to Dell/Lenovo was in the same price range. There was no Apple Premium.

So I decided to give Apple a chance, and I am still using that laptop today, it has really been great value for my money.

Mistletoe · on Jan 7, 2020

As you lower the price, more people can and will buy your product giving you (hypothetically) more profits than before. I’d like to think Apple has done all their homework about what price point to sell at to maximize profit but honestly at this point I think they just make up whatever huge number they want for the Mac Pro price to make it seem cool and go with it.

https://blog.asmartbear.com/price-vs-quantity.html

kevin_thibedeau · on Jan 7, 2020

Don't feel too bad. 386 PCs used to sell for $15k in 2020 dollars.

james_s_tayler · on Jan 7, 2020

When I was in high school in the early 2000s I had a friend who told me his parents purchased a 386 when they first came on the market where I live and they paid around $15k. My jaw just dropped. Then my friend just started chuckling at his parents folly when looking back even at that time it was such an expensive paperweight. Heck, thinking about it now it probably came up in conversation because at the time I had a hobby of picking up old computers that people had thrown out and cobbling together the working parts and built a 386 and a 486 that way. Good times.

jdkee · on Jan 7, 2020

Apple has been successfully using the Good, Better, Best three-tiered pricing model for quite some time. I remember buying a Powerbook 140 at the time when I really wanted the 170 but could not justify the increased price.

From wikipedia:

"Intended as a replacement for the Portable, the 140 series was identical to the 170, though it compromised a number of the high-end model's features to make it a more affordable mid-range option. The most apparent difference was that the 140 used a cheaper, 10 in (25 cm) diagonal passive matrix display instead of the sharper active matrix version used on the 170. Internally, in addition to a slower 16 MHz processor, the 140 also lacked a Floating Point Unit (FPU) and could not be upgraded. It also came standard with a 20 MB hard drive compared with the 170's 40 MB drive."

wmf · on Jan 7, 2020

So where's the "good" Mac with a PCIe slot?

AmericanChopper · on Jan 7, 2020

The MBP isn’t even a very good example of Apple price gouging. Any other laptop that your can get for less money is likely to have rather significant trade-offs.

tedunangst · on Jan 7, 2020

Not much of a deal, but the price differential on the high end Mac Pro CPUs is actually slightly less than Intel list.

nyx · on Jan 7, 2020

So? As any corporate sack-chortler worth their salt will eagerly tell you, to seek rent is not only their prerogative but their moral duty.

kilo_bravo_3 · on Jan 7, 2020

My hunch is that the ARM transition is coming sooner rather than later and instead of spending time and effort into rewriting a bunch of OS functions (like AirPlay Mirroring) to support AMD processors that lack Intel-only features like QuickSync, Apple is just going to drop the iOS pieces it has written for ARM64 into the MacOS (or whatever it's called) that runs on their upcoming ARM-based desktops and laptops.

I mean, I wouldn't re-write code to support hardware-accelerated video transcoding on Ryzen+GPU if I knew that in 2-3 years I was moving from x86-64 to ARM64.

bob1029 · on Jan 7, 2020

How well would Premiere or Photoshop run on ARM? How long would it take to rewrite these apps to run natively (i.e. with acceptable performance) on an ARM platform? Until you can do video/photo/audio editing faster w/ ARM using existing software products, I do not see it being viable as a replacement for x86 in any of Apple's higher-end products. Perhaps ARM is included for sake of mobile development, but with 32+ modern x86 cores you could just as well emulate ARM and barely feel any overhead.

Sure, there are a lot of neat tricks ARM can do with special instructions and hardware accelerators in very well controlled use cases. But, for the average creative professional who doesnt have time or patience to play with hyper-optimizing their workflow, having an x86 monster that can chew through any arbitrary workload (optimized or otherwise) is going to provide the best experience for the foreseeable future.

jdsully · on Jan 7, 2020

Apple has dragged Photoshop kicking and screaming onto different platforms and even OS’ before. If they succeeded when they were nearly dead (OS X Cocoa timeframe) they will have no problems doing it now.

e12e · on Jan 7, 2020

Like sibling comment says, it's not exactly the first cpu transition for Photoshop...

> Photoshop 1

> Photoshop 1 (1990.01) requires a 8 MHz or faster Mac with a color screen and at least 2 MB of RAM. The first release of Photoshop was successful despite some bugs, which were fixed in subsequent updates. Most users ended up using version 1.07. Photoshop was marketed as a tool for the average user, which was reflected in the price ($1,000 compared to competitor Letraset’s ColorStudio, which cost $1,995).

> Photoshop 1.x requires Mac System 6.0.3, 2 MB of RAM, a 68000 processor, and a floppy drive.

Source: https://lowendmac.com/2013/photoshop-for-mac-faq/

ksec · on Jan 8, 2020

They are rewriting all those Apps on iPad OS anyway, which is where many of the professionals are moving towards. It is the same with Autodesk, their CEO said he doesn't know how well iPad Sales are doing, but he clearly sees a trend of more pros moving to iPad. They are everywhere in their industry.

(Unfortunately I can no longer google the link of the video)

Which is not to say they will move to ARM. I am still skeptical of it.

ksec · on Jan 8, 2020

> to support AMD processors that lack Intel-only features like QuickSync

QuickSync is only just a Hardware Video Encoder. Which AMD has as well, it is called VCN [1]. Not to mention Apple hasn't been using QuickSync for as long as they have been shipping T2, where Apple uses their own Video Encoder within T2. ( T2 is just a rebadged A10 )

[1] https://en.wikipedia.org/wiki/Video_Core_Next

ksec · on Jan 8, 2020

And 100 Comments but not a single mention or suggestion as to how would Apple deal with their vested interest in Thunderbolt. I would not be surprised if 90% of the PC shipped with Thunderbolt were from Apple.

USB 4 is out, the Thunderbolt 3 spec has been out for quite a long time as well, and yet we dont even have a single announcement with USB 4 controller.

SCdF · on Jan 7, 2020

Unfortuntately Threadripper only ("only"..) supports 256GB of RAM, Mac Pros can go up to 1.5TB.

So they definitely couldn't switch completely, and supporting both would be expensive for apple due to doubling mobos + testing + drivers etc, and confusing for the consumer because of the differing max RAM capabilities.

amiga-workbench · on Jan 7, 2020

You're not considering AMD Epyc chips here, and Supermicro already have dual socket boards out that will take 2TB of RAM.

SCdF · on Jan 7, 2020

You're right I wasn't, because I presumed those chips were clocked too low to be useful outside of server farms.

I am not the audience for a Mac Pro though, maybe it's fine?

saagarjha · on Jan 7, 2020

I'm sure Apple's own chip team works pretty good as leverage too.

OkGoDoIt · on Jan 7, 2020

“consumer variant of the 64-core EPYC”

At nearly $4000 for just the CPU, is that still consumer territory? I assume only huge companies would spend that much money on a single CPU.

CoolGuySteve · on Jan 7, 2020

It's clocked at 2.9/4.3 instead of 2.0/3.4 like the 64 core EPYC part. So it's actually a lot faster than the server model. Kind of strange.

jedbrown · on Jan 7, 2020

The EPYC still has double the bandwidth (8x vs 4x DDR4-3200).

nullifidian · on Jan 7, 2020

Can someone explain why do server variants always have lower base clocks? In particular I'm interested if consumer higher clocked variants are less reliable for long-term 24 hour full-load use. It has to be something like that, and not just power consumption considerations.

gameswithgo · on Jan 7, 2020

you can’t fit the kind of cooler you would need in a server case is one reason, power is likely the other.

lightedman · on Jan 7, 2020

They do make 4U server cases that would fit watercooling, several GPUs, and a heavily-loaded motherboard with redundant PSUs quite nicely.

tracker1 · on Jan 7, 2020

AFAIK, you can overclock the Server variants if you have sufficient cooling, not sure on binning or the extra memory bandwidth in terms of CPU overclocking overhead... but there is room there.

nullc · on Jan 7, 2020

my 7742's seem to spend essentially all their time while in heavy all core use at ~3.2.

This isn't the same experience I've had with the consumer threadripper at all, so I don't think these numbers make for a simple comparison.

moonbug · on Jan 7, 2020

which power profile do you have set- 225W or 240W?

nullc · on Jan 7, 2020

Not sure, using a H11dsi. I don't recall setting it, though if I did I assume I would have picked the higher amount. :)

My chips are pretty well cooled.

moonbug · on Jan 7, 2020

Look for cTDP ("configurable TDP") in the BIOS.

nullc · on Jan 8, 2020

It was set at a mysterious "auto". I've set it to 240 now. I'll see if I notice a difference.

ktm5j · on Jan 7, 2020

If I'm not mistaken, most of these threadripper systems seem to come with a water cooling system that I'd bet isn't present in server systems. That's my guess.

It might also be that "consumer" workloads will run the cores at the high speed more infrequently than a server which might be running full tilt 24/7. Just a thought

blihp · on Jan 7, 2020

AMD (currently) only has server and consumer CPU lines. So since Threadripper isn't a server product, it falls into the consumer line even though it will mostly be used by professionals and extreme enthusiasts with deep pockets.

oumua_don17 · on Jan 7, 2020

Many large CG/VFX firms buy workstations that cost anywhere between 35K - 80K USD. See [1].

[1] https://twitter.com/yiningkarlli/status/1204564015113895936

mroche · on Jan 7, 2020

That’s because we’re buying server racks for virtualized workstations, not deskside systems. Even from Tier 1 vendors a viable deskside workstation for pro VFX doesn’t approach 35K without shoving dual socket 24-core+ Intel chips that have no realistic purpose being in a workstation unless you’re buying for specificity.

The Mac Pro is a specialty item, not the norm.

sitkack · on Jan 7, 2020

I'd love to know more about this, because I think it is the future for everyone. What VDI environment are you using, what runs on your desktop vs run on the racks? Are the racks shared or do you have dedicated hardware provisioned to you? Do you use more than one backend rack system at a time?

omnimus · on Jan 7, 2020

Does it mean people buy Mac Pros because MacOs cannot run inder vm?

thedaemon · on Jan 7, 2020

People buy Mac Pros to specifically run Mac Video Editing Software. This is the only reason to own a Mac Pro. IE if you work for Marvel editing films.

Polylactic_acid · on Jan 7, 2020

Not sure I would call that consumer. Maybe single user might be more appropriate.

Eikon · on Jan 7, 2020

"Consumer" is used there for market segmentation, essentially to sell additional features in an other variant.

More or less like consumer goods companies are using the "pro" keyword on pretty much any somewhat evolved product but the other way around.

hutzlibu · on Jan 7, 2020

There is a huge gamer market. Every hardcore gamer wants to have the fastest cpu avaiable.

Those who can afford it, will buy it.

edit: yes, it is definitely overkill for most if not all avaiable games, but in a certain scene "overkill" is considered awesome

SCdF · on Jan 7, 2020

It's definitely not for gamers, even silly "I have to have the best thing" gamers. Games are considered "lightly threaded" in these sorts of conversations, so you're looking for max boost clock / IPC.

However, Lots of the youtube influencer ruling class will buy it, in part because of what you said ("those who can afford it..").

The real legitimate consumer base for this are people whose work productivity is held back by compute loads that are embarrassingly parallel. If you spend a lot of your time waiting for a (well threaded) compiler to finish, or blender to render something out, or whatever.

rasz · on Jan 7, 2020

to be fair YT and twitch "influencer ruling class" uses all threads for video encoding (you need software encoders for best quality)

SCdF · on Jan 7, 2020

At least for Premiere it's not as useful as you'd think: https://www.youtube.com/watch?v=StJssAQZlZc&t=38s

(at least for now, I don't work in that industry but it may just be a code quality issue)

dragontamer · on Jan 7, 2020

Twitch / "Influencer" video gamers aren't using Premiere for videos.

Twitch streamers need a "streaming" solution. They play live, and instantly react to the crowd. If someone pays for an emote or something, the Twitch-streamer is expected to look on camera and say thank you to the donor (and maybe repeat the message that the donor paid for).

This means that a Twitch streamer's computer MUST encode the gameplay live. Traditionally, Twitch streamers would buy two computers, one to play video games, and a 2nd computer to process the video stream and upload it to Twitch.

With the advent of 16+ core computers, Twitch streamers have begun to simply buy one computer, lock 8-cores to the video game, and then lock 8-cores to the Twitch encoder.

Presumably, something like a Threadripper (24+ cores) could process the video stream for better quality and lower bandwidth. Maybe live VP9 encoding, for example (8-cores for the video game, 16-cores or more for the encoder)

SCdF · on Jan 7, 2020

I know what Twitch is, I said YouTube. Someone else said Twitch alongside YT.

rasz · on Jan 8, 2020

streaming is a thing on YT

BubRoss · on Jan 7, 2020

ffmpeg is a much better way to look at video encoding. Codec libraries themselves have varying ability to use multiple cores effectively.

Torn · on Jan 7, 2020

I'm not sure modern game engines make good use of large numbers cores, because most consumers don't have that many.

Intel chips are still competitive for single-threaded performance, from what i can tell https://www.pcworld.com/article/3453946/amd-threadripper-397...

FPGAhacker · on Jan 7, 2020

You could host your own Eve Online universe in your bedroom.

fgonzag · on Jan 7, 2020

On a virtual machine cluster, while you use a few of those cores to play a game from the same PC

gameswithgo · on Jan 7, 2020

EVE Online is written in python and can only leverage a single core per zone =)

CrazyStat · on Jan 7, 2020

Good thing the Eve Online universe has lots of zones to use all those cores (~8000 solar systems, iirc).;

cma · on Jan 7, 2020

Consoles have 8 and are about to have 8 cores / 16 threads. Multi core utilization on android is essential to running at lower clock speeds and not thermal throttling. Engines and graphics APIs are catching up and there are less singlethread bottlenecks than there used to be.

CoolGuySteve · on Jan 7, 2020

For most current games, anything more than 6 cores is overkill according to the benchmarks.

With the next console gen being based on Ryzen instead of the much less efficient Jaguar architecture, maybe 8 cores might be better used.

swebs · on Jan 7, 2020

Nobody wants a 64 core CPU for games. Single core performance is the most important factor by far since most games aren't really optimized for parallel computing.

k_sze · on Jan 7, 2020

Does the CPU actually matter that much for modern AAA games? (I haven't played any AAA game in a looooooooooong time.)

p1necone · on Jan 8, 2020

I help people build PCs sometimes and peoples first pass at picking components almost always overspends on cpu and underspends on gpu. For any fixed budget you will usually get (much) better perf by getting the low-mid range ryzen 3/5 or i3/i5 cpus and spending the difference on a better graphics card.

Most games are not particularly CPU intensive (although there are exceptions like Ashes of the Singularity which actually does usually get bottlenecked by the CPU unless you have a really high end one)

redisman · on Jan 7, 2020

Not really. You run the game loop on one core and physics simulation / sounds / world streaming on a handful of other ones. It's more about the maximum single core performance which is not that different on a $150 Ryzen. Going to the most expensive CPU on the market probably only gets you 5 FPS more compared to the top end GPUs that scale rendering in parallel like a dream and can give you 100+ more FPS.

hutzlibu · on Jan 7, 2020

And physic simulation is something trivial?

p1necone · on Jan 8, 2020

Physics simulation in most current AAA games is pretty trivial for low-mid range current cpus yes. There are outliers of course.

xbmcuser · on Jan 7, 2020

Amd has started uping the thread counts with the ryzen processors only. And now that Ryzen has started being adopted by gamers and streamers newer games are getting better at utilising these cores. with current generation games extra threads won't help but newer titles it is going to

colejohnson66 · on Jan 7, 2020

LinusTechTips tester it, and, IIRC, the answer is CPU bottlenecks aren’t really a thing anymore.

ashleyn · on Jan 8, 2020

Because "functional decomposition", a method of threading where you allocate threads per function like physics or rendering, fell out of fashion. Modern game engines instead use a task system where tasks are spun off a main thread and asynchronously computed.

rasz · on Jan 7, 2020

They sure are a thing, because programmers. For example for a stutter free game RDR2 needed 6 cpu cores up to a patch one month ago thanks to bad console first optimizations.

colejohnson66 · on Jan 7, 2020

The key word was really. There are obviously times where CPU bottlenecks are a thing, but for the most part, they’re not.

thedaemon · on Jan 7, 2020

I just upgraded my CPU and kept my GPU, an older Nvidia 960 GTX. My CPU was a Q6600 and now I have a Ryzen 2700X. CPU matters so much it's amazing. Some games are heavy on the GPU and some are heavy on the CPU. Depends on what kind of things they have going on.

p1necone · on Jan 8, 2020

You'd be hard pressed to find a current cpu that would be the bottleneck in most AAA titles though. The Q6600 was released in 2006.

thedaemon · on Jan 8, 2020

I haven't had any experience with cheap current generation CPUs. Have you any experience with ones such as an Intel Celerons that are about $30 on Amazon?

https://www.intel.com/content/www/us/en/products/processors/...

olavgg · on Jan 7, 2020

If you want a high 240hz refresh rate @ 1080p, having i7 9700k or i9 9900k is what you want.

gameswithgo · on Jan 7, 2020

the 64 core threadripper is the worst gaming cpu amd sells almost.

TD-Linux · on Jan 7, 2020

The original Apple II sold at an adjusted 2019 price of $5,476, and it sold millions. I would say there are plenty of consumers with that much buying power. Of course, that doesn't mean most consumers need that many cores - but that's still true even if it cost $100.

tracker1 · on Jan 7, 2020

I'd say Pro-sumer HEDT/Workstation, yes... if you need that much compute and can use it, you're probably making money from it.

xwdv · on Jan 7, 2020

My fridge cost $4k and it does less than this CPU.

sz4kerto · on Jan 7, 2020

I bet it runs cooler.

greggyb · on Jan 7, 2020

I'd be interested in that comparison. A fridge throws off quite a bit of heat when cooling. Luckily they're closed the vast majority of the time and are well insulated.

Which uses more energy at Max load?

selectodude · on Jan 7, 2020

Definitely the CPU. A fridge will use +/- 200 watts. The TDP on these chips is upwards of 250.

jacquesm · on Jan 7, 2020

You significantly overpaid for your fridge.

xwdv · on Jan 7, 2020

Go check out the price of built in fridges and re-evaluate your comment.

jacquesm · on Jan 7, 2020

Miele, a European 'A' brand tops out at about a grand:

https://www.coolblue.nl/koelkasten/inbouw/miele

xwdv · on Jan 7, 2020

Do yourself a favor and look up a real fridge like a Thermador.

jacquesm · on Jan 7, 2020

I see a fridge. Miele -> real fridge. Thermador -> also a real fridge. Difference in value not reflected in either functionality or efficiency, long term negative effect on pocketbook for non-income generating asset -> wasted money.

sytelus · on Jan 7, 2020

So wouldn't this be like 2.6 TFLOPS? I'm thinking if this can replace NVidia V100s to train something like ImageNet purely on CPU. However, V100 has 100 TFLOPS which seems 50x more than 3990X. Perhaps, I'm reading the specs wrong?

PS: Although FLOPS is not a good way to measure these stuff, it's a good indication of possible upper bound for deep learning related computation.

choppaface · on Jan 7, 2020

DDR4 has a bandwidth of about 25 Gigabytes per second. The memory on a V100 does about 900 Gigabytes per second. Cerebras has 9.6 Petabytes per second of memory bandwidth. For stochastic gradient descent, which typically requires high-frequency read/writes, memory bandwidth is crucial. For ImageNet, you're trying to run well over 1TB of pixels through the processing device as quickly as possible while the processor uses a few gigabytes of scratch space.

shaklee3 · on Jan 7, 2020

DDR has a bandwidth of about 25GBps per channel. You can hit around 100-200GBps on Epyc processors if you're utilizing ram efficiently. GPUs tend to enforce programming models that ensure more sequential accesses, but CPU can do it too.

choppaface · on Jan 7, 2020

oh that's true thanks! I knew GDDR had higher bandwidth but the gap seemed a little high when I looked it up

dragontamer · on Jan 7, 2020

These stats are true, but the CPU's biggest advantage is L1, L2, and L3 cache.

In particular, the 3990x, 64-core Threadripper will have 256MBs of aggregate L3 cache, and 512kBs L2 cache per core (32MBs of L2 cache). Highly-optimized kernels may fit large portions of data within L3 cache and rarely even touch DDR4!

Note: Each L3 cache is only 16MBs between 4-cores. It will take some tricky programming to split a model into 16MB chunks, but if it can be done, Threadripper would be crazy fast.

True, GPUs have really fat VRAM to work with, but CPUs have really fat L3 cache and L2 cache to work with. And the CPU caches are coherent too, simplifying atomic code. GPUs do have "shared memory" and L2 caches, but they're far smaller than CPU-caches.

m0zg · on Jan 7, 2020

>> Cerebras has 9.6 Petabytes

More accurately, Cerebras has 9.6 "bullshitobytes" per second. If you can't verify this, it doesn't exist. You could claim insane "bandwidth" by considering your register file to be your "memory". But that doesn't make it so.

nneonneo · on Jan 7, 2020

Nah, you’re probably not reading the specs wrong. GPU-type devices severely outclass CPUs in raw compute power. This has been true for years, and it’s why deep learning depends on them. But CPUs and GPUs fulfill very different niches computationally; GPUs are incredibly parallel but aren’t well-suited for serial tasks or tasks that require unpredictable branching/looping, which is pretty much exactly what CPUs are good at.

josh2600 · on Jan 7, 2020

Think of it like this:

CPU’s do general computing. They’re super flexible but if you have a specific workload you might be able to use a different piece of silicon to get more performance.

GPU’s do more parallelized computing but they don’t do as many different operations. They’re really good at doing a small not super complex fast but massively parallelized (like updating an array of pixels on a screen, for example).

TPU’s are even more parallelized but the operations they do are even more specific and often simpler than the operations GPUs do.

hutzlibu · on Jan 7, 2020

Aren't CPU's also slower, because they have a security overhead? (checking memory access, etc. which the gpu does not (so much))?

shaklee3 · on Jan 7, 2020

That's not really why they're slower. CPUs are significantly more complicated. Things like branch prediction, transactions, sophisticated prefetching all take up a lot of silicon.

dragontamer · on Jan 7, 2020

> Things like branch prediction, transactions, sophisticated prefetching all take up a lot of silicon.

Those things make CPUs faster at sequential execution.

In effect: CPUs are latency optimized. GPUs are bandwidth optimized.

josh2600 · on Jan 7, 2020

This is it exactly.

The more specific you get on the circuit the less flexible it is and the more bandwidth you get at a specific task.

The tradeoff is flexibility for application specific performance. CPUs can do hella stuff but they can’t do a specific thing faster than specialty hardware.

dragontamer · on Jan 7, 2020

GPUs are surprisingly flexible. GPUs are a full on turing machine.

The thing is, GPUs have horrible latency characteristics compared to CPUs. Whenever a GPU "has to wait" for RAM, it only switches to another thread. In contrast, CPUs will search your thread for out-of-order work, speculative work, and even prefetch memory ("guessing" what memory needs to be fetched) to help speed up the thread.

--------

Consider speculative execution. Lets say there is a 50% chance that an if-statement is actually executed. Should your hardware execute the if-statement speculatively?

Since CPUs are latency optimized, of course CPUs should speculate.

GPUs however, are bandwidth optimized. Instead of speculating on the if-statement, the GPU will task switch and operate on another thread. GPUs have 8x to 10x SMT, many many threads waiting to be run.

As such, GPUs would rather "make progress on another thread" rather than speculate to make a particular thread faster.

---------

What problems can be represented in terms of a ton-of-threads ? Well, many simple image processing algorithms operate on 1920 x 1080 pixel entries, which immediately provides 2,073,600 pixels... or ~2-million items that often can be processed in parallel.

When you have ~2-million items of work (aka: "CUDA Threads") waiting, the GPU is the superior architecture. Its better to make progress on "waiting threads" than to execute speculatively.

But if you're a CPU with latency-optimized characteristics, programmers would rather have that if-statement speculated. The 50% chance of saving latency is worth more to a CPU programmer.

einpoklum · on Jan 7, 2020

Remember that these millions of items must actually be able to do work independently, i.e. you need very few (or regular and localized) dependencies between the data processed by each thread, so that threads can just wait on things like memory accesses rather than waiting on _other threads_.

dragontamer · on Jan 7, 2020

> rather than waiting on _other threads_.

Hmm, I think I see what you're trying to say, but maybe more precise language would be better here.

GPU cores have extremely efficient thread-barrier instructions. NVidia PTX has "barrier", while AMD has "S_BARRIER". Both of which allow the ~256 threads of a workgroup to efficiently wait for each other.

-------

The other aspect, is that "Waiting on Memory" (at least, waiting on L2 memory) is globally synchronized in both AMD and NVidia systems. Waiting for an L2 atomic operation to complete IS a synchronization event, because L2 cache has a total memory ordering on both AMD and NVidia platforms.

Tying L2 cache to higher levels allows for memory coherence with the host CPU, or other GPUs even. That is to say: "dependencies" are often turned into memory-sync / memory-barrier events at the lowest level.

Synchronizing threads is one-and-the-same as waiting on memory. (Specifically: creating a load-and-store ordering that all cores can agree upon).

---------

I think what you're trying to say is that dependency chains must be short on GPUs, and that there are many parallel-threads of dependency chains to execute. In these circumstances, an algorithm can run efficiently.

If you have an algorithm that is an explicit dependency chain from beginning to end (ironically: Ethereum hashing satisfies this constraint. You work on the same hash for millions of iterations...), then that particular dependency chain cannot be cut or parallelized.

But Ethereum Hashing is still parallel, because there are trillions of guesses that can all work in parallel. So while its impossible to parallelize a singular ETH Hash... you can run many ETH Hashes in parallel with each other.

shaklee3 · on Jan 7, 2020

It's reading the specs wrong for apples to apples. OP is using the tensor core numbers, which is half precision and only for matrix multiplies. Operations that don't fit that will use the standard fp32/16 performance of the chip, which is around 13TFLOPS. still higher than 2, but nowhere near 100.

jedbrown · on Jan 7, 2020

64 cores * 2.9 GHz * 8 single-precision lanes * 2 issue * 2 (FMA) = 5.9 TF. This compares with 14 TF for V100 (costs more and needs a host). The 100 TOPS for V100 refers to reduced precision (which may or may not be useful in a given ML training or prediction scenario). The V100 has much (~9x) higher memory bandwidth, but also higher latency.

dragontamer · on Jan 7, 2020

* Radeon VII GPU is 14.2 TFlops for $600 right now.

* NVidia RTX 2070 Super is 9 TFlops for $500

True, Radeon VII and RTX 2070 are "consumer" GPUs... but Threadripper is similarly a "consumer" CPU and commands a lower price as a result.

"Enterprise" products cost more. EPYC costs more than Threadripper, V100 costs more than RTX 2070 Super. If you're aiming at maximum performance at minimum price, you use consumer hardware.

AstralStorm · on Jan 7, 2020

You pay the price in locked down fp64 functionality. (Typically quarter or less of fp32 rather than expected half.)

dragontamer · on Jan 7, 2020

Most people don't need double-precision.

Similarly, Threadripper loses on RDIMMs, LRDIMMs, and have 1/2 the memory channels and 1/2 the PCIe lanes. Most people don't need that either.

Of course consumer products have fewer features than enterprise products. The chip manufacturers need to leave some features for "enterprise". The general idea is to extract more wealth from the people who can afford it, while providing consumers the features they care about at a lower cost.

-----

Really, the feature enterprise GPUs need seems to be SR-IOV, or other PCIe-splitting technologies. This allows a singular GPU to be split over many VMs.

Double-precision floats is niche to the scientific fields, also "enterprise" but I don't think most enterprise customers use double-precision floats.