Hacker News new | past | comments | ask | show | jobs | submit login
Why isn’t CPU time more valuable? (johndcook.com)
93 points by panic 13 days ago | hide | past | web | favorite | 87 comments

You could look at this as "CPUs are cheap, of course we can afford to leave them idle." But a more interesting angle is, "For one programmer's hourly cost, you could run 4000 CPU cores continuously. Can there really be no practical way to apply thousands of cores to boosting the programmer's productivity? What are we missing?"

For instance, couldn't https://github.com/webyrd/Barliman develop into something that makes computing worth spending on at that level?

> What are we missing?

We're missing the magic bit that can transform whatever the programmer needs to do into something the thousands of cores can calculate.

For example, the other day me and a colleague tried to optimize a query. This is something where thousands of cores could have tried all various variations of the problem, and we could have sit back and let them figure it out.

The issue is that I don't have any magic way to tell the horde of cores what to try and how to verify the result. Also there are so many variations to try, I'm not sure it would have been more cost effective without some clever sql-aware thing running the show.

In other words, superoptimization for SQL queries? That's a neat idea. I don't know that anyone has done that before.

It's dependent on the RDBMS, the schema, and the data (or at least the engine's current stats about the data). The good news, though, is that if you could extract just the stats that your RDBMS engine knows about your data, that could be a pretty small bit of metadata to send over the wire to an army of CPUs. You wouldn't need to send actual database tables over the wire (which would be big and slow, and probably a security red flag).

I once worked on a similar problem which also amounted to "enumerate every possibility in N-dimensional space to find the good ones". It was surprising to me how well this worked in practice. Starting with literally every possible solution and then chopping off obviously bad branches by hand will get you to a 90% solution pretty quick.

This seems like an entirely tractable problem.

RDBMS do query optimization by trying out different plans and picking the best via cost estimations. With prepared statements you can reuse that optimization so the cost is (kinda) amortized.

If you want something deeper you'd need an optimizing sql compiler. The closest I know is DSH (database supported haskell) which translates haskell comprehensions into reaonably optimized sql. The name is a nod to data parallel haskell which used the same flattening transformation for automatic parallelization of comprehensions. https://db.inf.uni-tuebingen.de/staticfiles/publications/the...

Though this system can't come up with answers like 'add an index', 'split this table' or 'add sharding' so for the complex cases it doesn't really help.

> Though this system can't come up with answers like 'add an index', 'split this table' or 'add sharding' so for the complex cases it doesn't really help.

And that's the answer we ended up with. Adding a couple of materialized views with indexes, and rewriting the original query to utilize the materialized views.

FWIW I think that DOES happen, but it happens on the "wrong" computers!

Google has at least tens of thousands of cores running builds and tests 24/7. And they're utilized to the hilt. Travis and other continuous build services do essentially the same thing, although I don't know how many cores they have running.

From a larger perspective, Github does significant things "in the background" to make me more productive, like producing statistics about all the projects and making them searchable. (Admittedly, it could do MUCH more.)

I think part of the problem is that it's cheaper to use "the cloud" than to figure out how to use the developer's own machine! There is a lot of heterogeneity in developer machines, and all the system administration overhead isn't worth it. And there's also networking latency.

So it's easiest to just use a homogeneous cloud like Google's data centers or AWS.

There's also stuff like https://github.com/google/oss-fuzz which improves productivity. I do think that most software will be tested 24/7 by computers in the future.

Foundation DB already does this:

"Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson https://www.youtube.com/watch?v=4fFDFbi3toc&t=2s

Autonomous Testing and the Future of Software Development - Will Wilson https://www.youtube.com/watch?v=fFSPwJFXVlw

They have sort of an "adversarial" development model where the tests are "expected" to find bugs that programmers introduce. So it's basically like having an oracle you can consult for correctness, which is a productivity aid. Not exactly, but that would be the ideal.

Neat -- I haven't worked at Google but wondered what they might be doing on this score.

There's probably lots of potential for modern AI to hack at programmer productivity more directly. Machine learning so far has been more of a complement than a substitute, but I'm imagining a workflow where a lot of the time you're writing tests/types/contracts/laws and letting your assistant draft the code to satisfy them. You write a test, when you're done you see there's a new function ready for you coded to satisfy a previous test, you take a look and maybe go "Oh, this is missing a case" and mark it incomplete and add another test to fill it out.

Maybe in the sci-fi future programming looks more like strategic guidance; nearer term perhaps we might see 500 cores going full blast to speed up your coding work by 20% on average. Or maybe not! But it's one idea.

What practical problems can 4000 CPUs solve that 16 CPUs can't?

You get less and less been benefit for each additional CPU you add to the problem, unless the CPU is the main bottleneck.

Also if something requires 4000 CPUs, it is going to start getting expensive if you need to double the output. These types of problems don't scale well.

We've spent decades developing a huge software/hardware edifice that we all stand atop of, thinking in terms of a single thread. The majority of programmers still have to actively push themselves to think in more than one thread (and why would they - the majority of problems programmers come across are single-threaded).

I don't know if there a whole other edifice of computing out there, built atop of decades of thinking in terms of multiple threads, but I have sympathy to the idea that if it's out there, we'd have an awful lot of trouble conceptualising it, and an awful awful lot of trouble conceptualising it after decades of development.

I don't know what kind of practical problems 4000 CPUs will solve that 16CPUs can't, but I give weight to the argument that the way we think, the problems we've created for ourselves and subsequently solved, could have blinded us to them.

There is much that can be done that isn't easily done because of the way software and hardware evolved together. We think single threaded because our languages express single threads and because our languages express single threads the processors we use have to do outrageous things to reorder instructions trying to extract some meager parallelism across a couple execution units and do all this behind our backs. And, because they do that well enough, we don't bother inventing many new languages for doing that explicitly.

You can't easily express "do these two independent things as you can and, when finished, do this other thing" in C (or Python, or Java) and it's up to brave compiler writers to figure out (sometimes erroneously) what can be done with the independent execution flows.

> the majority of problems programmers come across are single-threaded

Surely it's solutions (or rather programs), not problems, that are single-threaded? A problem can probably be solved in many ways, and the fact that many programmers will first reach for a single-threaded program to solve it doesn't mean that's the only, or even the best, way to solve it.

One obvious answer is parallel builds. It's an immense waste of developer productivity to force builds on developer machines.

At sufficient speed too. A build on our build farm takes 2 hours, my local mac does it in 50 minutes.

Am I reading this right that a build farm is slower than your local development computer? Shouldn't that launch a re-evaluation of the build farm?

Not just builds but running tests as well. Actually I think I've heard that Google does exactly that (and caches compilation results, so they could just return them if they are ready).

Google's build and test execution infrastructure is both huge in size and absolutely amazing. One of the biggest positive surprises for me when I had my technical onboarding period.

Feel free to tell us more about it! :)

Thank you, I hadn't realized that was out there!

I think the bottleneck there is actually memory. When building LLVM I run out of system memory before I run out of cores.

In a lot of C++ game development this is a huge factor for productivity, especially since a lot of translation units may need to be touched when changing APIs. Being able to utilize an entire idle office for each .cpp file can make a big difference in build times.

Most places doing any non-trivial C++ development will have a centralized distcc and ccache cluster, surely?

Most C++ shops I've worked at use MSVC and Incredibuild, so yes! I think MSVC is the norm in game development, especially for PC games, but I could be wrong on that.

> You get less and less been benefit for each additional CPU you add to the problem, unless the CPU is the main bottleneck.

I disagree. Each CPU you add to a problem comes with 4x to 8x memory channels of DDR4 (2x on consumer systems, 4x on Threadripper, 6x on Skylake-X). So each CPU increases your memory bandwidth in a very predictable manner.

> Also if something requires 4000 CPUs, it is going to start getting expensive if you need to double the output. These types of problems don't scale well.

Finishing the problem in 1/4000th of the time is often good enough reason. That turns a problem that takes 10-years to finish into a problem that takes 1-day to finish.

You only get good scaling when all the data fits in memory and the problem scales well, but that happens often enough that its worth studying these cases.

> What practical problems can 4000 CPUs solve that 16 CPUs can't?

I am going to be the funny person and say that 4k CPUs can solve a scheduling problem on time so that jobs for 10Mi Idle CPUs can be assigned on time.

But yeah, there are problems where ~ 200x CPU power can make a lot of difference, especially if you're time bound (that's roughly solving in 2 days what 16 CPUs would solve in 1 year)


computational fluid dynamics, FEA, physics based simulations and the like

So many tasks can only take advantage of a single machine (and too often, only a single thread!). While that doesn't change the cost of the CPU time it uses, it does mean that you have to wait much longer to get those N hours of CPU time to save that hour of programmer time. That wait time could completely undo the productivity gains.

It might be useful. For example my IDE uses all cores for build but then those cores idle. If there's local data center nearby (so latency is something like 10ms and speed is gigabit), theoretically my IDE could upload its compiler in advance to cache it there and then just upload those files and get back object files. It'll allow for extremely fast compilation times while I could use very lightweight computer (for example energy-efficient laptop). But I don't think that it's possible to implement that transparently. Software must support this behaviour and developers should carefully decide, because network latency could kill all performance gains.

Regarding compilation, it's perfectly sensible: distcc has done that for a long time.

Part of the Plan9 design (which is almost 30 years old at this point) was the uniform access to computing resources across machines over the network. Unfortunately, we've mostly abandoned that and stuck with mainframe design.

It always pains me to think of what Could Have Been when it comes to things like this. We should have the everything-is-a-file, perfectly network transparent system, but instead most people are still using the bastard child of DOS and VMS

It's even more paining when you begin to realise that nearly every problem humanity has falls into this category - the solutions exist and are known, but don't get used because we're stuck in a local maximum (one that usually involves bank accounts).

>Can there really be no practical way to apply thousands of cores to boosting the programmer's productivity? What are we missing?

Surely that's the whole point of high-level languages?

That's using the end user's cores. Using something at programming-time could be something like static analysis. Then there are entire tests the programmer doesn't have to write and bugs that never get pushed.

I've seen people write Java or C# like I would write plain C. Those first two are insanely high level with large standard libraries.

I disagre with "Those first two are insanely high level" but totally agree with the "with large standard libraries." part.

Java: "I've got a huge standard library!"

C: "See, you're doing it wrong! I've got a really lean standard library!"

Java: "Ah, so you just focus on the basics. You have a dictionary structure? That's pretty basic."

C: "...no."

BSDs have <sys/queue.h> and <sys/tree.h>; the former is standard on Linux as well.

To my eyes just having generics and hiding away pointers and vtables make them very high level, but I think my argument was more about what is coming with the language (aka their standard libraries) and I've definitely seen people ignore the available libraries, even in python.

You could say the same about many things. My car is idle most of the time. I usually only use it for 1-2 hours a day during the week and less on the weekends. I could use Uber or Lyft instead but I don't. I could rent out my car when I am not using it but I don't. It is idle because I want to use when I want to use without coordinating with anyone.

The same is true of my computer. I once worked at a Unix shop where people would routinely log into other peoples computers to do builds. It locked the machine up (this was in the 90s) and made it hard to do anything else on the computer. The whole point of a personal computer it to have all of the power there for your use when you want it.

> My car is idle most of the time. I usually only use it for 1-2 hours a day during the week and less on the weekends. I could use Uber or Lyft instead but I don't. I could rent out my car when I am not using it but I don't. It is idle because I want to use when I want to use without coordinating with anyone.

That's such an interesting comparison that I relate with. If availability and startup time are good enough, I'd be happy to use a shared resources for both CPU and cars.

The article contains the answer to the question: the CPU time is not the bottleneck. The bottleneck is the time of qualified people turning real-world problems into the code and interpreting the results. And this time is expensive.

Which, by the way, is also why we don't have dynamically reconfigurable FPGA coprocessors like was once dreamed.

A large part is the fact that memory access is comparatively slow. Getting the data to the core is the challenge. Most processing is so fast that the operation you want to perform is latency-limited by IO issues. Some have suggested offloading compilation to other machines; this works, but efficiency depends on having large enough compilation jobs that this results in net benefit. Similar arguments apply to data analysis - moving the data is too expensive, so you move the code to the data instead.

John is confusing CPU time with CPU-core time. I assume he gets the $0.025/hour figure from "a1.medium", which is 1 core ("vCPU") with 2GB RAM.

As a result of competition, the price of any service tends toward cost of the service. So a $2000 computer with 8 cores that lasts 4 years + $500 for power/year + $500 for support/year is $1500/year or $0.17/hour. That's $0.021/core/hour, and the difference is a small amount of profit for Amazon and a buffer for when the CPU is idle.

>So a $2000 computer with 8 cores that lasts 4 years + $500 for power/year + $500 for support/year is $1500/year or $0.17/hour. That's $0.021/core/hour, and the difference is a small amount of profit for Amazon and a buffer for when the CPU is idle.

The other factor is power. That $500 for power for a year won't be $500 if the CPU is running full-tilt all the time, it'll be much more. Modern CPUs are designed to be power-efficient when idle, and then be able to perform many computations quickly when demanded, even if this actually exceeds the hardware's ability to dissipate heat (in which case the CPU throttles itself to avoid overheating). For computers used interactively by people (i.e., laptops, desktops, but not servers), the workload tends to be extremely bursty, with the CPU doing nearly nothing most of the time waiting for the user to do something, and then suddenly having to do a lot of work quickly when demanded (e.g., rendering a bloated webpage or watching a video).

In short, CPU time isn't just being "wasted"; modern CPUs are explicitly designed to be used this way.

Where I live (Texas) electricity costs about $0.08/kWh, which means $500 per year will buy about 700 watts of continuous power. That's much higher than the TDP of any 8-core processor on the market. Even if you have more expensive power and account for cooling costs, it's hard to see how you could end up spending much more than $500/year to power a single machine.

Just for comparison, where I am (Massachusetts) it's $0.23/kWh or so for residential service.

It's $0.05 here in Florida. Cost of living is no joke.

It's only a penny for the big industrial users.

$0.05 for both generation _and_ transmission charges? I would have expected that to be more like $0.09 or $0.10 for Florida... Still a good bit better than MA, of course. :)

There are some extra things, like a fuel charge. It comes to about $0.08 or $0.09 adding those things in, plus about $8 per month just to be connected.

Yeah, I wasn't looking into how realistic that $500 number was, I just accepted it from the OP as a given.

But I think electricity tends to be more expensive in other parts of the country.

Datacenters are placed wherever energy can be easily and cheaply acquired. Some may negotiate special deals with local utility companies, I don't have hard numbers on what datacenters usually pay for electricity though (assuming they don't provide their own).

Maybe if you're in CA. Most other places in the US are less than $0.14/kWh.

Human brain has equivalent of at least 30 TFLOPS of computing power [1]. To get same computing power with CPU at $6.3/TFLOPS in AWS it would cost $189/hr [2]. So renting a human as "general AI computer" is more than 27 times less expensive at the moment.

Also, don't forget that humans also come with powerful high precision mobile actuators and unmatched sensor arrays. And that humans come pretrained in large array of complex skills including object recognition, text to speech etc. My question would actually be reverse of what is posed in the article: Why isn't Mechanical Turk far more lucrative business than AWS?

So in conclusion, if you have $1M lying around, you more likely to find more profitable endeavor by renting humans than same amount of compute capacity in cloud. Price of GFLOPs is falling however at about 10X every 13±3 years. So possibly in 20-30 years things might be different.

[1] https://aiimpacts.org/brain-performance-in-flops/

[2] https://aiimpacts.org/recent-trend-in-the-cost-of-computing/

Because cost does not equal value. This is called the consumer surplus, the difference between how much you're willing to pay for a product and the market price. For example, most people would be willing to pay more for internet access than they currently do.

I think he hit the primary reason in this paragraph from the article:

We might need hundreds of hours of CPU time for a simulation, then nothing while we figure out what to do next, then another hundreds hours to run a modification.

Now that CPU time is so readily available vs. the mainframe era or even just the pre-cloud era, there's far less capacity-forced "figure out what to do next" time unless you've got an extremely computationally heavy problem or are very resource-constrained. As such, there's a lot less unmet demand for compute out in the world, which naturally brings cost down.

This always bothered me too. Both when computers were expensive to me and now cheap. I get paid a good amount of money during the day to use a computer to compute things, display/interface with me, to communicate to humans and other computers. Then my computer is idle/sleep when I am not there. This feels inherently wrong to me. How is that I cannot come up with something useful for my computer to do when I am not there to make real contributions when I resume work? (disclaimer, engineer who programs but not a computer scientist)

Isn't this the point of Folding@Home, SETI, etc...?

Depends on what you do. When 3d modeling, I'll spend the day working on a project, then have it render overnight.

I think serverless computing attempts to address this issue. You have an image that you can doing up whenever and access like a normal desktop, but when it isn't in use the resources go back into a pool that I managed by some authority. It is then the reposnsibility of the authority to load balance the resources and ideally optimize out wasted resources. This makes it cheaper for everyone.

I think the answer is that CPU time doesn't have consistently high value to the CPU owner, and isn't a free market as CPU time is non-transferable due to security concerns.

It isn't practical to buy compute power from a diverse set of CPU owners, because any of them may be malicious, and this problem only increases with scale. The only exceptions are cases where you can afford or mitigate malicious CPU owners, which doesn't lend itself to general computing.

Part of the problem is that most entities who use CPUs have more then they need. If you want to make money with your CPU then you need to use it either to produce a good or service worth more then a few cents an hour. The issue is that everyone else also has CPUs, so whatever you do can't be trivial.

At that point you're optimizing for the market of your good/service relative to it's cost, and not optimizing for CPU usage.

It's commoditised very effectively. So the cost is capital(buy computer) + operating(electricity+sysadmins) + margin(tiny).

I don't really understand why you'd expect CPU time to be more valuable. If anything you'd expect it to be cheaper, given how the majority of CPU time goes unused for anything useful.

Many decades and untold billions in R&D went into modern computers. I wouldn't expect them to be anything but extremely cost-effective, and they are.

The author of the blog entry is looking at things from a different angle. He knows the cost is low because demand is low. He's trying to figure out why demand is low. He wants to find a way to make that idle time useful time that produces value greater than the cost.

Once upon a time it was extremely valuable, see timeshare systems of old. The capital cost was so high that if it wasn't loaded pretty much all the time you were shoveling money down the drain.

Fuzzing is a great use of spare CPU time…

We do have an energy constraint (i.e. catastrophic global warming), and idle CPUs use less energy.

I tend to agree with you, but CPUs only use a percent or two of our energy and idle CPUs still use power. It seems to me that there are lower hanging fruit that we should pluck first.

Overuse reduces the lifetime of CPUs. I'm pretty sure unning a CPU at full throttle for a week will burn it out.

I'd like to see a source on that.

With most electronics, CPUs and GPUs included, the killer is heat. As long as you have the heat under control you're fine. Some applications such as laptops can't keep the thermals undercontrol at sustained 100% CPU usage, so those are obviously at risk.

The only other form of "wear" is electron migration and I HIGHLY doubt you'll kill a modern processor "in a week", even if you left it running at 95C .

Bollocks. You can run at 100% load 24/7 for years, if it's not overvolted too much, it won't degrade in any noticeable way

We do not have an energy constraint. The amount of wind energy and solar energy hitting the earth at any one time is enormous.

We have a usable energy constraint.

Also: superoptimization. Would anyone pay for SOaaS?

Yeah but what about GPU time? It has unfortunately become an arms race in AI and everyone who doesn work for big co is essentially precluded from making research/innovations

GPU time is available by the minute from AWS/GCP/etc and it's dirt cheap. If you think it's expensive, I envy your youth.

Hey! With the article about Cringely and this mentioning Condor (https://research.cs.wisc.edu/htcondor/), it's flashbacks week.

It was originally designed to soak up CPU cycles on unused desktop machines, but I usually used it in dedicated clusters.

I suppose the modern batch-processing hotness is Docker/Kubernetes, which are very heavy-weight for that usage.

There's an abundance of CPU power. Access to those CPU's is one problem but essentially why two-sided marketplaces exist. They've emerged for this exact problem in other categories (as pointed out in other comments) such as AirBNB/VRBO (housing), Turo/GetAround (cars), liquidspace (offices), boatsetter (boats).

I wonder if access was democratized, would demand increase? Jevon's Paradox in action?

I wonder if this isn't why Moore's Law appears to be slowing down. Maybe there are still huge gains to be made but the economic drive is not there.

It's slowed down for several reasons:

1) We hit a brick wall with silicon clock speeds. Silicon apparently can only go 3-4GHz; after that, there's too many switching losses, too much power used, etc.

2) Because of #1, we jumped on the multi-core bandwagon. This worked OK for a while, but most tasks can only be broken up and run in parallel so much. You can't just throw 1000 cores at every problem and expect it to scale. For anything with user interaction, this is especially true, so there's no point in having more than 4-8 cores on a single-user machine.

3) For the stuff normal people do, there just isn't much demand for more speed any more. How much faster do you need MS Excel to calculate your spreadsheet, or PowerPoint to show you slides?

I'd also argue:

4) cpu speed isn't the limiting factor many times now. Disk, memory, network, user input etc all are much more impactful honestly.

Sure getting a Blender run down 10% is huge, but what is that time saved compared to how long setting up the render took?

The actual Moore's law is about integration of transistors. That is mostly slowing down because the physics of semiconductor manufacturing have been getting increasingly ridiculous for the last 5-10 years.

Using photolithography to build features with sizes smaller than the wavelength of the laser being used has been normal for years now. That's the level of ridiculousness we're talking about here.

(3) is what I'm talking about though. There is not enough demand for smaller and lower power processors because at the edge where size and power matter the most there are not enough applications that can profitably use more power.

Data centers are easier to scale by just adding more nodes. Power matters a lot in data centers but otherwise miniaturization is less critical and single threaded performance is less critical.

If my computer didn't overheat and throttle the CPU any time you push it over 50% for more than 5 minutes at a time, I might buy this.

Depends on what you mean by speeds, but there's a huge confluence of reasons that processors are where they are today.

If you mean the clock speed specifically, then it's largely due to the inability to manufacture smaller gate widths in silicon. The Core 2 architecture by Intel, for example, uses a 45 nanometer gate width for transistors in each core. Core 2 was part of the Penryn family. The latest family is Nehalem, and it, too, uses 45 nanometer gate widths. Core i5 and i7 belong to this family, among others.

Since the gate widths didn't shrink from the Penryn family to the Nehalem family, the power consumption of a single state change in a given transistor didn't decrease. Since the heat dissipation (and, therefore, power consumption) is proportional to both the gate width and the clock speed, this new architecture couldn't change the state of the transistors any faster than the previous one. Therefore, core clock speeds remained pretty constant.

Getting to 45 nm was really tough. Going to the next frontier, which will likely be 32 nm, will be even tougher. So tough, in fact, that STMicroelectronics, Freescale Semiconductor, NXP Semiconductor and Texas Instruments have all decided to stop their process research. An article in 2007 claimed that Intel, IBM and Matsushita, AMD and Renesas would be the only organizations still pursuing R&D in this area. That's a vastly reduced set of brains and dollars on the gate width problem.

If your question about "speed" is more general, well, then there's another discussion around multi-core architectures that's also fascinating. The primary technical advances in Nehalem versus predecessor families are its multithreading, caching, bus and memory management schemes. If you keep each core at 3 GHz, how can you efficiently use two 3 GHz cores to get, say, 1.5 times the speed of a single core? How can you efficiently use four 3 GHz cores to get, say, 1.5 times the speed of two cores? in this respect, processor speeds have increased significantly in the last 5 years, and will continue to do so as software is written to take advantage of these new architectures.

But, then again, when was the last time you really found yourself waiting for your processor? It was probably your disk, your network or your brain that was the bottleneck in the first place. :-)

Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact