For instance, couldn't https://github.com/webyrd/Barliman develop into something that makes computing worth spending on at that level?
We're missing the magic bit that can transform whatever the programmer needs to do into something the thousands of cores can calculate.
For example, the other day me and a colleague tried to optimize a query. This is something where thousands of cores could have tried all various variations of the problem, and we could have sit back and let them figure it out.
The issue is that I don't have any magic way to tell the horde of cores what to try and how to verify the result. Also there are so many variations to try, I'm not sure it would have been more cost effective without some clever sql-aware thing running the show.
It's dependent on the RDBMS, the schema, and the data (or at least the engine's current stats about the data). The good news, though, is that if you could extract just the stats that your RDBMS engine knows about your data, that could be a pretty small bit of metadata to send over the wire to an army of CPUs. You wouldn't need to send actual database tables over the wire (which would be big and slow, and probably a security red flag).
I once worked on a similar problem which also amounted to "enumerate every possibility in N-dimensional space to find the good ones". It was surprising to me how well this worked in practice. Starting with literally every possible solution and then chopping off obviously bad branches by hand will get you to a 90% solution pretty quick.
This seems like an entirely tractable problem.
If you want something deeper you'd need an optimizing sql compiler. The closest I know is DSH (database supported haskell) which translates haskell comprehensions into reaonably optimized sql. The name is a nod to data parallel haskell which used the same flattening transformation for automatic parallelization of comprehensions. https://db.inf.uni-tuebingen.de/staticfiles/publications/the...
Though this system can't come up with answers like 'add an index', 'split this table' or 'add sharding' so for the complex cases it doesn't really help.
And that's the answer we ended up with. Adding a couple of materialized views with indexes, and rewriting the original query to utilize the materialized views.
Google has at least tens of thousands of cores running builds and tests 24/7. And they're utilized to the hilt. Travis and other continuous build services do essentially the same thing, although I don't know how many cores they have running.
From a larger perspective, Github does significant things "in the background" to make me more productive, like producing statistics about all the projects and making them searchable. (Admittedly, it could do MUCH more.)
I think part of the problem is that it's cheaper to use "the cloud" than to figure out how to use the developer's own machine! There is a lot of heterogeneity in developer machines, and all the system administration overhead isn't worth it. And there's also networking latency.
So it's easiest to just use a homogeneous cloud like Google's data centers or AWS.
There's also stuff like https://github.com/google/oss-fuzz which improves productivity. I do think that most software will be tested 24/7 by computers in the future.
Foundation DB already does this:
"Testing Distributed Systems w/ Deterministic Simulation" by Will Wilson https://www.youtube.com/watch?v=4fFDFbi3toc&t=2s
Autonomous Testing and the Future of Software Development - Will Wilson https://www.youtube.com/watch?v=fFSPwJFXVlw
They have sort of an "adversarial" development model where the tests are "expected" to find bugs that programmers introduce. So it's basically like having an oracle you can consult for correctness, which is a productivity aid. Not exactly, but that would be the ideal.
There's probably lots of potential for modern AI to hack at programmer productivity more directly. Machine learning so far has been more of a complement than a substitute, but I'm imagining a workflow where a lot of the time you're writing tests/types/contracts/laws and letting your assistant draft the code to satisfy them. You write a test, when you're done you see there's a new function ready for you coded to satisfy a previous test, you take a look and maybe go "Oh, this is missing a case" and mark it incomplete and add another test to fill it out.
Maybe in the sci-fi future programming looks more like strategic guidance; nearer term perhaps we might see 500 cores going full blast to speed up your coding work by 20% on average. Or maybe not! But it's one idea.
You get less and less been benefit for each additional CPU you add to the problem, unless the CPU is the main bottleneck.
Also if something requires 4000 CPUs, it is going to start getting expensive if you need to double the output. These types of problems don't scale well.
I don't know if there a whole other edifice of computing out there, built atop of decades of thinking in terms of multiple threads, but I have sympathy to the idea that if it's out there, we'd have an awful lot of trouble conceptualising it, and an awful awful lot of trouble conceptualising it after decades of development.
I don't know what kind of practical problems 4000 CPUs will solve that 16CPUs can't, but I give weight to the argument that the way we think, the problems we've created for ourselves and subsequently solved, could have blinded us to them.
You can't easily express "do these two independent things as you can and, when finished, do this other thing" in C (or Python, or Java) and it's up to brave compiler writers to figure out (sometimes erroneously) what can be done with the independent execution flows.
Surely it's solutions (or rather programs), not problems, that are single-threaded? A problem can probably be solved in many ways, and the fact that many programmers will first reach for a single-threaded program to solve it doesn't mean that's the only, or even the best, way to solve it.
I disagree. Each CPU you add to a problem comes with 4x to 8x memory channels of DDR4 (2x on consumer systems, 4x on Threadripper, 6x on Skylake-X). So each CPU increases your memory bandwidth in a very predictable manner.
> Also if something requires 4000 CPUs, it is going to start getting expensive if you need to double the output. These types of problems don't scale well.
Finishing the problem in 1/4000th of the time is often good enough reason. That turns a problem that takes 10-years to finish into a problem that takes 1-day to finish.
You only get good scaling when all the data fits in memory and the problem scales well, but that happens often enough that its worth studying these cases.
I am going to be the funny person and say that 4k CPUs can solve a scheduling problem on time so that jobs for 10Mi Idle CPUs can be assigned on time.
But yeah, there are problems where ~ 200x CPU power can make a lot of difference, especially if you're time bound (that's roughly solving in 2 days what 16 CPUs would solve in 1 year)
Part of the Plan9 design (which is almost 30 years old at this point) was the uniform access to computing resources across machines over the network. Unfortunately, we've mostly abandoned that and stuck with mainframe design.
Surely that's the whole point of high-level languages?
C: "See, you're doing it wrong! I've got a really lean standard library!"
Java: "Ah, so you just focus on the basics. You have a dictionary structure? That's pretty basic."
The same is true of my computer. I once worked at a Unix shop where people would routinely log into other peoples computers to do builds. It locked the machine up (this was in the 90s) and made it hard to do anything else on the computer. The whole point of a personal computer it to have all of the power there for your use when you want it.
That's such an interesting comparison that I relate with. If availability and startup time are good enough, I'd be happy to use a shared resources for both CPU and cars.
As a result of competition, the price of any service tends toward cost of the service. So a $2000 computer with 8 cores that lasts 4 years + $500 for power/year + $500 for support/year is $1500/year or $0.17/hour. That's $0.021/core/hour, and the difference is a small amount of profit for Amazon and a buffer for when the CPU is idle.
The other factor is power. That $500 for power for a year won't be $500 if the CPU is running full-tilt all the time, it'll be much more. Modern CPUs are designed to be power-efficient when idle, and then be able to perform many computations quickly when demanded, even if this actually exceeds the hardware's ability to dissipate heat (in which case the CPU throttles itself to avoid overheating). For computers used interactively by people (i.e., laptops, desktops, but not servers), the workload tends to be extremely bursty, with the CPU doing nearly nothing most of the time waiting for the user to do something, and then suddenly having to do a lot of work quickly when demanded (e.g., rendering a bloated webpage or watching a video).
In short, CPU time isn't just being "wasted"; modern CPUs are explicitly designed to be used this way.
It's only a penny for the big industrial users.
But I think electricity tends to be more expensive in other parts of the country.
Also, don't forget that humans also come with powerful high precision mobile actuators and unmatched sensor arrays. And that humans come pretrained in large array of complex skills including object recognition, text to speech etc. My question would actually be reverse of what is posed in the article: Why isn't Mechanical Turk far more lucrative business than AWS?
So in conclusion, if you have $1M lying around, you more likely to find more profitable endeavor by renting humans than same amount of compute capacity in cloud. Price of GFLOPs is falling however at about 10X every 13±3 years. So possibly in 20-30 years things might be different.
We might need hundreds of hours of CPU time for a simulation, then nothing while we figure out what to do next, then another hundreds hours to run a modification.
Now that CPU time is so readily available vs. the mainframe era or even just the pre-cloud era, there's far less capacity-forced "figure out what to do next" time unless you've got an extremely computationally heavy problem or are very resource-constrained. As such, there's a lot less unmet demand for compute out in the world, which naturally brings cost down.
It isn't practical to buy compute power from a diverse set of CPU owners, because any of them may be malicious, and this problem only increases with scale. The only exceptions are cases where you can afford or mitigate malicious CPU owners, which doesn't lend itself to general computing.
At that point you're optimizing for the market of your good/service relative to it's cost, and not optimizing for CPU usage.
Many decades and untold billions in R&D went into modern computers. I wouldn't expect them to be anything but extremely cost-effective, and they are.
With most electronics, CPUs and GPUs included, the killer is heat. As long as you have the heat under control you're fine. Some applications such as laptops can't keep the thermals undercontrol at sustained 100% CPU usage, so those are obviously at risk.
The only other form of "wear" is electron migration and I HIGHLY doubt you'll kill a modern processor "in a week", even if you left it running at 95C .
It was originally designed to soak up CPU cycles on unused desktop machines, but I usually used it in dedicated clusters.
I suppose the modern batch-processing hotness is Docker/Kubernetes, which are very heavy-weight for that usage.
I wonder if access was democratized, would demand increase? Jevon's Paradox in action?
1) We hit a brick wall with silicon clock speeds. Silicon apparently can only go 3-4GHz; after that, there's too many switching losses, too much power used, etc.
2) Because of #1, we jumped on the multi-core bandwagon. This worked OK for a while, but most tasks can only be broken up and run in parallel so much. You can't just throw 1000 cores at every problem and expect it to scale. For anything with user interaction, this is especially true, so there's no point in having more than 4-8 cores on a single-user machine.
3) For the stuff normal people do, there just isn't much demand for more speed any more. How much faster do you need MS Excel to calculate your spreadsheet, or PowerPoint to show you slides?
4) cpu speed isn't the limiting factor many times now. Disk, memory, network, user input etc all are much more impactful honestly.
Sure getting a Blender run down 10% is huge, but what is that time saved compared to how long setting up the render took?
Using photolithography to build features with sizes smaller than the wavelength of the laser being used has been normal for years now. That's the level of ridiculousness we're talking about here.
Data centers are easier to scale by just adding more nodes. Power matters a lot in data centers but otherwise miniaturization is less critical and single threaded performance is less critical.
If you mean the clock speed specifically, then it's largely due to the inability to manufacture smaller gate widths in silicon. The Core 2 architecture by Intel, for example, uses a 45 nanometer gate width for transistors in each core. Core 2 was part of the Penryn family. The latest family is Nehalem, and it, too, uses 45 nanometer gate widths. Core i5 and i7 belong to this family, among others.
Since the gate widths didn't shrink from the Penryn family to the Nehalem family, the power consumption of a single state change in a given transistor didn't decrease. Since the heat dissipation (and, therefore, power consumption) is proportional to both the gate width and the clock speed, this new architecture couldn't change the state of the transistors any faster than the previous one. Therefore, core clock speeds remained pretty constant.
Getting to 45 nm was really tough. Going to the next frontier, which will likely be 32 nm, will be even tougher. So tough, in fact, that STMicroelectronics, Freescale Semiconductor, NXP Semiconductor and Texas Instruments have all decided to stop their process research. An article in 2007 claimed that Intel, IBM and Matsushita, AMD and Renesas would be the only organizations still pursuing R&D in this area. That's a vastly reduced set of brains and dollars on the gate width problem.
If your question about "speed" is more general, well, then there's another discussion around multi-core architectures that's also fascinating. The primary technical advances in Nehalem versus predecessor families are its multithreading, caching, bus and memory management schemes. If you keep each core at 3 GHz, how can you efficiently use two 3 GHz cores to get, say, 1.5 times the speed of a single core? How can you efficiently use four 3 GHz cores to get, say, 1.5 times the speed of two cores? in this respect, processor speeds have increased significantly in the last 5 years, and will continue to do so as software is written to take advantage of these new architectures.
But, then again, when was the last time you really found yourself waiting for your processor? It was probably your disk, your network or your brain that was the bottleneck in the first place. :-)