Clock speeds seem to have leveled and IPC will only see another gain of 50-100%. Single threaded performance is close to the limit. What after that? Is this the end?
This is commonly claimed but it's actually false for x86_64 desktop parts. For a single core scalar integer workload the IPC boost from i7-2700k to i7-7700k was maybe 20-25% on a great day, but the base frequency increase was a further 20%, and max boost freq increase ~15%. The frequency increase is of similar importance as the IPC increase.
Was it the end of those industries?
Welcome to mature technology.
Economically? I think this or last year.
That's not a 50% or 100% improvement.
Maybe they'll get that improvement once they run recycled rockets all the time, but not before that.
Currently, SpaceX has prices around 56-62 million USD per launch of a normal satellite (with a weight and orbit where they can recover the first stage).
Arianespace launches such lighter satellites in pairs, always two at once, at a price of around 60 million USD per satellite.
The Chinese launchers offer the same at around 70 million USD per launch.
So, the prices aren’t that different.
But, for launches from reused rockets, SpaceX is damn cheap. The first launch on a reused rocket cost below 30 million USD.
So, to recap: Today, in best case, SpaceX is between 4 and 13% cheaper than the next competitor. But in a few years, once they launch mostly reused rockets, they’ll be around 50 to 60% cheaper than the next competitor.
Long term, Musk is shooting for a ~100x reduction in launch costs to make a Mars colony feasible. Hope he makes it.
Also more specialised cores e.g. DSP, and customisable hardware i.e. FPGA.
It wasn't explained in the benchmark, but the only reason I could imagine was the Iris chip worked as an L4 cache because the benchmark was not doing graphics stuff. That is what the Iris chip does, it sits right there in the socket with a whole bunch of memory available for the iGPU or work as L4 cache if available.
It's also a great way to do (almost) zero cost transfers from main memory to (i)GPU memory -- you'd do it at the latency of the L3/L4 boundary. With intel, that unlocks a few GFLOPs of processing power -- in theory, your code would have to be adapted to work this in a reasonable way, of course.
To sum things up, I agree with you, memory is a path that holds big speedups for processors. Don't know if "the Iris way" is the best path, but it indeed showed promise. Shame that Intel decided to lock it up for the ultrabook processors mostly.
My new Thinkpad has nvme and the difference is huge compared to my very fast desktop at work which has SATA connected SSD's.
This is behind much of the interest in machine learning these days. Deep learning provides a way to approximate any computable function as the composition of matrix operations with non-linearities. It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.
Thanks, and I wish this sentence was one of the first things I read when I was trying to figure out exactly what Deep Learning really meant. It's much more comprehensible than the semi-magical descriptions that seem far more prevalent in introductory articles.
It's also fascinating that a seemingly simple computing paradigm is so powerful, kind of like a new Turing Machine paradigm.
This actually describes neural networks in general, not so much "deep learning".
Deep learning comes from being able to scale up neural networks from having only a few 10s or 100s of nodes per layer, to thousands and 10s of thousands of nodes per layer (and of course the combinatorial explosion of edges in the network graph between layers), coupled with the ability to process and use massive datasets to train with, and ultimately process on the trained model.
This has mainly been enabled by the cheap availability of GPUs and other parallel architectures, coupled with fast memory interconnects (both to hold the model and to shuttle data in/out of it for training and later processing) and the CPU (probably disk, too).
But neural networks have almost always been represented by matrix operations (linear algebra), it's just that there wasn't the data, nor the vast (and cheap) numbers of parallelizable processing elements available to handle it (the closest architectures I can think of that could potentially do it in the 1980/90s would be from Thinking Machines (Connection Machines) and probably systolic array processors (which were pretty niche at the time, mainly from CMU):
These latter machines started to prove some of what we take for granted today, in the form of the NAVLAB ALVINN self-driving vehicle:
Of course, today it can be done on a smartphone:
The point, though, is that neural networks have long been known to be most effectively computed using matrix operations, it's just that the hardware wasn't there (unless you had a lot of money to spend) nor the datasets - to enable what we today call "deep learning".
That, and AI winters didn't help matters. I would imagine that if somebody from the late 1980s had asked for 100 million to build or purchase a large parallel processing system of some form for neural network research - they would've been laughed at. Of course, no one at that time really knew that what was needed was such large architecture, nor the amount of data (plus the concept of convolutional NNs and other recent model architectures weren't yet around). Also - programming for such a system would have been extremely difficult.
So - today is the "perfect storm", of hardware, data, and software (and people who know how to use and abuse it, of course).
>> It does this at the cost of requiring many, many times the computing power. But much of this computing cost can be parallelized and accelerated effectively on the GPU, so with GPU cores still increasing exponentially, at some point it's likely to become more effective than CPUs.
So can be any matrix. Sadly, there aren't as many algorithms that are efficiently represented by one.
Do you know what papers that was? I would have thought that with infinite transistors you could speculatively execute all possible future code paths and memory states at the same time and achieve speedup that way.
Speculation can only take you so far. How do you speculatively execute something like:
a = a + b[x];
You can't even speculatively fetch the second operand until you have real values for b and x.
Trying to model all possible values explodes so much faster than all possible control paths that it's only of very theoretical interest.