
Fifty or Sixty Years of Processor Development for This? - curtis
https://www.eejournal.com/article/fifty-or-sixty-years-of-processor-developmentfor-this/
======
nostrademons
Wonder what this means for system software and application development.

There's a factor of 10-40x speedup by going from an interpreted language like
Python/Ruby/PHP to a tight compiled one like C++/Rust/Ocaml. 2-4x going from a
good JIT like V8 or Hotspot (or Go's runtime, though technically not a JIT).
Probably another 10-100x by cutting out bloated middleware like most web
frameworks or the contents of your node_modules.

All this was irrelevant when you could get your 2-4x speedup by waiting 18
months, and your 10x speedup by waiting 5 years. It's very relevant when your
2x now takes 20 years and 10x takes a lifetime. Maybe this is why Rust gets so
much attention recently.

~~~
jashmatthews
I run a production Rust web service. The speedup for this service over using
slightly stripped Rails was only about 5x. As you said, you can gain like
50-100x performance improvements from not using the default Rails JSON
serialization and skipping ActiveRecord.

After that, you're lucky to gain 5x performance from re-writing the whole
thing in Rust. Most of the hot spots of serving web applications using Ruby
are already written as native extensions.

I think Rust is fantastic. I'm writing a tinyrb like "Ruby" VM in Rust at the
moment. But... it's just not worth the hassle for plugging web services
together. Maybe if you're at Google scale and already have web services in C++
it'd be a good choice.

~~~
0xffff2
I find the fact that anyone can speak dismissively about a 5x speedup
disheartening. Has anyone ever done a study on how much CO2 we are emitting in
the name of "developer productivity"?

~~~
nostrademons
It's probably less than you think. Humans - just by virtue of existence -
produce a _huge_ amount of CO2, both through the air they breathe, the meat
they eat, the automobiles they get to work in, the heavy machinery used to
build those roads & buildings, the manufactured goods they consume, etc. And
the CO2 cost of a developer isn't just that one developer's emissions; it's
also those of all the support staff needed, from managers/admins/HR at work to
the food service workers that serve them meals out to the
doctors/lawyers/therapists and other service providers they visit to the
parents that raised them.

It's almost certain one developer generates more CO2 than any reasonable
number of servers that run their code. Anything that reduces manpower costs is
a net positive for emissions. Besides, when the equation changes (say, when
the software enters maintenance mode but the servers stay up), they'll be a
strong economic incentive to spend the developer time to rewrite it more
efficiently.

~~~
nkurz
_It 's almost certain one developer generates more CO2 than any reasonable
number of servers that run their code._

I'm not so sure. Let's do a back-of-the-envelope estimate.

Assume a single really hefty server that consumes 1 kilowatt. Over one year,
this is about 10,000 kw hr. 1 kw hr of electricity produced by a coal fired
plant generates about 1 kg of CO2 ([https://carbonpositivelife.com/co2-per-
kwh-of-electricity/](https://carbonpositivelife.com/co2-per-kwh-of-
electricity/)). Thus that big server running for a year produces about 10
metric tons of CO2.

An average American lifestyle (all in, total country production divided by
population,
[https://www.theguardian.com/environment/datablog/2009/sep/02...](https://www.theguardian.com/environment/datablog/2009/sep/02/carbon-
emissions-per-person-capita)) involves the production about about 20 tons of
CO2 per year. So if you write code that full-time on more than 2 really big
servers per year, your code might be producing more CO2 than the rest of your
lifestyle.

I'm guessing that most of the errors in this are probably overestimating the
code's CO2 (probably not coal fired, probably less than 1 kw, a year is less
than 10,000 hours), so more realistically maybe it's 4-8 servers to be break-
even? Still, I think it's fair to say that there are some participants in this
forum whose running code probably generates more CO2 than the rest of their
lifestyle.

------
narrator
Speaking of Moores law being dead this time, check out this old article from
2012 predicting we would be at 7nm Intel chips with 5nm on the way:
[http://www.tomshardware.com/news/intel-cpu-
processor-5nm,175...](http://www.tomshardware.com/news/intel-cpu-
processor-5nm,17578.html)

Intel is still trying to figure out 10nm because it is rumored that there are
material science problems that are causing yield issues. Remember the 1960s
when rapid gains in space tech made everyone think we'd be travelling around
the solar system by 2000? The tech hit a plateau and stopped. Maybe we're in
that situation with chip technology...

~~~
atomicnumber1
Why do we have to go below 10nm at all? We'll have hard physics limitations.
Can't we improve on other frontiers? Say more cache?, better design ? More
cores? Etc. I don't know.

~~~
tremon
Probably the #1 area that can produce results is avoiding/conquering the
processor-memory gap: while processor performance has been growing
exponentially, memory (bandwidth) performance has basically grown linearly.
There is now a factor 1,000 difference between processor/memory speeds
compared to 1980.

One of the areas that I have much hope for is near-data processing: since
processors scale so much better, pretty much every peripheral device already
has its own microcontroller. The idea behind NDP is basically to offload some
data-heavy processing to the data layer. What if your disk layer could already
preselect your data so the database wouldn't have to read and discard so many
rows for each query? What if the network controller could evaluate your
firewall rules itself, so dropped packets wouldn't have to interrupt the main
CPU?

~~~
CountHackulus
So NDP is essentially what Commodore did with their 1541 disk drive. That disk
drive had a 6502 in there to complement the 6502 in the actual VIC-20.

From what I remember, the IBM System Z mainframes also do this sort of thing
and have dedicated IO processors that can decode XML on the fly for you and
other fun things like that.

~~~
jacquesm
Every modern hard drive is a computer in its own right.

~~~
sliken
Seagate had a cool project where each hard drive ran linux and they used the
physical sas cable to run a 2.5 Gbit network (or two actually) per drive.

So you could use that as block storage for luster, hadoop, or similar and
enable things like direct disk to disk copies.

Cool idea, seems unlikely to hit a reasonable price point though.

------
api
The thing that really killed plain vanilla RISC is memory latency. Compared to
on-die registers and cache memory might as well be disk. True RISC is more
efficient to execute but it results in more instructions and hence more code
that has to be read from RAM.

Modern CISC chips that immediately unpack CISC into RISC micro-ops are really
something that I've termed "ZISC" \-- Zipped Instruction Set Computing. Think
of CISC ISA's like the byzantine x86_64 ISA with all its extensions as a
custom data compression codec for the instruction stream.

We got ZISC accidentally and IMHO without us realizing what we'd actually
done. The x86_64 "codec" was not explicitly designed as such but resulted from
a very path-dependent "evolutionary walk" through ISA design space. I wonder
what would happen if we explicitly embraced ZISC and designed a custom codec
for a RISC stream that can be decompressed very efficiently in hardware? Maybe
the right approach would be a CPU with hundreds of "macro registers" that
store RISC micro-op chunks. The core instruction set would be very
parsimonious, but almost immediately you'd start defining macros. Of course
multitasking would require saving and restoring these macros which would be
expensive, so a work-around for that might be to have one or maybe a few
codecs system-wide that are managed by the OS rather than by each application.
This would make macro redefinition rare. Apps are compiled into domain
specific instruction codec streams using software-defined codec definitions
managed by the OS.

The neat thing about this hypothetical ZISC is that while 99% of apps might
use the standard macro set you could have special apps that did define their
own. These could be things like cryptographic applications, neural networks,
high performance video encoders, genetic algorithms, graphics renderers,
cryptocurrency miners, etc. Maybe the OS would reserve a certain number of
macros for user application use.

~~~
deepnotderp
I agree with a lot of what you said, but ZISC already stands for zero
instruction set computing.

Also, RISC and CISC instruction cache hitrates are pretty similar.

~~~
api
Ahh I forgot about zero instruction set computing. Maybe CISC should just
stand for Compressed Instruction Stream Computing because on today's chips
that's exactly what it is.

Cache hit-rates being similar may just show that the ad-hoc evolved
compression codecs represented by CISC instruction sets are sub-optimal, hence
my point about what might happen if we intentionally designed a CPU with on-
board compression codec support for the instruction stream.

------
hinkley
At the end of this he says transistors are now doubling every twenty years(!?)
and it reminded me of another law Patterson doesn’t include in his graph:

    
    
        Proebsting’s Law: improvements to compiler technology double the performance of typical programs every 18 years.

~~~
marvy
The derivation of that law is very suspect:

[http://proebsting.cs.arizona.edu/law.html](http://proebsting.cs.arizona.edu/law.html)

(go on, it's just a paragraph.)

The key issue that this ignores in my opinion, is that a compiler optimization
will rarely make last year's program faster, but it will make next year's
program faster. Why? Because if the compiler can't make an optimization,
programmers will do it by hand, even if it makes the code worse in some way.

For instance, if your C compiler can't inline small functions, you would use a
macro instead. When it finally starts learns to inline, your program won't get
any faster, but the next version will be able to use functions in places where
macros are a bad fit.

Pile up enough of these optimizations, and eventually it starts to feel as if
you're coding in a higher-level language than before, even though the syntax
that's accepted by the compiler never changed.

~~~
hyperpallium
> programmers will do it by hand

Only if better performance is needed.

Thus, corrollary: compiler technology will double program performance every 18
years, _but only if it doesn 't matter_.

~~~
hinkley
Developers have a nasty habit of convincing themselves that things aren’t
needed when they see them as too difficult. Even if the rest of the world
thinks your code is too slow you can convince yourself it’s good enough.

And in a world where we rely more and more on libraries, my ability to improve
on a piece of code is greatly curtailed. Sending in the compiler to help might
be my best option.

------
Animats
Yes, we're kind of stuck on individual CPU power. Clocks have been around 3GHz
for a decade now.

There are now architectures other than CPUs that matter. GPUs, mostly. "AI
chips" are coming. And, of course, Bitcoin miners. All are massively parallel.
What hasn't taken off are non-shared-memory multiprocessors. The Cell was the
only one ever to become a mass market product, and it was a dud as a game
console machine.

~~~
PostOnce
Perhaps it (Cell) would not have been a "dud" as you put it had IBM not been a
morally bankrupt villain.

I've read that Sony was under the impression that the licensing agreement
meant that IBM would market Cell tech to other customers, those customers
being in other computer markets like datacenters and stuff, rather than to
Microsoft, for the 360, at the same time that the PS3 was still in
development.

"As the book relates, the Power core used in the Xbox 360 and the PS3 was
originally developed in a joint venture between Sony, Toshiba and IBM. While
development was still ongoing, IBM–which retained the rights to use the chip
in products for other clients–contracted with Microsoft to use the new Power
core in their console. This arrangement left Sony engineers in an IBM facility
unknowingly working on features to support Sony’s biggest competitor, and left
Shippy and other IBM engineers feeling conflicted in their loyalties."

from [http://gamearchitect.net/2009/03/01/the-race-for-a-new-
game-...](http://gamearchitect.net/2009/03/01/the-race-for-a-new-game-
machine/)

(it's a book, and worth reading)

~~~
slavik81
I have not read the book, but I have a hard time imagining any world in which
the Cell could possibly be successful. Its heterogeneous architecture thrust a
huge amount of complexity onto software developers in exchange for meager
gains. Writing good code for it was difficult and expensive compared to other
platforms. Sony was just completely out of touch with reality.

In 2007, Gabe Newell famously complained that the Cell was "a waste of
everybody's time. Investing in the Cell, investing in the SPE gives you no
long-term benefits. There's nothing there that you're going to apply to
anything else. You're not going to gain anything except a hatred of the
architecture they've created."

~~~
wolfgke
> I have not read the book, but I have a hard time imagining any world in
> which the Cell could possibly be successful. Its heterogeneous architecture
> thrust a huge amount of complexity onto software developers in exchange for
> meager gains. Writing good code for it was difficult and expensive compared
> to other platforms. Sony was just completely out of touch with reality.

This was a different time. At that time researchers tried to build clusters
out of PS3s - because the speed advantages of the Cell made it worth and
"regular" Cell clusters were much more expensive. Some years later GPGPU
became feasible and one could forsee that it will become faster than the Cell,
too, in near future - and at that time the same kind of researchers dropped
their PS3 clusters and built GPGPU clusters. Don't tell me that particular in
the beginning GPGPU was easier to program for than the Cell.

It was also the time when Apple switched to Intel CPUs. I know at that time
IBM was also trying to sell the Cell to Apple, but Steve Jobs refused and
decided for Intel instead.

This decision of Apple and the decisions of researchers to stop tinkering with
PS3 clusters and build GPGPU clusters instead were in my opinion the two
landslides after which the fate of the Cell was destinied.

~~~
slavik81
I'm sorry, but I pretty much entirely disagree.

> Don't tell me that particular in the beginning GPGPU was easier to program
> for than the Cell.

The alternative was to use bog-standard homogeneous cores.

Yes, the air force bought a compute cluster of PS3s for some specialized
calculations. I wouldn't read too much into that. It says little about the
suitability of the architecture for more general purpose computing.
Supercomputers were always weird.

> It was also the time when Apple switched to Intel CPUs.

I don't believe there was much chance of Apple moving to Cell. Their switch to
Intel was because IBM could no longer seriously compete outside of a few
niches. There's nothing positive to infer from IBM's unsuccessful pitch to
Jobs.

> This decision of Apple and the decisions of researchers to stop tinkering
> with PS3 clusters and build GPGPU clusters instead were in my opinion the
> two landslides after which the fate of the Cell was destinied.

You're assigning far more importance to research group purchases than I think
is warranted. They don't buy enough to create economies of scale. That's why
researchers so frequently adopt consumer products already manufactured at
scale, like the Novint Falcon, Microsoft Kinect, and gaming graphics cards.

The Cell was best-in-class for a few specialized use cases, but it was never
going to take the world by storm. If we turn to a heterogeneous architecture
in the future, it will be begrudgingly, after all simpler alternatives have
been exhausted.

~~~
wolfgke
> You're assigning far more importance to research group purchases than I
> think is warranted. They don't buy enough to create economies of scale.
> That's why researchers so frequently adopt consumer products already
> manufactured at scale, like the Novint Falcon, Microsoft Kinect, and gaming
> graphics cards.

This is true, but in the consequences I have to disagree: Very often from this
kind of "abusing" consumer products for research purposes there emerge quite
interesting applications that _do_ become quite popular and economically
important. For example from such research there came the idea to use the
Kinect as a 3D scanner - from this commercial applications emerged. Or from
GPGPU (which at the beginning NVidia was quite the opposite of enthusiastic
about) CUDA and later OpenCL emerged (which is much better to program for than
abusing vertex and fragment shaders).

That is why I considered it is quite important for the future of the Cell when
researchers went from tinkered PS3 clusters to GPGPU and called this a
"landslide event for the future of the Cell".

------
monochromatic
Mirror:
[https://web.archive.org/web/20180404023027/https://www.eejou...](https://web.archive.org/web/20180404023027/https://www.eejournal.com/article/fifty-
or-sixty-years-of-processor-developmentfor-this/)

~~~
jaytaylor
Also here: [https://archive.is/tY5Cl](https://archive.is/tY5Cl)

------
tfmkevin
The problem is that we have designed ourselves into an architectural cul-de-
sac when it comes to processors. We have fifty-plus years of evolution on
programming methodologies built on top of von Neumann architectures. Moore's
Law has given us decades of exponential gain without significant challenge to
that architecture, and now that Moore's Law is reaping diminishing returns in
terms of compute performance we are in the situation where we'd have to go
backward forty years on our programming model in order to take advantage of a
superior (given today's technology) architecture. For example, FPGAs can in
many cases outperform von Neumann machines by orders of magnitude in terms of
compute performance and (more importantly) performance per watt. However, the
programming model and ecosystem for FPGAs is worse than primitive. Something
you could write in a couple hundred lines of C code could take months to get
up and running on an FPGA. We need a way to transition from von Neumann
computing to alternative architectures without starting over on computer
science. Or, perhaps recent trends in neural networks will eliminate the need
for that?

~~~
scroot
Just this afternoon I finished reading David Harland's 1988 book "Rekursiv:
Object-Oriented Computer Architecture". It describes a completely different
way of designing machines at the low level that can support better programming
environments at the high level. You might want to check it out.

I believe we are going to see further balkanization between different
operating systems / programming systems and computers based upon what they are
use for. Cloud services will be the domain of what today we call "systems
programmers" who work in compiled languages and care about speed. In contrast,
we might now be able to get real "personal computers" running environments
that teach their users how to peel back the layers and manipulate them — the
long sought personal computing medium. This all could have happened back in
the 80s, but we didn't have widespread or fast use of the Internet. Now it's
different, and both of these types of systems can interop together in the
blink of an eye because of it.

Both will require completely new computing architectures.

------
mcjiggerlog
It's not all bad - one upside is that you don't need to upgrade your hardware
anywhere nearly as often as 10 or 20 years ago.

I put this PC together in 2013 for maybe £500-600 total and apart from adding
some RAM I haven't needed to upgrade anything and can still run games on
highish settings.

~~~
criley2
You can probably run 2016-2018 games on medium settings if you are not
interested in 60fps. I imagine you can play no graphically intense game @60fps
at any respectable resolution.

I say this because building a computer which can play say Assassin's Creed
Origins or Far Cry 5 at 1080p60 High Settings would easy run you over a $1000
right now, due in no small part to the extravagantly over-priced GPUs.

Heck, it costs $400-600 to get a GPU to play those games on medium to medium
high right now. Not a computer, JUST the graphics chip to get 60fps on medium.

Crypto has destroyed affordable PC gaming and it makes me so sad. I can
recommend Alienwares on sale that are dramatically cheaper than self-built.
What happened to this industry :(

~~~
mort96
60 FPS? I had a desktop I built in 2014, with a 4770k and an r9 290x, run
Overwatch at 144 FPS at 1440p with low settings. The machine could still play
Just Case 3 and GTA 5 (2015 games, but I didn't really play any graphically
intensive 2016+ games on it) at 1080p 60 FPS with decent graphics settings, if
I recall correctly.

I have since upgraded to a 6700k and 1080Ti, but that 2014 hardware lasted
well into 2017 - and the current GPU cost just under 3/4 the price of the
entire 2014 computer, despite the r9 290x being a top of the line GPU. High
end PC gaming definitely isn't affordable anymore.

~~~
AstralStorm
That is mostly due to cost of GPUs having been inflated by miners and perhaps
the expense of having a huge monitor.

Neither CPU nor GPU are progressing as fast as some predicted anymore.

Additionally the shift to consoles as stable hardware platforms over time has
put a damper on computing power required by economically viable games.

The remaining outlets are VR and huge resolution (same thing actually) - and
high quality and fidelity simulations. (Including AI.)

------
steve_musk
What is the limit on creating bigger chips? If some of the money/effort was
focused on being able to fab larger chips instead of decreasing feature
size... I don't know much about lithography so maybe the answer is obvious to
those that do.

~~~
KaiserPro
Chips are fabb'd on a large wafer, which is then split up (see here:
[https://s3.amazonaws.com/ksr/assets/003/150/280/9b6a64c4d8ed...](https://s3.amazonaws.com/ksr/assets/003/150/280/9b6a64c4d8ed23068b87cd56737810ca_large.jpg?1421462097)
)

Now, the process isnt perfect, and you hear a lot about "yield" Which is
basically how many chips on a wafer are not working to spec. Now, as you make
a chip bigger, you increase the chance of a mistake. This reduces the "yield"
and drives up the cost. (I'm not sure if its actually possible to make a full
sized wafer without a mistake, I'll defer that to someone who knows)

In some cases those broken chips arn't all that bad, so they are shipped with
the broken bits deactivated (This could be lies, but I think some AMD procs
were done like this )

yes, there are other factors like propagation time, but thats solved by not
having chip wide cache coherency.

~~~
sp332
You don't increase the chance of a mistake, you increase the cost of a mistake
because each little defect means you're throwing out a whole chip. The larger
each chip is, the more expensive each little defect is.

Sony had a hard time when they were ramping up Cell processor production, so
they designed the chips with 8 SPEs but only shipped them with 7 activated.
That way if a defect happened to be in one of the SPEs, they could just turn
it off and still ship the chip.

------
fouc
Found an older youtube video that touches on the same topic:
[https://www.youtube.com/watch?v=1FtEGIp3a_M](https://www.youtube.com/watch?v=1FtEGIp3a_M)

------
rstuart4133
There is a lot of focus on the end of Moore's law here, but it isn't main
driver of what's happening. The slow down we are seeing is driven by the end
of Dennard scaling.

There is a thermal dissipation limit of 200W per chip for air cooling. We hit
that decades ago of course, but it didn't matter while Dennard scaling kept
dropping the power consumption. Once that stopped we squeezed a bit more of
stuff by being more power efficient, which boiled down to two things - turning
stuff off when it wasn't needed and devoting the transistors Moore's law gave
us to specialised tasks (like silicon dedicated to encryption or h264
encoding) that did the job more efficiently. However, that doesn't get you
very far.

Which is probably why he didn't mention 3D, even though we have it 32 layers
of it now and they are talking about 256 layers. What is the point of having
128 CPU's on a single die just running 4 of them exceeds your power budget?
Indeed, what's the point of spending billions perusing Moore's further?

Or to put it another way again, the human brain fits roughly the same number
of synapses per unit volume as modern 3D silicon has transistors. The brain's
raw switching speed is roughly 1,000,000 times slower than silicon (1ms vs
1ns), but power consumption of a synapse vs a transistor is roughly 100,000
times better.

So while AlphaZero learnt to play Go between than any human in a few days, it
used more energy that an entire human (not just their brain) would use in
several life times to do it.

------
jtbayly
I wonder what will happen to the CPU, especially if it’s not speeding up much
anymore. Perhaps with Apple doing its own chips, the CPU will just have less
and less of the work assigned to it.

~~~
ianai
At some point it was going to be germanium to replace silicon.

~~~
resource0x
Remember GaAs? It was the Future in 1980.

~~~
deepnotderp
He's talking about germanium channel FinFETs, GaAs was intended to entirely
replace silicon, which probably was never going to happen.

~~~
Tobba_
GaAs logic is probably happening at some point, just not yet. All improvements
like that which would be incredibly expensive to develop will be held off on
until all cheaper options have been exhausted. It does seem to be slowly
moving though.

------
hyperpallium
So Moore's Law really is dead this time. And TPU's are only faster than GPU's
through lower precision.

Will compute at least keep getting _cheaper_ , perhaps through economies of
scale?

Is Kurzweil's magical next information technology, to carry on the
exponential, anywhere in sight?

~~~
nootropicat
Imagine asics for everything. I mean everything, like implementing a
javascript engine directly. The energy efficiency could go up 2-3 orders of
magnitude (looking at the difference in bitcoin mining between gpus and
asics).

Old gaming consoles had cartridges (with memory); I can imagine a future in
which complex software is transported in the same manner, except cartridges
contain specialized asics. Or perhaps a step forward - a chip making device in
every home, an equivalent of sorts of burning music to cd.

~~~
iainmerrick
_I mean everything, like implementing a javascript engine directly._

In that case, rather than a Javascript engine, wouldn't you have an ASIC for
the script itself?

~~~
yjftsjthsd-h
You wouldn't want a new chip per website. Maybe hardware handling of a
standard library, though.

~~~
AstralStorm
Or even just of the expensive operations like synchronization, context
switches, DMA to cache from network interface...

Wait. We have most of that already. :)

------
strainer
This should be a sobering account to the remarkably popular theory, that given
enough time we should inevitably make powerful enough processors to model a
universe in sufficient 'detail' that creatures within it will be convinced
they live and are as important as we find ourselves, in this one.

~~~
quickthrower2
If I were to do that, I'd put some constraints in to limit how much the
creatures can explore of that universe. For example a maximum speed at which
any matter can travel, for one.

~~~
strainer
Even with the parallelism afforded by speed limits, a computer many trillions
of times as powerful as we have today could not model the thoughts and life
experiences of the billions of human beings and other creatures in this
planet.

Its not even clear that modelling thought in a virtual world has any
equivalence to thinking in this world.

It is clear that we are unlikely to ever model anything nearly as complicated
as this.

~~~
eloff
I don't think it's clear. How long before our digital neural networks exceed
the complexity of our human biological ones? It's possible today with a super
cluster of sorts. That is to have roughly the same number of connections
between neuron type thingies - it still wouldn't be able to match our brains
functionally - that's another problem. And yes our biological neurons are way
more sophisticated than what we use today for AI, but again that can be
overcome eventually.

This suggests that it's possible one day to have computers some orders of
magnitude better. If you look at it from first principles, of course it's
possible. The brain is unlikely to be the most efficient design of neural
network allowable in this universe. So given enough time, we'll learn how to
build it better.

Then it's just a manufacturing and energy problem to match the number of human
minds on the planet. So no, I don't think it's impossible at all.

Just ridiculously freaking hard, and not likely to happen in our lifetimes.

~~~
mjburgess
It's not a matter of it being "hard". Nor is it a matter of "complexity" (how
many parts something has).

A simulation is a model which picks out a tiny subset of regularities in the
target to _model_. There is an infinite density of such regularities to pick
upon, because _we_ are imposing the structure on the target in order to model
it.

The target of the model has no "model structure" it has causal structure. That
is, when light interacts with the surface of a mirror its interaction isnt
"abstract", ie., some description. It is an actual photon interacting with an
actual electric field, etc.

To "model to infinite density", ie., to have every single test that can
possibly be applied to a model come out identical to that test of the target,
the model needs to be just another example of the target.

The only thing which can be investigated in _any_ way to behave as light
hitting a mirror, is light hitting a mirror.

A digital computer is just an electric field oscillating across a silicon
surface. It cannot be programmed into being a mirror, nor into being light.

Programming gives the electric field a "model structure". Chalk gives a
blackboard a "model structure". Lego gives a bridge a "model structure".

Programming cannot not -- it is _impossible_ \-- give silicon the causal
structure of light interacting with a mirror.

Model structure is actually just an observer-relative isomorphism: when the
user of the computer (chalkboard, lego,...) looks at it, the _user_ , is able
to inform himself of the target by use of the model. To do so the _user_
identifies certain aspects of the model with the target. The model is not _at
all_ causally alike the target.

No amount of lego will make a lego brain. No amount of oscillation in an
electric field will make a thought. Neurological activity, and indeed every
causal mechanism of the universe, is only _described_ by a model.

~~~
narrator
My argument against simulation is similar in thst certain algorithms for
modeling physical processes can only run in exponential time. Protein folding
is a good example. How could a computer simulation perform exponential time
operations efficiently? It wouldn't work no matter how big a computer was made
because the complexity would explode very quickly while reality can do it in
real-time.

~~~
adrianN
Quantum computers can solve protein folding efficiently.

~~~
ziotom78
"Can" or "could in principle"?

~~~
adrianN
"Can" as in quantum chemistry is one of the best applications for quantum
computers.

~~~
ziotom78
Sorry for the naivety, I am not an expert of quantum computing, therefore I
asked. I am quite interested in the topic, might you give me some references,
please?

~~~
adrianN
Googling for "quantum chemistry computers" yields a number of interesting
results, e.g.

[https://www.chemistryworld.com/feature/quantum-chemistry-
on-...](https://www.chemistryworld.com/feature/quantum-chemistry-on-quantum-
computers/3007680.article)

[https://www.technologyreview.com/s/603794/chemists-are-
first...](https://www.technologyreview.com/s/603794/chemists-are-first-in-
line-for-quantum-computings-benefits/)

Or if you like something by Feynman:

[http://doc.cat-v.org/feynman/simulating-
physics/simulating-p...](http://doc.cat-v.org/feynman/simulating-
physics/simulating-physics-with-computers.pdf)

------
aap_
> Maurice Wilkes first conceived of microprogramming in 1951

Zuse's Z1 was microprogrammed in 1937.

------
mtgx
> _Consequently, the x86 processors in today’s PCs may still appear to be
> executing software-compatible CISC instructions, but, as soon as those
> instructions cross over from external RAM into the processor, an instruction
> chopper /shredder slices and dices x86 machine instructions into simpler
> “micro-ops” (Intel-Speak for RISC instructions) that are then scheduled and
> executed on multiple RISC execution pipelines. Today’s x86 processors got
> faster by evolving into RISC machines._

Going by that and the graph, then we can conclude that Intel saw the rapid
90's and early 2,000's gains because it was converting its chips into RISC
chips?

Also, that paragraph is basically saying that Intel's architecture has an
extra layer of abstraction - so now we actually see that there _is_ indeed an
"x86 bloat" and why ARM chips seem to be so much more efficient (assuming all
else, including process node is equal). It also looks like Intel may have made
a "mistake" going with CISC decades ago, and it tried to rectify that in the
90's.

~~~
blattimwind
Which is not true, since ARM processors - the faster ones anyway - use micro-
ops as well. Arguably µops are not RISC, either, unless you consider "very
wide instruction word whose bits map to control lines" RISC.

~~~
Narishma
ARM isn't exactly your typical RISC either.

------
JoachimS
I found this article from a Synopsys User Group meeting to be very
interesting. The steps and changes needed to get to 7, 5 and 2 nm are really,
really big:

[https://www.eetimes.com/document.asp?doc_id=1333109](https://www.eetimes.com/document.asp?doc_id=1333109)

------
Const-me
I don’t think the main reason for Moore’s law slowdown is technical one. Intel
enjoyed no competition for quite a few years. They simply lacked incentive to
improve performance of their chips.

In areas with healthy competition, mobile processors and GPUs, Moore’s law
still doing OK.

E.g. here’s a graph I recently made for top of the line single-chip nVidia
GPUs: [http://const.me/tmp/nvidia-gpus.png](http://const.me/tmp/nvidia-
gpus.png) The numbers represent single precision floating-point performance.
The graph is in logarithmic scale and it looks pretty close to the exponential
growth predicted by Moore’s law.

------
PeterStuer
Couldn't find a video of the event, but probably this talk comes close?

"Past and future of hardware and architecture"

[https://www.youtube.com/watch?v=q9KRq2Ns0ZE](https://www.youtube.com/watch?v=q9KRq2Ns0ZE)

------
awiesenhofer
So, whats the way forward? FPGAs on Die? Asics? Or doubling down on EVU,
Germanium, etc?

~~~
Tobba_
I think we're far from the ceiling on CPU performance so far, but we seem to
have hit a (micro)architectural dead end. Currently a _lot_ of time and
transistors is spent simply shuffling data around the chip, or between the CPU
and memory, while the actual computational units simply sit idle. Or
similarily, units that sit idle because they can't be used for the current
task, even if they _should_ be - the FPUs on modern x86 cores are a pretty
good example of this. FP operations are just fused integer/fixed-point
operations, but it's been designed into a corner where it _has_ to be a
special unit to deal with all the crap quickly.

We've probably optimized silicon transistors to death though; that's why it's
coming to a stop now. GaAs or SiGe are some of the alternatives there.
Although there's still quite a lot of advancements there that simply aren't
economical yet. For example, SOI processes at low feature sizes seem to be
suitable for mass-produced chips now, but it hasn't made it out of the low-
power segment yet. MRAM seems to be viable and might be able to provide us
with bigger caches (in the same die area), but right now it's mainly used to
replace small flash memories (plus some more novel things like non-volatile
write buffers, but it's horrifically expensive). So we've probably got a few
big boosts left there, but it's not gonna last forever.

The next obvious architectural advancement right now is asynchronous logic. In
theory, it's superior in every way - power and timing noise immunity, speed
isn't limited by the worst-case timings, no/reduced unnecessary switching (i.e
lower power, meaning higher voltages without the chip melting itself). On
paper, you run into some big problems on the data path - quasi-delay-
insensitive circuits need a _lot_ more transistors and wires, and the current
alternative is to use a separate delay path to time the operations, which is a
bit iffy. You do at least get rid of the Lovecraftian clock distribution tree
that's getting problematic for current synchronous logic. In practice, the
tools to work with it and engineers/designers that know how to work it don't
exist, and the architecture is entirely up in the air. So it's many years of
development behind right now and a huge investment that nobody really bothered
with while they could just juice the microarchitecture and physical
implementation.

~~~
nominatronic
> You do at least get rid of the Lovecraftian clock distribution tree that's
> getting problematic for current synchronous logic.

No, you don't. You make it even bigger and far more complex.

You can take any synchronous design, and refine the clock gating further and
further, to the point where no part of it gets a clock transition unless it
actually needs it on that cycle.

And then when you're finished, congratulations, you've made an asynchronous
circuit.

Fully asynchronous design and perfect clock gating are one and the same thing.

The clock distribution and gating approaches we already have are actually a
sign of progress towards asynchronous design; they're just quite coarse-
grained.

Of course, it's probably not the case that a clock-gating transform of an
conventional synchronous design is also the best possible solution to a
problem, so there's clearly still scope for improvement. But a lot of the
possible improvements are probably equally applicable, or have equivalents in,
optimising clock distribution and gating in synchronous design - because
that's ultimately the same thing as moving towards asynchronicity.

So talking about clock distribution issues as a problem that will just go away
with asynchronous design is misleading.

------
guitarbill
Hmm, doesn't mention ARM once. A bit of an oversight, or a convenient omission
when one is advertising a new RISC instruction set for "purpose-built
processors"?

~~~
_chris_
"Purpose-built" means you can change the ISA to suit your whims, which for ARM
requires you to A) pay for an architecture license and B) pay for the
privilege of changing the architecture.

~~~
guitarbill
I'm sure RISC-V has merits, in fact as a hacker who used microcontrollers I
think that would be great. Certainly ARM isn't the be-all and end-all.
Technically it's not even 1 ISA, but you still know what I meant. Not
mentioning the most* used architecture though? Come on.

* most = number of CPUs shipped

------
XenophileJKO
I seriously wonder if cryogenic computing won't break out of this. From what I
hear it is very promising by several orders of magnitude both in terms of
power and speed.

~~~
hyperion2010
I heard a rumor that the big guys did the math on the energy costs for running
the compressors to keep nitrogen or helium liquid and compared it to their
projected cooling cost for normal computers and found that the compressors
were cheaper. Trick is, apparently no one has a good story for super
conducting circuit parts, so everyone has to start from scratch.

------
kokey
It's also interesting to note that Fabrice Bellard has developed a RISC-V
emulator [https://bellard.org/riscvemu/](https://bellard.org/riscvemu/)

------
tempodox
Is the host down? I can't open that page.

~~~
krylon
It worked for me, but the page took a very long time to load (1 minute or
longer).

------
dannymulligan
Is there a video of this talk available anywhere?

~~~
fouc
Not sure, but this one might be similar:
[https://www.youtube.com/watch?v=1FtEGIp3a_M](https://www.youtube.com/watch?v=1FtEGIp3a_M)

------
excalibur
So the moral of the story is that everyone with a grand vision and an
ambitious project is doomed to failure, but there's still plenty of success to
be had for those willing to quickly slap some junk together.

~~~
cwp
To put it more positively, progress is made by stringing together many small
incremental improvements. Even the RISC revolution started as a special
purpose project that stripped away the inessential to achieve a specific,
narrow goal.

------
srcmap
"Moore’s Law are Dead" only for CPU.

Moore's Law is alive and progressing at the same rate for GPU.

Applications such AI, Crypto-Currency are leveraging that.

~~~
martinpw
Actually not true. Perhaps surprisingly, CPUs and GPUs are progressing at
about the same rate if you look at the high end. GPUs are all about massive
parallelism, and if you compare against high end Xeons, the CPU core count
increases plus things like AVX512 & FMA means they have been scaling similarly
to GPUs over the past 10 years or so.

Nice analysis here (URL says 2013 but he has updated his numbers to end-2016).
Looking at the graphs, you might even conclude that CPUs are improving faster
in some respects.

[https://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-
ch...](https://www.karlrupp.net/2013/06/cpu-gpu-and-mic-hardware-
characteristics-over-time/)

