
David Patterson Says It’s Time for New Computer Architectures and Languages - teklaperry
https://spectrum.ieee.org/view-from-the-valley/computing/hardware/david-patterson-says-its-time-for-new-computer-architectures-and-software-languages
======
philipkglass
Are there languages that have first-class support for representing/optimizing
memory hierarchy characteristics? Optimizing C compilers, for example, may
have extensions to force specific alignments:

[https://software.intel.com/en-us/articles/coding-for-
perform...](https://software.intel.com/en-us/articles/coding-for-performance-
data-alignment-and-structures)

But I'm not aware of languages where e.g. declaring alignments is part of the
base language. Awareness of L1, L2, L3 cache characteristics, plus NUMA nodes,
is increasingly important to writing high performance code. Every language
I've ever used exposes nothing of that hierarchy; it's all just "memory".

People who write performance-critical code are already aware of these issues
and can measure/hint/optimize to work with the memory hierarchy in existing
languages. But I feel like this could be better if we had languages that
modeled these issues up front instead of only exposing the abstracted view.
Maybe such languages already exist and I just haven't encountered them yet.

~~~
axilmar
Wouldn't it be better if the slow RAM we have was replaced more and more by
the fast RAM used in caches, before we do anything else?

~~~
gpderetta
Caches are significantly less dense than normal ram. You wouldn't be able to
fit gigabytes in the same area. They would probably be more expensive as well.

------
Animats
It's really hard. Remember the Itanium and the Cell. You can build it, but
they may not come.

GPUs, though. Those have turned out to be very successful, they can be
parallelized as much as you're willing to pay for, and they're good for some
non-graphics tasks.

Much of machine learning is a simple repetitive computation running at low
precision. Special purpose hardware can do that very well.

So what's the next useful thing in that area?

~~~
marcosdumay
Well, past experiences are not that useful right now. I don't think the
Itanium or the Cell were great architectures, but it's clear that their most
obvious failure mode (mainstream architectures improving faster than them)
isn't a showstopper anymore.

For some guess on the next big thing, I would imagine that stuff that avoid
the need of memory coherence have nice odds.

~~~
OldHand2018
Does anyone else have real-world Itanium experience? We've got a couple 5+
year old HP Integrity servers running OpenVMS on Itanium at work that we use
to batch process large ASCII vendor files in an ETL process written in C. They
certainly don't embarrass themselves. We'll be connecting them to a new Pure
Storage SAN in a few months and the IT guys are really excited to see what
happens to performance.

I take the same exact C code, compile it with Visual Studio, and then run the
same ASCII files locally on my 7th-gen 4c/8t i7 desktop and am not seeing any
improvement. That's about the best I can do as far as benchmarking goes.

~~~
philipkglass
I used Itanium from 2003-2006 for scientific computing on a big cluster. For
that purpose it was much faster than Intel's Xeons of similar vintage. It was
also significantly faster than the MIPS and POWER systems we had. Caveats:

\- The simulation suite that I used most heavily was developed in-house at the
same lab that bought the hardware. It was profiled and tuned specifically for
our hardware. The development team was a _real_ software development team with
experience writing parallel simulation code since the early 1990s. It wasn't
just a bunch of PhD students trying to shove new features into the software to
finish their theses.

\- There was also close cooperation between the lab, the system vendor (HP),
and Intel.

\- Other software (like all the packages shipped with RHEL for Itanium) didn't
seem particularly fast.

\- God help you if you needed to run software that was available only as
binary for x86. The x86 emulation was not fast at all.

It was _great_ for numerical code that had been specifically tuned for it. It
was pretty good for numerical code that had been tuned for other contemporary
machines. Otherwise I didn't see particularly good performance from it. I
don't know if it really was a design that was only good for HPC or if (e.g.)
it also would have been good for Java/databases, given sufficient software
investments.

Maybe it would have been competitive against AMD64, even considering the
difficulty of architecture-switching, if it had not been so expensive. But I'm
not sure Intel had wiggle room to price Itanium to pressure AMD64 even if they
had wanted to; Itaniums were quite big, complicated chips.

~~~
gpderetta
Numerical code is usually the best case for VLIW machines. It is not
surprising that Itanium did well there.

------
Razengan
I fear, the more we advance, the more we are going to become permanently
entrenched in The Way Things Are.

To really shake things up at a fundamental level will probably require running
into an alien species that does basic things differently than us.

Even creative people with lots and lots of spare time and resources at their
disposal will still be too "colored" by existing human knowledge and
established practices to really try anything new (and without falling into the
trap of needlessly reinventing the wheel.)

~~~
z3phyr
Can we pretend to run into alien species when we encounter innovators among
ourselves and try to adapt it? Most of the new lisp people do it anyway ..

------
amckinlay
I want a language where you can express time and constraints on time within
the language and the type system. I want to be able to guarantee program
correctness, pre-calculate fine-grained energy usage, optimize for
energy/power-saving mode usage, interface with asynchronous signals, and
whatnot -- all with respect to program execution at the ISA-level within some
hard real-time constraints.

Compilers make optimizations using detailed instruction timing information,
but as far as I know, these details do not bubble up to the surface in any
programming language.

It may be wise to keep these details under the surface at the compiler level,
but for 8-bit architectures, it would be awesome to have a language where you
have explicit control of time.

~~~
munificent
Part of the reason most languages obscure this is because it's a moving
target.

If a language let you say, "this chunk of code here should run in 7 cycles",
what happens when a new optimization finds a way to reduce that, or a new
architecture comes up where that operation gets slower but lots of others get
faster?

I'm not arguing against your desire, just explaining that it's not
unreasonable that we're where we are now. We've gotten so used to language
portability, that it's good to remember how painful it can be to lose that.
It's no fun having to rewrite all your code every time a new chip comes out.

~~~
yuushi
This could only ever be doable with extremely simple architectures anyway. Off
the top of my head, add in just one of branch prediction, micro-op fusion,
out-of-order execution, pipelines and pipeline stalls, or cache misses, and
this becomes impossible. Of course, this assumes you even know which CPU you
are targeting its specific instruction latencies.

That's already an extremely niche set of processors. Further, the number of
bits of code you're likely to care about this kind of extremely precise timing
for, you'll either examine the emitted assembly, or just hand-write the ASM
yourself.

It seems like a huge amount of effort for an extremely niche scenario.
Remember, the ISA is still just an abstraction, after all.

~~~
daemin
To add to that there's also the difference between cycles spent executing an
instruction and how many of those instructions can be executed at once in the
pipeline. So there is a difference between executing a set of instructions
once versus executing them millions of times.

------
api
Here's something that seems shockingly under-explored to me: languages that
incorporate relational data structures natively.

We have SQL of course, but SQL is not a general purpose language and is
(intentionally) often not Turing-complete.

I'm imagining something like Go, Ruby, JavaScript, or Rust with native tables,
queries, and other SQL-ish relational data structures.

The long term goal would be to kill the age-old impedance mismatch problem and
eliminate CRUD code. Toward this long term end the language runtime or
standard library could actually contain a full database engine with
replication/clustering support.

~~~
chubot
Data frames in R give you the relational model with a Turing complete
language. I think of it as a better SQL, at least for analytics (as opposed to
transaction processing).

Besides its data structures, R is surprisingly similar to JavaScript -- it's
has Scheme-like semantics but C-like syntax.

For certain problems, it's a pleasure to program in. You don't have to go
through "objects" to get to your tables; you just have tables!

The examples here might give you a flavor:

[https://cran.r-project.org/web/packages/dplyr/vignettes/dply...](https://cran.r-project.org/web/packages/dplyr/vignettes/dplyr.html)

R itself has a lot of flaws, but the tidyverse [1] is collection of packages
that provide a more consistent interface.

FWIW I agree that more mainstream languages should have tables natively. It is
an odd omission. Somehow they got coupled to storage, but they're also useful
for computation.

[1] [https://www.tidyverse.org/](https://www.tidyverse.org/)

~~~
api
Yes, but I'm thinking of a general purpose language like Go that would be used
to implement normal things. R is specialized for data analysis.

Most "normal things" have designs that are deeply inefficient and compromised
by bad under-powered data models.

~~~
chubot
I'm not suggesting that people actually write business applications in R --
I'm just saying it's probably the closest thing to what you're asking for.

And I agree that what you want should exist -- R is proof that it's perfectly
possible and natural!

Although, people are using (abusing) R to write web apps:

[https://shiny.rstudio.com/](https://shiny.rstudio.com/)

They mostly display data, but they have non-trivial interaction done in the
style of JS.

------
stellalo
Well, there are some interesting new languages out there already:
[https://julialang.org/](https://julialang.org/)

In fact Moore’s law must have been the reason why Python is what it is today:
“Who cares if it’s slow, tomorrow’s computers will be faster!”

~~~
elihu
I think the current attitude could be summed up better as "Who care's if it's
inefficient, yesterday's computers are fast enough!" I think that's true of
most software, but not all -- there will always be people who care about the
best performance, but we shouldn't expect them to be in the majority.

~~~
fsloth
"there will always be people who care about the best performance, but we
shouldn't expect them to be in the majority."

Given how machine learning is going to kick in in lots of fields quite soon
there will be a lot of real world applications that are _not_ ambivalent about
performance (if not else then from energy use point of view).

We're going to get Clippy the Clipper 2.0 who can actually do something
usefull, and how long he spends figuring out his suggestions and how good they
are are likely to be dependent on the available CPU resources and how fast
Clippy can think.

------
tabtab
I have to say that most bottlenecks are from lazy designs, not wimpy hardware.
As a thought experiment, suppose parallel processing scaled easily without
worrying about Amdahl's Law and similar bottlenecks. So then we put 128 cores
into our computers. Software companies would eat up every single core
eventually if there's no penalty for inefficient coding. They don't pay our
electric bill, we do. It's kind of like freeways: the more you build, the more
the traffic increases to fill them up again.

It is "nice" to be able to slap components together like magic legos and/or a
Dagwood Sandwich rather than smartly tune and coordinate everything, but doing
such is often computationally inefficient. The speed of light will probably
put a hard upper limit on this practice that no clever language or
architecture can work around to a notable degree.

~~~
chubot
Yes, a pithier way of saying this is:

 _Software is a gas; it expands to fill its container_ \- Nathan Myhrvold

Bad performance isn't really a software design problem; it's an economics
problem (ditto for security). You could speed up the browser by 2x now, or
reduce it's memory usage by 2x, and websites would be exactly as slow after a
few months.

I'm not trying to be negative here, just stating an unfortunate systems
dynamic.

~~~
gone35
_Bad performance isn 't really a software design problem; it's an economics
problem (ditto for security)._

Why do you say so?

~~~
saagarjha
It means that people haven't thrown enough {time, energy, money} at the
problem.

~~~
chubot
Not quite.. see my sibling response. It's probably more accurate to say that
some people have thrown a lot of time and energy at the problem.

But the market does not value their software; we use slower and less secure
alternatives.

Slack is a great example. There are probably 99 other chat services that
perform better than Slack. (Apparently it makes fans spin on laptops, which is
kind of shocking for a chat app.)

But the market doesn't necessarily care -- it values the features that Slack
provides more.

Part of it is a bad network effect. I care about speed, but I might have to
use Slack because the people I want to talk to on Slack don't care about
speed.

------
quickben
What's happening with that Mill Arch? Is it moving forward outside of them
publishing papers? It seemed promising.

~~~
zaarn
I believe end of next month there will be another Talk where the Mill
designers will talk about Spectre and Meltdown attacks in relation to Mill
(though the spoiler would be that the talk mentions the point "and why Mill is
immune" so I guess it'll be a very fun talk).

I do hope a lot that Mill will succeed, it's an incredibly promising
architecture.

~~~
daemin
I do hope they just put paper to silicon and implement the damn thing, even if
they just distribute simple RaspberryPi type boards with a slow version on it.

------
davidhyde
The author makes it sound as if there have been no material changes to
programming languages recently but unless you've been hiding under a rock you
will know that this is not true.

He compares Python to C. Come on, please. It would have been more relevant to
speak about the LLVM project and how that enables architects to build more
expressive languages which can still make use of 30 years of optimization
wisdom (most of which was built when CPU's were much slower)

What about mentioning languages like Rust which are designed for high
concurrency and zero overhead which let developers write software with the
safety of python but the performance of C.

What about mentioning that some old languages, like c++, have evolved though
new language features AND guidelines into much more effective languages to
guide us into the future of concurrent programming (the easiest way to enjoy
the speedups of yesteryear)

------
watmough
The lack of obsolescence is terrible news for Dell and HP. Ran into a client
this afternoon still running his seismic workflows on an old HP EliteBook
8570w with a couple big monitors, big SSD and 16 Gigs of memory.

That family was launched mid-2012!

------
mjfl
It would be nice if low volume foundry costs went down an order of magnitude.
I'm in a bio lab right now, but I did CS in college and really enjoyed VLSI.
If I want to turn a design into a prototype in my bio lab, it's going to cost
hundreds of dollars, but if I want to turn a chip I designed into reality,
it's going to cost thousands to tens of thousands of dollars even with
specialized low volume foundries like MOSIS. I feel like it should be cheaper.

~~~
tonysdg
Generally speaking, that's where FPGAs excel. You get a good chunk of the
speedup that comes with designing your own circuit at a lower cost than actual
ASICs (at least for protoyping and low volume batches).

------
mark_l_watson
A little off topic, but I greatly admire generalists: Dr. Patterson (along
with Dr. Fox) taught a good Coursera course on agile web development with Ruby
and Rails. Quite a departure for an originator of RISC architectures!

I think Dr. Patterson makes a great point on there being plenty of headroom in
the increase in software efficiency.

------
nickpeterson
It's way easier to push the limits on simple things. We need languages with
fewer features and clear design, running on hardware with less exotic
features.

~~~
AnIdiotOnTheNet
I agree in general, and while I think that is totally completely possible in
languages and operating environments, I'm not so sure it is in hardware, at
least not without sacrificing a lot of performance. People respected by people
I trust, who have a lot more domain knowledge than I do, don't seem to think
it is.

~~~
justin66
On the other hand, we can afford to sacrifice a lot of performance.

------
ericand
I'm a VC and I've seen lots of pitches lately for various tech related to new
architectures, be it GPU, FPGA or RISC. I'd add cryptocurrency to list
workloads with insatiable computing demands.

~~~
crb002
Massive parallel prefix and reduction operations in RAM. Start with just the
MPI standard reduction ops. 1000x faster than Von Neuman bottleneck.

~~~
snaky
> Massive parallel prefix and reduction operations in RAM

Sounds similar to Micron Automata Processor.

~~~
crb002
Micron's processor in memory doesn't have near the same bandwidth as a
processor in memory array geared to doing parallel prefix.

------
agumonkey
Probably old news, but forth cpu arrays are so radically fun .. I wonder why
they're not interesting.

Every forth core can pass forth to its neighbors, forth is very expressive yet
tiny since it's a stack language.

Anyway my 2cents

~~~
snaky
> the so-called Adaptive Compute Acceleration Platform (ACAP) will deliver 20x
> and 4x performance increases on deep learning and 5G radio processing,
> respectively, Xilinx claimed. The first chip, called Everest, will tape out
> this year in a 7nm process.

> The centerpiece of the new architecture is word-based array of tiles made up
> of VLIW vector processors, each with local memory and interconnect.

> tiles communicate with each other to create data paths that best suit their
> application.

[https://www.eetimes.com/document.asp?doc_id=1333632](https://www.eetimes.com/document.asp?doc_id=1333632)

So GA144 is going to mainstream finally.

~~~
agumonkey
great, but do they have actual lineage to GA144 chips ? or just another matrix
of cores ? did xilinx buy GA ?

ps: funny reading 'post-moore' in the linked article :)

------
Lerc
I have had this fun idea for an odd architecture floating in my head for ages.

Lots of small processors with local work ram, cached ram, shared ram.

Each processor has a numerical id, communicates with each processor with one
bit different in the id by optical link plus an additional link with to the
processor complimenting all bits. They send messages and fill their caches
from the pool of shared ram.

Place even parity processors on one board and odd parity processors on another
board facing towards the first. Thus all processors have line of sight on
their communication partners. Messages go back and forth with processors
relaying messages by fixing one bit in the message address. All neighbours
with a one bit better address are candidates for relaying. If any are busy,
they have options to use another path.

This means the most number of hops a 2^19 core system would have to do is 9.
If more than half of the bits are wrong then jump to the compliment.

So the example with half a million processors, each talking to 20 neigbours by
optical links. Messages can be sent anywhere with a latency of up to
HopTime*9. Filling a cache from anywhere in the shared memory pool would have
a latency of twice that. If speed of light is the latency factor a 150ms
latency would get you 9 hops across a 5 meter gap. Smaller is of course always
going the make it better.

This is the sort of thing that would also probably need a new language. I'm
not entirely sure you could come up with an appropriate language without at
least a simulation of the architecture.

~~~
smilekzs
Hop time is not constant. There will be longer and shorter optical links, and
if you want the system to be synchronous, everyone would have to wait on the
longer links.

~~~
Lerc
True, angles would add a little extra depending on how large the proccessor
panels were and their separation distance.

I had envisioned every link comminicating independantly. what would be the
advantage of communicating syncronously?

------
gone35
As part of one of those "45 hardware startups" (in stealth mode) trying to
tackle the problem, I think nevertheless there is _plenty_ of fat to trim just
on the software side still... I mean, just look at that very page: 200 MB
[edit: in memory consumption] for a single static article!

Information-theorically at least, I bet that is another 10-100x speedup
opportunity right there (and next multi-billion dollar industry, perhaps)...

------
reacweb
IMHO, machine learning is a tiny aspect of the issue. The main issue is that,
strangely, C is still the desert island language (first hit: [http://www-cs-
students.stanford.edu/~blynn/c/intro.html](http://www-cs-
students.stanford.edu/~blynn/c/intro.html)). The is a huge intimacy between
the C language and CPU, they have evolved together during the many years of
Moore law. I think there should exist a far better language to access
hardware. linux kernel should migrate to this language. And the intimacy
between OS and hardware should drive CPU (and GPU) toward better directions.
Maybe I am dreaming.

------
luma
Only tangentially related to the article but... I'm left to wonder about the
photograph selected. His left hand appears to be on a stack of 20-year old HP
9000s (maybe rp3440, pre-grey box?), and standing across from a rack full of
ancient 4U systems mounted into 2-post relay racks. The white blanking panels
are early 2000's-era HP kit, and in the background are racks full of tower
systems stacked vertically.

Is this a recent photo? Do they have a computer history museum @ UCB or is
that datacenter actually running production workloads on that kit?

~~~
Varcht
Per the exif data embeded in the image "Date Time Digitized: May 1, 2006,
9:10:59 PM, Creator: Peg Skorpinski photo".

~~~
luma
That would certainly explain it! After posting my comment it occurred to me
that having Dr Patterson pose next to a PA-RISC system was probably
intentional, given his history.

------
ArtWomb
In gamedev, Nvidia's Turing enabled real time rendering at 4k 60fps could be
"future-proof" well into the next decade. I'm just not sure end users are
clamoring for more photo-realism.

Instead, future of computing turns toward optimizing for the experience. With
3D printed form factors and smart haptics. European startup Canatu provides a
glimpse of what carbon nano tube (CNT) sheets make possible:

[https://canatu.com/](https://canatu.com/)

~~~
the8472
Now it's time for display technology to catch up. Once things like light
fields roll around those 4k will seem laughable.

> I'm just not sure end users are clamoring for more photo-realism.

People are definitely making fun of characters looking waxy, fire effects
looking cheap, shadows being pixelated etc.

Plus this is just the first generation of realtime raytracing. It's not like
the entire scenes will be raytraced, just some parts will be raytracing-
assisted, e.g. lighting. The next generations will most likely enable a higher
percentage of the scenes being based on tracing.

------
snaky
> And Intel, he said “is trying to make all the bets,” marketing traditional
> CPUs for machine learning, purchasing Altera (the company that provides
> FPGAs to Microsoft), and buying Nervana with it specialized neural network
> processor (similar in approach to Google’s TPU).

Actually Intel has even wider view of the things and interest to
non-(yet?)-mainstream tech, like e.g. async circuit design. They has bought
Achronix, Fulcrum, Timeless Design Automation. Fulcrum in 2002 made things
like "a fully asynchronous crossbar switch chip that achieves 260-Gbit/second
cross-section bandwidth in a standard 180-nanometer CMOS process" \-
[https://www.eetimes.com/document.asp?doc_id=1145012](https://www.eetimes.com/document.asp?doc_id=1145012)

------
zvrba
> "When performance doubled every 18 months, people would throw out their
> desktop computers that were working fine because a friend’s new computer was
> so much faster."

So we have finally reached some sustainability. Good.

~~~
mehrdadn
People still have to throw their computers out because other hardware still
advances, software also advances, and companies decide not to support older
hardware in their new software. The wastefulness really doesn't end.

~~~
earenndil
They don't have to throw them out as often, though.

~~~
mehrdadn
ionno... did people actually throw out their computers more often before? I'm
honestly skeptical that people threw out their computers every 18 months just
because their friend had a faster computer as was claimed here. I feel like
people are very quick to buy new computers now. And not just that, but now
we're having this problem with phones too.

------
dschuetz
While I agree with Patterson, I'd be careful with the expectations though.
Computational engineering is _highest_ tech. Whatever purpose, whatever
specialization, processors are the central domain in modern technology and
engineering. It would still take decades to find a feasible approach to make
new architectures work (in silicon). But I agree that it is an exciting time,
I want to see new clever architectures emerging, competing and be used for
different purposes.

------
j45
Most languages in use today have had to extend from their core to the web, or
mobile through a framework.

On the other hand, web or mobile first frameworks/platforms, are often high
level, and have been a little ornery when digging into the weeds.

Our paradigm has shifted from desktop-first, to web-first, to mobile-first a
while ago, but our frameworks are still often anchored from a desktop-first
world perspective.

If there's examples that do, or do not highlight this, would love to see and
discuss :)

------
tokyodude
Not a new language, maybe not even a new idea but Unity's new Entity Component
System.is designed around making cache friendly parallelizable coding easy

[https://github.com/Unity-
Technologies/EntityComponentSystemS...](https://github.com/Unity-
Technologies/EntityComponentSystemSamples/blob/master/Documentation/index.md)

------
bcheung
It seems like hardware is moving away from RISC and even CISC architectures in
favor of lots of different types of silicon for specialized applications (GPU,
machine learning, vision, DSP, hashing, encryption, SIMD).

It makes sense from a hardware perspective but it seems software development
is not keeping pace. General purpose languages have trouble targeting this
kind of hardware.

~~~
calebh
This sounds good, as a person who specializes in domain specific languages.
Maybe we'll see higher demand and corresponding salary increases...

------
breckuh
> We are now a factor of 15 behind where we should be if Moore’s Law were
> still operative. We are in the post-Moore’s Law era.

Is this true? According to [https://ourworldindata.org/technological-
progress](https://ourworldindata.org/technological-progress) it looks like we
are maybe a factor of 2 or 4 off.

------
eveningcoffee
I think that we well see more and more specialized silicon like has been used
for mining and training. This hardware is orders of magnitude faster than
conventional CPU.

Another point is that processors are already now incredibly fast and the
performance is caped by the peripherals. This is were the huge potential is
hiding for some time.

~~~
astrodust
Better FPGA could bridge the gap between ASIC and software. FPGA chips are
still almost exclusively niche products, but if they became mainstream like
GPUs the economy of scale kicks in and it's a whole different game.

------
bogomipz
The article states:

>"Google has its Tensor Processing Unit (TPU), with one core per chip and
software-controlled memory instead of caches"

Is have heard of software-controlled caches before but I am imagining this is
not the same thing? Could someone say? Might anyone have any decent resources
on software-controlled memory architectures?

------
nuguy
This is a question of computer science but more a question of economics.
Computer systems will drop their own compassing solutions for more specific
solutions as time goes forward.

------
graycat
IMHO, stochastic optimal (SOC) control is a framework that is much more
powerful than anything like current artificial intelligence, machine learning,
deep learning, etc. and, in particular, a much better version of something
like actual learning. The difference is like going from gnats to elephants,
like a bicycle to the star ship Enterprise.

A good application of SOC looks not just "intelligent" but wise, prudent,
brilliant, and prescient.

There have been claims that necessarily SOC is the most powerful version of
learning there can be -- a bit much but ... maybe.

For computing architectures, the computing that SOC can soak up makes the
computing needs of current deep learning look like counting on fingers in
kindergarten.

The subject is no joke and goes back to E. Dynkin (student of Kolmogorov and
Gel'fand), D. Bertsekas (used neural nets for approximations for some of the
huge tables of data), R. Rockafellar (e.g., scenario aggregation), and, sure,
R. Bellman. Some of the pure math is tricky, e.g., measurable selection.

I did my Ph.D. dissertation in SOC and there actually made the computing
reasonable, e.g., wrote and ran the software -- ran in a few minutes, with my
progress in algorithms down from an estimate of 64 years. For larger problems,
can be back to taking over all of Amazon for 64 years. More algorithmic
progress is possible, and, then, some specialized hardware should also help,
sure, by factors of 10; right, we want lots of factors of 10.

If want to think ambitious computing, past Moore's law, past anything like
current AI, with special purpose hardware, go for SOC.

SOC Applications 101. For your company, do financial planning for the next 50
years, 60 months. So, set up a spreadsheet with one column for each month, one
column for the current state of the company and then 60 more columns. Goal is,
say, to maximize the expected value of something, maybe the value of the
company, in the last column, right, with control over the probability of going
broke, if a bank always be able to meet reserve requirements and pass stress
tests; if a property-casuality insurance company, stay in business after
hurricane Florence, etc.

For each variable of interest, have a row.

In the cells, put in the usual expressions in terms of the values of cells in
earlier columns.

Also have some cells with random numbers -- the stochastic part.

Also have some cells empty for the business decisions, the _control_ part.

This is the 101 version; the 201 version has more detail!

Can't get the solution with just the usual spreadsheet _recalc_ because the
best solution is _optimal_ and varies through the 60 months as more
information is gathered -- and that is the core of the need for more in
algorithms and taking over all of Amazon for the computing.

Really, the work comes too close to looking at all possible business state
scenarios over the 60 months yet still is astronomically faster than direct or
naive ways to do this.

Or, a little like football, don't call the play on 2nd down until see the
results of the play on 1st down, BUT the play called on 1st down was _optimal_
considering the options for the later downs and plays.

The optimality achieved is strict, the best possible, not merely heuristic: No
means of making the decisions using only information available when the
decisions are made can do better (proved in my dissertation).

Intel, AMD, Qualcomm, DARPA, NSF, Microsoft, etc., think SOC!!! You just heard
it here -- I might not bother to tell you again.

~~~
ssvss
What book would you recommend to know more about SOC?

~~~
graycat
Start with books on dynamic programming, discrete time, discrete space space,
no uncertainty. There's a nice one by Dreyfus and Law, _The Art and Theory of
Dynamic Programming_ that also gets into the stochastic case. It's called
_dynamic programming_ because the _program_ , that is, the planning, changes
_dynamically_ over time as learn more and more as time passes.

D&L show a nice result that could be used for some first cut approximations:
If are minimizing a quadratic cost, if the algebraic expressions in that
spreadsheet I mentioned are all linear, and if the randomness is all just
Gaussian, then get "certainty equivalence" \-- that is, get to f'get about the
Gaussian, random, stochastic part and use just the expectations themselves.
Huge speedup.

For more, there is Dynkin and Yushkevich, _Controlled Markov Processes_ \--
one of the key assumptions that permits treating the spreadsheet columns one
at a time is that the stochastic part obeys a Markov assumption (the past and
future are conditionally independent given the present).

There is, from MIT and CMU,

Dimitri P. Bertsekas and Steven E. Shreve, _Stochastic Optimal Control: The
Discrete Time Case_.

And there is

Wendell H. Fleming and Raymond W. Rishel, _Deterministic and Stochastic
Optimal Control_.

There are papers by R. Rockafellar, long at U. Washington.

There is now an ambitious program at Princeton in the department of Operations
Research and Financial Engineering.

I tried to get IBM's Watson lab interested; by now they would have been nicely
ahead. The guys I was working for wanted me to do software architecture for
the OSI/ISO CMIS/P data standards. Garbage direction. A really big mistake for
Big Blue, not so big now.

One of the reasons for IBM to have done SOC was that they had some vector
hardware instructions, that is, that would do an inner product, that is, for
positive integer n, given arrays A and B, each of length n, find the sum, i =
1, 2, ..., n of

    
    
         A(i)*B(i).  
    

Well inevitably work in probability does a LOT of this. So, if thinking about
hardware instructions for SOC, such a vector instruction would be one of the
first features. Then maybe more for neural net approximations to some big
tables of several dimensions, e.g., computer language _arrays_ A(m, n, p, q)
for 4 dimensions but in practice several more. And multivariate splines can
play a role.

Commonly can get some nice gains by finding _non-inferior_ points -- so will
want some fast ways to work with those. The software for my dissertation did
that; got a speedup of maybe 100:1; on large problems, commonly could get much
more.

There is some _compiling_ that can get some big gains -- some of the big gains
I got were from just my doing the _compiling_ by hand, but what I did could be
a feature in a compiler -- there's likely still a stream of publications there
if anyone is interested in publications (my interests are in business, the
money making kind, now my startup).

There is a cute trick if, say, all the spreadsheet column logic is the same --
can "double up on number of stages".

There are lots of speedup techniques known. A big theme will be various
approximations.

If there's an opportunity to exploit sufficient statistics, then that could
yield big speedups and shouldn't be missed. Having the compiling, sufficient
statistics, and non-inferior exploitation all work together could yield big
gains -- that should all be compiled. Get some papers out of that! No doubt
similarly for other speedups.

There's lots to be done.

~~~
patrickg_zill
Curious if you have evaluated any of the APL family of languages as being
useful for your work.

~~~
graycat
Looked at APL long ago. My guess is that since it is interpretive it would be
too slow. I wrote my dissertation code in just quite portable Fortran.

As I suggested, I do believe that some progress in execution time could be had
from some new machine instructions, new language features to use those
instructions, and some _compiling_ help. For the compiling part, the language
would have some well defined semantics the compiler could exploit. E.g., for
something simple, the semantics could let the compiler exploit the idea of
non-inferior sets.

E.g., say have 100,000 points in 2-space. Say, just for intuitive
visualization, plot them on a standard X-Y coordinate system. Say that the X
coordinate is time and the Y direction is fuel. Part of the work is to
minimize the cost of time and fuel. At this point in the computation, we don't
know how the costs of time and fuel trade off, but we do know that with time
held constant, less fuel saves cost, and with fuel held constant, less time
saves cost.

So, in the plot of the 100,000 options, we look at the lower left, that is,
the south-west parts of the 100,000 points.

Point (X2,Y2) is _inferior_ to point (X1,Y1) if X1 <= X2 and Y1 <= Y2. That
is, point (X2,Y2) is just equal to point (X1,Y1) or is to the upper right,
north east of (X1,Y1). If point (X1,Y1) is not inferior to any other of the
100,000 points, then point (X1,Y1) is _non-inferior_.

So, there in the work, can just discard and ignore all the inferior points and
work only with the non-inferior points. May have only 100 non-inferior points.
Then, presto, bingo, just saved a factor of 1000 in the work. Okay, have
sufficiently restricted programming language semantics that the compiler could
figure out all that and take advantage of it. When I wrote my code in Fortran,
I had to write and call a subroutine to find the non-inferior points --
bummer, that work should be automated in the compiler, but to do that the
compiler will need some help from some semantic guarantees.

The above is for just two dimensions, but in practice might have a dozen or
more. So, what's the fast way with algorithms, programming language semantics,
and hardware to find the non-inferior points for a dozen dimensions?

Non-inferior points are simple -- lots more is possible.

------
person_of_color
Any suggestions for MS in CompArch?

------
ilaksh
Does David Patterson know that Cython already exists?

------
StillBored
How about instead of inventing new languages we just pick a couple and tell
people that if care at all about performance not to use the rest.

I would start by throwing away all the interpreted/GC'ed languages because
even after decades of massive effort they are frequently lucky if they even
manage to keep up with unoptimized C.. But that really isn't the problem, the
problem is that they cannot be debugged for performance without directly
hacking the GC/interpreter. At least in C when you discover your data
structures are being trashed by insufficient cache associativity (for
example), you can actually fix the code.

Put another way, up front people need to know that if they write in
python/Java/etc to save themselves a bit of engineering effort and its
anything more than throwaway low volume code, the effort to optimize or
rewrite it will dwarf any savings they might have gotten by choosing python.

~~~
ychen306
We need new languages; at the very least new ways to approach to optimization.
We are at the point where even C doesn't give you the performance you want --
that's why people invented things like Halide, XLA (Tensorflow), etc.

~~~
StillBored
There are a ton of domain specific languages (SQL, GLSL, etc), and I don't
have a problem with those, they have a place.

The problem is that we already have too many languages trying to be generic,
and languages like C++, and openMP can be wrangled into nearly any programming
paradigm in common use and the results tend to also be significantly faster.

If your talking about performance, the minimum baseline requirement should be
running faster than a language in common use, say C++. There are a few
languages that might take this (Cuda/OpenCL) but they also tend to be somewhat
domain specific. Similarly verilog/VHDL can produce fast results. There have
been calls for languages that can express parallelism better and are both
expressive, and safe for decades. But what we have actually gotten haven't
been better in most objective measures. What we have gotten are a pile of
"scripting like" languages with very similar characteristics (tcl, ruby, perl,
python, javascript, PHP, lua, go, applescript, etc).

