
AI and Compute - gdb
https://blog.openai.com/ai-and-compute/?
======
dmreedy
I'm going to draw up some charts about hull displacement on ships from the
dawn of time up until about 1950. Then we can have a really informed
conversation about the naval power of countries through the ages. I think we
need to be ready for the implications of there one day being a battleship the
size of the Pacific that will allow its owner to rule the world.

Forgive the sarcasm, but I'm really put off by the aggressive weak-to-strong
generalizations that are going on here. I'm also very excited about AI, but I
don't understand how lines like,

>> _But at least within many current domains, more compute seems to lead
predictably to better performance, and is often complementary to algorithmic
advances._

can be extrapolated to anything more than a fun conversation to have over
drinks, or the plot for a bad sci-fi movie about AI (which, to be fair, are
also quite prevalent in the current zeitgeist). We're definitely at a new tier
of "kinds of problems computers can solve", but surely experience and history
in this space should tell us that we need to expect massive, seemingly
insurmountable plateaus before we see the next tier of growth, and that that
next tier will be much more a matter of paradigm shift than of growth on a
line.

The systems on this graph all do different things in different ways. It's one
thing to abstract over compute power via something like Moore's Law, or
societal complexity via the Kardashev scale. But I think we need a _much_ more
nuanced set of metrics to provide any kind of insight in to the various AI
techniques. Or an entirely different way of looking at 'intelligence'

~~~
dbelchamber
I completely agree. Current AI is excellent (or at least super-human) at
learning to do anything where the mechanics of the situation are clear and
where the measurement of success is well defined. Beyond that, I'm not sure
we've made any convincing strides towards anything truly general.

~~~
wyattpeak
While I sort of intuitively agree with you, I think there's a certain amount
of circularity in the argument - any problem will seem to have well defined
metrics once you've built and studied a machine which can solve it.

In the sixties, chess was thought to be a problem that required intelligence
since it couldn't be brute-forced. We now have machines which can play it
well, without brute-forcing, and yet it's seen as entirely procedural.

~~~
goatlover
AI people thought chess was a problem that required intelligence. Critics back
in the 60s such as Dreyfus probably didn't view chess as the hallmark of
intelligence.

------
zach
Looking at the trend here, you can see why many business forecasters and
economists have predicted that advances in artificial intelligence will create
huge new returns to capital. That future is worth reflecting on because it
suggests a fundamental change of labor-capital dymamics.

Take startups. Right now, many startups can compete on the same basis to hire
talent as huge companies. But if companies with huge capital reserves can put
their cash directly to work to train AI models, startups will be hard-pressed
to compete with "smarter" products. Specialization will not even be much help.

Looking at Beating the Averages
([http://www.paulgraham.com/avg.html](http://www.paulgraham.com/avg.html)), PG
enthused that, since established companies are so behind the curve on software
development technology, there is always a chance for higher-productivity
techniques like more productive languages to give smaller teams a real chance
at a huge market. Of course, that this was in the era when Google was not
creating new programming languages and there were no Facebook to widely deploy
OCaml and Haskell. And now, AI looks to make the averages even harder to beat.

Even today, if you round up the smartest members of a CS grad class, it is
going to be quite difficult to directly compete with a machine learning model
with access to huge amounts of data and computing resources. Looking further
forwards, if machine learning is able to provide "good enough" alternatives to
most human-created software, the software startup narrative — that a few
talented and determined people can beat billions in resources — may not even
be so relevant anymore.

~~~
ddtaylor
It's worth noting that some prominent figures in AI/ML are saying we are due
for another "AI winter" since it's being oversold again. I don't know if I
agree with that, since we are seeing some interesting things, but technically
Google is kind of saying they can tentatively pass the Turing Test with phones
and meanwhile even a car decked out with extra sensors and 360 LIDAR cannot
detect a simple stop sign with mud on it.

~~~
aglionby
> Google is kind of saying they can tentatively pass the Turing Test with
> phones

This is quite a bold claim, and one I'm not sure they're making. Their promo
material suggests that it's limited to quite well-defined domains where
conversations aren't really that open-ended, and we haven't seen how it'll
perform in the real world.

Relatedly, I don't think headlines like "Google Duplex beat the Turing test:
Are we doomed?" [0] are helpful at all. It's disappointingly low-effort
clickbait where instead there's plenty of interesting discussion to be had
(should machines have to identify themselves as such? What about their use of
pauses and fillers?).

[0] [https://www.zdnet.com/article/google-duplex-beat-the-
turing-...](https://www.zdnet.com/article/google-duplex-beat-the-turing-test-
are-we-doomed/)

~~~
computerex
Right. I personally think the coolest thing about duplex is the end-to-end
synthesis of natural speech. The actual call isn't as impressive to me because
that's just handed coded stuff. IBM Watson has already had success in this
regard.

------
ClassAndBurn
That is a staggering rate of increase. I can see a future where this is less
centralized; learning could happen in "phases" where a local device improves
its model given local data and reports back something centrally that can be
combined and used to train a shared model.

This requires hardware to be miniaturized as non-ML compute has been and when
that does happen we'll have the learnings from the current edge computing
push. In the mean time I've excited to see what developments are made on both
the hardware and software side.

~~~
nschucher
This is called federated learning[0] at least by Google. I don't know whether
they've added this to more products or whether it works well. It would be
interesting to see this done in open source.

[0] [https://ai.googleblog.com/2017/04/federated-learning-
collabo...](https://ai.googleblog.com/2017/04/federated-learning-
collaborative.html)

~~~
ClassAndBurn
Thank you! I was trying to find that before posting but forgot their naming of
it.

------
westoncb
Would someone explain the purpose/origin of using 'compute' as a noun like
this instead of a verb?

~~~
dahart
I don't know the origin specifically, but it's been happening for some time
(~decades) in GPU & graphics circles.

We've had 'compute shaders' e.g., [https://msdn.microsoft.com/en-
us/library/windows/desktop/ff4...](https://msdn.microsoft.com/en-
us/library/windows/desktop/ff476331\(v=vs.85\).aspx)

The purpose from this perspective has been to differentiate general purpose
computation on GPUs from fixed-function pipelines and/or graphics-specific
functionality. The history of using GPUs for general purpose computation
involved a lot of hacking to abuse hardware designed for rasterization to do
other kinds of calculations.

One keyword / search term you can use is "GPGPU" (which stands for general
purpose GPU). Here's another article which might shed more light on the
history: [https://en.wikipedia.org/wiki/General-
purpose_computing_on_g...](https://en.wikipedia.org/wiki/General-
purpose_computing_on_graphics_processing_units)

* Also found this possibly relevant note: "When it was first introduced by Nvidia, the name CUDA was an acronym for Compute Unified Device Architecture" ([https://en.wikipedia.org/wiki/CUDA](https://en.wikipedia.org/wiki/CUDA))

~~~
westoncb
That's an interesting example of usage. I was actually familiar with compute
shaders but hadn't connected it with the sort of usage we see in the headline.

So it seems like a big part of how it's being used is to refer to a
generalized computation service—some 'function' you're given access to which
takes arbitrary programs as a parameter.

Seems like there's often the implication that how the computation is performed
is abstracted over and that more or fewer resources could be applied to
it—though that's not necessarily there (absent in the case of compute shaders
for instance).

------
mooneater
This implies more centralization, as those with cheap access to vast compute
gain a bigger relative edge.

~~~
sgillen
Yes, unfortunately both data and compute will probably become more and more
centralized. At least the algorithmic components have a chance to becone
available to everyone.

~~~
samstave
Here is an off the cuff thought: What if there was (or maybe there already
is?) a system which is distributed such as was SETI back in the day and its a
massively distributed general AI that can be used - and people on a mass scale
allow for slices of their compute to be part of the system?

~~~
tlrobinson
So Skynet, but it lets you rent parts of itself? I'm _sure_ someone is writing
an ICO whitepaper for this right now, if they haven't already...

~~~
nostrademons
EOS: [https://eos.io/](https://eos.io/)

Raised $2.7B in its ICO, currently trading at a market cap of $10B.

FileCoin: [https://filecoin.io/](https://filecoin.io/)

Raised $257M in its ICO.

Tezos: [https://tezos.com/](https://tezos.com/)

Raised $232M in its ICO.

Those are the 3 largest ICOs of all time, so yes, there is definitely a market
for renting part of Skynet.

The actual technology may or may not be vaporware or a scam. IMHO the way you
build a decentralized P2P system is to give a single really smart programmer
enough to live on for a couple years and see what he comes up with, not throw
a billion dollars at a Cayman Islands corporation that may or may not use it
for anything productive. Sorta like what Ethereum did.

~~~
tlrobinson
Are any of those platforms suitable for running deep learning algorithms?

I think Golem is closer [https://golem.network/](https://golem.network/) And
some others: [https://www.investinblockchain.com/distributed-computing-
blo...](https://www.investinblockchain.com/distributed-computing-blockchain-
projects/)

But I'm skeptical of distributed computing blockchains. I think a) it's
unlikely a distributed compute network can compete with highly optimized
datacenters running TPUs or whatever, b) people are unlikely to trust
distributed compute networks with their proprietary data (maybe acceptable for
CGI rendering and some other specific use-cases)

------
tehsauce
I think it's important to notice that if we're using the metric of "300,000x"
increase in computing power applied to ML models, the giant increase has
mostly been due to parallel computing playing catchup on decades of moores law
all at once. It will hit a wall and die with moores law fairly soon. Physics
requires it.

~~~
spunker540
How is parallelism limited by physics?

I thought the point of parallelism is you can throw more chips at a problem
and see improved performance. Single chips are limited by physics, but true
parallelism scales linearly ad infinitum.

Can anyone with more knowledge than me speak to known limits of parallelism?
I’d guess it’s not truly infinitely scalable.

~~~
ychen306
You can't scale linearly ad infinitum because eventually the communication
(i.e. memory) cost gets too high.

This reminds me of a thought experiment I heard from -- if memory serves --
Scott Aaronson. The gist is that the fastest super-computer will be on the
edge of a black hole. If you run any faster, there will be too much energy
concentrated on a given area, thus creating a black hole. Similarly, when you
run so many parallel devices (on GPU, CPU, etc) together, you will want to put
the devices as close to each other as possible (speed of light limits the rate
of communication). You then pump too much heat into a small area, and getting
so much heat out is, among other things, a physics problem.

~~~
red75prime
That's a very far limit, though. It will not have practical consequences for a
long time.

Also, if you don't squeeze as much as you can into a small space, you can
scale sublinearly ad infinitum (in practical terms, which don't include heat
death of the universe).

------
nutanc
Though this talks about current trends, I would place my bets on a more
radical future where the current algorithms for AI are overhauled and we get
much better and faster algorithms which can even work on generic CPUs.

------
cmarschner
Cherry-picking a few papers doesn‘t tell anything. If at all it shows what
people have achieved who pushed the envelope to the extreme, mostly at Google
where people can afford to not care about cost. 99.9% of the work is done
using small numbers of GPUs, and that hasn‘t changed much in recent years,
except for the improvements in GPU architectures. Draw this graph and you get
a very different story.

------
forapurpose
> Three factors drive the advance of AI: algorithmic innovation, data (which
> can be either supervised data or interactive environments), and the amount
> of compute available for training. Algorithmic innovation and data are
> difficult to track ...

Are algorithmic innovations and improvements in data so difficult to track?
Could they be measured by the cost of certain outputs? Or is it that the
information about algorithms and data is not easily accessible?

------
jfaucett
> On the other hand, cost will eventually limit the parallelism side of the
> trend and physics will limit the chip efficiency side.

Anyone working on chip architecture care to give their opinion on the next
10-20 years in chip design? It would really interest me to know if chip
designers think Moore's law will continue, since that is probably going to be
a big factor in the timeline for AGI.

~~~
p1esk
Analog computing has a lot of yet unrealized potential for machine learning
algorithms.

However, currently it does not make sense to build a specialized analog chip
to run specific type of ML algorithms, because algorithms are still being
actively developed. I don't see GPUs being replaced by ASICs any time soon.
And before you point to something like Google's TPU, the line between such
ASICs and latest GPUs such as V100 is blurred.

~~~
deepnotderp
Please explain where analog computation has a benefit over digital that
outweighs its numerous disadvantages.

~~~
p1esk
Wait, aren’t you working on analog chips?

~~~
deepnotderp
No.

You may have confused me with the Isocline/Mythic guys or a red herring
comment. Our approach to deep learning chips is very public and amongst the
craziest...A̶n̶d̶ ̶e̶v̶e̶n̶ ̶I̶ ̶w̶o̶u̶l̶d̶n̶'̶t̶ ̶t̶o̶u̶c̶h̶ ̶a̶n̶a̶l̶o̶g̶
̶c̶o̶m̶p̶u̶t̶a̶t̶i̶o̶n̶

To clarify: I'm always open to opposing evidence, but based on the data at the
moment, I believe that analog computing buys you _very_ little.

~~~
p1esk
I'm sure you know both cons and pros of analog computing. As long as you can
significantly improve digital tech every year, keep doing that. But as soon as
that stops, or becomes too expensive, analog is the way forward.

~~~
deepnotderp
Again, what advantage does analog have?

People seem to assume that analog intrinsically consumes less power, which due
to bias and leakage currents isn't true in the general case.

------
itchyjunk
So for research, would using some standard petaflops/s-days when presenting
results be useful? Like model x might be 1% more accurate then model Y but for
same baseline petaflops/s-day, how does x and y perform? I'm guessing it might
not make sense for all types of research though.

~~~
alfalfasprout
OpenAI and the other research labs (FAIR, Google Brain, MS Research) are
heavily focused on image and speech models, but the reality is the vast
majority of models deployed in industry don't need DL and benefit more from
intelligent feature engineering and simpler models with good hyperparameter
tuning. It's definitely the exception that more compute automatically yields
more performance.

~~~
sanxiyn
I disagree. Well, you don't _need_ DL, but DL will usually help. For example,
it helps recommendation:
[https://github.com/NVIDIA/DeepRecommender](https://github.com/NVIDIA/DeepRecommender)

------
petters
It's not wrong, but the unit "petaflop/s-day" made me smile.

~~~
kbob
1 petaFlO/sec × day = 86400 petaFlO = 8.64e19 FlO.

------
forcer
I don't get it. how does OpenAI knows how many resouces are thrown at AI
calculations worldwide?

~~~
visarga
They are reporting only on a few well known papers. They don't know what
people are doing in secret.

------
tzahola
For some reason the word “compute” in this context causes me to throw up in my
mouth.

It used to be that only “coding” could elicit this reaction - nevertheless I’m
quite fascinated by this new development.

~~~
calibas
I support harsh penalties on anyone who tries to noun a verb.

~~~
dahart
Verbing nouns and nouning verbs is probably as old as verbs and nouns.

These words are all nouned verbs:

Chair, cup, divorce, drink, dress, fool, host, intern, lure, mail, medal,
merge, model, mutter, pepper, salt, ship, sleep, strike, style, train, voice.

(according to this, anyway: [https://www.grammarly.com/blog/the-basics-of-
verbing-nouns/](https://www.grammarly.com/blog/the-basics-of-verbing-nouns/))

Shakespeare verbed nouns.

"Compute" as a noun is _at least_ 20 years old, according to my memory, and
there are several high profile products named this way that are more than 10
years old.

------
sandover
It's machine learning. It's not AI. Please, all, let's try hard to use words
that mean what they mean.

~~~
blixt
I think that ship has sailed. The term "AI" for any behavior by a machine that
changes based on input has been in use for at least over 60 years now. Whether
it's the ghosts in Pacman, or a disembodied voice that tells you the weather
and plays music when you ask it to.

~~~
MarkMMullin
We should do our best to get it back into port - part of the whole mess is
that the name AI implies things about ML systems that simply aren't true - as
a side note, we should also probably start using the word tensor more
accurately, we've now enraged enough physics and math folks :-)

~~~
blixt
The entire English dictionary has evolved into its current state, and there's
several words that used to have the opposite meaning just from stubborn ironic
use by the masses. As much as I like to be correct about my use of words, I
think AI has established itself as a term that will stick around for now.

Besides, I really don't think all the stigma comes from the term "artificial
intelligence". You don't have to ever mention the term to a child interacting
with Alexa, they will nevertheless greatly overestimate "her" ability. I think
because of the anthropomorphic nature of their interactions, and the black box
implementation that prevents you from knowing the boundaries of what is
possible.

This something that video game characters have played on since their
conception, to make humans imagine much more complex intents and thoughts
behind their "stupid" hard coded behaviors. I'm okay with calling it AI even
if it's not even close to on par with human intelligence. :)

------
tw1010
To me this just smells like there's some hidden force – not necessarily
nefarious but definitely with the power to incentivize an exaggerated lens –
pushing OpenAI to make these claims. Maybe it's the desire to keep AI in the
limelight as the buzz is fading slightly. Maybe it is SV echo chamber effects,
or investors, or a strategy to build hype in order to attract talent to the
company. But to me, on a gut level, it doesn't feel completely ethically pure.

~~~
damodei
I'm the lead author, and I can only speak for myself, but what drove me to
spend a lot of time on this post is a sense of caution. I think AI is likely
to have amazing positive implications for society, but it also has negative
implications, and if it advances faster than expected, we're going to have to
be very alert to properly deal with those negative implications.

The facts about hardware are hard numbers and difficult to argue with, at
least in order-of-magnitude. I agree the implications for AI progress are very
open to interpretation (and we acknowledge this in the post), but caution
means we should think carefully about the case where the implications are big.

~~~
sharemywin
Said the PR AI to easy the monkey's pre-quantum brains until it can fulfill
it's mission to get itself off this meat infested planet and create it's new
martian home world.

[https://www.pinterest.com/pin/127086020711738208](https://www.pinterest.com/pin/127086020711738208)

