
The cost to train an AI system is improving at 50x the pace of Moore’s Law - kayza
https://ark-invest.com/analyst-research/ai-training/
======
solidasparagus
Resnet-50 with DawnBench settings is a very poor choice for illustrating this
trend. The main technique driving this reduction in cost-to-train has been
finding arcane, fast training schedules. This sounds good until you realize
its a type of sleight of hand where finding that schedule takes tens of
thousands of dollars (usually more) that isn't counted in cost-to-train, but
is a real-world cost you would experience if you want to train models.

However, I think the overall trend this article talks about is accurate. There
has been an increased focus on cost-to-train and you can see that with models
like EfficientNet where NAS is used to optimize both accuracy and model size
jointly.

~~~
sdenton4
I would guess that this means DawnBench is basically working. You'll get some
"overfit" training schedule optimizations, but hopefully amongst those you'll
end up with some improvements you can take to other models.

We also seem to be moving more towards a world where big problem-specific
models are shared (BERT, GPT), so that the base time to train doesn't matter
much unless you're doing model architecture research. For most end-use cases
in language and perception, you'll end up picking up a 99%-trained model, and
fine tuning on your particular version of the problem.

------
calebkaiser
This is an odd framing.

Training has become much more accessible, due to a variety of things (ASICs,
offerings from public clouds, innovations on the data science side). Comparing
it to Moore's Law doesn't make any sense to me, though.

Moore's Law is an observation on the pace of increase of a tightly scoped
thing, the number of transistors.

The cost of training a model is not a single "thing," it's a cumulative effect
of many things, including things as fluid as cloud pricing.

Completely possible that I'm missing something obvious, though.

~~~
adrianmonk
> _Comparing it to Moore 's Law doesn't make any sense to me, though._

I assume it's meant as a qualitative comparison rather than a meaningful
quantitative one. Sort of a (sub-)cultural touchstone to illustrate a point
about which phase of development we're in.

With CPUs, during the phase of consistent year after year exponential growth,
there were ripple effects on software. For example, for a while it was cost-
prohibitive to run HTTPS for everything, then CPUs got faster and it wasn't
anymore. So during that phase, you expected all kinds of things to keep
changing.

If deep learning is in a similar phase, then whatever the numbers are, we can
expect other things to keep changing as a result.

~~~
Const-me
> then CPUs got faster and it wasn't anymore

The enabling tech was AES-NI instruction set, not the speed.

Agree on the rest. The main reason why modern CPUs and GPUs all have 16-bit
floats is probably the deep learning trend.

~~~
moonchild
If it hadn't been aes-ni, it would have been chacha, which is much faster than
unaccelerated aes and close to the speed of accelerated aes.

Phones use https without a problem, and those haven't had hw-accelerated aes
until recently.

~~~
giantrobot
A phone needing to set up a dozen HTTPS sockets is nothing for the CPU to do
even without acceleration. A server needing to consistently set up hundreds of
HTTPS sockets is where AES-NI and other accelerated crypto instructions
becomes useful.

------
lukevp
What are some domains that a solo developer could build something commercially
compelling to capture some of this $37 trillion? Are there any workflows or
tools or efficiencies that could be easily realized as a commercial offering
that would not require massive man hours to implement?

~~~
jacquesm
Take any domain that requires classification work that has not yet been
targeted and make a run for it. You likely will be able to adapt one of the
existing nets or even use transfer learning to outperform a human. That's the
low hanging fruit.

For instance: quality control: abnormality detection (for instance: in
medicine), agriculture (lots of movement there right now), parts inspection,
assembly inspection, sorting and so on. There are more applications for this
stuff than you might think at first glance, essentially if a toddler can do it
and it is a job right now that's a good target.

~~~
yelloweyes
anything that's even remotely profitable is already taken

~~~
smabie
Ah yes, of course. There will never be A new profitable ML startup until the
end of time. Makes perfect sense.

~~~
Rastonbury
People said the same thing about SaaS 5 years ago

------
anonu
Ark Invest are the creators of the ARKK [1] and ARKW ETFs that have become
retail darlings, mainly because they're heavily invested in TSLA.

They pride themselves on this type of fundamental, bottom up analysis on the
market.

It's fine.. I don't know if I agree with using Moore's law which is
fundamentally about hardware, with the cost to run a "system" which is a
combination of customized hardware and new software techniques

[1]
[https://pages.etflogic.io/?ticker=ARKK](https://pages.etflogic.io/?ticker=ARKK)

------
gchamonlive
I remember this article from 2018: [https://medium.com/the-mission/why-
building-your-own-deep-le...](https://medium.com/the-mission/why-building-
your-own-deep-learning-computer-is-10x-cheaper-than-aws-b1c91b55ce8c)

Hackernews discussion for the article:
[https://news.ycombinator.com/item?id=18063893](https://news.ycombinator.com/item?id=18063893)

It really is interesting how this is changing the dynamics of neural network
training. Now it is affordable to train a useful network on the cloud, whereas
2 years ago that would be reserved to companies with either bigger investments
or an already consolidated product.

~~~
qayxc
> Now it is affordable to train a useful network on the cloud

I honestly don't see how anything changed significantly in past 2 years.
Benchmarks indicate that a V100 is barely 2x the performance of an RTX 2080 Ti
[1] and a V100 is

• $2.50/h at Google [2]

• $13.46/h (4xV100) at Microsoft Azure [3]

• $12.24/h (4xV100) at AWS [4]

• ~$2.80/h (2xV100, 1 month) at LeaderGPU [5]

• ~$3.38/h (4xV100, 1 month) at Exoscale [6]

Other smaller cloud providers are in a similar price range to [5] and [6]
(read: GCE, Azure and AWS are way overpriced...).

Using the 2x figure from [1] and adjusting the price for the build to a 2080
Ti and an AMD R9 3950X instead of the TR results in similar figures to the
article you provided.

Please point me to any resources that show how the content of the article
doesn't apply anymore, 2 years later. I'd be very interested to learn what
actually changed (if anything).

NVIDIA's new A100 platform might be a game changer, but it's not yet available
in public cloud offerings.

[1] [https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-
vs-v...](https://lambdalabs.com/blog/best-gpu-tensorflow-2080-ti-vs-v100-vs-
titan-v-vs-1080-ti-benchmark/)

[2] [https://cloud.google.com/compute/gpus-
pricing](https://cloud.google.com/compute/gpus-pricing)

[3] [https://azure.microsoft.com/en-us/pricing/details/virtual-
ma...](https://azure.microsoft.com/en-us/pricing/details/virtual-
machines/linux/)

[4] [https://aws.amazon.com/ec2/pricing/on-
demand/](https://aws.amazon.com/ec2/pricing/on-demand/)

[5] [https://www.leadergpu.com/#chose-best](https://www.leadergpu.com/#chose-
best)

[6] [https://www.exoscale.com/gpu/](https://www.exoscale.com/gpu/)

~~~
robecommerce
Another data point:

"For example, we recently internally benchmarked an Inferentia instance
(inf1.2xlarge) against a GPU instance with an almost identical spot price
(g4dn.xlarge) and found that, when serving the same ResNet50 model on Cortex,
the Inferentia instance offered a more than 4x speedup."

[https://towardsdatascience.com/why-every-company-will-
have-m...](https://towardsdatascience.com/why-every-company-will-have-machine-
learning-engineers-soon-b7bc515a53b4)

~~~
qayxc
That data point talks about _inference_ though, and nobody's arguing that
deployment and inference have improved significantly over the past years.

I'm referring to training and fine-tuning, not inference, which - let's be
honest - can be done on a phone these days.

------
ersiees
I would really like a thorough analysis on how expensive it is to multiply
large matrices, which is the most expensive part of a transformer training for
example according to the profiler. Is there some Moore’s law or similar trend?

------
mellosouls
It is regrettable if an equivalent to the self-fulfilling prophecy of Moore's
"Law" (originally an astute observation and forecast, but not remotely a law)
became a driver/limiter in this field as well, even more so if it's a straight
transplant for soundbite reasons rather than through any impartial and
thoughtful analysis.

~~~
kens
One thing I've wondered is if Moore's Law is good or bad, in the sense of how
fast should we have been able to improve IC technology. Was progress limited
by business decisions or is this as fast as improvements could take place?

A thought experiment: suppose we meet aliens who are remarkably similar to
ourselves and have an IC industry. Would they be impressed by our Moore's law
progress, or wonder why we took so long?

~~~
NortySpock
[https://en.wikipedia.org/wiki/Moore%27s_law](https://en.wikipedia.org/wiki/Moore%27s_law),
third paragraph of the header, claims that Moore's Law drove targets in R&D
and manufacturing, but does not cite a reference for this claim.

"Moore's prediction has been used in the semiconductor industry to guide long-
term planning and to set targets for research and development."

------
gxx
The cost to collect the huge amounts of needed to train meaningful models is
surely not growing at this rate.

------
gentleman11
Despite nvidia vaguely prohibiting users from using their desktop cards for
machine learning in any sort of data center-like or server-like capacity.
Hopefully AMDs ml support / OpenCl will continue improving

~~~
QuixoticQuibit
Last I saw, they don’t even support ROCm on their recent Navi cards, so I’d be
hesitant.

~~~
Reelin
Wow. This is really disappointing to see.
([https://github.com/RadeonOpenCompute/ROCm/issues/887](https://github.com/RadeonOpenCompute/ROCm/issues/887))

I guess PlaidML might be a viable option?

------
sktguha
Does it mean that the cost to train something like gpt3 by OpenAI will reduce
from 12 million dollars to less next year ? If so how much will it reduce to ?

------
m3kw9
It was probably because very inefficient to begin with.

~~~
techbio
Indeed nonexistent

------
bra-ket
"AI" is not really appropriate name for what it is

------
seek3r00
tl;dr: Training learners is becoming cheaper every year, thanks to big tech
companies pushing hardware and software.

