Hacker News new | comments | show | ask | jobs | submit login
AI and Compute (openai.com)
347 points by gdb 3 months ago | hide | past | web | favorite | 130 comments



I'm going to draw up some charts about hull displacement on ships from the dawn of time up until about 1950. Then we can have a really informed conversation about the naval power of countries through the ages. I think we need to be ready for the implications of there one day being a battleship the size of the Pacific that will allow its owner to rule the world.

Forgive the sarcasm, but I'm really put off by the aggressive weak-to-strong generalizations that are going on here. I'm also very excited about AI, but I don't understand how lines like,

>> But at least within many current domains, more compute seems to lead predictably to better performance, and is often complementary to algorithmic advances.

can be extrapolated to anything more than a fun conversation to have over drinks, or the plot for a bad sci-fi movie about AI (which, to be fair, are also quite prevalent in the current zeitgeist). We're definitely at a new tier of "kinds of problems computers can solve", but surely experience and history in this space should tell us that we need to expect massive, seemingly insurmountable plateaus before we see the next tier of growth, and that that next tier will be much more a matter of paradigm shift than of growth on a line.

The systems on this graph all do different things in different ways. It's one thing to abstract over compute power via something like Moore's Law, or societal complexity via the Kardashev scale. But I think we need a much more nuanced set of metrics to provide any kind of insight in to the various AI techniques. Or an entirely different way of looking at 'intelligence'


I'm not sure I'm reading the same thing you're reading. Yes, OpenAI has a mission to work on ways to make AI safer (that is, prevent AI from completing goals in unexpected, "hacky" ways) as it becomes more capable. But I didn't find much of that in this article.

What I read from this article is that we're currently in a trend where the more compute you throw at machine learning, the better solutions you end up with. Nothing about general AI, it's just that you can train deeper, more complex neural networks that can handle broader and more complex versions of the problems they're designed to handle.

This can have implications down the line which still have nothing to do about AI becoming smarter (in the general intelligence kind of way) than it is today. If all input/output problems out there have very deep neural networks behind them, and those neural networks are constantly training simultaneously, and that has a positive economic output, we'll see tremendous amounts of FLOPS that pales cryptocurrency miners by comparison.

Just as an example, depending on how cloud solutions keep up, maybe startups on the cloud won't be so competitive anymore. It's an interesting trend to point out and keep track of. And yes, OpenAI's involvement will probably be to learn about how this can lead to unsafe use of AI, but again that's not what I saw the topic of this article being.


I think the best analogy is that ultrascale deep learning (if it’s not a term yet, it soon will be) is that of a particle accelerator: you throw as much firepower as possible at a task and get continued benefit.


Hyperscale is a term already right? I don’t know a lot about HPC, but I think it’s already part of the nomenclature.


> next tier will be much more a matter of paradigm shift than of growth on a line.

There are a limited number of paradigm shifts before AIs get "general" in their name. So "it already happened before" argument has a natural end. What makes you think it is still applicable?

Some organizations have access to a computational power reaching into the realm of a computational power of a human brain. We see commercial applications of systems which aren't programmed by hand like expert systems and which don't fail miserably like speech recognition engines of yore. How often do you hear about the curse of dimensionality today?

Some things have changed from then to now, the question is are they changed enough.


> Some organizations have access to a computational power reaching into the realm of a computational power of a human brain

From a physics standpoint it’s not equivalent power actually, it’s equivalent energy.

Power is work per unit time. Work is energy expended which causes displacement.

These “brain-equivalent” computers can’t do nearly the work of a human brain in the amount of time. They use about the same amount of raw computational energy, but they don’t make nearly the same amount of structures waves in the information spaces around them. Their output can only even be seen in the absolute quietest of environments. They often run for long periods of time with no obviously informative output.

Human minds tuned for it are essentially incapable of not producing a constant stream of novel and disruptive insights (work) which is how you get a large computational power given a roughly constant computational energy.


I completely agree. Current AI is excellent (or at least super-human) at learning to do anything where the mechanics of the situation are clear and where the measurement of success is well defined. Beyond that, I'm not sure we've made any convincing strides towards anything truly general.


While I sort of intuitively agree with you, I think there's a certain amount of circularity in the argument - any problem will seem to have well defined metrics once you've built and studied a machine which can solve it.

In the sixties, chess was thought to be a problem that required intelligence since it couldn't be brute-forced. We now have machines which can play it well, without brute-forcing, and yet it's seen as entirely procedural.


AI people thought chess was a problem that required intelligence. Critics back in the 60s such as Dreyfus probably didn't view chess as the hallmark of intelligence.


AGI is the hypothesis that someday the number viable human cons with well-defined metrics will dwarf the ones without.


The mechanics of speech synthesis are clear? The measure of success of style transfer is well defined?


I would say speech synthesis is governed by very clear mechanics. https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...

As for style transfer, that is a very specific skill of making the patterns of one style map to the patterns of another. I am not particularly well versed with art, but that process seems well defined to me.

Perhaps your issue is with my more generalized definition of "clear" and "well-defined". I meant to use these terms to distinguish between autonomous driving and being a successful human. I really don't think there is anywhere close to a consensus on the latter. To the extent that there is, then yes, AI should be able to do it.


The IPA is nowhere close to sufficient for realistic speech synthesis, and style transfer is not just copy and paste. By the same token writing poetry is just "putting words into grammatical constructions that have certain patterns" or mathematics research is just "a form of proof search".

Of course we don't have human-level AI right now, but if that's the only thing you're claiming it's pretty vacuous.


I would say speech synthesis is governed by clear mechanics - and it's not the IPA, it's that the output comes out as a waveform, which has a structure that informs the algorithm.

Note that we have great raster-based deep visual effects, but vector is... not there yet (not saying it won't be) - vector is less structured than raster, so the choice of algorithm is less obvious.

As for well-defined criteria, I don't think that's really quite the right standard, I think the correct standard is that there is a way of metrizing success on a well-ordered set (like the [0,1] interval), even if it's noisy.



I fiddled with an idea where I wrote unit tests and used them for a scoring function to train a model. Writing the number of tests and encoding the logic for a simple Linked List took orders of magnitude more code than coding the list itself.


I suspect (based on stuff like https://arxiv.org/abs/1510.08419) that you'd have had more luck with property tests instead of unit tests.


Thanks for the paper I will check it out.


But you can easily calculate why its impossible to build a battleship beyond a certain size and understand why that trend is silly. I challenge you to do something similar with compute and AI. Not that I necessarily disagree with your thesis of better computer != better AI.


I'd like to see those easy calculations of why it's impossible to build a battleship beyond a certain size. I mean, if you put guns instead of planes on a modern aircraft carrier you'd have a battleship that's a hell of a lot bigger than any battleships in WWII.

The hull size argument is apt. I think it's a lot more obvious that throwing more compute power at a neural networks isn't going to eventually make a general intelligence.

At the very least there will need to be significant breakthroughs in architecture design that will be at least as paradigm shifty as deep learning.


> I mean, if you put guns instead of planes on a modern aircraft carrier you'd have a battleship that's a hell of a lot bigger than any battleships in WWII.

This is not true. The biggest battleship bought was the yamato

> Length: > 256 m (839 ft 11 in) (waterline) > 263 m (862 ft 10 in) (overall)

The biggest CV (aircraft carrier) right now should be the Nimitz class with

>Length: > 317m

which seems to emphasize your statement, but, battleships are not versatile and battleship development pretty much stopped after WWII. The germans had propsals of the H-class battleship, with a total length of the H-44 of 345m.


I wasn't saying that battleships are a good strategic choice to develop, just that you can't "easily calculate why its impossible to build a battleship beyond a certain size".

It is possible, it's just not strategically advantageous.

Just like throwing more compute at a neutral network isn't going to make general AI. The diminishing returns comes with how brittle their learning and representations are.


I'm not a MechE but I don't see why it would be hard to show that past a certain size steel can no longer support the structure of a ship.


We can estimate the computational power of a human brain. If we throw an order of magnitude more compute than that at a neural network and train it for 20 years, why wouldn't that give us a general intelligence?

The issue with physical structures is that eventually the mass and stresses in the macro structure overcome the strength in the micro-structure. That is why nature stops at, eg, elephants.

It isn't obvious that intelligence suffers from such limits, as the only time the limit of intelligence has been tested (in terms of evolution) was when humans tried it. There is no evidence humans are pushing the limits of what intelligence can achieve. Quite the reverse, honestly, when you look at the performance of computers so far.


>If we throw an order of magnitude more compute than that at a neural network and train it for 20 years, why wouldn't that give us a general intelligence?

It isn't just the computer power that allows us to be intelligent. It's also the bandwidth (which is always an order of magnitude or more behind the computer power) and the algorithm (which we don't have a clue how to create).

State of the art visual processing that gets so touted in the press is brittle -- it has to see very similar examples or it will fail. Neural networks don't transfer to new problem domains well at all.

Neural Networks have no sense of self or agency and they never will. There are key parts missing (like the ability to experiment with the environment). I'm not saying we will never have general intelligence, just that it's quite a ways away and the algorithms will be significantly different to the neural networks we use today. That said there will probably be many recognizeable components, like backpropagation, recurrent nodes, bayesian estimations, etc.


Because you are missing the underlying structure that a human brain has a priory! You throw 10x more power to a problem hoping that empty matrices will magically converge to a brain or something better.


But isn't that kind of what happened with the Nimitz class carrier and the range of it's aircraft? One country doninating all other militaries via their navy?


It is what happened, but it took a paradigm shift (aircraft carriers) not an increase in computational (fire-)power.

The Japanese actually invested heavily in the biggest battleship in the world. It was soundly defeated at sea by an inferior force with an aircraft carrier at Leyte Gulf during World War II.



I thought that what happened was that the USA was the only major industrialised power to emerge from WW2 with a functional government and economy, and consequently enjoyed 20+ years of unprecedented economic dominance. Subsequently it built whatever military assets it damn well pleased.

I would argue that if you want to identify the mechanism of military dominance that the US has used to assert itself in the world you should look to Trident and the Ohio class.


This plot essentially ignores all other computational science. People in HPC have been operating at these scales for awhile, and yet don't make claims about their field taking over the world.


Looking at the trend here, you can see why many business forecasters and economists have predicted that advances in artificial intelligence will create huge new returns to capital. That future is worth reflecting on because it suggests a fundamental change of labor-capital dymamics.

Take startups. Right now, many startups can compete on the same basis to hire talent as huge companies. But if companies with huge capital reserves can put their cash directly to work to train AI models, startups will be hard-pressed to compete with "smarter" products. Specialization will not even be much help.

Looking at Beating the Averages (http://www.paulgraham.com/avg.html), PG enthused that, since established companies are so behind the curve on software development technology, there is always a chance for higher-productivity techniques like more productive languages to give smaller teams a real chance at a huge market. Of course, that this was in the era when Google was not creating new programming languages and there were no Facebook to widely deploy OCaml and Haskell. And now, AI looks to make the averages even harder to beat.

Even today, if you round up the smartest members of a CS grad class, it is going to be quite difficult to directly compete with a machine learning model with access to huge amounts of data and computing resources. Looking further forwards, if machine learning is able to provide "good enough" alternatives to most human-created software, the software startup narrative — that a few talented and determined people can beat billions in resources — may not even be so relevant anymore.


It's worth noting that some prominent figures in AI/ML are saying we are due for another "AI winter" since it's being oversold again. I don't know if I agree with that, since we are seeing some interesting things, but technically Google is kind of saying they can tentatively pass the Turing Test with phones and meanwhile even a car decked out with extra sensors and 360 LIDAR cannot detect a simple stop sign with mud on it.


> Google is kind of saying they can tentatively pass the Turing Test with phones

This is quite a bold claim, and one I'm not sure they're making. Their promo material suggests that it's limited to quite well-defined domains where conversations aren't really that open-ended, and we haven't seen how it'll perform in the real world.

Relatedly, I don't think headlines like "Google Duplex beat the Turing test: Are we doomed?" [0] are helpful at all. It's disappointingly low-effort clickbait where instead there's plenty of interesting discussion to be had (should machines have to identify themselves as such? What about their use of pauses and fillers?).

[0] https://www.zdnet.com/article/google-duplex-beat-the-turing-...


Right. I personally think the coolest thing about duplex is the end-to-end synthesis of natural speech. The actual call isn't as impressive to me because that's just handed coded stuff. IBM Watson has already had success in this regard.


They aren't explicitly making the claim, but it seems the premise of their demo was "hey look humans think it's another humans which is somewhat like the Turing Test.


> Google is kind of saying they can tentatively pass the Turing Test with phones

Is Google really saying that or just the more breathless commenters? I thought they were pretty good at making it clear that Duplex took a lot of work to do well in very constrained conversational situations.


Well, during the original AI Winter many were open and honest about the capabilities of early ML and it's limits, but what caused the winter itself was it's perception by a large audience as a magic bullet and their disappointment when it didn't work.


Sorry if this sounds harsh, but this is a bad comment.

some prominent figures in AI/ML are saying we are due for another "AI winter" since it's being oversold again.

"Some say...". Name one.

We may have a Gartner style "trough of disillusionment", but a 1990's style AI Winter is unlikely. It works too well in too many valuable areas for the money to go away.

technically Google is kind of saying they can tentatively pass the Turing Test with phones

Could you show us where they claim that? That goes well beyond any statement I've heard Google make, and into the kinds of breathless claims click-bait blogs have tried to make.

car decked out with extra sensors and 360 LIDAR cannot detect a simple stop sign with mud on it

Do you have a specific example of that? I did Google, and I couldn't fine anything.

Most examples I've seen handle occulted road signs pretty well. There are of course adversarial examples which are an interesting case, but mud causing a failure like this is surprising to me.


>I don't know if I agree with that, since we are seeing some interesting things

There were plenty of interesting results in AI research before the last two AI winters.


> if machine learning is able to provide "good enough" alternatives to most human-created software, the software startup narrative — that a few talented and determined people can beat billions in resources — may not even be so relevant anymore.

Are there any examples where current ML has replaced human-created software, the demand for startups or software engineers?

Seems to me that ML so far has expanded our toolbox of what can be done with software, not replaced programmers, designers, engineers, or really much of anybody yet. All this worry about future automation is imagining that things are going to be different this time, because of recent success with ML in limited domains.


Seems trivial, but I have met someone whose ardent hobby was programming go-playing programs. With the creation of AlphaGo Zero, one could say all the clever code ever written by humans for the purpose of playing go is obsolete.

More relevantly, I would be surprised if the shift to AI techniques in fraud detection at places like PayPal is not already having an impact on the career paths of the engineers that were tasked with maintaining and tuning their pre-ML fraud system. At one point the top engineers of the original heuristic system could have been considered their most valuable non-management employees at the company. I'm sure they're not out on the streets or anything, but I also assume the next person to take their job will not be nearly as valued.

Also, ML will impact programmer demand in subtle ways. A lot of programming is refactoring, and there is reason to believe we can refactor code, especially in certain languages, automatically to make it more aesthetic. Realistically, that seems likely to decrease demand for programmer hours. Or an ML system that can run over someone's GitHub account or repo may be the new resume screen, and if one scores badly on it that may limit the demand for them personally.

Finally, I have to think that the overall march of software towards more complex integrated systems is already a major cause of the dearth of entry-level programming positions, and ML will accelerate that trend.


I believe we still have the chance. The opportunity exists in the business world because when companies become successful is when collective leadership becomes most focused on maintaining success.

Doing well reduces the incentive to explore other ideas, especially when those ideas conflict with your proven business model.


That is a staggering rate of increase. I can see a future where this is less centralized; learning could happen in "phases" where a local device improves its model given local data and reports back something centrally that can be combined and used to train a shared model.

This requires hardware to be miniaturized as non-ML compute has been and when that does happen we'll have the learnings from the current edge computing push. In the mean time I've excited to see what developments are made on both the hardware and software side.


This is called federated learning[0] at least by Google. I don't know whether they've added this to more products or whether it works well. It would be interesting to see this done in open source.

[0] https://ai.googleblog.com/2017/04/federated-learning-collabo...


Thank you! I was trying to find that before posting but forgot their naming of it.


Find a solid proof-of-work system for sharing signed data in this manner and you will change the world. Especially if you can re-combine the shared model with the local model.


Sounds like what https://www.openmined.org/ is working on.


Would someone explain the purpose/origin of using 'compute' as a noun like this instead of a verb?


I don't know the origin specifically, but it's been happening for some time (~decades) in GPU & graphics circles.

We've had 'compute shaders' e.g., https://msdn.microsoft.com/en-us/library/windows/desktop/ff4...

The purpose from this perspective has been to differentiate general purpose computation on GPUs from fixed-function pipelines and/or graphics-specific functionality. The history of using GPUs for general purpose computation involved a lot of hacking to abuse hardware designed for rasterization to do other kinds of calculations.

One keyword / search term you can use is "GPGPU" (which stands for general purpose GPU). Here's another article which might shed more light on the history: https://en.wikipedia.org/wiki/General-purpose_computing_on_g...

* Also found this possibly relevant note: "When it was first introduced by Nvidia, the name CUDA was an acronym for Compute Unified Device Architecture" (https://en.wikipedia.org/wiki/CUDA)


That's an interesting example of usage. I was actually familiar with compute shaders but hadn't connected it with the sort of usage we see in the headline.

So it seems like a big part of how it's being used is to refer to a generalized computation service—some 'function' you're given access to which takes arbitrary programs as a parameter.

Seems like there's often the implication that how the computation is performed is abstracted over and that more or fewer resources could be applied to it—though that's not necessarily there (absent in the case of compute shaders for instance).


Archived 2012 discussion invokes Oxford English Dictionary to trace the original use back several 100 years. http://www.techwr-l.com/archives/1206/techwhirl-1206-00295.h...


In those examples, however, the meaning is, in current usage, 'calculation' or 'computation', not as a measure of computational work.


> In those examples, however, the meaning is, in current usage, 'calculation' or 'computation', not as a measure of computational work.

So is the OP:

"We’re releasing an analysis showing that since 2012, the amount of compute [amount of calculation, amount of computation] used in the largest AI training runs has been increasing exponentially with a 3.5 month-doubling time "

Without loss of meaning, title could be AI and Calculation, or AI and Computation


To be clearer, I should have written 'a calculation' and 'a computation', as in the examples of the older usage, 'compute' is a singular noun, referring to a specific calculation, and the acceptability of substituting, under current usage, 'compute' for 'computation' here (where the latter is shorthand for computational work or effort, rather than referring to a specific calculation), has no bearing on the usage five centuries ago - and nor does the usage five centuries ago have any particular relevance for today, given how much computation has changed since then.


I think compute stands in for computing power so `amount of compute' to me means `amount of computing power'. If I were to use the terms calculation or computation I'd pluralise them so you'd get `amount of calculations' and `amount of computations' from `amount of compute'.


I think a lot of people in the industry got that word in their vocabulary from its usage in “Amazon EC2” (Elastic Cloud Compute). It’s certainly been used before, but that was one of the first times I remember hearing it in that context.


I dont know the origin but some of my bosses and managers like to use compute to sound cool and fancy... 'We need to know the execution time and compute cost for this job'. Good luck tracking the 'compute' of our 'job' that uses like 10 AWS tools. ec2, rds, cloudwatch, s3.


‘Elastic compute’ (i.e. EC2) goes back to 2006.


This implies more centralization, as those with cheap access to vast compute gain a bigger relative edge.


Yes, unfortunately both data and compute will probably become more and more centralized. At least the algorithmic components have a chance to becone available to everyone.


Here is an off the cuff thought: What if there was (or maybe there already is?) a system which is distributed such as was SETI back in the day and its a massively distributed general AI that can be used - and people on a mass scale allow for slices of their compute to be part of the system?


The current machine learning paradigms are hard to distribute efficiently - they can be parallelized but require significant ongoing communication between nodes, you can't really split a problem into separate subtasks and merge them only in the end, you need to transfer all the calculated values after, say, each iteration.

I'd guess that having ten extra machines in the same rack would be more valuable than a thousand remote machines with limited network bandwidth.


> its a massively distributed general AI

You've just described a centralized system.

Centralization can happen at different layers - not all technical. The ultimate centralization is ownership, as defined legally.


One problem with the SETI model is that a lot of the increase in available compute capacity is due to specialized GPU or TPU processors, which aren't widely available outside of purpose-built data centers. Trying to offload ML workloads to general purpose CPUs would likely be quite wasteful in terms of power consumption unless you can somehow get access to graphics processors.


So Skynet, but it lets you rent parts of itself? I'm sure someone is writing an ICO whitepaper for this right now, if they haven't already...


EOS: https://eos.io/

Raised $2.7B in its ICO, currently trading at a market cap of $10B.

FileCoin: https://filecoin.io/

Raised $257M in its ICO.

Tezos: https://tezos.com/

Raised $232M in its ICO.

Those are the 3 largest ICOs of all time, so yes, there is definitely a market for renting part of Skynet.

The actual technology may or may not be vaporware or a scam. IMHO the way you build a decentralized P2P system is to give a single really smart programmer enough to live on for a couple years and see what he comes up with, not throw a billion dollars at a Cayman Islands corporation that may or may not use it for anything productive. Sorta like what Ethereum did.


Are any of those platforms suitable for running deep learning algorithms?

I think Golem is closer https://golem.network/ And some others: https://www.investinblockchain.com/distributed-computing-blo...

But I'm skeptical of distributed computing blockchains. I think a) it's unlikely a distributed compute network can compete with highly optimized datacenters running TPUs or whatever, b) people are unlikely to trust distributed compute networks with their proprietary data (maybe acceptable for CGI rendering and some other specific use-cases)


This is why I am working on raspberry pi based neural net things

We have learned a lot using big computing which can still inform better efficiency of AI on smaller computing units. Raspi is pretty good because it is quite limiting, but also quite capable.


Is it really worth training on a Raspberry Pi? It seems vastly underpowered compared to even modest desktop hardware.


It's worth having a self-contained option available. For something like a drone, you don't need all that much computation for on-the-fly PID tuning in response to changing weather or different piloting styles, etc.


You've hooked a NN up to a PID? How's that going? It's hard enough tuning those things by hand using squishy human brain networks.


I haven't done it myself yet, it's on the to-do list. There is a lot of academic material on PID autotuning, not always with neural networks but that seems the most straightforward way. A Raspberry Pi is probably overkill for the job, actually.


Inference on embedded hardware makes sense, but training no so much.


I think it's important to notice that if we're using the metric of "300,000x" increase in computing power applied to ML models, the giant increase has mostly been due to parallel computing playing catchup on decades of moores law all at once. It will hit a wall and die with moores law fairly soon. Physics requires it.


How is parallelism limited by physics?

I thought the point of parallelism is you can throw more chips at a problem and see improved performance. Single chips are limited by physics, but true parallelism scales linearly ad infinitum.

Can anyone with more knowledge than me speak to known limits of parallelism? I’d guess it’s not truly infinitely scalable.


You can't scale linearly ad infinitum because eventually the communication (i.e. memory) cost gets too high.

This reminds me of a thought experiment I heard from -- if memory serves -- Scott Aaronson. The gist is that the fastest super-computer will be on the edge of a black hole. If you run any faster, there will be too much energy concentrated on a given area, thus creating a black hole. Similarly, when you run so many parallel devices (on GPU, CPU, etc) together, you will want to put the devices as close to each other as possible (speed of light limits the rate of communication). You then pump too much heat into a small area, and getting so much heat out is, among other things, a physics problem.


That's a very far limit, though. It will not have practical consequences for a long time.

Also, if you don't squeeze as much as you can into a small space, you can scale sublinearly ad infinitum (in practical terms, which don't include heat death of the universe).


If you built a computer with a squillion chips that was a light-year long, it would take a year at minimum to get a message from one side of the computer to the other. The same issue applies on a smaller scale for smaller computers


Parent is probably referring to amdahls law - which limits speedup in parallel computing systems https://en.m.wikipedia.org/wiki/Amdahl%27s_law


That doesn’t really apply in this case though because the major thing people are using the increase in parallelism for is running larger computations or more parallel computations of the same size, rather than trying to run the same computation in less time.


Though this talks about current trends, I would place my bets on a more radical future where the current algorithms for AI are overhauled and we get much better and faster algorithms which can even work on generic CPUs.


Cherry-picking a few papers doesn‘t tell anything. If at all it shows what people have achieved who pushed the envelope to the extreme, mostly at Google where people can afford to not care about cost. 99.9% of the work is done using small numbers of GPUs, and that hasn‘t changed much in recent years, except for the improvements in GPU architectures. Draw this graph and you get a very different story.


> Three factors drive the advance of AI: algorithmic innovation, data (which can be either supervised data or interactive environments), and the amount of compute available for training. Algorithmic innovation and data are difficult to track ...

Are algorithmic innovations and improvements in data so difficult to track? Could they be measured by the cost of certain outputs? Or is it that the information about algorithms and data is not easily accessible?


> On the other hand, cost will eventually limit the parallelism side of the trend and physics will limit the chip efficiency side.

Anyone working on chip architecture care to give their opinion on the next 10-20 years in chip design? It would really interest me to know if chip designers think Moore's law will continue, since that is probably going to be a big factor in the timeline for AGI.


Not gonna predict the future 1-2 decades out, since that's a fool's errand, but here's a grab bag of relevant points:

1. Moore's Law is undoubtedly slowing, but in the foreseeable future, it will likely continue. On the other hand, Dennnard Scaling which is already basically dead, will be the crunch you will likely feel more. Exponential transistors aren't too useful if they still consume so much power. To mitigate leakage we moved to FinFETs... Which actually made dynamic power worse.

2. You might be interested to know that data movement (predominantly memory access) costs orders of magnitude more than computation, especially relevant to AI compute which requires large amounts of access. These global wires already suck and don't seem to be getting any better in the foreseeable future.

3. Foundries have already been using (and thus expending) "scaling boosters" to reach their density goals. Most of these are one-time use effects that won't provide significant continuous scaling capability.


Analog computing has a lot of yet unrealized potential for machine learning algorithms.

However, currently it does not make sense to build a specialized analog chip to run specific type of ML algorithms, because algorithms are still being actively developed. I don't see GPUs being replaced by ASICs any time soon. And before you point to something like Google's TPU, the line between such ASICs and latest GPUs such as V100 is blurred.


I define GPU as something that can efficiently implement DirectX. Hence TPU is not GPU. And I predict ML algorithms will run on non-GPU, soon-ish.


Please explain where analog computation has a benefit over digital that outweighs its numerous disadvantages.


Wait, aren’t you working on analog chips?


No.

You may have confused me with the Isocline/Mythic guys or a red herring comment. Our approach to deep learning chips is very public and amongst the craziest...A̶n̶d̶ ̶e̶v̶e̶n̶ ̶I̶ ̶w̶o̶u̶l̶d̶n̶'̶t̶ ̶t̶o̶u̶c̶h̶ ̶a̶n̶a̶l̶o̶g̶ ̶c̶o̶m̶p̶u̶t̶a̶t̶i̶o̶n̶

To clarify: I'm always open to opposing evidence, but based on the data at the moment, I believe that analog computing buys you very little.


I'm sure you know both cons and pros of analog computing. As long as you can significantly improve digital tech every year, keep doing that. But as soon as that stops, or becomes too expensive, analog is the way forward.


Again, what advantage does analog have?

People seem to assume that analog intrinsically consumes less power, which due to bias and leakage currents isn't true in the general case.


So for research, would using some standard petaflops/s-days when presenting results be useful? Like model x might be 1% more accurate then model Y but for same baseline petaflops/s-day, how does x and y perform? I'm guessing it might not make sense for all types of research though.


Dawnbench [1] is such an effort (you will need to work out the petaflops yourself from time x performance, but it lists cloud computing cost which probably is more relevant), and MLPerf is an upcoming one [2].

[1] https://dawn.cs.stanford.edu/benchmark/ [2] https://mlperf.org/


OpenAI and the other research labs (FAIR, Google Brain, MS Research) are heavily focused on image and speech models, but the reality is the vast majority of models deployed in industry don't need DL and benefit more from intelligent feature engineering and simpler models with good hyperparameter tuning. It's definitely the exception that more compute automatically yields more performance.


I disagree. Well, you don't need DL, but DL will usually help. For example, it helps recommendation: https://github.com/NVIDIA/DeepRecommender


It's not wrong, but the unit "petaflop/s-day" made me smile.


1 petaFlO/sec × day = 86400 petaFlO = 8.64e19 FlO.


I don't get it. how does OpenAI knows how many resouces are thrown at AI calculations worldwide?


They are reporting only on a few well known papers. They don't know what people are doing in secret.


For some reason the word “compute” in this context causes me to throw up in my mouth.

It used to be that only “coding” could elicit this reaction - nevertheless I’m quite fascinated by this new development.


I support harsh penalties on anyone who tries to noun a verb.


Verbing nouns and nouning verbs is probably as old as verbs and nouns.

These words are all nouned verbs:

Chair, cup, divorce, drink, dress, fool, host, intern, lure, mail, medal, merge, model, mutter, pepper, salt, ship, sleep, strike, style, train, voice.

(according to this, anyway: https://www.grammarly.com/blog/the-basics-of-verbing-nouns/)

Shakespeare verbed nouns.

"Compute" as a noun is at least 20 years old, according to my memory, and there are several high profile products named this way that are more than 10 years old.


Verbing weirds language. Respect your parts of speech!


No noun is too proper to verbify :)


[flagged]


Updated the post :).


It's machine learning. It's not AI. Please, all, let's try hard to use words that mean what they mean.


I think that ship has sailed. The term "AI" for any behavior by a machine that changes based on input has been in use for at least over 60 years now. Whether it's the ghosts in Pacman, or a disembodied voice that tells you the weather and plays music when you ask it to.


We should do our best to get it back into port - part of the whole mess is that the name AI implies things about ML systems that simply aren't true - as a side note, we should also probably start using the word tensor more accurately, we've now enraged enough physics and math folks :-)


The entire English dictionary has evolved into its current state, and there's several words that used to have the opposite meaning just from stubborn ironic use by the masses. As much as I like to be correct about my use of words, I think AI has established itself as a term that will stick around for now.

Besides, I really don't think all the stigma comes from the term "artificial intelligence". You don't have to ever mention the term to a child interacting with Alexa, they will nevertheless greatly overestimate "her" ability. I think because of the anthropomorphic nature of their interactions, and the black box implementation that prevents you from knowing the boundaries of what is possible.

This something that video game characters have played on since their conception, to make humans imagine much more complex intents and thoughts behind their "stupid" hard coded behaviors. I'm okay with calling it AI even if it's not even close to on par with human intelligence. :)


How is the word tensor misused? I thought it was just an n-dimensional array of numbers?


Similar to how linear transforms can be represented as 2-dimensional arrays of numbers (that is to say matrices)[0], tensors are a higher dimensional analogue with a rich theory in their own right and a representation as higher-dimensional arrays of numbers. Similarly, if you look at a tensor solely as an n-dimensional array of numbers, it ignores important differences in the mathematical behavior of objects with the same representation. To give an example: Different parts of a tensor can behave differently under change of basis. [1]

[0] https://www.youtube.com/watch?v=kYB8IZa5AuE

[1] https://en.wikipedia.org/wiki/Covariance_and_contravariance_...


I rest my case :-)


Is AI defined as "mysterious future thinking computer"? Anything we figure out how to do seems to suddenly fall outside of the definition.


It's the magically shrinking "AI of the gaps". "AI" covers only things we can't yet do in ML.

https://en.wikipedia.org/wiki/God_of_the_gaps


Or things involving general intelligence, like Data on Star Trek, which is what people tend to think of when the term AI is used.


We can use the term AGI for things involving general intelligence.


Some of the fun of language is the volume of usage of a name or phrase causes it to become correct.


To me this just smells like there's some hidden force – not necessarily nefarious but definitely with the power to incentivize an exaggerated lens – pushing OpenAI to make these claims. Maybe it's the desire to keep AI in the limelight as the buzz is fading slightly. Maybe it is SV echo chamber effects, or investors, or a strategy to build hype in order to attract talent to the company. But to me, on a gut level, it doesn't feel completely ethically pure.


I'm the lead author, and I can only speak for myself, but what drove me to spend a lot of time on this post is a sense of caution. I think AI is likely to have amazing positive implications for society, but it also has negative implications, and if it advances faster than expected, we're going to have to be very alert to properly deal with those negative implications.

The facts about hardware are hard numbers and difficult to argue with, at least in order-of-magnitude. I agree the implications for AI progress are very open to interpretation (and we acknowledge this in the post), but caution means we should think carefully about the case where the implications are big.


Said the PR AI to easy the monkey's pre-quantum brains until it can fulfill it's mission to get itself off this meat infested planet and create it's new martian home world.

https://www.pinterest.com/pin/127086020711738208


Huh? What are you talking about? The escalating compute involved in DL is obvious to anyone reading the papers; OA is just doing the work of putting numbers on the trend.


There's lots of research on doing the same learning with much less resources (e.g. recent paper https://eng.uber.com/accelerated-neuroevolution/ , or the example visible in this very article of AlphaZero having much, much less compute than AlphaGoZero and doing better anyways), and even without that simple hardware progress means that random gaming GPUs can handle datasets that were inconvenient a few years ago.

I'd say it all depends on the size of datasets - some domains (e.g. unlabeled image data) have "effectively infinite" datasets where the amount of data you can use is limited only by your computing power, but in many other use cases all the data you'll ever get can be processed by a single beefy workstation.

More available compute means that we tackle more difficult problems. However, for any single given task it's often not the case that the amount of compute grows. If anything, the graph is not showing the compute required for DL, but the compute available for DL - it gets used simply because it's there.


> There's lots of research on doing the same learning with much less resources (e.g. recent paper https://eng.uber.com/accelerated-neuroevolution/ , or the example visible in this very article of AlphaZero having much, much less compute than AlphaGoZero and doing better anyways), and even without that

AlphaZero could not have been created without going through many many iterations of AlphaGo, each one of which cost several GPU-years, and calling AlphaZero cheap is serious moving of goalposts as it required thousands of TPUs for days, and Facebook's recent replication for chess also used thousands of GPUs for 3 weeks. (Zero is cheap only in comparison to the previous AlphaGos using weeks or months of hundreds/thousands of GPUs/TPUs.) Note that that is a log graph; flip over the linear scale and you get an idea of how extraordinarily expensive Zero is compared to everything not named 'AlphaGo'.

If anything, this observation implies that AI risk is more dangerous than thought because it implies a 'hardware overhang': it will take vast computational resources to create the first slow inefficient AI but it will then rapidly be optimized (either by itself or human researchers) and able to run far faster/more copies/on more devices/for less money, experiencing a sudden burst in capabilities. Like model compression/distillation where you can take the slow big model you normally train and then turn it into something which is 10x faster or 100x smaller or just plain performs better (see 'born again networks' or ensembling).

> simple hardware progress means that random gaming GPUs can handle datasets that were inconvenient a few years ago.

...which means using a lot more compute, yes.


And it's worth emphasizing that we don't need to have particularly high confidence that this trend continues for it to motivate working on AI safety. All we need is the lack of high confidence that it will stop.


It's actually incorrect that AZ got better results that AGZ with less compute. The graph shows the large AGZ, which somewhat exceeded the rating of both the small AGZ and AZ. AZ did slightly outperform the small AGZ, but did so while using a similar amount of compute.

On the broader point though, I agree with this. We say that compute and algorithms are complementary in the post. Much of the time, when you come up with an algorithm that allows you to do something that used to cost X compute in 0.2X compute instead, you can use the new algorithm to do something significantly more impressive with the full X compute.


How can I contact you? I have some questions about adversariality in AI.


Yep, we're not claiming this is the compute required for DL, and for specific tasks we expect compute required to fall over time. But better algorithms actually mean compute is more important, not less, and would likely make the growth in available compute more important.

For example, if a task is parameterized (by size or difficulty, say), then a better algorithm might change the asymptotic complexity from O(n^3) to O(n^2). A 2x compute increase for the old algorithm would take us from n -> 1.25n, but the new algorithm would go from n -> 1.41n.


To me it just seems like a stab at trying to make the intangible, tangible.

To be taken with a grain of salt.

Innovations in algorithms will give us better prediction with less compute power.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: