There is no need to go around criticizing people for 'green AI' by myopically focusing solely on an abstract electrical cost of training. And if there is, then that applies to everything which uses electricity, and is better handled by putting a carbon tax on energy sources, and letting the market find the most unprofitable uses of energy (which will probably not be AI research, I'll tell you that...) and stop it and substitute in more 'green' power sources for everything else.
More importantly, if you are concerned about the costs of training AI, you should be concerned about the total costs as compared to the total benefits, not slicing out a completely arbitrary subset of costs and ranting about how many 'cars' it is equivalent to (which is not even strictly true in the first place considering that many data centers are located near cheap and renewable power like hydropower or nuclear power plants!) and shrugging away the issue that people consider these performance gains important and well-worth paying for. There are costs to a model which is worse than it could be. There are costs to models which run slower at deployment time even if they are faster to train. There are costs to models which cannot be used for transfer learning (as the criticized language models excel at, incidentally). And so on. What matters are the total costs, and corporations and researchers already pay considerable attention to that already. (Not a single one of their metrics - 'carbon emission', 'electricity usage', 'elapsed real time', 'number of parameters', 'FPO' - is an actual total cost!)
No its not.
Its analogous to quantifying an F test with watts.
Nothing wrong with that. I can give a billion row dataset to a dozen people. John assumes normality, samples 100 rows, runs a linear regression, gets 70% R^2, under 1 second on a 1980s era computer. Mary doesn't assume normality, runs a glm via irwls, gets better explanatory power than John, still on 1980s computer, though she takes 5 seconds instead of 1. Then James comes along & runs a decision tree & is twice as good as Mary, but he now needs a second on a 1990s PC. Tony uses a random forest & Baker wants a 128 layer neural net. And so on & so forth, until we end up with burning enough energy that would otherwise power a village, to overfit some dataset that excel at some completely artificial leaderboard metric.
At some point a grown up comes in & says you don't need to light up your bedroom with industrial stadium lighting if you need to do your homework, a 40w table lamp will suffice. Hell half the third world uses a candle & they are performing quite well, if not better. That's really all this is.
The marginal gains in performance aren't justified compared to the amount of energy you end up expending. Half these models are brittle & have zero shelf life, ppl regularly throw away stuff they wrote just a year ago, so to what raison d'etre this pursuit ?
This comment exemplifies the myopic narrowminded focus on costs and being dismissive of the benefits of greater resource investments which I am criticizing here. Said 'completely artificial' tasks like ImageNet actually turn out to transfer really well to the real world and are why AI is becoming so useful, and also spurred the research into how to create efficient NNs via distillation and compression which still cannot be done without training the large models first. You need to invest if you want efficiency. Performance precedes it, it doesn't follow it. As the programming saying goes: get it correct, then get it fast.
Come on, we’ve been through this before. At least for the past 10 years if not much longer, performance improvements have not actually improved performance, they’ve improved developer productivity.
If you create a language that’s half the speed, but developers can program twice as fast in it, it makes sense to use it. Processing power is much cheaper than a $150k+ developer. Also, being first to market can yield outsized gains on competitors.
They suggest reporting cost metrics next to benefit metrics so readers can judge the tradeoff for themselves. There's no proposal to ignore the benefits.
In this case, the metric I want is either TFLOPs (for a commercially available architecture) or kWh (for some in-house machine) rather than tons of CO2, which depends on their power source.
Cost of training is currently ignored so much, some of the authors don't even mention it in their papers, yet it has a direct relationship with things like researcher productivity and reproducibility of results by mere mortals.
I wouldn't call this "myopically focusing on the cost of training". I'd call this "acknowledging cost of training as a valid metric worthy of publishing and optimization". I hope everyone can agree with this. Once you start optimizing this in addition to your mAP or BLEU or top1 accuracy, the results could be multiplicative.
I suspect this will ultimately come down to math: both having architectures which are amenable to efficient gradient propagation and having better optimizers.
Do you think Google is funding all this research and just forgetting to consider the cost of training BERT? Do you think any researchers are ignoring the cost of renting GPUs or their university overhead? AI researchers talk all the time about costs! They're positively obsessed with it, in a way you don't see in any other area. (When was the last time you saw someone snicker about a CRISPR paper, 'yeah, nice results, but you know those reagents cost thousands of dollars, right?' Or covered in said paper?)
It's quite a double standard. Someone wants to spend >$20 billion on ITER (which doesn't work), you never hear about it; someone sketches out that the most inefficient way to train a neural net model (using on-demand cloud GPUs) costs $1m, suddenly it's a huge deal, a boondoggle, researchers should be ashamed of themselves, it's a threat to research, and we need 'green AI'.
I don't get your opposition to introducing approximate exaflops spent in training as an additional metric by which to judge "goodness" of a model. It is well known that some models are much easier to train than others. It's also known that some optimizers converge faster than others. All this stuff directly affects reproducibility of research. We're getting to the point where some of the papers aren't reproducible unless you're at Google or DeepMind or OpenAI or Microsoft Research. I think you'll agree that this is pretty terrible.
Between two models with approximately the same validation accuracy metrics and inference timings wouldn't you choose the one that's more economical to train? I know I would. Why aren't these metrics even mentioned most of the time? Beats me. Even better if the model requires less _data_ to train, but that seems to be less tractable than just estimating and surfacing the cost of training.
> I worked at Google for many years. Google has so much idle compute at any given moment, they tend not to care _at all_ about you using it. It's not a problem for a Google employee to just grab a few thousand cores on a whim and do whatever they want with those cores, and many people do use this freedom.
And it works out for them, does it not? Because the uses are useful on net, even if some of them seem speculative or silly or inefficient. If they weren't, if people were just constantly burning resources to do absolutely nothing, presumably things would change and those resources would be tracked by the markets too or other measures taken. Let Google as a whole decide where to optimize, end to end - don't push it onto individual researchers and guilt-trip them into worrying about whether they are 'green AI' or evil people who hate the planet.
> I think you'll agree that this is pretty terrible.
I would not. I think it's awesome that machine learning is seeing the tiniest crumbs of the resources other fields have (still grossly disproportionate to its potential) invested in it, and we are no longer in a field where all results are reproducible on a $200 gaming GPU run overnight, and this is delivering big gains in performance and surprising nontrivial demonstrations of scalability (which would not be true of, say, IBM Watson). Again, as I said, imagine applying this sort of standard to any other field of science. 'You can't discover the Higgs boson using just 250 watts and a $200 interferometer you bought off eBay, I think you'll agree that this is pretty terrible.' Reproducibility doesn't mean 'should be reproduced, frequently, by anyone, at arbitrarily small budgets'.
> Between two models with approximately the same validation accuracy metrics and inference timings wouldn't you choose the one that's more economical to train?
No, I'd choose the one with the lowest total cost for my purposes. Do you program only in assembler to make sure you write programs with only the most economical runtime costs? What a narrow way of seeing costs... I'm often much more concerned with which one is easier to modify, or to scale, or to reuse for transfer learning, or which runs acceptably for users. The model which is 'more economical to train' is only better on one of many axis, and when someone else like OA or DM pays for it, it is a sunk cost of zero relevance to me. (How much does it matter to me how long it took GPT-2 to train? Not very much! Because I download the trained model and finetune it for other things.)
> Why aren't these metrics even mentioned most of the time?
They typically are. Most DL papers will mention roughly the GPU/TPU count and wallclock time, or you can guess from the training graph or baselines. And that's really all you need to know as you evaluate trainability as one of many cost variables; stuff like CO2 emissions or flops are not.
If the goal is actual intelligence, as the 'I' in the name appears to suggest, then making resource consumption in the form of energy/compute a central part of the equation makes a fair deal of sense. It's arguably long overdue and might actually push research towards methods that actually do increase the capacity of agents to learn rather than just praying for faster chips and more data.
The benefit here can be twofold in that more inclusivity increases the pace at which the field advances by including more researchers, but also just gives researchers an incentive to not treat compute/data as unlimited moving closer towards figuring out how the brain actually learns.
Basically adding energy requirements to the loss function of automated neural model architecture search seems like a good idea also. (I am thinking of frameworks like AdaNet, etc.)
I retired this year but I still spend a lot of time reviewing deep learning and also conventional AI literature (and I do tiny consulting projects to help people get started or build prototypes).
Since I now mostly pay for my own computer resources I try to mostly limit myself to what I can do on my System 76 laptop with a single 1070 GPU. The availability of pretrained models makes this not so bad, at all. I really appreciate efforts by huggingface (and other organizations) of offering reduced size models that still provide good results.
That's the reason brains are so energy efficient today. Though you are right, energy constraints are built in with biological evolutionary search
To train a brain to be competent in a classification task it takes 20W rms for years on end for the individual + all of the wattage from the parents, grandparents, teachers etc... that are training the individual for those years. Very hard to determine the power allocation used for a specific human being trained on a narrow task for eg. object classification, but that doesn't mean it's not the comparable measure.
Comparing training of a single model to the "instant power" draw of the brain is not just over simplistic, the scale and time periods are wrong.
One cannot measure the total energy consumption of something without knowing both the wattage and the time spent running at that voltage. While training a neural network is often done from scratch or a pre-trained model with a known training time, the brain of a human being does not start developing from zero their birth. Measuring the amount of energy spent on a specific human being trained on a task would also have to account for the billions of years of evolution that lead up to the present-day structure of the human brain. It would be very hard but also very interesting to approximate the energy spent on this, but it may not be relevant to machine learning as the processing power scales and time periods are completely different than those involved in the development of the human brain.
That's completely ridiculous. It's like saying that to measure the energy used by a computer on a task, you have to start by measuring the entire energy usage of the human race since the invention of the transistor.
8 years at 20W is:
0.02kW * 24h * 365.25 * 8 = 1 402.56kWh
Indeed, but at least you describe a slightly more accurate possible representation.
I remember feeling really discouraged by the level of discourse and downvoting in the comments at that time.
I know you have a prolific reputation, and I mean this sincerely, but did you actually read the link?
The point was that with power constraints, the class of possible algorithms is restricted heavily, requiring development of an understanding closer to what algorithms a person must use in their mind.
If you present me a Go-based Turing test, and I’m pretty fooled by AlphaGo, except I get blasted by the hot air venting out from a big cooling system, then it probably reveals to me that it’s failed the test (at least until some far future where there are life forms that regularly vent computer-levels of hot air from cooling systems).
This effect is much more pronounced with a larger system like Watson, hence the article.
> “If a NN uses more electricity than 150 watts to classify images at superhuman rates, why does this matter?”
It matters because pure computing achievements are not isolated from their resource constraints. If a NN can generate flawless natural language responses, but it takes 1000 years per word, we would not be interested and certainly would not see it as valuable progress towards artificial intelligence (likely not even strong AI, but surely not tool AI, there just aren’t a lot of ways to demonstrate a conceptual breakthrough so wasteful of resources). Intelligence is (partly) generating language within human-level time constraints. Why not also human level RAM, physical volume, material density, etc. Solving it within these constraints is much more valuable than solving it without them.
Likewise with power. Maybe using 100x power is an OK tradeoff for pure tool
usage (heck, I work professionally in computer vision and NLP and I make this concession to use machine learning models running on computers to solve my work daily).
But that doesn’t mean it’s an ok trade-off for the philosophical side of the questions, like approximating a Turing test or understanding how far we must go from modern deep neural networks to the algorithms an actual brain is using. And, ultimately, these will move from being philosophical issues to being practical ones.
> “Is there some reason to think that AI can never be misused or cause problems if it happens to use somewhat more electricity?”
Huh? Where did I ever say anything like this? I mean, humans misuse intelligence and cause problems on less power.
It sounds like you’re trying to propagate your feelings about the “green” aspect of the post into what I’ve said, but if you’ll notice, I did not say anything about that, and don’t particularly care about this OP article’s take on “green” AI power consumption, it’s not related at all to what I’m talking about. (Plus they are more focused on power demands at training time, while I’m interested in power constraints, like runtime or space constraints, at inference time).
I did, and all the comments, to see where you were mistreated so badly that you'd drag it up 8 years later, like the Bourbons, having forgotten nothing and learned nothing.
> The point was that with power constraints, the class of possible algorithms is restricted heavily, requiring development of an understanding closer to what algorithms a person must use in their mind.
That's not true at all: it is not true that intelligences must exactly imitate humans to be intelligent, and if it was, it would not be true either as a lower bound, upper bound, or average case that an algorithm using the same energy as a human must be isomorphic to the human algorithm. Consider deep learning. The more compute it uses, the better it gets, and the more human-like the results are, and this is true despite the fact that they typically use more power than a human brain even at the smaller scales where they are often among the worse-performing and still-power-hungry of methods, and power consumption goes up from there; by your logic, they would be getting even less human-like, if power consumption and algorithmic equivalency were so inextricably intertwined. But they aren't. 'The bitter lesson' is itself a disproof of your claim that power efficiency has anything to do with potential for future progress or human-like-ness of algorithms.
> But that doesn’t mean it’s an ok trade-off for the philosophical side of the questions, like approximating a Turing test or understanding how far we must go from modern deep neural networks to the algorithms an actual brain is using. And, ultimately, these will move from being philosophical issues to being practical ones.
None of that follows. I made this quite clear, that this is why your complaints are irrelevant or applicable only to the most narrow economic situations. Does a plane not travel faster than a bird because it uses many more joules to do so? We can use far more resources than we do, and all of the consequences of AI can still follow. (This also holds for the silly arguments that computational complexity somehow means that AI can't work or can't be a threat: https://www.gwern.net/Complexity-vs-AI ) Everything about superintelligence, AI risk, new breakthroughs, all of that can still follow even if an AI uses 10, 1000, or 10,000x more joules to accomplish a task. Humans themselves use astronomical amounts of energy compared to, say, insects; does this mean humans are not intelligent, and insects should be questioning whether humans are, on a philosophical level, as powerful intelligences as insects are, since we need so many additional calories to solve basic vision and navigation tasks? Who cares if we are being eaten by gray goo and you go 'ah, well, it requires several orders of magnitude and therefore isn't really real AI, this gray goo would be more impressive if it solved stuff within human-level time or energetic constraints'?
That being said, we need to definitely be thinking about lowering the cost (environmental and monetary) of training these models. I’m glad research is being done within this domain.
I’d love to see a study on what the human labor cost potential vs. training environmental costs would be for certain large models.
Whatever makes them happy, petting their dog, having a conversation with their friends/family, make some music that one person listens to, literally anything produces more value for society than performing arbitrary unnecessary labor which could be automated for near-zero marginal cost.
We should divert that capital to something more productive, tax the process and kick them back a basic income to do whatever they want with. If what they really want to do is what they currently do at their job, whatever, they can keep doing it while supporting themselves on the basic income, but that'd be their uncoerced choice.
Most people are unhappy at work and only go because they are required to to survive. If our society weren't so myopic and steeped in puritan ethics this would be as absurd as asking whether the thirteenth amendment was going to be surely positive.
Between pointless distraction and work, most people will pick the latter, especially after a solid decompression period, in spite of the surface enjoyability balance favoring the former.
Being paid is not a substitute for enjoying the fruits of one's labour.
On organizing work. Chances are that at some point you'll want to work on things beyond the scope of what you can accomplish on your own. Or want to have a larger impact on the world / other people's lives. The big question is how to organize the work of people at society level. The pay per work model has proven to be very effective and gave us the marvels of the modern world.
On the shadow side, the pay per work model has become heavily geared towards exploitation, as in exploration vs. exploitation. We are at a point where worker conditions in a BigTech warehouse, or as a mechanicized Turk, are meticulously quantified exploitation. At the same time, we find it natural to expend the energy budget of a small city to enable machines do large-scale exploration.
On one hand technically it is better than nothing immediately for those in need and is fraud resistant in that it consumes time and any better paying job without major negative externalities would be a better use of their time.
The negative is that it does nothing productive and those who are dependent on it aren't able to use their time productively on anything else.
While the problem of lack of income and lack of good uses for their labor certainly should be addressed just because it provides income doesn't make it a good thing overall.
It is one of those things which puts the "dismal" in the "dismal science" term for economics - regardless of any misgivings over calling it a science. It is essentially a subset of human organizational and incentive problems which the economy "is designed" (technically more emergant even if it is literally a planned economy) to solve.
People won't stop finding things to do, they'll just get to choose things further removed from survival.
If X becomes cheaper, humans spend the surplus resources on Y. If Y becomes cheaper, humans spend the surplus resources on Z. And on it goes.
"Save thousands of hours of human labor" doesn't mean humanity is working thousands of hours less total; it means that they accomplished the same specific X with thousands fewer hours (and then almost certainly spend thousands of hours on Y).
That's not just an armchair theory. Time-honored tradition of technology advancement. Agriculture fell from 70% to 25% of all U.S. jobs in a single lifetime. Staggering economic change. And humanity is better off for it.
as the article shows, there's still a heavy focus on beating accuracy benchmarks, at huge computational cost, in terms of what actually gets accepted at major venues.
it can be quite difficult to make the case for a method that has worse performance, but is cheaper. hard to judge without real data but I think such papers are much less likely to be accepted.
there are also sometimes demands to replicate very expensive techniques as baselines, which can be onerous for groups with limited resources.
That sort of quantity vs quality thing has happened a lot throughout history. The "VHS beats Betamax" moments that goes way further back but with more obscure examples.
Edit: pardon typos due to mobile device.
I feel this is misleading. Computation is cheap, so the computation thrown at deep learning research has been doubling every few months, but that doesn't mean it's required to do research (unless your research is "throw huge datasets at a neural net architecture and see if it sticks".)
Like Lanchester's laws. It is obvious how outnumbering the foe can help but knowing the scaling really hammers home how important it is to concentrate force against as weak a foe groups as possible.
> deep learning was inspired by the human brain, which is remarkably energy efficient
Yeah, sure, if you purely look at its' energy consumption in isolation and ignore all of the flights of fancy we engage in (such as, for example, this NN model) in order to maintain its' coherence.