Hacker News new | past | comments | ask | show | jobs | submit login
Green AI (arxiv.org)
104 points by montalbano 14 days ago | hide | past | web | favorite | 60 comments



This is ultimately a silly and misguided proposal, which focuses on the costs and not benefits. There is nothing special about training an AI; it is simply another thing to spend resources on, no intrinsically worse or better than any other way. A biology experiment has 'carbon footprint'. The ITER has a 'carbon footprint' (and probably one orders of magnitude larger than all AI research this year). HN burns CO2. Everything uses energy or it uses things which require energy, like human labor, and they involve other costs like opportunity cost which are just as real and important. They stand or fall on their net merits, not how much electricity they use.

There is no need to go around criticizing people for 'green AI' by myopically focusing solely on an abstract electrical cost of training. And if there is, then that applies to everything which uses electricity, and is better handled by putting a carbon tax on energy sources, and letting the market find the most unprofitable uses of energy (which will probably not be AI research, I'll tell you that...) and stop it and substitute in more 'green' power sources for everything else.

More importantly, if you are concerned about the costs of training AI, you should be concerned about the total costs as compared to the total benefits, not slicing out a completely arbitrary subset of costs and ranting about how many 'cars' it is equivalent to (which is not even strictly true in the first place considering that many data centers are located near cheap and renewable power like hydropower or nuclear power plants!) and shrugging away the issue that people consider these performance gains important and well-worth paying for. There are costs to a model which is worse than it could be. There are costs to models which run slower at deployment time even if they are faster to train. There are costs to models which cannot be used for transfer learning (as the criticized language models excel at, incidentally). And so on. What matters are the total costs, and corporations and researchers already pay considerable attention to that already. (Not a single one of their metrics - 'carbon emission', 'electricity usage', 'elapsed real time', 'number of parameters', 'FPO' - is an actual total cost!)


> is ultimately a silly and misguided proposal

No its not.

Its analogous to quantifying an F test with watts.

Nothing wrong with that. I can give a billion row dataset to a dozen people. John assumes normality, samples 100 rows, runs a linear regression, gets 70% R^2, under 1 second on a 1980s era computer. Mary doesn't assume normality, runs a glm via irwls, gets better explanatory power than John, still on 1980s computer, though she takes 5 seconds instead of 1. Then James comes along & runs a decision tree & is twice as good as Mary, but he now needs a second on a 1990s PC. Tony uses a random forest & Baker wants a 128 layer neural net. And so on & so forth, until we end up with burning enough energy that would otherwise power a village, to overfit some dataset that excel at some completely artificial leaderboard metric.

At some point a grown up comes in & says you don't need to light up your bedroom with industrial stadium lighting if you need to do your homework, a 40w table lamp will suffice. Hell half the third world uses a candle & they are performing quite well, if not better. That's really all this is.

The marginal gains in performance aren't justified compared to the amount of energy you end up expending. Half these models are brittle & have zero shelf life, ppl regularly throw away stuff they wrote just a year ago, so to what raison d'etre this pursuit ?


The cure isn't proscribing "green ai", it's adding a carbon tax that embeds the cost of the carbon externality into the cost of power in general, then letting the free market make the decision.

> to overfit some dataset that excel at some completely artificial leaderboard metric.

This comment exemplifies the myopic narrowminded focus on costs and being dismissive of the benefits of greater resource investments which I am criticizing here. Said 'completely artificial' tasks like ImageNet actually turn out to transfer really well to the real world and are why AI is becoming so useful, and also spurred the research into how to create efficient NNs via distillation and compression which still cannot be done without training the large models first. You need to invest if you want efficiency. Performance precedes it, it doesn't follow it. As the programming saying goes: get it correct, then get it fast.


> The marginal gains in performance aren't justified compared to the amount of energy you end up expending.

Come on, we’ve been through this before. At least for the past 10 years if not much longer, performance improvements have not actually improved performance, they’ve improved developer productivity.

If you create a language that’s half the speed, but developers can program twice as fast in it, it makes sense to use it. Processing power is much cheaper than a $150k+ developer. Also, being first to market can yield outsized gains on competitors.


Money is optimizing this already as the highest invest is computing resources.

>This position paper advocates a practical solution by making efficiency an evaluation criterion for research alongside accuracy and related measures. In addition, we propose reporting the financial cost or "price tag" of developing, training, and running models to provide baselines for the investigation of increasingly efficient methods.

[emphasis mine]

They suggest reporting cost metrics next to benefit metrics so readers can judge the tradeoff for themselves. There's no proposal to ignore the benefits.


And, as I already pointed out, none of their suggests cost metrics are even close to a total cost, and many of them are seriously misleading, and they use them in misleading rhetorical ways (as I noted about their bogus estimates of CO2 emissions). Everything about this paper, from the name 'green AI' on down, is framed to put sole attention on a narrow arbitrary section of costs of political relevance, at the cost of a holistic evaluation. A single vague clause about 'other measures' doesn't change that, no matter how you italicize it. Motte-and-bailey.

It is useful, though, to compare results of projects with different budgets. There are papers tackling the same problem with 4 orders of magnitude difference in computing. Including an objective metric in the results would be worthwhile.

In this case, the metric I want is either TFLOPs (for a commercially available architecture) or kWh (for some in-house machine) rather than tons of CO2, which depends on their power source.


I disagree with some of this. The main thrust of their paper is correct, IMO: cost of training (and inference) needs to be factored into the comparison when comparing models. You can't optimize what you don't measure, and currently many SOTA results have the training regimes which take a ton of compute and aren't really doable in a research context if you're not Google and can't just use racks full of idle TPUs to do your stuff for free in the trough of the diurnal cycle while customers aren't using them.

Cost of training is currently ignored so much, some of the authors don't even mention it in their papers, yet it has a direct relationship with things like researcher productivity and reproducibility of results by mere mortals.

I wouldn't call this "myopically focusing on the cost of training". I'd call this "acknowledging cost of training as a valid metric worthy of publishing and optimization". I hope everyone can agree with this. Once you start optimizing this in addition to your mAP or BLEU or top1 accuracy, the results could be multiplicative.

I suspect this will ultimately come down to math: both having architectures which are amenable to efficient gradient propagation and having better optimizers.


There is no reason given to think that anyone is ignoring the total cost of training and not factoring this in when comparing models.

Do you think Google is funding all this research and just forgetting to consider the cost of training BERT? Do you think any researchers are ignoring the cost of renting GPUs or their university overhead? AI researchers talk all the time about costs! They're positively obsessed with it, in a way you don't see in any other area. (When was the last time you saw someone snicker about a CRISPR paper, 'yeah, nice results, but you know those reagents cost thousands of dollars, right?' Or covered in said paper?)

It's quite a double standard. Someone wants to spend >$20 billion on ITER (which doesn't work), you never hear about it; someone sketches out that the most inefficient way to train a neural net model (using on-demand cloud GPUs) costs $1m, suddenly it's a huge deal, a boondoggle, researchers should be ashamed of themselves, it's a threat to research, and we need 'green AI'.


I worked at Google for many years. Google has so much idle compute at any given moment, they tend not to care _at all_ about you using it. It's not a problem for a Google employee to just grab a few thousand cores on a whim and do whatever they want with those cores, and many people do use this freedom. No permission is required if your tasks are preemptible (which by design they mostly are, at Google). I'm not sure if the same applies to TPUs (I left before they became prevalent) but I'm sure for the right subset of people the right permissions could be procured. Most workloads follow the diurnal cycle, so at any given time anywhere between 20 and 80% of capacity is sitting idle. That's far more than you need to do most kinds of deep learning research. That's why you see work from them that would cost you $100K+ to replicate end-to-end (e.g. neural architecture search). At Google you get those resources for basically free, so it's not a huge issue to just spin up a rack full of TPUs. That's just how things are done over there.

I don't get your opposition to introducing approximate exaflops spent in training as an additional metric by which to judge "goodness" of a model. It is well known that some models are much easier to train than others. It's also known that some optimizers converge faster than others. All this stuff directly affects reproducibility of research. We're getting to the point where some of the papers aren't reproducible unless you're at Google or DeepMind or OpenAI or Microsoft Research. I think you'll agree that this is pretty terrible.

Between two models with approximately the same validation accuracy metrics and inference timings wouldn't you choose the one that's more economical to train? I know I would. Why aren't these metrics even mentioned most of the time? Beats me. Even better if the model requires less _data_ to train, but that seems to be less tractable than just estimating and surfacing the cost of training.


From what I understand from other Googlers, there are internal markets designed to track and allocate compute resources; this is necessary because of their expense, and because of the low marginal value of many possible uses.

> I worked at Google for many years. Google has so much idle compute at any given moment, they tend not to care _at all_ about you using it. It's not a problem for a Google employee to just grab a few thousand cores on a whim and do whatever they want with those cores, and many people do use this freedom.

And it works out for them, does it not? Because the uses are useful on net, even if some of them seem speculative or silly or inefficient. If they weren't, if people were just constantly burning resources to do absolutely nothing, presumably things would change and those resources would be tracked by the markets too or other measures taken. Let Google as a whole decide where to optimize, end to end - don't push it onto individual researchers and guilt-trip them into worrying about whether they are 'green AI' or evil people who hate the planet.

> I think you'll agree that this is pretty terrible.

I would not. I think it's awesome that machine learning is seeing the tiniest crumbs of the resources other fields have (still grossly disproportionate to its potential) invested in it, and we are no longer in a field where all results are reproducible on a $200 gaming GPU run overnight, and this is delivering big gains in performance and surprising nontrivial demonstrations of scalability (which would not be true of, say, IBM Watson). Again, as I said, imagine applying this sort of standard to any other field of science. 'You can't discover the Higgs boson using just 250 watts and a $200 interferometer you bought off eBay, I think you'll agree that this is pretty terrible.' Reproducibility doesn't mean 'should be reproduced, frequently, by anyone, at arbitrarily small budgets'.

> Between two models with approximately the same validation accuracy metrics and inference timings wouldn't you choose the one that's more economical to train?

No, I'd choose the one with the lowest total cost for my purposes. Do you program only in assembler to make sure you write programs with only the most economical runtime costs? What a narrow way of seeing costs... I'm often much more concerned with which one is easier to modify, or to scale, or to reuse for transfer learning, or which runs acceptably for users. The model which is 'more economical to train' is only better on one of many axis, and when someone else like OA or DM pays for it, it is a sunk cost of zero relevance to me. (How much does it matter to me how long it took GPT-2 to train? Not very much! Because I download the trained model and finetune it for other things.)

> Why aren't these metrics even mentioned most of the time?

They typically are. Most DL papers will mention roughly the GPU/TPU count and wallclock time, or you can guess from the training graph or baselines. And that's really all you need to know as you evaluate trainability as one of many cost variables; stuff like CO2 emissions or flops are not.


> There is nothing special about training an AI; it is simply another thing to spend resources on

If the goal is actual intelligence, as the 'I' in the name appears to suggest, then making resource consumption in the form of energy/compute a central part of the equation makes a fair deal of sense. It's arguably long overdue and might actually push research towards methods that actually do increase the capacity of agents to learn rather than just praying for faster chips and more data.

The benefit here can be twofold in that more inclusivity increases the pace at which the field advances by including more researchers, but also just gives researchers an incentive to not treat compute/data as unlimited moving closer towards figuring out how the brain actually learns.


I like this idea! In last week’s Lex Firdman AI interview, Gary Marcus touches on the low twenty watt energy requirements of the human brain compared to deep learning energy requirements.

Basically adding energy requirements to the loss function of automated neural model architecture search seems like a good idea also. (I am thinking of frameworks like AdaNet, etc.)

I retired this year but I still spend a lot of time reviewing deep learning and also conventional AI literature (and I do tiny consulting projects to help people get started or build prototypes).

Since I now mostly pay for my own computer resources I try to mostly limit myself to what I can do on my System 76 laptop with a single 1070 GPU. The availability of pretrained models makes this not so bad, at all. I really appreciate efforts by huggingface (and other organizations) of offering reduced size models that still provide good results.


Don't forget biological brain "neural architecture search" has also had a significant time and energy cost - billions of years of evolution, powered by a pretty large fusion reactor for those billions of years.

That's the reason brains are so energy efficient today. Though you are right, energy constraints are built in with biological evolutionary search


The math is just wrong though.

To train a brain to be competent in a classification task it takes 20W rms for years on end for the individual + all of the wattage from the parents, grandparents, teachers etc... that are training the individual for those years. Very hard to determine the power allocation used for a specific human being trained on a narrow task for eg. object classification, but that doesn't mean it's not the comparable measure.

Comparing training of a single model to the "instant power" draw of the brain is not just over simplistic, the scale and time periods are wrong.


I agree. Using watts, which are energy (joules) per unit time (second), to measure the energy consumption of training a model is the wrong unit to begin with.

One cannot measure the total energy consumption of something without knowing both the wattage and the time spent running at that voltage. While training a neural network is often done from scratch or a pre-trained model with a known training time, the brain of a human being does not start developing from zero their birth. Measuring the amount of energy spent on a specific human being trained on a task would also have to account for the billions of years of evolution that lead up to the present-day structure of the human brain. It would be very hard but also very interesting to approximate the energy spent on this, but it may not be relevant to machine learning as the processing power scales and time periods are completely different than those involved in the development of the human brain.


> Measuring the amount of energy spent on a specific human being trained on a task would also have to account for the billions of years of evolution that lead up to the present-day structure of the human brain.

That's completely ridiculous. It's like saying that to measure the energy used by a computer on a task, you have to start by measuring the entire energy usage of the human race since the invention of the transistor.


Also it's important to note that the energy consumed by the brain is really really dirty, considering it's taken from sugars, which require farming to produce.

I'm willing to bet this will be used as one of the arguments in a corporate push for global automation in ~10 years.

8 year olds can accomplish pretty much any classification task. 20W * 3.154E7 seconds in a year * 8 years / 3600 seconds per hour gives about 70 kilowatt-hours, which would cost about $14 from the grid at $0.20/kwh. Our AI is nowhere near the intelligence of an 8 year old, who can presumably also do more things than just decide if a photo depicts a bus or an avocado.

I think your math is off.

8 years at 20W is:

0.02kW * 24h * 365.25 * 8 = 1 402.56kWh


Yup, you're correct, forgot to multiply by the 20W. Still only about 1.5X average American monthly electricity usage.

Our AI is nowhere near the intelligence of an 8 year old, who can presumably also do more things than just decide if a photo depicts a bus or an avocado.

Indeed, but at least you describe a slightly more accurate possible representation.


Back when I used to frequently visit Less Wrong, I made a post about this related to Watson from IBM.

I remember feeling really discouraged by the level of discourse and downvoting in the comments at that time.

https://www.lesswrong.com/posts/kaNErr6mbXvDF9YFf/watts-son


That's too bad, because your comments there are really bad. You argue solely by assertion and personal feeling, and then wonder why no one agrees with you. Why does it matter in the least that you are not 'impressed' or feel using more than 1 server 'waters it down'? Why does it matter how much IBM Watson uses? AlphaGo uses more electricity than any human; does it not win at Go? If a NN uses more electricity than 150 watts to classify images at superhuman rates, why does this matter? Is there some law of physics I am unaware of which stipulates that no computer is allowed to use more than 150 watts, or that we are unable to build systems which have more than 1 computer involved? Is there some theorem in computer science which proves that AI isn't real AI unless it solves problems in exactly the same way as a human brain at exactly the same efficiency, never mind how well it solves problems or how much superior to humans it may be otherwise? ('Jet planes, you see, do not actually provide "artificial flight" because they burn more joules than a bird would.') Is there some reason to think that AI can never be misused or cause problems if it happens to use somewhat more electricity? Have deep learning techniques not continued to scale over the 8 years since you posted that? IBM Watson failed; did it have anything to do with using 15TB RAM; would Watson now be a success and closer to 'real AI' if they had hand-optimized its routines for memory efficiency to do the same thing but only needing 1.5TB RAM?

> “AlphaGo uses more electricity than any human; does it not win at Go?”

I know you have a prolific reputation, and I mean this sincerely, but did you actually read the link?

The point was that with power constraints, the class of possible algorithms is restricted heavily, requiring development of an understanding closer to what algorithms a person must use in their mind.

If you present me a Go-based Turing test, and I’m pretty fooled by AlphaGo, except I get blasted by the hot air venting out from a big cooling system, then it probably reveals to me that it’s failed the test (at least until some far future where there are life forms that regularly vent computer-levels of hot air from cooling systems).

This effect is much more pronounced with a larger system like Watson, hence the article.

> “If a NN uses more electricity than 150 watts to classify images at superhuman rates, why does this matter?”

It matters because pure computing achievements are not isolated from their resource constraints. If a NN can generate flawless natural language responses, but it takes 1000 years per word, we would not be interested and certainly would not see it as valuable progress towards artificial intelligence (likely not even strong AI, but surely not tool AI, there just aren’t a lot of ways to demonstrate a conceptual breakthrough so wasteful of resources). Intelligence is (partly) generating language within human-level time constraints. Why not also human level RAM, physical volume, material density, etc. Solving it within these constraints is much more valuable than solving it without them.

Likewise with power. Maybe using 100x power is an OK tradeoff for pure tool usage (heck, I work professionally in computer vision and NLP and I make this concession to use machine learning models running on computers to solve my work daily).

But that doesn’t mean it’s an ok trade-off for the philosophical side of the questions, like approximating a Turing test or understanding how far we must go from modern deep neural networks to the algorithms an actual brain is using. And, ultimately, these will move from being philosophical issues to being practical ones.

> “Is there some reason to think that AI can never be misused or cause problems if it happens to use somewhat more electricity?”

Huh? Where did I ever say anything like this? I mean, humans misuse intelligence and cause problems on less power.

It sounds like you’re trying to propagate your feelings about the “green” aspect of the post into what I’ve said, but if you’ll notice, I did not say anything about that, and don’t particularly care about this OP article’s take on “green” AI power consumption, it’s not related at all to what I’m talking about. (Plus they are more focused on power demands at training time, while I’m interested in power constraints, like runtime or space constraints, at inference time).


> I know you have a prolific reputation, and I mean this sincerely, but did you actually read the link?

I did, and all the comments, to see where you were mistreated so badly that you'd drag it up 8 years later, like the Bourbons, having forgotten nothing and learned nothing.

> The point was that with power constraints, the class of possible algorithms is restricted heavily, requiring development of an understanding closer to what algorithms a person must use in their mind.

That's not true at all: it is not true that intelligences must exactly imitate humans to be intelligent, and if it was, it would not be true either as a lower bound, upper bound, or average case that an algorithm using the same energy as a human must be isomorphic to the human algorithm. Consider deep learning. The more compute it uses, the better it gets, and the more human-like the results are, and this is true despite the fact that they typically use more power than a human brain even at the smaller scales where they are often among the worse-performing and still-power-hungry of methods, and power consumption goes up from there; by your logic, they would be getting even less human-like, if power consumption and algorithmic equivalency were so inextricably intertwined. But they aren't. 'The bitter lesson' is itself a disproof of your claim that power efficiency has anything to do with potential for future progress or human-like-ness of algorithms.

> But that doesn’t mean it’s an ok trade-off for the philosophical side of the questions, like approximating a Turing test or understanding how far we must go from modern deep neural networks to the algorithms an actual brain is using. And, ultimately, these will move from being philosophical issues to being practical ones.

None of that follows. I made this quite clear, that this is why your complaints are irrelevant or applicable only to the most narrow economic situations. Does a plane not travel faster than a bird because it uses many more joules to do so? We can use far more resources than we do, and all of the consequences of AI can still follow. (This also holds for the silly arguments that computational complexity somehow means that AI can't work or can't be a threat: https://www.gwern.net/Complexity-vs-AI ) Everything about superintelligence, AI risk, new breakthroughs, all of that can still follow even if an AI uses 10, 1000, or 10,000x more joules to accomplish a task. Humans themselves use astronomical amounts of energy compared to, say, insects; does this mean humans are not intelligent, and insects should be questioning whether humans are, on a philosophical level, as powerful intelligences as insects are, since we need so many additional calories to solve basic vision and navigation tasks? Who cares if we are being eaten by gray goo and you go 'ah, well, it requires several orders of magnitude and therefore isn't really real AI, this gray goo would be more impressive if it solved stuff within human-level time or energetic constraints'?


If you do consulting, you should consider using pre-tax money to buy whatever hardware you need for work. Trump made it much easier to amortize business expenses up to a certain limit, and this could save you a lot of money if you're in the higher tax brackets. I spent ~16K on deep learning hardware for my business this year so far (consumer GPUs, since I'm not running them in the "datacenter") because I'm not made of money and think current cloud GPU pricing is a rip-off.

I find it strange that the part of the conversation where AI has the potential to save thousands of hours of human labor doesn’t show up more often in these types of threads.

That being said, we need to definitely be thinking about lowering the cost (environmental and monetary) of training these models. I’m glad research is being done within this domain.

I’d love to see a study on what the human labor cost potential vs. training environmental costs would be for certain large models.


I find it interesting that you see 'save thousands of hours of human labor' as something that's surely positive. What should those people start doing instead? They also need an income to get bread on the table.

Literally anything else.

Whatever makes them happy, petting their dog, having a conversation with their friends/family, make some music that one person listens to, literally anything produces more value for society than performing arbitrary unnecessary labor which could be automated for near-zero marginal cost.

We should divert that capital to something more productive, tax the process and kick them back a basic income to do whatever they want with. If what they really want to do is what they currently do at their job, whatever, they can keep doing it while supporting themselves on the basic income, but that'd be their uncoerced choice.

Most people are unhappy at work and only go because they are required to to survive. If our society weren't so myopic and steeped in puritan ethics this would be as absurd as asking whether the thirteenth amendment was going to be surely positive.


Work gives us purpose. There is no substitute for enjoying the fruits of one's labor. Working on teams anchors us socially. On the flip side, working for someone else with little agency and decompression time is not very enjoyable.

Between pointless distraction and work, most people will pick the latter, especially after a solid decompression period, in spite of the surface enjoyability balance favoring the former.


There's a lot of work that I want to do that I cannot be paid to do, therefore those are the things I would be doing with my time instead.

Being paid is not a substitute for enjoying the fruits of one's labour.


True. The point is that at a personal level leisure is no long term substitute for work.

On organizing work. Chances are that at some point you'll want to work on things beyond the scope of what you can accomplish on your own. Or want to have a larger impact on the world / other people's lives. The big question is how to organize the work of people at society level. The pay per work model has proven to be very effective and gave us the marvels of the modern world.

On the shadow side, the pay per work model has become heavily geared towards exploitation, as in exploration vs. exploitation. We are at a point where worker conditions in a BigTech warehouse, or as a mechanicized Turk, are meticulously quantified exploitation. At the same time, we find it natural to expend the energy budget of a small city to enable machines do large-scale exploration.


Not possible under capitalism. Work or starve modulo the welfare state is the order of the day.

Apply it backwards - would a welfare system of having people complete large N Tower of Hanoi locks that pay people money be a good idea?

On one hand technically it is better than nothing immediately for those in need and is fraud resistant in that it consumes time and any better paying job without major negative externalities would be a better use of their time.

The negative is that it does nothing productive and those who are dependent on it aren't able to use their time productively on anything else.

While the problem of lack of income and lack of good uses for their labor certainly should be addressed just because it provides income doesn't make it a good thing overall.

It is one of those things which puts the "dismal" in the "dismal science" term for economics - regardless of any misgivings over calling it a science. It is essentially a subset of human organizational and incentive problems which the economy "is designed" (technically more emergant even if it is literally a planned economy) to solve.


What do you mean by "productive"?

Bertrand Russell's In praise of idleness might help:

http://www.zpub.com/notes/idle.html

People won't stop finding things to do, they'll just get to choose things further removed from survival.


There is no limit to demand.

If X becomes cheaper, humans spend the surplus resources on Y. If Y becomes cheaper, humans spend the surplus resources on Z. And on it goes.

"Save thousands of hours of human labor" doesn't mean humanity is working thousands of hours less total; it means that they accomplished the same specific X with thousands fewer hours (and then almost certainly spend thousands of hours on Y).

That's not just an armchair theory. Time-honored tradition of technology advancement. Agriculture fell from 70% to 25% of all U.S. jobs in a single lifetime. Staggering economic change. And humanity is better off for it.


Won't lowering the cost/complexity of deep learning just allow those with more resources to increase the complexity of their models while keeping the costs the same?

A lot of these AI research projects don't have immediate economic impact.

Like the abstract mentioned, I think this is also a good criteria for helping "level the playing field" for groups with lower budgets so the phenomenon of simply throwing more computational resources at a problem with an existing resource becomes less of a novelty.


Hm, I struggle to see an upside with levelling the playing field that way. Groups that have the budget to throw huge amounts of resources at problems are still providing important insights. That can happen in parallel to optimising wattage/computational unit. In fact, that can be an almost completely parallel track in AI research.

right now such a parallel track doesn't really exist, at least not on an even footing.

as the article shows, there's still a heavy focus on beating accuracy benchmarks, at huge computational cost, in terms of what actually gets accepted at major venues.

it can be quite difficult to make the case for a method that has worse performance, but is cheaper. hard to judge without real data but I think such papers are much less likely to be accepted.

there are also sometimes demands to replicate very expensive techniques as baselines, which can be onerous for groups with limited resources.


The upside is probably opportunity cost related essentially like mainframes of yore. Technically they could handle higher throughput than the workstations at the start and even today but once developed they became a dead end compared to more parallel approaches.

That sort of quantity vs quality thing has happened a lot throughout history. The "VHS beats Betamax" moments that goes way further back but with more obscure examples.


The citation for the "surprisingly large carbon footprint" [0] is crazy. The paper alleges that a car, including fuel, has a lifetime footprint of 126,000 lbs of CO₂e, and training a big NN transformer consumes 626,155 lbs of CO₂e, almost a 5-fold increase.

[0] https://arxiv.org/pdf/1906.02243.pdf


Training it _with neural architecture search_. Just training it with a given architecture they cite as 192lbs ...

I thought Google etc datacentres run on mostly renewable energy?

Research that demonstrates, say, 1000x reduction in power for training a known problem (ex: mnist) won't be considered irrelevant by the community. So is there a specific need to bias against large works apart from the carbon footprint argument? There remain questions to be answered in that space too - such as are current "neural" architectures adequate to cover the capabilities of the brain when upping only the scale? It was certainly worth knowing that scaling up was all that was required to compete with humans in DOTA. But will we hit wall as we near human level complexity? After all, the money spent on making movie which is "just" for entertainment trump's multi million $ deep learning experiments in costs which I guess have some correlation with the carbon footprint .. or the gases emitted in rocket launches. Do we really know whether this well intentioned call for green AI (which I right as hell want) will do too little for the greenness while biasing people against possible discoveries that can lead to a greener future a tad too early?

Edit: pardon typos due to mobile device.


> The computations required for deep learning research have been doubling every few months

I feel this is misleading. Computation is cheap, so the computation thrown at deep learning research has been doubling every few months, but that doesn't mean it's required to do research (unless your research is "throw huge datasets at a neural net architecture and see if it sticks".)


I understood it as “computation required to replicate stuff outlined in a paper has doubled” not as “we do more research so we do more computation”.

That...does seem to be a lot of the current research though.

This reminds me of that Simpsons quote "We can't fix your heart, but we can tell you exactly how damaged it is"


I'm not sure about that. I think you get what you measure, and if you start measuring effeciency, you're going to start seeing a major incentive to make it more effecient.

I agree that it is important but think it is more understanding is a prerequisite to do so deliberately (careful of Goodhart's law) and from there the incentives become evident.

Like Lanchester's laws. It is obvious how outnumbering the foe can help but knowing the scaling really hammers home how important it is to concentrate force against as weak a foe groups as possible.

https://en.m.wikipedia.org/wiki/Lanchester%27s_laws


Yes the first step to improvement is quantifying your objective, but it would have been nice to see some actual improvements and concrete implementations. The quantifying part is easy.

This quote amused me:

> deep learning was inspired by the human brain, which is remarkably energy efficient

Yeah, sure, if you purely look at its' energy consumption in isolation and ignore all of the flights of fancy we engage in (such as, for example, this NN model) in order to maintain its' coherence.


Efficiency metrics would be really useful is evaluating DNNs for embedded solutions as well.



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: