
Safety-first AI for autonomous data centre cooling and industrial control - dsr12
https://deepmind.com/blog/safety-first-ai-autonomous-data-centre-cooling-and-industrial-control/
======
candiodari
This explanation ignores the risk that is posed by sudden absence of the AI.
Given that it's a remote system this seems relevant. If, for whatever reason,
the AI decides to call it quits, or the network in between disconnects, what
happens ?

I ask because I've implemented systems that allowed operators of large
networks to do exponentially more work, and therefore grow the network while
keeping operators down. It essentially reacted automatically to random
outages, tracking their status and carefully bringing things back up when
possible. Then the system went down, and we started repairing the system under
the knowledge that we had to call up everyone extra to do the (very tedious)
work that this system did. And for those of use repairing the system we were
working against the clock: we knew we had about 2 hours before we'd start
missing contractual obligations to customers, and about 2 days before we were
likely to experience serious degradation in network performance and bandwidth,
and this couldn't be fixed, as in it would happen regardless of any realistic
amount of human effort invested in manually doing this.

We fixed it (in about 1h15m) but I'd be lying if I said the patch was properly
tested when we set it live.

The scary thing is, this happened ~8 years ago, and now we've pretty much had
a "full turnaround" in the "manual" team: only 2 (out of ~80) have ever
manually taken care of these actions. If it were to fail now, and those 2 are
on vacation ... hell it's been working so well I wonder if they'd even notice
before it's too late.

So what happens to the DCs when the AI disconnects somehow from the equipment
? Shutdown ?

~~~
Chopsah2
Hi, I'm one of the engineers working on this project at Google. If the AI
disconnects or starts to make bad control decisions, the local control system
(which has veto power) kicks it out and takes over. We lose some efficiency
when this happens, but the cooling system stays safe and operates in a mode
that the human operators understand completely.

~~~
candiodari
That was true for this system as well. How long until management fires these
people ? How long until nobody in the industry knows anymore ?

~~~
Chopsah2
Here's an excerpt from [https://www.datacenterknowledge.com/google-
alphabet/google-s...](https://www.datacenterknowledge.com/google-
alphabet/google-switching-self-driving-data-center-management-system) where
our VP Joe Kava addresses that question:

What About the Jobs?

With more and more of the company’s data centers shifting to automated
infrastructure control, and with the real possibility that the same will
eventually start happening outside of Google, arises the inevitable question
of jobs. Are Google’s data center engineers engineering themselves and their
colleagues out of work?

So far, Kava hasn’t seen evidence of that happening.

“We still have people there, because they still have to do all the
maintenance,” he said. “So, you’re not getting rid of the people, you’re
augmenting” the existing team’s capabilities. “Instead of trying to tune the
system themselves, they can focus more of their time on preventative
maintenance and corrective repairs.”

Besides, AI still does poorly in situations “outside of the envelope of its
training,” he said. People are very good at making observations in what Kava
likes to call “corner cases” and coming up with a course of action on the
spot. AI isn’t.

In other words, it’s a good idea to have AI fine-tune a cooling system to
improve efficiency in pre-tornado conditions, but you better have some human
engineers around in case a tornado forms.

~~~
friday99
I think the problem is in the idea that "you better have some human engineers
around in case a tornado forms." That is a completely different job than the
original job and the folks waiting around "in case a tornado forms" likely
won't have the skills to fix that tornado anymore. The issue isn't that "the
robots are going to take our jobs!" its that the new jobs are ones humans
actually aren't very good at; waiting around ever vigilant until the automated
system screws up and then immediately coming up to speed and fixing the system
they no longer have any interaction with.

------
3pt14159
> It was amazing to see the AI learn to take advantage of winter conditions
> and produce colder than normal water, which reduces the energy required for
> cooling within the data centre.

Sorry, what? It must be more complex than that. That's something a basic
multi-linear optimizer could have accomplished.

~~~
sandworm101
Also seems like the AI is just taking advantage of over-engineered systems
with unused capacity/safety margins. Why can the system handle that colder
water? I'd expect condensation, even ice, if the chilled water was below
expectations. How much of this improvement is really just the AI making
changes that the humans would not normally be allowed to make? How much is
just that the AI is allowed that little bit closer to the edge than we allow
the humans?

------
AndrewKemendo
Whenever someone goes off the deep end making statements about how AI is going
to take over the world and become an existential threat, one of the key things
I point to is the lack of AI integration with control systems - those that
regulate physical systems with digital systems (SCADA or otherwise).

This effort, while seemingly benign, seems to be making my argument more
difficult to make - and I actually think this is a major step. They threw the
Human Override in there at the end as a hedge, but the writing is on the wall.

I am a huge advocate for AI running the world, and don't fall into the AI fear
mongering camp, but as someone who has been building and implementing ML
systems for the past few years and working on AI problems for about a decade
now, I don't have a ton of confidence in our ML systems to be able to
autonomously control physical systems just yet.

And just to reiterate, DeepMind is explicitly trying to create General
Artificial Intelligence.

------
tomkat0789
I did my PhD dissertation on a similar topic to this. The company I worked
with despised neural net methods because it wasn't robust to new operating
regimes that cropped up frequently and it wasn't explainable enough for
anybody to trust it.

That said, it'd be cool if Google opened the data set for everybody. The
process monitoring community would be excited!

------
giocampa
So did they close the feedback loop with an AI, or did they just use some ML
hackery to create a model of the system? Does anyone know how this compares to
a classical control system?

------
crunchlibrarian
Yeah just throw some of our magical AI dust at climate change, that'll fix it!

Why every company's PR team insists on this meaningless self-serving
propaganda being inserted into tech blog posts is beyond me. You know we all
see through it, right?

------
cbhl
I'd be curious to see what processes are in place for testing the failover of
these control systems. (For example, do the teams failover to the local
cooling system during DiRT?)

~~~
Chopsah2
The AI control is architected such that failover isn't really necessary; the
local control is always in control, it just gets suggestions (which it can
safely ignore) from the AI. If the AI disappears or starts sending bad
suggestions, the AI gets kicked out.

We're trying to continually improve the AI so that its time in control is
maximized, much like Waymo's early days with self-driving car software +
safety drivers.

------
mlthoughts2018
The climate change spin annoys me. Not because the research effort isn’t
valuable. Not because these energy optimizations are unimportant. But because
Google is generally a bad actor, with deleterious side effects on people, and
this just distracts from that. Unfortunately, there is no time to celebrate
doing incremental good when your company’s first order characteristics are,
like Google’s, severely harmful and manipulative.

