
What Happened in the UK Blackouts? - mitchoneill1
https://mitchoneill.com/blog/uk-blackouts-interim-report/
======
londons_explore
It's notable that this report is completely consistent so far with the way I
would design a cyber attack to take out a countries power grid.

> The causes provided by RWE for the initiation of the trip of Little Barford
> steam turbine (ST1C) was due to discrepancies on three independent safety
> critical speed measurement signals to the generator control system.

A cyber attack would most likely, rather than directly take capacity offline,
instead make the system less stable by attempting to make safety systems not
work properly. That has the benefit that the attack will be triggered at th
exact same time as another incident, increasing the impact.

Injecting false values for a sensor reading is a very good way to hide your
tracks as an attacker - just inject a bunch of "0" values, and then uninstall
whatever malware did it, and probably nobody will ever find out why that
sensor randomly acted up.

~~~
std_throwaway
An even more stealth method is to make the system work perfectly if everything
is running fine. Like correctly reporting net frequency in the range 49-51 Hz
but reporting false values when outside of that range. That way there will be
no detectable errors until everything starts behaving erratically.

The best way for a society to prepare is to have a few small incidents that
put the safety systems to the test on a regular basis.

~~~
Reason077
_”Like correctly reporting net frequency in the range 49-51 Hz but reporting
false values when outside of that range.”_

You can’t fake the system frequency like that. Anyone connected to the grid
can observe/measure the system frequency trivially - you could do so right now
at the nearest 10A socket. The attack you describe would require compromising
large numbers of diverse, distributed generation and frequency response
assets.

It would be like trying to fool everyone into thinking it’s a hot day by
hacking every single thermometer!

~~~
delfinom
Yep. Once one meter starts reporting the wrong frequency vs the thousands of
other ones. It'll simply be taken offline and replaced because it could just
be a broken one. Depending on severity, the gig could be up if the meter is
sent back to the manufacturer for "repair".

------
diggan
To understand how/why these cascading failures can happen, Grady Hillhouse
(Practical Engineering on YouTube) recently made a video about substations and
what kind of faults can affect them, interesting if you're not super
knowledgeable about our electric grids already:
[https://www.youtube.com/watch?v=7Q-aVBv7PWM](https://www.youtube.com/watch?v=7Q-aVBv7PWM)

~~~
4rt
I always upvote grady. It's great to see his subscriber count finally starting
to match the quality of his videos.

------
Zenst
"Frequency drops below 48.8Hz which triggers the Low Frequency Demand
Disconnection scheme (LFDD). Approximately 1GW (5% of total) of Great Britain
electricity demand is turned off."

Is interesting as it highlights the importance of managed failure. Things
fail, but the ability to see those failures and control how that fails is
important. Had such an action not been taken, the result would of been a far
greater outage/impact and more so an outage that you had no idea what parts it
would impact.

~~~
mytailorisrich
The issue with this specific outage is that it was badly managed.

The main gripe people and authorities have with what happened is that key
infrastructures immediately lost power. When (only) 5% of demand is turned off
this should not impact railways, airports, and hospitals.

Beyond the initial loss of power other problems amplified the consequences:
For example, trains could not just restart but required specialist staff to
attend each of them to get them through a specific procedure, which turned a
short power loss into hours of delays.

This highlights the importance of looking at systems globally. The grid is
just one part of the system.

~~~
pm215
A lot of the headline grabbing stuff wasn't the grid's fault though:

* the railways didn't have traction power disconnected -- but one type of (new) train saw the within-specification frequency drop, shut down, and in many cases needed a fitter to go out and restart it. Everything else behind a dead train was stuck. I imagine the train company are discussing this with their supplier...

* Ipswich hospital had a power outage, but that was because their own systems disconnected from the grid when they saw the frequency drop, and then they had a problem with one of their generators.

*One regional airport seems to have been accidentally not on the list of key sites not to disconnect -- sounds like an admin snafu, and has now been fixed.

None of these seem like "bad management" \-- they are a collection of
relatively minor, entirely unrelated, problems of the sort you're likely to
run into if you have to activate an emergency procedure you haven't needed to
use in a decade.

~~~
topspin
I read your list and see a bunch of 'technical debt.' Trains put into
operation without the ability to survive a power transient without
intervention. Life critical backup power system failures. The set of critical
sites is not being maintained properly. In my mind these failures are are not
entirely unrelated; reliable power is taken for granted and the testing and
maintenance necessary to cope with power outages is neglected.

~~~
growse
> Trains put into operation without the ability to survive a power transient
> without intervention.

The trains were built to a environmental tolerance spec, and shutting down
when that spec is exceeded seems like a reasonable thing to do. Having an
expert test that all the safety systems are still working properly after
they've been exposed to conditions they weren't designed for also seems
prudent.

Do you think people should pay higher ticket prices / more tax to fund trains
that can tolerate a wider range of input frequencies, given how rarely these
sorts of events happen?

~~~
caf
The GP says that the frequency drop was within-specification, not exceeding
spec.

~~~
growse
Which specification though? At the time they were being designed and procured,
I think the specified tolerance for frequency change rate on the grid was
0.125Hz/s, but this was relaxed in 2014 to 1Hz/s (see
[https://www.nationalgrideso.com/document/10771/download](https://www.nationalgrideso.com/document/10771/download)).
How much the effects of relaxing this on existing equipment was considered,
it's not clear.

Similarly, the grid's license gives them a ±1% variation around 50Hz (so
±0.5Hz from [https://www.nationalgrideso.com/balancing-
services/frequency...](https://www.nationalgrideso.com/balancing-
services/frequency-response-services)).

So we have a power anomaly where the frequency moved >0.125Hz/s and dropped
well below the 49.5Hz license threshold. I don't know what the procurement
spec was, but I'd be very surprised if this event was "in-spec" for the
trains, so in the absence if this it's probably reasonable to say that GP is
incorrect.

Transformers get quite unhappily melty if you drop the frequency too much, so
if your cooling system isn't designed to cope with this, it's probably better
to shut down than have an exciting fire.

~~~
makomk
The grid's license only requires them to be within ±0.5Hz of the nominal 50Hz
for at least 95% of the time. Anyone designing a system which needs to operate
reliably ideally ought to design it to operate over the full 47.5-52Hz range,
or at least not fail so completely it cannot be restarted. Low frequency
demand disconnect only starts at 48.8 Hz, ao any system that can't handle that
is effectively setting itself up to be the first thing to fail when there's a
power shortfall. This is utterly inexcusable for a train given the amount of
disruption caused by multiple lines being clogged by failed trains in multiple
places.

~~~
growse
I don't disagree that a train that can tolerate a wider range is desirable,
but lots of things are desirable. It's also desirable to give everyone their
own carriage, or have a free bar on every train. But at what cost?

* Wider tolerance means heavier electrical and cooling systems, increasing both capital outlay and operating cost for both track wear and maintenance.

* The trains procured were essentially "off the shelf" from Siemens and would have been designed and manufactured for other operators before the DfT put the tender together. A wider operating tolerance would have therefore either ruled out the Siemens trains, or at least significantly upped the cost as they would have had to redesign them.

An annual season ticket from St Albans to London is about £3,600. How much
more should that passenger (as well as the taxpayer) be paying so that they
can travel on trains that work when this type of event happens? How often does
this type of event happen?

> This is utterly inexcusable for a train given the amount of disruption
> caused by multiple lines being clogged by failed trains in multiple places.

People being stuck on trains for a few hours is pretty far down the severity
list of "things that could go wrong" when the power goes out. That said, what
I do think the outcome will be here is they'll figure out if/how the driver
can self-reset this type of issue rather than requiring a fitter to travel
out.

------
eDISCO
Most traditional generators (think CCGT, nuclear, hydro) run in synchronous
mode. Which means the generators spins with rpm proportional/equal to the
network frequency. For 50Hz that would be 3000rpm. When the total electricity
demand exceeds the supply, the kinetic energy of spinning generators is
converted to electricity which slows down the generator rpm, which reduces the
frequency. All synchronous generators on a given network spin at the same
frequency, so when the demand is too high all of them slow down a bit.

~~~
circular_logic
That makes me wonder, if you were standing next to these turbines would you be
able to here them slow down?

~~~
bsder
Probably, although you are more likely to hear the 3000Hz to 2880Hz change in
the generator RPM than the 50 to 48.8 change in the output.

------
aurbano
It actually sounds like this was really useful though, as a lot of the systems
involved will be improved in the future.

I'm not sure if the grid is already doing this, but they should trigger
failures randomly, including Low Frequency Demand Disconnection schemes from
time to time to ensure that all downstream systems are also ready (like the
hospital someone mentioned that had an issue with the backup generator, or the
trains that stopped working when the frequency failed). A bit like caos
monkeys in dev environments.

~~~
pm215
Triggering demand disconnection randomly is saying "we should provide
consumers a service including deliberate random blackouts". That's OK in
contexts like Netflix where the downstream systems are supposed to be able to
be resilient to faults and failures are comparatively low-stakes anyway. But
almost all grid consumers aren't resilient to power loss (would your house
continue to be fully functional without noticeable ill effects if power to it
went out? Mine wouldn't...) and even for industrial scale consumers who do
have generators the costs of finding out in a deliberate blackout that their
generator was faulty could be anything from expensive to life-threatening.

~~~
noir_lord
One of the hospitals affected by the blackout describe in this article had
it's generators fail to come online cleanly, they had to go manual in ICU.

~~~
walshemj
Why are they not on battery at all times in ICU

------
kachnuv_ocasek
What an interesting read! I had no idea that a blackout happened in UK
recently, so I clicked out of curiosity and was rewarded with this delightful
description of the initial events.

It's also interesting to see the interplay of automated and manual systems in
such a complicated scenario.

------
pm215
The article says near the bottom "At the time [the GT1A generator tripped]
there was 4000MW of unused generation in reserve, and if only a fraction of
this reserve was instructed to dispatch within 60 seconds of the initial loss
it’s very likely that load shedding would have not occurred."

My understanding was that having reserves available within 30-60s was pretty
expensive, and that that would have been what was used to handle the initial
frequency drop (the interim report PDF says there was 1000MW of this). Drax's
page on this ([https://www.drax.com/technology/power-systems-super-
subs/](https://www.drax.com/technology/power-systems-super-subs/)) says that
even spinning reserve can be on the order of two minutes, and the next tier
below that (STOR) is on the order of 20 minutes. So does anybody know what the
4000MW available within 60s but not called on was?

~~~
mitchoneill1
Hey! (author of the article here). I'm not happy with how I phrased this with
the 4GW of capacity -- and while I suspect generation could have increased by
hundreds of MW within 60 seconds I may be wrong. I've updated how I've phrased
it in the blog to hopefully give a bit more clarity to my position.

I don't meant to suggest that plants could ramp full within 60 seconds, it's
that 4GW means there's a bunch of headroom, and that there was at least the
physical possibility that dispatch instructions to increase generation/start-
up (the reserves were "available") could have been sent out and plants could
have started moving on an upward trajectory, creating even a few extra hundred
MW of generation if many of the more flexible plants (gas, hydro) could
increase their output even just a bit.

Here in Australia we dispatch on a 5 minute period, and there are some plants
that are very slow moving and have to ramp over tens of minutes or hours, but
moving grid-wide generation up and down many hundreds of MW within a 5 minute
dispatch period is a common occurrence, and so this coupled with the
relatively high amount of reserve makes me suspect it would have at least been
physically possible to increase output and hasten the return to a normal
operating frequency.

How you create both the systems and market structures/rules to oversee
dispatch and coordination of something like that in a fast and automated way
is very complex and infeasible to effectively do any time soon, but I think
it's a system we should be thinking about and moving towards.

Overall though I'm just grumpy about the amount of manual, slow work that's
still involved in grid operations (when we lose an inter-connector here in
Australia it takes 3-15 minutes for the dispatch engine to become aware that
there's no inter-connector anymore as it needs to be reconfigured manually).

~~~
guerby
I assume battery based grid power stations like the Tesla one in Australia are
amongst the best in class to help in this kind of crisis situation? Compared
to the gaz turbine the system it is simpler (no mehanical/heat/fluid parts)
and depending on design will likely never be completely off since made of
dozens or hundreds of independant equipement (batteries and inverters).

Thanks for the article and comment!

~~~
barney54
Yes, batteries are very fast acting, but that battery in South Australia is
only 100MW and the grid lost over 1400 MW.
[https://www.news.com.au/technology/innovation/inventions/eve...](https://www.news.com.au/technology/innovation/inventions/everything-
you-need-to-know-about-teslas-battery-in-south-australia/news-
story/a989f74cfccb8a1211de83f5becc60ed) Batteries have a lot of potential, but
they are currently small in capacity.

~~~
jhayward
> _but that battery in South Australia is only 100MW and the grid lost over
> 1400 MW_

I hear this kind of misconception fairly often and so it is worth noting that
you don't need 1400 MW of battery to protect from this event.

The initial 735 MW drop happened, as I read TFA, when a major generating
source _took itself offline as a result of the frequency disturbance_. The
later generator drops are cascades of the frequency disturbance earlier in
time.

If a suitable battery had been present and reacting in milliseconds it could
have stabilized the frequency enough, for just the very short period required,
to avoid the whole generator trip in the first place. Thus you can use much
smaller amounts of fast-response frequency support to avoid much bigger drops.
This is how batteries can pay for themselves so quickly when placed in the
right topologies w.r.t to the grid demand and generation resources.

~~~
generatorguy
If 200 MW of generation trips instantly, which is typical when electrical
equipment fails, you need 200 MW to replace it in the next few seconds, and
then additional generation to bring the frequency back to Nominal.

So far the largest battery is 40 MW? So not really going to help more than 40
MW can, even if it comes online instantly, when 200 MW trips off.

~~~
jhayward
The point is to avoid the generator trip in the first place.

~~~
generatorguy
how much battery storage at what point in the system would have prevented this
blackout?

~~~
jhayward
Answering that question would take a detailed analysis which I am not aware
of. It would be less than the instant load of the Eaton Socon - Wymondley
transmission circuit that was struck by lightning.

It is basically a question of providing just enough frequency support, for
long enough, for the remainder of the grid to respond to the frequency drop.
In other words, let the frequency sag a bit so generators respond but not so
much that they disconnect.

I think there's also some control system error implied in the wind farm's drop
as I read the article. That perhaps shouldn't have happened at all and was an
un-analyzed state that will be corrected.

~~~
generatorguy
>> but that battery in South Australia is only 100MW and the grid lost over
1400 MW

>I hear this kind of misconception fairly often and so it is worth noting that
you don't need 1400 MW of battery to protect from this event.

I would be interested in any papers or literature that have led you to this
conclusion.

as soon as (generation - load) <> 0 the frequency is going to change. If you
lose 200MW of generation and have 100MW of batteries that are configured with
0 or very small amount of droop they see some small change in frequency and
crank out 100MW to try and stop it from falling further. Then they are maxed
out, there is still 100MW of imbalance for conventional generation with
governors to take up, which they will as the frequency falls as if the trip
had only been 100MW in the first place. How is this not a 1:1 relationship,
100 MW of batteries displace 100MW of lost generation?

For this event in the UK in particular it appears the issue is that the large
windfarm, and 500MW of small embedded or distributed generation are en masse
improperly configured or by nature are not able to ride through any small
transient wobble in frequency. TFA didn't show us what voltage these plants
saw.

Although a 200MW steam turbine also tripped due to the same
lightning/transmission line trip event, so clearly it was subjected to
something that exceeded instantaneous thresholds and had to trip immediately.
And then that steam turbine tripping caused the gas turbines at the same plant
to trip which indicates there are some design or operational issues at that
plant.

Also is 20s reclose time for a transmission line standard? I've only
configured distribution line reclosers but it was more on the order of 1s for
the first reclose and maybe 7s for the second?

------
fyfy18
What affect does the frequency drop have to home users? Is there protection in
the substations near to homes that will cut off the power completely if the
frequency goes out of range?

~~~
rcxdude
The grid frequency itself doesn't matter much as far as consumers are
concerned, the deviations mentioned in this article would not cause any
problems for a home user directly (or even generators, but as mentioned below
if it is too far out of range generators start to need to cut out) but it is a
far better indicator of the general balance of supply and demand of power in
the grid than any other metric, because it can be measured very precisely,
isn't subject to local effects like voltage, and correlates well with the
state of the generators.

One quirk of synchronous generators is you can think of the whole grid like
all of the generators in it are geared together. If demand rises, the torque
on the generators starts to increase and if the power plants don't feed more
mechanical power in then they will start to slow down in unison. Likewise
their speed will increase as the demand drops. So the frequency is the main
indicator of the health of the grid and so the grid operators will monitor it
closely.

~~~
the_mitsuhiko
> The grid frequency itself doesn't matter much as far as consumers are
> concerned

So I thought and then we had a multi week frequency shift in the EU and
various clocks started lagging behind. This does show up for consumers as
well.

~~~
jacques_chester
Generally grids try to average out to their target frequency over some period
(24hrs, for example) because so many timekeeping systems rely on counting
cycles.

So sometimes they will run slightly above target and sometimes they will let
it run slightly below target. Ordinary daily demand cycles help.

------
hyperman1
I like this article as it gives a lot of advanced insights about what seems at
first an easy problem.

Generators are tech from, what, 2 centuries ago? Keeping them in sync sounds
boring, just talk together. And then, under the surface, looms an incredibly
complex problem about working at scale, having engineering, political,
organizational aspects.

------
computator
Simplifying, this is what I get:

(1) A lightning strike creates a transient voltage on the grid.

(2) Some power generators remove themselves from the grid to prevent damage to
themselves from the transient voltage.

(3) Now there is too much demand on the grid -- too many people using power
and not enough generators to supply it. (The line frequency dropping indicates
that supply and demand are not balanced.)

(4) So the grid operator begins to (deliberately) remove some people from the
grid to keep the supply and demand balanced.

What's not clear to me is what would happen if you didn't do step #4.

~~~
NobodyNada
That’s when you end up with a cascading failure
([https://en.wikipedia.org/wiki/Cascading_failure](https://en.wikipedia.org/wiki/Cascading_failure)).
The remaining generators are overloaded by the demand and shut down to protect
themselves as well, which further decreases the supply and causes more
stations to shut off, etc.

The Wikipedia article about the northeast US blackout of 2003 is a very
interesting read:
[https://en.wikipedia.org/wiki/Northeast_blackout_of_2003](https://en.wikipedia.org/wiki/Northeast_blackout_of_2003)

------
wyiske
CityAM published a really interesting article about this
[https://www.cityam.com/what-was-behind-fridays-national-
grid...](https://www.cityam.com/what-was-behind-fridays-national-grid-outage-
network-theory-not-conspiracy/amp/) It certainly fits the given explanation

------
jonatron
It was a few minutes of downtime affecting a fraction of people. The last
large UK outage was 10 years ago. The uptime has a lot of 9s already, so even
if they ensure this particular problem wouldn't cause downtime again, the next
time it'd just be a different rare event that causes it.

------
Havoc
This to me suggests we need a hell of a lot more batteries in the system.
Don't see anything else reacting fast enough.

------
stygiansonic
For a related read: the Northeast blackout of 2003:
[https://en.m.wikipedia.org/wiki/Northeast_blackout_of_2003](https://en.m.wikipedia.org/wiki/Northeast_blackout_of_2003)

------
kzrdude
At one stage, they have to shed load (turn power off), and that sounds totally
reasonable. How are these areas divided, how does one turn power off for a
whole area, and how is it decided who to cut from power (temporarily)?

~~~
pbhjpbhj
From the little I know UK settlements have substations that serve something
like 1000 homes. Older ones are open air affairs, newer ones are prefabbed
enclosed buildings. They're oil cooled transformers and switching gear. One
serves the village where my parents live, in a city there are two within about
400m of me (serving different neighbourhoods). Sometimes a neighbourhood will
trip out, like in a thunderstorm; or get turned off (maintenance). I assume
that the central grid has a switching order for these to be turned off to
maintain essential supply. Factories, hospitals, and such have their own
substations.

------
pedrocr
Isn't this kind of fault a good argument for synchronizing the UK grid to the
rest of Europe? They seem to already have an interconnect that's of the scale
of demand needed here, but it's a DC link.

~~~
radiowave
If they were synchronized, it would mean that in the case of any larger
failure that would draw too much current over the undersea link, the link
itself would have to trip out.

~~~
pedrocr
That depends on the link size. But I guess it may very well be better to have
a big enough DC link and have it automatically modulate power all the way up
to maximum and then keep it there, instead of tripping.

------
rwmj
Why does frequency drop when the system is overloaded?

~~~
techsupporter
Imagine if you are running on a conveyor belt at 60 steps per second. This
belt is turning a crank that, let's say, moves water. You are the energy input
(the sun for solar, heat energy for steam turbines, movement of wind for
windmills, and so on), the crank is the generator, and the water is the load.

Now imagine the water is suddenly turned into vegetable oil (more load has
come on). It is more difficult to move. For a brief moment, your energy input
is still the same but you can't push as much of the new liquid so your pace
slows down a bit, to 58 steps per minute. You increase your exertion (add more
energy input) and your pace climbs back to 60 steps per minute.

Finally, imagine that you're running as hard as you can. That 60 steps per
second to push that vegetable oil is as much as you can do. But the vegetable
oil turns to heavy cream (even more load). You've put in as much energy as you
can but your pace falters and drops to 49 steps per second.

That's why the frequency falls.

------
abedci
9k

