
Full tech report of UK 9th August power outage - trebligdivad
https://www.ofgem.gov.uk/publications-and-updates/ofgem-has-published-national-grid-electricity-system-operator-s-technical-report
======
eigenvector
To summarize:

* Multiple contingencies occurred simultaneously (loss of generation from two major generators and lost of distributed generation totalling 1,400 MW) resulting in a drop in system frequency to 49.1 Hz

* Standby generation (frequency response reserve) was deployed, totaling 1,000 MW or the largest single generation contingency and began to arrest the system frequency decline

* Just as system frequency began to recover, a third contingency occurred resulting in the loss of a further 210 MW of generation. This caused system frequency to decline again to 48.8 Hz

* Load shedding kicked in as designed and dropped 5% of load to stabilize the system

The largest loss of generation was from Hornsea offshore wind farm. The wind
farm should have rode through the system disturbance, but instead its control
and protection systems rapidly curtailed active power generation in response
to an undamped oscillation in the response of its voltage regulator through
the disturbance.

Basically, the internal voltage of the Hornsea wind farm collector system
dropped due to the voltage regulator oscillations (from 35 kV nominal to 20
kV), while active power generation remained the same. Power = current *
voltage, so an overcurrent condition occurred and protection systems operated
to prevent overload of the wind turbine generators.

Subsynchronous oscillations (SSO), i.e. oscillations at below power frequency
(50 Hz), are a known issue in power system controls that can lead to unstable
or unexpected consequences during system disturbances. The reduction in system
inertia caused by the replacement of large synchronous machines with
asynchronous generators as wind and solar replace conventional generators
exacerbates the possibility of problematic SSO because there is less damping.

Nowadays, in North America, very specific modelling is done in design stage to
identify the possibility for such behaviour and ensure that if present it is
adequately damped. Some system operators, such as ERCOT (Texas), require this
for new wind projects. I imagine that this major occurrence will led to
revisions to modelling and grid code testing standards in the UK to protect
against future incidents.

All in all, kudos to Ofgem, National Grid and all other participants for
producing a thorough, public technical report in just about one month.

~~~
ID1452319
Was the fact that the Hornsea site is a wind farm a contributing factor or is
that merely a coincidence?

~~~
pjc50
Sort of: what matters is that it was _not_ an AC synchronous generator, like
traditional thermal power plants. Wind and solar systems these days have fully
digital AC-AC conversion systems which take the variable frequency multi phase
output of the wind turbine(s) and turn it into standard three phase.

Turbine-generator systems have nice, simple behaviour in reponse to frequency
drop: they act to maintain the frequency by transferring more energy from the
shaft rotation to the generator. In the long run this slows the turbine down
or triggers a throttle response, but over the few second period we're talking
about the shaft speed is basically constant due to its own inertia.

The wind farm "saw" the rapid fluctuations in connection voltage, tried to
compensate, and instead went into oscillation. This appears to have been a
software bug:

> "During the incident, the turbine controllers reacted incorrectly due to an
> insufficiently damped electrical resonance in the subsynchronous frequency
> range, so that the local Hornsea voltage dropped and the turbines shut
> themselves down. Orsted have since updated the control system software for
> the wind turbines and have observed that the behaviour of the turbines now
> demonstrates a stable control system that will withstand any future events
> in line with Grid Code and CUSC requirements"

(Oscillation damping is "control theory 101", but in a complex system like
this it's not so easy!)

Good news is that, while more renewables do potentially have this kind of
vulnerability, battery systems are the perfect counter. Some are already being
deployed for "fast frequency response". Being a DC-AC system, they can deploy
power with any frequency and phase angle required to compensate for problems.

~~~
wglb
> Oscillation damping is "control theory 101", but in a complex system like
> this it's not so easy

This is a nice understatement.

My first gig (summer after freshman year) I worked with D. Van Ness, who had
an inquiry from he Bonneville Power Administration to determine why their
frequency was oscillating (yes, the frequency). This oscillation would rapidly
get worse until something tripped and the whole network in the Northwest would
go down.

He modeled the system with a state vector and interconnect matrix. The matrix
was 500x500 and the path to understanding it was to find Eigenvectors and
Eigenvalues of this system. If there any poles to the right of the y-axis, you
have an oscillator. Over time, they changed enough to get it stable.

And you make some good points about the synchronization available if
everything is a classic generator, and these other power sources are not.

And this was many years ago, so the power systems of today are likely much
harder to model.

You put this nicely:

> They act to maintain the frequency by transferring more energy from the
> shaft rotation to the generato

Another way to think of this is that in a system with more than one generator,
a phase difference anywhere in the hookup causes power to flow in direct
proportion to the difference in phase angle. In other words, the slow
generator becomes a motor.

~~~
eigenvector
Yes, the SSO issue in the northwest US power system is a textbook case of
this, although in that case IIRC the main issue was control interactions
between the power system stabilizers* oscillating together and actually
exchanging quite a bit of energy over long distances at a low frequency
because those very low frequencies were not effectively damped (or in some
cases, negatively damped). At the time, the tools available for large scale
power system modelling were very rudimentary compared to what we have today.

In general I don't see it as a renewable vs. conventional issue. SSO/SSR/SSCI
have been around since the 1960s when PSS started to be deployed in
synchronous generator excitation control systems. Rather it reflects the
greater complexity in modelling involved high speed digital controls vs.
physical, inertial responses that are expressed very effectively by well-known
equations. As we layer on more and more controls, we don't only have to model
what is going on at power frequency (50 or 60 Hz) but also at harmonic
frequencies and sub-synchronous frequencies. Renewable generators just happen
to depend much more heavily on complex control systems for power conversion,
mimicking synchronous generator response characteristics and to marry all the
components of a large renewable plant together.

At the same time, we have far more powerful tools for power system simulation
today that can effectively mitigate this risk, as long as engineers realize
the risk is there.

A good reference explaining SSO as it applies to conventional generators can
be found here:
[http://www.cigre.org.br/archives/pptcigre/07_subsynchronous_...](http://www.cigre.org.br/archives/pptcigre/07_subsynchronous_oscillations.ppt)

* Power system stabilizers (PSS) are a part of synchronous generator excitation control that improves dynamic stability by damping generator oscillations against the grid. However, PSS systems can actually cause additional, long-distance oscillations _with other PSS systems_ in the frequency range of 0.1 to 1 Hz. See: [https://www.wecc.org/Reliability/Power%20System%20Stabilizer...](https://www.wecc.org/Reliability/Power%20System%20Stabilizer%20Tuning%20Guidelines.pdf) and [http://www.meppi.com/Products/GeneratorExcitationProducts/St...](http://www.meppi.com/Products/GeneratorExcitationProducts/Static%20Excitation%20System/Power%20System%20Stabilizer.pdf)

~~~
wglb
That is the thing about such a highly dynamic system. "Let's just add this
dampening right here." Then, we have a system that is very slow to recover.

This is all made more complex by the fact that many of the components of these
systems have really non-linear behavior. Like a dam spill that hits a hard
boundary.

------
bmsleight_
Very interesting from engineering view and the impact to rail.

Rail Headline:-

    
    
        Page 27. "The effects were exacerbated as the fleet was undergoing a software change which meant the train drivers could not recover trains which were operating on the new software." 
    

My view: Twenty-eight more units would have required a technician to visit if
the software roll-out had been completed. Potentially exponentially increasing
the disruption to public.

Appendix F – Govia Thameslink Railway (GTR) technical report, Page 47-50.
[http://www.ofgem.gov.uk/system/files/docs/2019/09/eso_techni...](http://www.ofgem.gov.uk/system/files/docs/2019/09/eso_technical_report_-
_appendices_-_final.pdf)

    
    
        Appendix F, page 49: "Therefore, the affected Class 700 and 717 sets did not react according to their design intent in these circumstances."
    

The ability for the driver to recover was removed as part of the software
update. (See, Appendix F, Cause point 8). Operators are a key stakeholder.

Great technical report, would have loved to have had more information on
Victoria line. Lessons can be learned from this report.

~~~
trebligdivad
Yeh, I'm guessing they were having problems with drivers doing a reset for any
random thing they couldn't figure out. But shit happens, you really need a way
to get out of trouble.

------
generatorguy
Oof. The report from the steam turbine operator is dicey. Their time stamps
are relative so they don’t have GPS time synced but they are ms precision so
must have a decent sequence of events record. But they don’t know why the
200MW steam turbine unit tripped, which causes a knock on trip of 400 MW of
gas turbines. Report says steam turbine tripped due to a discrepancy in the
speed signals. Will be interesting to see what that is! The turbine speed will
be measured either by number or teeth passing by a magnetic pickup or
proximity sensor per unit of time but unit of time varies so it is always a
whole number of teeth, or measuring the period of the generator voltage
waveform is also a nice speed signal, but maybe not if you are trying to ride
through a fault in which the voltage has collapsed or is subject up to bad
harmonics. Maybe harmonics caused an instantaneous over speed trip in the
turbine governor? Protection relays wouldn’t be involved in measuring turbine
speee and should be smart enough not to trip on overfrequency in harmonics
during a fault.

~~~
repolfx
Yeah I found that part the most concerning. Basically RWE don't have any clue
why they tripped even more than a month on from the event, and apparently have
to wait for their next scheduled outage to make progress? That's not good.

There also seem to be multiple faults at once that they don't know anything
about. Turbine trip? No clue, could be bad sensors, could have been some
actual physical problem in the turbine. Overpressure in the condenser? No
clue, could be anything. Second generator tripped? Dunno boss.

Implies that don't have enough sensor coverage and are hoping to literally
eyeball something wrong when they next open it up. Also implies they can't
shut down their own plant to diagnose apparent faults in anything like a
reasonable timeframe? Not good.

Also worth noting - Newcastle Airport were totally fine on their UPS but
demanded to be considered a priority customer anyway, and that request was
granted? Why? They clearly don't need to be, they have a working UPS!

Honestly I'd be sending this report back for more work if I were the boss guy
receiving it. It's not good. Filled with repetition, bad grammar, missing
information (why did London Underground shut down, what was this 'internal
traction issue') and most seriously it leaves a gaping hole around Little
Barford.

~~~
michaelt
_They clearly don 't need to be, they have a working UPS!_

The impression I get from the data centre outages reported here on HN is that
backup generators are about the least reliable thing in IT.

~~~
repolfx
Yeah, but most places don't have UPS at all. If Newcastle Airport can survive
a power outage but other places cannot, why should it be prioritised?

~~~
michaelt
Because the airport's UPS+Generator is only, say, 99% effective.

And a 1% chance of losing power to air traffic control and runway lights is
worse than a 100% chance of 1000 homes having their dinner spoiled by the
cooker turning off.

------
alexchamberlain
I’m really quite impressed with the ESO, who is getting all the flack. They’d
planned for a loss of 1GW; lost 1.9GW and had protected 95% of the network
within 5 minutes. I hope everyone else realises that these networks aren’t
infallible and they need to plan for very occasional outages.

------
generatorguy
The smoking gun is the last plot in appendix D. A 2% step change in voltage
results in VAR oscillation Of initially +/-100MVAR that takes 2 seconds to die
out, with 13 peaks over that two seconds. This was behavior before the outage.
Probably it has been like that since it was connected but nobody bothered to
look. This is a problem with the smart grid, so much data is generated but
nobody has time to look at any of it. Probably a good application for some
kind of ml or ai.

They say it required a software update to fix it which was applied the next
day - probably this was just a change in the gains in the voltage controller
rather than an update to the actual program or firmware.

Somebody did a bad job of commissioning the voltage regulators on the wind
turbines, and that is what caused a normal transmission line reclose to
escalate in to such a large loss of generation.

I am still curious as to why the DAR delayed action reclose time on the
transmission line is 20s. I would have thought it would be more like 1-2s
tops.

~~~
eigenvector
One of the issues in commissioning facilities as large as Hornsea is that if
you want to do field validation of things like dynamic voltage regulator
response (instead of just steady state performance), actually creating a
sufficiently large disturbance to test it may not realistically be possible
without impacting grid reliability. Hence the reliance on modelling. The
presence of an underdamped SSO probably could have been identified in
modelling, but there's no indication that National Grid requires SSO studies
in design phase.

Also I'm not sure I agree that the SSO was there all along. The system
configuration appears to be several STATCOMs at the HV interconnection
substation plus the VAR capabilities of the individual wind turbines. There
may be an interaction between these control systems that leads to SSO under
certain conditions only while being effectively damped at other times.

As for the reclose time, it may have to do with circuit breaker duty cycles.
We don't know what equipment they're using but if it's dated stuff, it's
conceivable that it requires that level of delay before it's rated for another
interrupting operation.

~~~
generatorguy
So a 2% step change in voltage regulator setpoint for any individual WTG would
be non oscillatory but in aggregate, or in interaction with the statcom it
oscillates at 6.5 Hz?

When I’m setting the gains for any control loop I tend to prefer choosing
lower gains that still meet the performance requirements rather than having
high gains closer to the edge of stability. I would not leave a system behind
that had 13 oscillations after a step.

~~~
eigenvector
All I'm saying is that the frequency-impedance characteristic of the local
system changes depending on active power loading, other nearby generators in
or out service, nearby reactors or capacitors in service, etc. So it's
conceivably possible that the oscillatory behaviour was more effectively
damped both in the modelling cases and in whatever field testing they did.
Though if you read the grid code compliance testing report for Hornsea, it's
not evident they actually did voltage step change field validation.

If this kind of oscillation was present under all conditions, it would be an
oversight to not have caught it in the modelling stage. The level of modelling
we have to do in some North American regions for a _50 MW_ wind farm would
catch that kind of behaviour, let alone an 800 MW unit.

~~~
generatorguy
Indeed when there are large outages is when utilities often find out that the
models Don’t quite match reality. One example would be when 1200 MW of
generation tripped in the WECC region and they found the effective droop of
the system was about 8% instead of 5, and have since mandated that every
generator over 10 Mva is tested every 5 years to ensure the equipment still
matches the model.

------
trebligdivad
Fun bits: * Software upgrade your wind farm to remove bad
responses/oscillations * Software upgrade on trains removed ability for
drivers to reboot them (& they intend to keep it that way?!) so hand to send
techs with laptops * Upgrades needed to improve loss-of-mains detection in
small generators.

~~~
mrob
>Software upgrade on trains removed ability for drivers to reboot them (& they
intend to keep it that way?!)

From section 5.2.1: "The train manufacturer, Siemens, are developing a patch
which will allow the drivers to recover the trains themselves without the need
for a reboot or technician to attend site."

~~~
goatinaboat
Finally an explanation of what a train driver actually does - they are there
to reboot the automation if necessary. But what have they been doing up until
now? Train driving is a very well paid job, over twice as much as a bus
driver, for a fraction of the work... they don’t even steer it!

~~~
tialaramex
Lots of responsibility in driving a train. That's usually going to be dozens,
and for some trains regularly hundreds of passengers, or in freight hundreds
or thousands of tonnes of freight.

It's psych-profiled, companies which employ drivers will be looking for
"compliance" (a psychological tendency to obey rules even if you don't
understand why) so that the driver obeys all the safety rules.

It's also a fairly complicated machine, not as complicated as a jet liner but
far more complicated to operate than a bus, so that reduces your pool of
candidates further, in most cases they'll be looking for someone with some
mechanical aptitude to understand how it works.

They need communication skills, the driver needs to work with their
signallers, and potentially also company dispatch, and on trains without
separate customer service personnel they need to talk directly to passengers.

For example yesterday I was on a train which was delayed by trespassers. The
driver will have needed to use "proceed with caution" rules, where they drive
the train slowly enough that they can always stop it within the distance they
can clearly see, obeying any signals, and then call their signaller back each
time a signal cancels that authority, to get a new authority overriding each
signal. Then, clear of the problem but much delayed, they needed to handle the
fact that their dispatch turned their train into an Express to get it back
where it should be, so they need to make announcements to passengers about
where passengers should disembark to get a different train that's still going
to their destination.

Mainline train drivers make similar money to me (or at least similar to what I
made five years ago) but I can't say I feel like they don't earn it. Like me
their job is pretty easy when things go right, but not so much when things go
wrong. Lots of people couldn't do it, and more wouldn't.

------
londons_explore
What is the purpose of these embedded generation protections:

Vector Shift Protection (triggered by lightning, led to loss of 150 MW):

As far as I can see, this protection shuts down generation when part of the
grid might be disconnected from the rest. Shutting down when islanding hasn't
occurred is wrong, and destabilises the grid. Perhaps we should be measuring
islanding another way? What about applying gold coded frequency modulation to
the actual system frequency? A gold code of length 1 million could be injected
on just a few points on the national grid, at a power of just a few kilowatts,
and be measurable from anywhere. When islanding occurred, the signal
disappears, and embedded generation can switch off?

Rate of change of frequency protection (led to loss of 350MW). What's the
purpose of this protection at all? If frequency is changing in a downward
direction, the faster it's falling, the more important it is _not_ to
disconnect supply.

High positive rate of change of frequency might be a reason to disconnect
generation to prevent oscillation (effectively acting as the "D" term in a pid
loop), but did this occur?

~~~
generatorguy
Signals on power lines get filtered out by transformers, line reactors,
capacitors, etc. they are also expensive to inject and extract since it
requires high voltage connections which have to be of the same quality as the
rest of transmission system components. I’ve worked both with power line
carrier for transfer trip schemes and ripple plants for controlling hot water
loads. More likely your ripple plant is going to have some failure resulting
in tripping all your generation when it didn’t need to.

Many Important plants and transmission lines are connected by a utilities own
fiber which can be used to transmit the actual state of the system instead of
trying to infer it from the power waveforms. This is the best solution but
obviously expensive.

Uncontrolled decentralized embedded generation is not meant to ever energize a
dead line. If it has a solid state power electronics interface to inject power
(a fancy inverter) it is probably operating in a mode where it follows the
waveform on the grid to make sure it stays in phase. If the grid waveform is
poor quality, full of harmonics due to faults, or phase shifts due to major
loads or generation disappearing resulting in instantly changing power flows,
the inverter can’t stay in phase. It is probably a delicate balancing act to
be able to follow the grid frequency but also affect it by injecting active
and reactive power. I’m not an inverter guy so this is mostly speculation from
reading data sheets and manuals for grid tie small battery systems.

Not sure about rocof tripping generation on falling frequency. There are
situations in which injecting power in to an island can result in overvoltages
that would damage all of the equipment on the island, so it is better to avoid
it if it looks like an island might be forming. Just a guess based on
experience with an embedded steam turbine.

There could have been a very short increase in frequency for the 80ms or 4-5
cycles when the single phase to ground fault occurred as faults cause machines
to accelerate since it is the same as removing the load and replacing with a
short circuit. Otherwise the only increase in frequency was when it started to
recover by the system operator calling for more generation.

------
shmageggy
They characterize the Hornsea and Little Barford trips as to

 _" not be expected to trip off or de-load in response to a lightning strike.
This therefore appears to represent an extremely rare and unexpected event."_

Looking at the timeline, both of those events are logged within 1 second of
the strike. To me, with even a little bit of experience with systems having
complex interacting components, it seems _vastly_ more likely that there is
some unknown interaction rather than pure chance. I would imagine the prior
probability of either of those two going offline is very low, so the
probability of both independently going offline within one second of a
potential causal event seems vanishingly small.

~~~
generatorguy
I read it as they tripped because of the lightning strike and transmission
line trip, but they shouldn’t have.

------
Zenst
Very detailed info in the Appendix pdf, and wonder if some of that should of
been made public - when I see grid layouts like that.

~~~
eigenvector
Anyone with the level of engineering knowledge to do something nefarious with
that information is quite capable of doing something nefarious without it.

On the other hand, by releasing publicly the internal findings of the involved
power companies can be scrutinized by academics, independent engineers and
members of the public.

~~~
Zenst
Equally, might be some nice deliberate errors. Sort of like a trap street upon
a map
[https://en.wikipedia.org/wiki/Trap_street](https://en.wikipedia.org/wiki/Trap_street)
though in this case, not for copyright protection.

------
airnomad
How would you compare grid reliability to reliability of huge networks such as
Google's or Facebook's? Scale might be comparable.

