German nuclear plant infected with computer viruses, operator says (2016)

Nokinside · on June 24, 2018

If a virus can cause accident that releases radiation, it's not a computer safety problem, it's a failure in nuclear plant automation design.

Modern nuclear plant's usually have analog monitoring and safety system's parallel to digitized control in critical parts like neutron flux measurement and calibration, fuel loading monitoring, control rod position indicator, control rod actuators, .... Any error caused by virus attempting sabotage the plant should result shutdown.

You can divide the system into 1) process automation 2) limiting systems and 3) Protection systems.

Process automation can be fully digitized and computerized. If it fails or computer virus attempts to sabotage the system, it should not cause critical conditions because limiting systems have diversified digital and analog control systems that limit given parameters. If that fails independent protections systems trigger shutdown if those same parameters are exceeded.

Another way to divide the safety is four principles:

1) parallel principle: parallel subsystems

2) separation principle: parallel subsystems are placed so that simultaneous damage to all of them is very unlikely.

3) diversity principle: same function is implemented with different operating principles. The same valve can be operated using electric motor, compressed air, or manually.

4) safe-state principle: system failing falls back to the safe state

roywiggins · on June 24, 2018

A shutdown is still a denial of service attack, which might be bad enough on its own. How well can the grid compensate for one or several plants suddenly dropping offline? We know cascading grid failures can happen.

But, I guess the risks there are about the same for a nuclear plant as they would be for a computerized non-nuclear plant.

topspin · on June 24, 2018

> How well can the grid compensate for one or several plants suddenly dropping offline?

One? Pretty well usually. Reactor trips are common enough that the grid is designed to handle it. If multiple reactors shut down simultaneously the loss of supply could exceed available margins and disrupt service.

Here is the most recent (June 6th, BRAIDWOOD) full power trip reported in the US: https://www.nrc.gov/reading-rm/doc-collections/event-status/...

There was another last week but it was only from 2% power at the time. We see a couple every month and few ever notice.

Nokinside · on June 24, 2018

This is a unique new category for networked digital automation that is absent from other systems. I don't think computer security people take it seriously enough.

Say, you sabotage "dumb" automation system to break and turn the car suddenly right. When the failures starts to happen, manufacturer eventually notices it and recalls vehicles. Maybe tens or hundreds of people will be injured.

But what if the system has access to date and time trough CAN bus? If design a system that steers all affected cars suddenly right at the same time the cascade can kill and injure thousands and cause huge panic. People are afraid to drive until the problem is fixed.

nickpsecurity · on June 24, 2018

High-assurance security has been pushing separation kernels to solve that kind of problem for a long time. Vendors even list automotive in their Solutions menus. Two got evaluated by NSA for security. I think NSA cancelled that protection profile because the hardware and firmware were having too many leaks to find a software solution believable. The boards aimed at that market are also increasingly complex. However, the separation kernels and virtualization platforms were the strongest thing produced for this goal with some still available.

Here's a good explanation of default, architectural style used in high-assurance security from a team that FOSS's a lot of their prototypes:

https://os.inf.tu-dresden.de/papers_ps/nizza.pdf

Here's a commercial one used in an automotive platform that got certified to high assurance at EAL6+:

https://www.ghs.com/products/safety_critical/integrity-do-17...

I'm not endorsing the [closed-source] product so much as the methods it claimed to use to get the job done. Look at the architecture, minimalist runtimes (esp for Ada), determinism, partitioning, and certification data on bottom-right. That last part hints at the kinds of things they have to do and get reviewed for the certification. If a developer claims to do security, ask to see the Covert, Channel Analysis of their product. That certification requirement/activity is how cache-based, timing channels were discovered in 1992 by VAX Security Kernel team among other leaks. Covert and side channels are getting popular again but most still don't apply storage/timing analysis.

You'll find that most developers and "security professionals" don't build secure stuff using methods proven to work in the past. Companies building tech like this sell it for high price since there's so few buyers. The networking-oriented ones started at $50,000 when I priced them. You can bet the big companies whose products are getting hit could've afforded them, though. Especially the royalty-free options.

Nokinside · on June 24, 2018

Partitioning kernels have been used decades in aviation and other safety critical systems. I have developed software for integrity for example. It's commercial software and not proper open source, but the code is available for their their customers.

These system address the issue of separating different functions as well as possible. What they don't address the problems that widely used homogeneous systems face when a way to compromise the system is found.

nickpsecurity · on June 24, 2018

I first encountered them made for aerospace for DO-178B. What's the oldest one you know like a separation kernel whose design details are publicly described? I like giving proper credit and knowing the history.

For me, we went from security kernels to Rushby's concept to separation kernels. Then, they did aerospace versioms first since safety requirements were already a subset of security requirements. Plus, aerospace companies would actually buy it. Hard to sell that stuff in non-regulated markets.

Nokinside · on June 25, 2018

I'm not good at history of operating systems.

Separation kernel is efficiency thing. The oldest and most secure way to solve the same problem is separate computers and put data diodes between them if necessary. Need for integrated modular avionics was the driving force for the development. Aircraft have space and weight limits.

irundebian · on June 26, 2018

> If a developer claims to do security, ask to see the Covert, Channel Analysis of their product. > You'll find that most developers and "security professionals" don't build secure stuff using methods proven to work in the past.

Could you tell which security professionals actually offering such in depth analysis? I have the feeling that most penetration tests or security assessments offered aren't that deep, rather mostly of low-quality. Could you also tell how the NSA is doing their security evaluations exactly? I know there is Common Criteria, but these documents are so high-level that you can't derive the techniques on how to conduct thorough evaluations.

nickpsecurity · on June 29, 2018

Almost none. Most people in the security industry don't use techniques like that which proved out in older evaluations. My own framework for looking at things was heavily based on those methods. Common Criteria got too political where people might try to dodge requirements with hand-waiving. So, I'll give you that plus the VAX Retrospective since the design, layering, and assurance sections show you what methods were in use in 1990's high-assurance security. Also, the covert channel analysis was how they found cache-based, timing channels in it in 1992. Mainstream security ignored that for a long time, too. Most still don't give them credit. It's a social or clique thing that I can tell.

http://lukemuehlhauser.com/wp-content/uploads/Karger-et-al-A...

https://pastebin.com/y3PufJ0V

For covert channels, you want to look up Kemmerer's Shared Resource Matrix for storage channels and Wray's method for timing channels. Those got early results. There's a lot of newer ones under banner of "information flow security." Google variations of those terms with "PDF" since they're PDF's to find lots of answers. Throw the word "survey" in with information flow since I think there was a survey paper.

Finally, I'll throw in below my summary of assurance techniques I found over time:

https://pastebin.com/uyNfvqcp

irundebian · on July 11, 2018

Thank you very much! You're pastebin were really interesting to read.

fulafel · on June 24, 2018

A plant dropping offline is still a relatively friendly and controlled event. In a malicious case you could imagine a plant acting in a maximally adverserial way to the grid's direction.

erentz · on June 24, 2018

Back during SQL Slammer (IIRC) I was working in security and subsequently SCADA security for critical infrastructure. There was a well known story about a nuclear plant in the US that had its network taken down by the worm. The problem was they had become so reliant on automation that only had two operators on duty, not nearly enough.

The lesson too often ignored I quickly found all over in the critical infrastructure space was, yes, the software is terribly insecure, but more critically when all is well organizations develop an overreliance on automation always going well meaning they’re under equipped for when it stops going well.

archgoon · on June 24, 2018

Is this the incident?

https://www.securityfocus.com/news/6767

'The Slammer worm penetrated a private computer network at Ohio's Davis-Besse nuclear power plant in January and disabled a safety monitoring system for nearly five hours, despite a belief by plant personnel that the network was protected by a firewall, SecurityFocus has learned.

The breach did not pose a safety hazard. The troubled plant had been offline since February, 2002, when workers discovered a 6-by-5-inch hole in the plant's reactor head. Moreover, the monitoring system, called a Safety Parameter Display System, had a redundant analog backup that was unaffected by the worm. But at least one expert says the case illustrates a growing cybersecurity problem in the nuclear power industry, where interconnection between plant and corporate networks is becoming more common, and is permitted by federal safety regulations.'

Sounds like the plant was being operated by a skeleton crew since it was mostly disabled.

erentz · on June 24, 2018

That could be it but that sounds much less concerning than how the guy at INL who was telling me about it described it. He may have enhanced the story. :-)

fulafel · on June 24, 2018

Do you have a reference to sources or requirements documents saying that the plants must tolerate a malicious actor in control of the computerized parts of a plant, control rooms and supporting infrastructue? This sounds like quite a hard problem to solve.

Safety, meaning defense against accidental events, is in some respects much easier than defense against malicious events. Diversity and redundancy for example are much less effective: failure comes, not from a coincidence of two very unlikely failures, but from the adversary just successfully attacking an additional subsystem in a workflow of several attacks.

Nokinside · on June 24, 2018

Diversity and parallel principles are more general than cybersecurity. If these the design is properly implemented, complete failure in one subsystem (even intentional) is not going to cause a catastrophe. All the attacker can do is to trigger controlled shutdown from analog subsystems.

Different countries have different safety requirements for nuclear power plants, all plants are not necessarily equally secure. Safety is expensive to implement and manufacturers provide different choices.

fulafel · on June 24, 2018

I'm sure we agree but a caveat like "if properly implemented" is quite a big one to make in this business - bad stuff happens largely because of implementation bugs. And we don't know how to eliminate implementation bugs. And this is one reason why we see successful attacks on systems with multiple "sound if properly implemented" levels of security.

moring · on June 24, 2018

I remember reading elsewhere that bad stuff happens largely because of incomplete or conflicting requirements, and that implementation bugs are secondary. My own experience confirms this, even though I usually deal with systems where a problem at worst results in lost sales / orders and thus sloppy coding and implementation bugs are much more common.

Makes me wonder how well-polished the requirements analysis for (nuclear) power plant software is...

Nokinside · on June 24, 2018

There is no perfect security. What I mean is that if you design a plant in a way that it's not relying on cybersecurity measures or digital automation for it's nuclear safety and security, it's secure from cyberattacks.

adrianN · on June 24, 2018

We know how to eliminate almost all implementation bugs. I hope nuclear power plants are one of the areas where people are willing to pay for that level of correctness.

blablabla123 · on June 24, 2018

It's nice that they have all these principles. And they are not only there for digital systems but literally anything created by engineers in the Nuclear Power Plants. Still there have been accidents. Paradoxically Chernobyl was caused by human fault by the Operators. Possibly it might not have happened if there had been digital systems in place for direct control of the nuclear processes.

IMHO Nuclear technology is inherently insecure and every attempt to make it "secure" is just a hack. I mean if you just compare what best practices there are in security, like openness especially when it comes to incidents. So many incidents came out through detours and sometimes only after years.

cjslep · on June 24, 2018

> I mean if you just compare what best practices there are in security, like openness especially when it comes to incidents. So many incidents came out through detours and sometimes only after years.

You're going to have to be specific about claims like this, because this comes across as FUD in the US given the stringent transparency requirements the NRC holds licensees accountable.

To see current events: https://www.nrc.gov/reading-rm/doc-collections/event-status/...

Edit: Also your argument about the failure of Chernobyl is not compelling as you provide no evidence of how the existing analog systems failed in a way that digital systems would not leading to an an aversion of the accident. It is generally accepted that it was a combination of [lack of] system redundancy (analog or digital does not matter), inherent risk with the beta reactor design coefficient, and poor operator understanding of the neutron poison buildup from the low power excursion.

blablabla123 · on June 25, 2018

Chernobyl was discovered through measurements by western states, not by disclosure. In fact worse happened for accidents that happened in the west. Even in the recent years, just read about them, no need to reference.

It's not "my argument" but you can read it for instance on Wikipedia.

I'm aware that the official requirements are stringent but that means not much I'm afraid.

cjslep · on June 25, 2018

> Even in the recent years, just read about them, no need to reference.

I have studied lots of nuclear accidents because I studied Nuclear Engineering. While Fukushima was happening, for that matter. Telling me to "read up" and "no need to reference" comes off as extremely intellectually lazy at best and condescending at worst.

> It's not "my argument" but you can read it for instance on Wikipedia.

If it isn't your argument then why bother even bringing it up, except to spread FUD? I'm not going to seek out something you don't even care to advocate for. I will stick to my primary sources within the industry.

> I'm aware that the official requirements are stringent but that means not much I'm afraid.

I am glad you are self-describedly aware. But your (lack of) argument and vague handwaving generalization is not compelling me to reach the same conclusion. I've met a lot of anti-nuclear proponents who have made much more informed arguments, and I respect them, but I don't have much patience for vague "read Wikipedia" "arguments". Either stand for something, or please stop spreading FUD.

blablabla123 · on June 26, 2018

I have studied Physics and been member of ecological groups. So let's say I have a viable interest.

So you need references, here you go. Obviously this is going to take some minutes to google:

- Chernobyl: https://en.wikipedia.org/wiki/Chernobyl_disaster#Delayed_ann...

- 3 Mile Island: "Twenty-eight hours after the accident began, William Scranton III, the lieutenant governor, appeared at a news briefing to say that Metropolitan Edison, the plant's owner, had assured the state that "everything is under control".[58] Later that day, Scranton changed his statement, saying that the situation was "more complex than the company first led us to believe."[58] There were conflicting statements about radioactivity releases.[" https://en.wikipedia.org/wiki/Three_Mile_Island_accident#Vol...

Fukushima might have had a more direct communication to the media. But it puzzles me how long it took to accept help from foreign countries... did they actually ever?

I think the core problem about this discussion is that you have on the one side really left ecological group and on the other Nuclear industry, engineers. Both groups are perceived as very biased by each other, sometimes with reason, sometimes irrational. In any case there aren't so many topics there the depth of argumentation sometimes goes into ridiculous depths. Sometimes arguments themselves sound very ridiculous because it seems to counter-intuitive. It's already 10 years ago that I betted with someone from a neoliberal party a box of beer that there were Nuclear transports you could throw an egg on and it got baked. It took me hours to find that stupid article - from a reputable German news magazine. Of course he didn't admit that I won, he just stopped answering my mails.

Honestly, this is making no sense.

tjoff · on June 24, 2018

This quote sounds strange to me:

"As an example, Hypponen said he had recently spoken to a European aircraft maker that said it cleans the cockpits of its planes every week of malware designed for Android phones. The malware spread to the planes only because factory employees were charging their phones with the USB port in the cockpit.

Because the plane runs a different operating system, nothing would befall it. But it would pass the virus on to other devices that plugged into the charger."

What kind of malware would that be? Also obviously the OS is at risk since it also must have been infected to be able to pass the malware along (I'm guessing this might be the entertainment system or something and nothing critical).

ztjio · on June 24, 2018

You are thinking too conventionally. There are numerous ways to attack USB devices that don’t involve the OS. https://www.bleepingcomputer.com/news/security/heres-a-list-...

tjoff · on June 24, 2018

I guess that a regular app would not have the necessary low level access (that I assume is required) to do anything like that.

So that (another guess) means that the android phone needs to be under complete control of the malware.

Is that common or are my guesses/assumptions wrong?

robertAngst · on June 24, 2018

Hmm guess thats the disadvantage of allowing USB to do anything.

Cant limit even what looks like a mouse to be a mouse only..

astonex · on June 24, 2018

Has anyone told the employees that their phones have malware? Seems strange to let it be a weekly occurrence.

Nokinside · on June 24, 2018

USB connections in the cockpit are used for charging devices in electronic flight bag (tablets etc). I'm not sure if the USB controller is connected to anything else than power supply, but it might still be programmable allowing it to distribute malware.

DanBC · on June 24, 2018

Why do they have data lines if they're only providing power?

4ad · on June 24, 2018

Because high-speed charging requires negotiating a protocol between the two devices.

Nokinside · on June 24, 2018

That's a good question. I suspect that it might be cheaper to use the same USB components they use in passenger systems.

Note: there are other USB or RJ-45 ports, or SD cards that are used to update avionics software or aeronautical databases. These are usually available only to maintenance and use digital signatures to verify the data.

walshemj · on June 24, 2018

Then why do they not have a strict "no personal devices inside the wire" and the same for air crew all phones must be locked in a secure box.

Locking down the USB ports to authorised devices would also make sense.

djsumdog · on June 24, 2018

Probably some type of app that automatically copies a payload to any plugged in USB device? Makes me think of all those .eml files you'd find on network shares in the early 00s.

cxcorp · on June 24, 2018

Probably referring to the likes of BadUSB

rurban · on June 24, 2018

That's java malware. They are compiling everything down to Java bytecode, even perl. The OS itself is not Android so they are not affected.

4ad · on June 24, 2018

I know this is not going to be a popular opinion, but using Windows for critical systems, even air-gapped and everything is simply irresponsible. Windows seems to be too hard to secure, and these problem keep happening again and again.

It's not that Linux or some RTOS would necessarily be immune to a targeted attack like Stuxnet, but they would be immune from the random crap malware that exists in the Windows world.

tetha · on June 24, 2018

I've been thinking about that on and off for quite some time. These kind of embedded systems are actually a really hard problem, because updates are hard up to impossible. So eventually there will be 10 years of known bugs in that system.

Digesting from a lot of defcon / blackhat talks, it seems that the best way to minimize this amount of bugs is to deploy less code. And I think that's where a BSD or a linux has a strong option to be more secure: you could strip down the kernel and the system as much as possible. Don't deploy any USB related code, potentially remove the entire network stack if you can.

That's the theory anyway. The IoS displays how badly managed linux can be equally open.

laythea · on June 24, 2018

My experience of redundant embedded controller systems is that whilst they may not be WindowsTM software, they are usually based on an RTOS such as vxWorks, QNX etc.

I don't suppose they are any harder to "hack" than Windows. In fact, it is possible that they may be easier depending on the security mentality of the manufacturer, and given the number of eyeballs on WindowsTM.

I can only imagine this will get worse as these controllers become more and more sophisticated.

The only way, in my opinion, to safeguard against this, is have a fully disconnected system, with no USB access. However USB is easy.

tetha · on June 24, 2018

> I can only imagine this will get worse as these controllers become more and more sophisticated.

That is exactly the thing though.

If you could reduce a digital control unit to a small program and run that on a write-once memory in a dead simple controller, you'd avoid spectre, meltdown, you'd avoid the vulnerability of overwriting the program post-deployment and you'd simplify ensuring correctness of the control program. Verification might actually be possible there.

laythea · on June 24, 2018

Hi, yes but they wont do that. Normally, there is no write once memory. Most eg. oil rig control systems are not hammered down to that level. Heck - most aviation flight control systems are not like that. In fact, I was, in my early career responsible for writing a windows C# communications driver for the entire oil rig. Its still working :)

AngryData · on June 25, 2018

It seems incredibly stupid and irresponsible to use any 'modern' hardware systems for fairly simple but critical functions. Like voting machines, you need a chip less powerful than a damn 30 year old scientific calculator to do something as simple as count votes, everything more is just layers of vulnerability. And I would say the same for any critical infrastructure, but money and profit is apparently more important than safety or security.

GuB-42 · on June 24, 2018

It doesn't seem to affect the nuclear operation part in any way.

Nuclear power plants also have windows PCs running MS Office. There are people there who plan budgets, write reports, send e-mails, etc... The boring stuff found in every company.

Nuclear plant operation, at least for the critical part is still surprisingly low tech. There are panel full of gauges, lights, switches and levers. And there are people with paper sheets following procedures like "if pressure X exceed Y, push button Z". I don't expect a virus to have much influence on that part. It is bad for the economic operation, but not really a safety issue.

I don't know enough to know if a well targeted attack, stuxnet style can be a safety issue. I guess not, but that could probably cause a shutdown.

Theodores · on June 24, 2018

In the UK they did not update the control rooms for the nuclear plants so, last time I checked (which was about a decade ago), they still looked like 1960's vintage hardware. So I was sceptical about how a computer virus could possibly affect things.

Looking for a picture of a 'modern' control room from a British nuclear plant (with not so much as a Windows XP machine in sight) I came across this story of how a decommissioned nuclear plant is now a school:

https://www.businesswest.co.uk/blog/site-berkeley-nuclear-po...

Girls and boys can go there to learn about STEM things, including our friend the atom. Clearly that talk about how toxic nuclear power plants were going to be for future generations was idle speculation and fear-mongering.

tonyedgecombe · on June 24, 2018

Clearly that talk about how toxic nuclear power plants were going to be for future generations was idle speculation and fear-mongering.

We are spending a lot of money decommisioning legacy nuclear power stations.

5DFractalTetris · on June 24, 2018

People are quick to forget that the human mind is a firewall in computing environments. A car that can be remotely turned off by satellite signal is not a car with which I would want to share the road. Process automation will probably create the Runaway Train effect in due time.

winrid · on June 24, 2018

A lot of modern cars with OnStar have this.

djsumdog · on June 24, 2018

> infections of critical infrastructure were surprisingly common, but that they were generally not dangerous unless the plant had been targeted specifically.

Stuxnet

ccnafr · on June 24, 2018

It's an article from 2016. What's the point of sharing this?

tosh · on June 24, 2018

doener · on June 24, 2018

This article is from 2016.