
Toshiba and WD NAND Production Hit by Power Outage: 6 Exabytes Lost - deafcalculus
https://www.anandtech.com/show/14596/toshiba-western-digital-nand-production-partially-halted-by-power-outage
======
pbhjpbhj
This doesn't make sense:

>Toshiba Memory and Western Digital on Friday disclosed that an unexpected
power outage in the Yokkaichi province in Japan on June 15 affected the
manufacturing facilities that are jointly operated. //

Surely that's not the reason, it would have to be "and local [backup] power
failed, and the failovers for that failed too"??

Toshiba manufacture generators too, it's not like they'd need to go far to get
backup power designed for them.

There must be more to this? (Which explains why people are assuming it's
suspicious, I guess; and this site is making 35% of global NAND output).

FWIW, I hadn't realised that it takes ~2months to process a wafer in to a
chip.

~~~
wyxuan
Yeah I was surprised. Don't they have a ups for this kind of thing?

~~~
HankB99
Yes. My Google-fu is not up to finding them but I happen to be familiar with
this company's products and here is one. [https://www.energy-
xprt.com/products/purewave-ups-systems-55...](https://www.energy-
xprt.com/products/purewave-ups-systems-559832)

One application of this kind of product is chip fabs because they are so
sensitive to power disruptions.

Whether Toshiba/WD had this type of system and if so, why it didn't prevent
loss of product was not mentioned in the linked article. I have heard that
there is a glut in chips for SSDs so a reason to cut production can't be ruled
out. However it seems like Toshiba/WD would pay the price for this outage
while their competitors would reap the benefits (unless the competition agreed
to somehow share the cost.)

~~~
pault
I believe the memory industry is also known for price fixing, so it's not out
of the question. The oversupply is ridiculous though; the last time I shopped
for an nvme they were $1000 for 500gb and the other day I bought 1TB for $500.

~~~
nixgeek
Amazon was selling 2TB NVMe from Intel for under $200 just last week!

~~~
loeg
Yeah Intel 660p (QLC, but with some SLC cache and a decent controller) is
going for under $100 per TB. In my (limited) experience it makes a solid
non-24/7 workstation drive (e.g., I use it for local object storage for large
builds).

------
maheart
Here's an article from one month ago discussing the over-supply of NAND and
DRAM (and the effect it has on pricing):
[https://www.forbes.com/sites/tomcoughlin/2019/05/25/nand-
dra...](https://www.forbes.com/sites/tomcoughlin/2019/05/25/nand-dram-supply-
and-pricing)

I can't help but feel very skeptical about the timing of this event, given the
history of price-fixing in the industry.

~~~
GordonS
These kind of issues do seem to hit with a suspicious degree of regularity -
it seems every 1-2 years there is a shortage due to some calamity or such...

~~~
wil421
Yea like that time a suspicious typhoon knocked out all the Hard Drive fabs in
Thailand.

~~~
sbr464
I was there at the time, was pretty bad.

~~~
iamnotacrook
"It would be a shame if we build labs in places with unavoidable weather
problems, wouldn't it?" "Yes, it would! I believe it's your turn to tee-off".

------
ksec
Important to quote from the comment section

>Five fabs and an R&D center, outage was after the batteries also ran out.

For perspective, the batteries at GF's leading fab can run the 1/3 of the
systems for only a few minutes. That's the scale we're dealing with.

I think before we do all sort of conspiracy theory, we need to look into
reason for why was there an outage in Yokkaichi.

~~~
jsjohnst
Batteries (and giant multi-to. spinning wheels, which serve the same purpose)
are not a long term power supply. They are intended to only bridge the couple
minutes until generators can come online and provide stable power. So yes,
it’s expected that they drained, the question is why didn’t the generators
come online?

~~~
ksec
That is a good question. but if I had to guess,

Judging from the scale, the "Generator" would have to be a power plant? I.e It
is not feasible to have generators to operate at this Scale?

~~~
londons_explore
Gas turbine power plants aren't _that_ expensive...

Lots of sites use them for backup power _and_ simultaneously use them as
regular power plants selling power back to the grid.

~~~
jsjohnst
I was estimating to the very high side to make a point. One can easily get
10MW in the 500k-1M USD range, but I did see some very elaborate setups
peaking out near $5M so went with that. Heck, I recently saw some on Alibaba
for $100k, but I highly doubt Toshiba/WD would buy from there.

------
taspeotis
I would be most grateful if someone could please explain what sort of tools
are likely to be used here, and why a power loss to those tools would ruin
days/weeks/months worth of output relative to the time they were offline?

~~~
furi
Semiconductor manufacturing involves a lot of precisely controlled processes.
You put the wafers into furnaces and pass a gas over them for X time and Y
flow rate at Z temperature to impregnate the wafer with various chemicals. You
put them in low pressure plasma environments to etch them, again for X time at
Y flow rate. There are half a dozen more of these as well, like applying metal
and implanting ions.

These values are experimentally tightened to get the highest possible accuracy
to the desired effect and improve the number of working chips that leave the
factory. If the power cuts out you don't know what conditions the wafer
experienced while the system was winding down completely uncontrolled and your
processes haven't been designed for the wafer going through the ramp up twice.

The reason why it's lost so much output is because modern semiconductor
processes have hundreds of steps and (I believe) a lead time in the months, so
the amount of material that's in flight at any one instant has to be huge to
get any reasonable throughput.

~~~
agumonkey
So they realized late that an early step of the pipeline was off, rendering
everything that has been through flawed ?

~~~
CydeWeys
No, they experienced a power outage which has ruined every chip that was
currently at any point in the pipeline. The realization would be immediately
(as soon as the power goes out).

~~~
shaklee3
It said that it was 1/2 their production output for one quarter. Is that
really possible for an instantaneous power outage?

~~~
msbarnett
If at that instant they have half there quarter’s product in various stages of
processing (and there are hundreds of such stages, in hundreds of parallel
pipelines) when all of those stages shit the bed, obviously yes.

~~~
shaklee3
That's the point. The only way that's possible is if things sit in a process
that takes many months.

~~~
londons_explore
_or_ that the restart procedure after a power failure is going to take a few
weeks...

I can imagine that if the factory is entirely automated and a full-restart has
never been attempted. Every single machine will probably be in some bad state
with unknown chemicals settled into unknown pipes in the machine, requiring
custom flush processes to be designed, and in some cases machines might have
to be replaced, which in a human-free clean room isn't easy...

------
baybal2
I recall the story of Micron's first Chinese fab: 1 millisecond out phase
brownout and they loose few megabucks instantly, and like that during every
electrical event.

Giant UPSes are not an option in the industry because fabs eat oodles of
electricity, and it is cheaper to loose a megabuck once a year than build a
stabilisation/ups plant

~~~
vpribish
my friend - it's lose, not loose.

------
gruez
So... does that mean they'll be hiking NAND prices, just like with HDD prices
after the Thai floods?

~~~
thesimp
Looking at the numbers it should not move that much. According to this
article,
[https://www.businesswire.com/news/home/20190307005812/en/TRE...](https://www.businesswire.com/news/home/20190307005812/en/TRENDFOCUS-
Combined-SSD-HDD-Storage-Shipped-Jumps), in 2018 912 exabytes of HD and SSD
storage was sold. 800 exabyte for HD and 112 exabyte for SSD. And the SSD
market grew 45% in 2018. If manufacturers project to grow at the same rate
then 2019 SSD shipments will be around 162 exabyte. This puts the 6 exabyte
loss at around 3.5%.

But we all know that markets are driven by emotion: losing 3.5% of your raw
materials in a market that is projected to grow 45% will cause big
fluctuations. But that is just my opinion.

~~~
antpls
I believe you forgot Toshiba, which could be 9 exabytes lost according to the
article for this quarter, so we are talking about 15 exabytes lost.

According to your data, the total quaterly production is 41 exabytes for SSD,
which would mean losing about 37% of the total SSD production this quarter.

That being said, it is the first time I read about the scale of storage
production worldwide. It makes you wonder what does the humanity store in
those hundreds of exabytes _per year_. Probably many duplicated data or unused
bytes.

~~~
simcop2387
These days a large portion of it is just basically logs. Logs of all the
traffic we're generating looking at content on the internet, to be used to try
to target ads. That and videos, youtube itself probably accounts for a
significant amount of that storage use.

~~~
owl57
Are the logs of this scale and less-popular videos usually stored on SSD? I
thought HDDs are still cheaper and RAID gives enough throughput given enough
disks?

~~~
londons_explore
HDD's access time (10milliseconds or more), means a hard disk can't really
serve more than 100 concurrent users, assuming each wants to stream a chunk of
video every second.

That makes it a poor choice for serving anything but the rarest of YouTube
videos.

~~~
owl57
I believe a typical video, stored in all of the Youtube formats, uses on the
order of a megabyte per second. So, a 10TB disk probably holds about 100 days
of video. Seems fine for videos that are watched less than once a day, that's
probably a vast majority of Youtube's storage.

------
agumonkey
First time I have to really think about Exa<unit>.

    
    
         Giga / Tera / Peta / Exa
    

6 Millions Terabytes of solid state memory.. quite a mass.

------
otakucode
I know that NAND involves no exotic raw materials, so does that enable them to
recycle any of the damaged/lost wafers? I don't know very much about the
physical processing/preparation of the raw silicon and such that goes into
making a wafer, could you simply grind up or perhaps chemically dissolve
everything back to base components and re-create a fresh wafer?

~~~
baybal2
At least some scrap is now being bought by solar cell industry, but that
material is forever lost for IC making because it's already contaminated with
dopants and metals

------
icefo
I wonder what failed in their redundant power supply because they surely have
something.

I hope the postmortem will be public !

~~~
igravious
If my reading comprehension has not let me down then a 13 minute power
disruption can cause them to lose 1/2 of their output for a quarter.

Given the massive consequences of quite a short disruption maybe they need to
figure out how to weather disruptions more robustly?

~~~
dgacmu
Cycle times (the time it takes to process one wafer) can be in the range of a
month. Any disruption therefore kills roughly a month (plus or minus) of
output, at least for wafers in certain steps. It's brutal.

Fabs are engineered to have redundant power, but what's interesting is that
the same thing happened to Samsung last year:
[https://www.anandtech.com/show/12535/power-outage-at-
samsung...](https://www.anandtech.com/show/12535/power-outage-at-samsungs-fab-
destroys-3-percent-of-global-nand-flash-output)

~~~
igravious
Interesting!

That's my point though. If power outages hurt these fabs so severely why
aren't their power supply systems more robust?

I know it's easy for me to say but I'm having a hard time wrapping my head
around it.

Say in another engineering space where an hour of power outage means roughly
an hour of downtime then you'd maybe not care so much.

But if, as you link here, a 30 minute power outage can "destroy 3.5% of the
global NAND supply for March" wouldn't they make sure they have 0 minutes of
power outage – heck, that's nearly national security levels of threat –
wouldn't the South Korean government install two (or three) sets of power
lines from different parts of the grid. Or a local power source (diesel
generators and a small coal power plant.) Expensive? Sure. But so is 3.5% of
global NAND supply for a month?

~~~
dgacmu
Power is honestly really hard -- for example, if you read through this list of
datacenter power failure post-mortems:

[http://up2v.nl/2017/06/02/datacenter-complete-power-
failures...](http://up2v.nl/2017/06/02/datacenter-complete-power-failures/)

there are a lot of individual failure cases. DC operators learn from each
failure, but there are a lot of ways things can go wrong. Fabs can be upwards
of 50MW, which puts them in the range of a good-sized datacenter, so the
challenges probably end up similar. (I'm saying the last part carefully - I'm
much more familiar with datacenter power design than fab power design!)

------
YayamiOmate
This seems weird that 13 minute outage can kill month and a half production.

I wonder if this is standat hi-tech factory process reliability.

~~~
jotm
I guess it makes more sense to destroy everything affected by a power loss (
_even if some of it could be perfectly fine, or salvageable_ ) than risk
shipping products that will fail at a higher rate. That would cost way more in
lost trust and lost sales.

~~~
meruru
They could sell under a special brand.

~~~
jotm
Oh I just remembered! There used to be noname parts - motherboards, PCI cards,
RAM. Literally no brand markings, no warranty either.

I remember the RAM in particular - the chips had nothing etched on them, or a
single 5char line. Even the firmware was unbranded, with strange timings, too
(probably loosened because they would not work at standard specs).

I'm guessing it was Chinese companies buying up "bad" or excess stock and
reselling it.

Haven't seen this in a while, either they tightened up regulations or the
margins are too low to make a profit nowadays.

------
social_quotient
Conspiracy: this is how the NSA buys their disk space.

~~~
glbrew
That is an interesting idea but it would be so much easier for them to
constantly buy, say 5%, of the output.

~~~
social_quotient
Well I’m not typically a real conspiracy guy but if I had to really think on
this I’d say it’s easier to have a loss event like this and consequent
writeoff vs some unexplainable long term 5% buyer.

I also don’t think if I needed tons of storage like this I’d want to acquire
it over a 20+ month period.

Obviously I think I’m kidding but the thought is interesting.

~~~
glbrew
NSA knows they need the storage, they didn't just figure out they needed to
store a lot of data to perform a massive buy. It would much easier for the
manufacturer to disguise 5% of constant purchases through various techniques,
they could hide it in all sorts of ways. Also the NSA could set up shell
companies to consistently buy output (frequently done during the cold war when
USA needed to buy supplies from Soviet aligned nations, USSR itself). The NSA
could hide behind major buyers like Google and Amazon. If the plant shut down
for false reasons literally thousands of plants workers would know if was
false. That wouldn't accomplish anything.

------
ggm
The fragility of the supply chain.

------
Rickvst
The stock of these companies really took a hit. Sarcasm.

------
loudouncodes
Every time I see Godzilla he’s ensnared in and tearing down power lines. This
was bound to happen sooner or later.

