I used to work in manufacturing test for an SSD supplier. This would normally be covered by an “ongoing reliability test” in quality. But I also witnessed that quality can be a highly politicized arm of manufacturing companies, and finding issues with products is not always well received, while approving products is always well received.
In many consumer products, tests like that are often not implemented or curtailed compared to OEM products. When you buy from a company like Dell or Apple, you get the benefit of having a large organization providing accountability. In other words, when a company like Dell represents their interests in receiving quality products to uphold their reputation, they also have a shared interest with the end consumer — but carry a lot more weight since they represent large contracts with the supplier. Suppliers tend to put more effort into testing their OEM products so as not to damage their business relationships.
Anyway, this kind of thing happens all the time in consumer storage. Likely nobody was doing reliability testing on these drives in the first place since that costs money and can only expose problems they didn’t really want to know about.
In a perfect world this would be true, especially at the large business level where the integrator will get their ass sued by the customer or at least be forced to make good on the situation.
In the retail and small/medium business market the reality is that Dell, HP, and the like are under so much pressure to cut margins that they'll go with whoever is cheapest, and customers almost never escalate things to tort.
Dell PC power supplies are made for them by someone else, proprietary in size and connector, and gosh, wouldn't you know it - they have a pretty high failure rate. They last just long enough to make it out of the warranty period, and then they make for a really nice revenue stream for Dell via replacement PSUs or pushing the customer to buy a new system entirely.
Even failure within the warranty period is acceptable in the consumer market because integrators have it down to a science exhausting people on the customer support side. Long phone queue times, incompetent support agents who have to transfer you to different agents and likely drop the call entirely, silly policies like requiring a reformat/OS reinstall for everything, and so on.
>Even failure within the warranty period is acceptable in the consumer market because integrators have it down to a science exhausting people on the customer support side. Long phone queue times, incompetent support agents who have to transfer you to different agents and likely drop the call entirely, silly policies like requiring a reformat/OS reinstall for everything, and so on.
This is one reason why I believe Apple computers last much longer than Windows computers. With Apple, they only sell a few models in high volume. So if there's an issue, everyone will know about it and Apple will often have to do a mass recall or provide free repairs. And since Apple prices are higher, you'd assume that they use better-grade parts on average.
> So if there's an issue, everyone will know about it and Apple will often have to do a mass recall or provide free repairs.
I wouldn't say Apple is any better than anyone else - aging iPhone batteries and butterfly keyboards both had a class action lawsuit settlement, it wasn't out of good PR that these got addressed. I suppose you are right that everyone will know about them, though, given that those were from memory.
>I wouldn't say Apple is any better than anyone else - aging iPhone batteries and butterfly keyboards both had a class action lawsuit settlement, it wasn't out of good PR that these got addressed. I suppose you are right that everyone will know about them, though, given that those were from memory.
That's the point. If 5% of PSUs failed inside a Dell computer just outside of warranty, no one would care except those affected. If the same thing happened on a Mac, you'd get a media storm and a class-action lawsuit and Apple will eventually settle by giving out repairs - even if the failure happened outside of warranty.
I did get a free battery replacement for my iPhone 6S.
Buying from an OEM certainly doesn’t come with any guarantees. It’s a price/quality contract in almost all cases though. The OEM defines an acceptable defectivity rate in their contract (even if allowed DPM if high). This effectively establishes a requirement at the supplier to ensure they will meet it.
For consumer products, you can assume that this added requirement doesn’t exist.
Edit: as another example, it’s well known among hardware suppliers that being a supplier to Apple can be a double edged sword for this reason. They have very high quality expectations and they squeeze extremely hard on price. But for that, they bring high volumes. If your company doesn’t have their stuff together, they can easily get raked over the coals in Apple contracts.
> ...Dell, HP, and the like are under so much pressure to cut margins that they'll go with whoever is cheapest, and customers almost never escalate things to tort.
Can confirm. Have an office supplied HP business desktop. One day noticed that my system is slower than normal. After 5 minutes with smartctl, I found out that the SSD was constantly throttling down SATA link (SATA downshift), was not reading or writing more than ~250 MBps, and had some wonky latency issues.
Got a new SSD, moved the drive with dd, and all my problems are solved. Previous drive was by Samsung, but it was a "value" drive which even Google knew nothing about. It was probably built with bottom of the barrel parts, and something went bad earlier than expected.
> Even failure within the warranty period is acceptable in the consumer market because integrators have it down to a science exhausting people on the customer support side.
This has been true for at least decades. It's why I completely ignore all warranties when I'm deciding what to purchase -- they tend to be essentially worthless, once you factor in the cost of trying to make a warranty claim.
Except that a non-overclockable CPU isn't a lower quality one. In fact, they may be sold cheaper to the OEM because they are less likely to be overclockable.
Generally, it did mean this. If we are to believe that Intel largely made the same CPU, and "binned" their processors into different speeds based on what they were stable at. And locked their multipliers to speeds that they'll be reliable at (lower quality = lower multiplier). But one could still set the bus speed to whatever they liked, and the retail boxed chips handled this better.
There would also be a market demand factor to it. If they got a large order for 266MHz chips, they'd lock them at the multipler for that, even if they could handle 300 or 333 MHz.
(Part of the rumour for some Celeron chips was that they were the same die but a fraction of the cache, so "Pentium" chips produced with a cache defect could have that section locked out and labelled a Celeron)
Nowadays, CPUs can often throttle themselves, so this binning wasn't as necessary to mitigate batch to batch variation.
Not a rumor. Starting with Coppermine, Celerons were Binned P3s. Same die size and all.
Interestingly, AMD did not typically do the same for the Duron (with one or two exceptions). My understanding at the time was their dies had extra cache to handle defects without full binning.
Hard to tell from appearance only but my initial impression is that's an inductor, not a capacitor. The circuit looks like a switching power regulator. The capacitors would be beige with silver ends, this one looks like an over molded inductor, similar to [1], and is used as the main power inductor in a buck regulator.
If this is an inductor, my gut reaction is it has an insufficient current rating for the application and it is overheating. Inductors have a bunch of loss mechanisms that contribute to heating. Depending on the type of metal used to build the core, it can 'hard saturate' and effectively walk itself off a cliff once the current draw gets too high. At some point, it gets hot enough to desolder itself from the circuit board. It's possible they did not see this in validation because the power draw of SSDs depend heavily on the work load and process variations in the chips; erase current can have a fairly wide variation.
fwiw, voiding of solder joints is a problem. The solder is applied as a paste - fine particles of metal solder suspended in solder flux. During reflow the flux evaporates and leaves the metal behind, but if the process isn't tuned right bubbles of gas can be trapped in the joint. This can lead to reliability problems. It can also increase the effective thermal resistance to the circuit board, which for tiny components like this can often be the primary path for heat removal during normal operation.
> the problem lies in hardware, not firmware, which could explain the lack of corrective firmware updates for those models and SanDisk's continued silence about the source of the issues.
But I'd guess a firmware update that slowed down the erase process could let it cool down. But the performance hit.
Are they not using charge pumps and these are some of the first SSDs upgraded to on-board inductored boost convertors?
These messes could be solved if system power supplies had a 20V rail instead of requiring tiny devices to make it. Maybe an integrated manufacturer (hi apple) will spec out proprietary SSDs like this one day.
Charge pumps are cheap and small, but not as efficient (ie: HEAT!):
> By using the boost converter with the optimized inductor, the energy during write-operation of the proposed 1.8-V 3D-SSD is decreased by 68% compared with the conventional 3.3-V 3D-SSD with the charge pump.
> One of the main causes is the on-die charge pump circuit, which has a low conversion efficiency and induces high heat generation.
> Using the in package boost converter, we show that the power consumption can be reduced by up to 39% while the temperature rise can be reduced by 50%.
> These messes could be solved if system power supplies had a 20V rail instead of requiring tiny devices to make it. Maybe an integrated manufacturer (hi apple) will spec out proprietary SSDs like this one day.
Then you'll get people (like me) who will deride Apple for requiring a proprietary component where COTS components are available, calling it an anti-consumer move.
Oh, if it were also smaller and lighter, we'd be in heaven. If it weren't for the proprietary devil lurking in the corner, showing us a fake heaven while having us in chains, sucking the life of our dreams.
I am electronics / PCB hobbyist and I can definitely see how their explanation can be true. I can't say it is, but I can see how it could be.
If you design a PCB for a given size of the resistor but then decide to use larger resistors without redesigning the pads, you may have reflow problems and weak joints. This is simply due to the fact, that the components are positioned due to surface tension during reflow process (they are pulled into place as the solder melts). If the pads are for smaller components, there will be too little solder for larger surface and weight of the component and working at a wrong angle to pull it into place causing potentially higher rate of failure.
> What's more is that the component pictured is a capacitor.
And that means what? From the picture I can tell that there is very little solder between component and the pad. Potentially too little to hold the component well in place.
> The only conclusion I can draw here is that the guy has no clue what he's talking about
Maybe he does, maybe he doesn't. Have you considered a possibility you are not an expert either?
As someone who designs circuit boards professionally, the explanation is clearly lacking. There might be a thermal issue or there might not be. There is nothing conclusive in the pictures either way. What I do see is the following:
1. Underfill (the brownish-tan smooth material surrounding the components towards the bottom of the picture) around the IC, which is typically done to make parts more mechanically robust.
2. No evidence of overheating on any of the thermal interface material that is left stuck to most of the components and no evidence of overheating on the PCB or the components themselves.
3. Completely insufficient evidence to declare a soldering issue. The way to prove this one way or another is x-ray inspection to look for voids in the solder or a mechanical cross-section of the suspect solder joints.
While this certainly could actually be the problem, I see insufficient evidence to conclude one way or another. Manufacturers don’t put underfill under a part unless it’s required through testing or experience with similar package types in prior designs since it adds cost, additional process steps and makes it a PITA or impossible to rework any bad components in the area.
As to the pad size/shape, there are three general classes of design defined by the IPC (standards body that deals with PCBs and PCB assemblies). Depending on how space constrained your design is, there are different recommended pad designs for passive components like these. They might be using one of the tighter spacing guidelines, but if their process is well controlled, it can be perfectly fine for the design life of the product.
If you want to see small pad layouts done well, look at an iPhone logic board.
If you want to know more about pad design for SMT parts, search for IPC-7352
My totally unsubstantiated guess from the description alone was 'I wonder if they switched to a larger package component and forgot to update the pads.' That could be described as the 'component being too large for the device' and while it might just fit, it may be borderline mechanically and electrically stable. That could also explain the added underfill. Is that possible?
It’s certainly possible someone did a BOM substitution and didn’t due diligence on it, but I doubt it. PCB assembly houses tend to notice components that are suddenly too big for their pads because they’ll have fallout in AOI or later testing.
The underfill was likely added before full production as the result of reliability tests that showed some mechanical susceptibility of that IC.
Does seem a bit strange, but the original article[1] in German, translated using Google Translate, reads as follows:
> “It's definitely a hardware problem. It is a design and construction weakness . The entire soldering process of the SSD is a problem,” says Häfele. A hard drive has components that need to be soldered to the circuit board. “The soldering material used, i.e. the solder, creates bubbles and therefore breaks more easily.”
> “In addition, the components used are far too large for the layout intended on the board,” says Häfele, explaining the technical problems: “As a result, the components are a little higher than the board and the contact with the intended pads is weaker. All it takes is a little something for solder joints to suddenly break.”
It sounds like what they're saying is that the solder pads are too small for some of the components. Not sure about what they're saying about the solder though.
> Not sure about what they're saying about the solder though.
There's more than one solder alloy in use. There's more than one class of solder alloy in use. Some are easier to use, some are harder to use. Some are high-performance, low-tolerance, some are low-performance, high-tolerance. Some are expensive, some are cheap.
The most troublesome family is SnBi. These are relatively new. They have a big "greenwashing" problem in that they solder at lower temperatures, which is "environmentally friendly" (and cheaper to run). Also the base metal is dirt cheap. (Wonder why manufacturers are interested?) It's also very, very brittle. It also happens to be a low-temperature alloy... so it's much easier to get hot enough to desolder during operation. Lots of trouble all around and in general a very high field failure rate. Not recommended... oh wait but it's cheap and greenwashable. Sigh.
Are there places that use SnBi for production devices? I know Bismuth alloys are used to desolder stuff (and they work amazingly well for that), but the general rule is that you should clean it up before soldering something new. (And keep it for later use, because it isn't exactly cheap.)
Also Bismuth appears to be rare: https://en.wikipedia.org/wiki/Abundances_of_the_elements_(da... Rarer than palladium. All the even rarer elements are generally known to be rare and/or precious, or radioactive elements that normal people would never come across.
I won't ever forget the widespread BGA failures caused by the RoHS-forced switch to lead-free solder. No doubt massive amounts of additional ewaste were created, but at least it's "environmentally friendly" ewaste?
Military/aerospace are still exempt and continue to use leaded solder.
If you are talking about Nvidia's flip chip problems. Those were actually caused by the glue holding the chip onto the substrate, not the solder. The glue expanded at a different rate from the solder balls and caused them to crack.
This was especially the case on consoles. People kept reballing and doing other useless repairs that solved the problem by accident by melting the solder balls between the substrate and the silicon chip. Some even managed to remelt the solder balls simply by replacing capacitors, which then made everyone think the capacitors were the problem and everyone swallowed it because replacement capacitors were cheap.
I don't want to pick a fight, but here's my rando opinion on that:
Almost all electronic devices end up as e-waste after a few years. If a couple % fail prematurely, that doesn't create a massive amount of additional ewaste, but rather a _very_ slight increase in e-waste. And it's relatively benign e-waste. You could shred the board and sprinkle it over your field and it wouldn't be a huge problem (* don't take my word on this; there's flux residue and somewhat toxic stuff used in other components, the plastics will probably leak BPA and other stuff, etc.)
> It sounds like what they're saying is that the solder pads are too small for some of the components
The converse is also possible. Instead of being a design flaw with the pads too small for the component, it could be that a larger component was substituted during manufacturing. Even terrible freeware EDA packages have design rules that will flag improper solder pad layouts, so it seems like what might have happened is the physical part does not resemble its model.
> Even terrible freeware EDA packages have design rules that will flag improper solder pad layouts
No, they don't. EDA software doesn't really know what size the terminations are. It knows how big the pad itself is, and is very good at keeping those out of trouble, but it doesn't know what size the solderable area is. You might tell it, or give it a 3D model, but make a mistake there and you're right back here. As well, there are so many different kinds of terminations (pop quiz: what kind are these?) that even if it does know what size they are, it doesn't necessarily know what size or shape the pad should be.
Also the CM will totally edit this stuff and not tell you. Which they're not supposed to do, and are probably better at if you're a huge customer, but they still do it. EDA sure doesn't know about that.
If the correct amount of pad is not exposed at the edge of the part, the solder will have nowhere to form a fillet which is critical to its physical attachment. Solder is not glue, and even with more pad contact beneath this is a physically weaker connection which often results in tombstones like pictured in TFA.
If you read the integration documents for these packages, you'll see that they distinctly specify the requirements for these margins. Probably the length is the more important axis and may be what he was referring to when saying "large". I've seen this be a problem particularly during the "chip shortage" where jellybean parts like these capacitors have the weakest specs in a design, meaning unilateral substitutions can happen at many points in the design/mfg pipeline.
Indeed brittle solder is a real phenomenon which is often easily visible in hand soldered joints that we call "cold" joints. Formation of bubbles can happen for a number of reasons, but IME it's the result of low quality solder or flux/cleaning. The organic compounds gasify in the heat and form an internal structure similar to bread.
ETA: an interesting paper exploring the cause and minimization of voiding in the reflow process. Particularly, the decrease in thermal conductivity in voided solder can critically contribute to its failure in high-heat operational environments.
> Larger components will have more surface area at the joint and should be stronger than a smaller component
Larger components are also, well, larger, and have much bigger forces on them. For ceramic capacitors you need to avoid shearing and torquing as the body of the capacitor is very brittle and a small crack means a dead part, possibly dead short. Big ceramics are dangerous to use as they have a high failure rate. I personally won't use anything larger than a 1210. Some of my colleagues think I'm nuts and should stop at 0805, but I think the flexible terminations available these days make 1210 viable. At least in medium volumes, I don't ship SSDs!
> I can't for the life of me figure out how they came to such a weird conclusion
What I see when I look at this is they have a part with a 5-sided termination (typical MLCC capacitor with metallized cap) but they have a footprint that only gets fillets on 1 of those 5 sides (typical would be 3). This is common for resistors... but resistors (a) have only 3-sided terminations anyway and (b) are made of robust alumina bodies, not fragile ceramics. So someone either got dumb with the footprint library or more likely overly aggressive to pack things in, not appreciating what MLCCs really need to be happy. I don't think it's part size changes, because the fillets along the length dimension that are visible look about right in size.
This is something that is in my area of expertise, and your suspicions are correct.
Solder can "bubble" but this is a line process issue that is easily picked up even in old AOI systems (automatic optical inspections) from 10-15 years ago.
To be frank, this article to me, reads like piece put together by somebody who has no idea what they're on about to generate publicity for their company. Nothing to see here.
The most charitable way I can read their statement is that the resistors are too large for the pad, and along with poor solder material it forms a weak joint which breaks over time.
I have a hard time accepting that because there is not a lot of heat on that line nor is there a lot of physical stress, like constant vibration on SSDs.
These SSDs are tiny. The controllers can easily get up to 80C during sustained writes, so there could be mechanical stress from thermal cycling. (Source: we also make small USB-interfaced high-speed storage devices and do a range of reliability testing for stuff like this)
It looks to me like some glued on covering has been removed here, which in turn could have pulled the components off (could still be weak solder joints) rather than it being a manufacturing problem - the components don't look too big for the pads to me
Most modern manufacturing lines have manual and automatic (vision system) inspections that would detect badly soldered or toombstoned components like the ones shown here.
But there was something in the article about epoxy - so potentially the components are glued down with a conductive epoxy instead of being actually soldered. Why you do this? Don't know. But it would explain why the solder is losing the plot.
"Too big" could mean the pads on the circuit board were made for a smaller component, and now with the larger one, there's less overlap and direct contact from the pads on the board and the contacts on the component.
I stopped buying WD anything early 2010's, but then they acquired everyone else like Seagate, meaning even decent Hitachi disks would be now tainted to become typical WD garbage. I still won't buy anything WD, but alternatives are hardly attractive with the market limited to like 3-4 players.
Good old monopolies in effect, your options are bad or worse.
If Backblaze yearly disk stats and my personal experience in our datacenter is anything of importance, WD is generally the more reliable disk brand for the last decade or so.
I remember an era where Seagate Constellation (enterprise disks) were so bad, I was replacing them a dozen per week.
Also, from my experience SanDisk didn't get tainted by WD acquisition. Their Extreme Pro SDs still as reliable as before, and their portable SSDs hit the speeds and reliability they advertise.
Every manufacturer makes a design error almost once a decade. Seagate did it, Maxtor did it, WD did it before (their drives were very finicky), however all big producers are in good shape now, from my experience. I can equally trust a Seagate IronWolf Pro or its WD equivalent, or a Samsung SSD and its SanDisk equivalent.
Problems happen, PCBs got revised, things got recalled. Everything is new, but nothing has changed.
It's funny you say that. I always thought WD were the more reliable brand, and Seagate were trash.
I wonder if it's just a case of each of us having one HDD of a particular brand fail on us violently, and then finding others who were in the same boat.
Pronounce this in German: "Sea gate oder sea gate nicht" ("Sie geht oder Sie geht nicht"). Meaning "she works or she does not work" is a German word play on early failure rates for Seagate drives.
Coined when there was a time where if you didn't have Seagate drives in a RAID you were more likely to loose your data than not ;)
And yeah I started buying WD at that point. Backblaze stats weren't a thing back then tho.
> I wonder if it's just a case of each of us having one HDD of a particular brand fail on us violently, and then finding others who were in the same boat.
That is absolutely the case and anyone with enough experience could confirm it. Both WD and Seagate have made some real trash drives, and both made at least one or two models that were trash at scale. If you timed it just right you could jump from one to another and experience massive failures with both! You also probably have a drive from each that's been running for 20 years somehow.
And don't forget Hynix. They somewhat recently got into the B2C business, and while they command a premium, the SSDs both OEM and Retail I use from them have been very solid.
I wondered if he was confusing the drama that happened with Seagate buying up Maxtor. A lot of people were upset when that happened because they trusted Seagate a lot more than Maxtor or Western Digital and suddenly the same shitty Maxtor drives many went out of their way to avoid were being sold under the Seagate name leaving people stuck with either buying WD or buying Seagate and probably getting Maxtor anyway. Seagate's quality and reputation took a huge hit.
For external drives, I would seriously consider using SSDs. Unless you use them exclusively as cold backups and handle them carefully and seldom, I would be far too worried about accidental drops. I have killed some external HDDs this way, never killed an SSD, even though I am far rougher with them. For extra reliability, buy two disks from different manufacturers (e.g. Sandisk/WD and Samsung) at different times and mirror the contents. Less chance of both disks going bad at the same time.
Talking about 3.5" HDDs, sourced from external drives: WD is still ok in my book. Both the Backblaze report [1] (newest, quarterly version, check the drive hours, WDC has less than HGST so far) and my own experience show they are ok. I used to buy HGST based on Backblaze's reports, but now I am using WD external drives in my NAS. My oldest and most used disk (one of the parity drives) has more than 3 years power on hours with nearly 900 start/stop cycles. It shows no signs of failure so far.
I get these HDDs from external drives (called "shucking"), 10TB WD My Book or WD Elements Desktop. It is a bit random what you get, but between 7 HDDs (+1 currently in testing) over about 3 years, I only had one non-Helium drive that runs hotter than the other all Helium drives. No failures yet, no bit errors as well, performance is at least good enough for media storage, currently reading at about 180MB/s sequentially.
I saw one problem: USB errors with WD's USB-SATA bridge and I even had to remove the newest disk to run the test, it would drop from the bus via USB. Might be because it is a refurbished disk or something fishy with the USB 3.0 ports on my server, so I won't blame WD for it.
>For external drives, I would seriously consider using SSDs.
I wouldn't. I use my external drives as offline backups, so they don't get plugged in that often. SSDs lose their data if they aren't powered up regularly. And of course, they're much more expensive per TB than spinning rust.
The funny thing is since these have been getting news even months ago, there was almost immediate fire sales on all the main deal sites to sell them off. Everyone that bought them now have a waiting time bomb of a disk to use. Thanks Western Digital for your contribution to society.
Costco is actually a decent org, and if anyone knew they were selling this time-bomb garbage, they would stop it, as they will warranty stuff for YEARS, just to be a somewhat decent company in a time of pirates.
I own one of these disks and quit using it when the news came out, expecting I should hang onto it to get money back for a recall. Didn't even occur to me I could just have brought it back to Costco all this time because of their extremely generous return policy.
Not the same series. "Extreme Go" is not the same product as "Extreme Pro". I have two of these from Costco and they have worked fine for several years.
I've had a Fujitsu (if I remember correctly) drive many many years ago that had a hardware bug that would cause an IC on it to spontaneously flash fire and die.
There will probably be a class action lawsuit where everyone that bought one gets a $20 coupon towards a new WD product, and the lawyers make millions.
I told myself I'd never again buy a WD drive when I realised the WD Red NAS drives I bought were completely unsuitable for NAS because they secretely replaced the product line with SMR drives.
And now you are telling me that the Sandisk SSD I bought as a replacement also has a fatal design flaw? And apparently Sandisk is a WD subsidiary?
I'm feeling slightly less bad about spending a fortune on getting a bigger built-in SSD in my Macbook. Please don't tell me they are flawed as well.
I'm unmoved and unsurprised. Retail parts are unreliable, cheap crap by the nature of the market created to perpetuate the fantasies of something for nothing.
Coincidentally, I recently selected Max Endurance with a 15 year warranty for a noncritical application and a non-retail channel Industrial XI for something else.
I'm also unsurprised there are no SLC or traditional EEPROM SD cards advertising these facts because of the race-to-the-bottom commodification of garbage by the price point obsession of users who don't know any better. In an ideal world™, all network and computing devices would use ECC memory but no we can't have nice things and would rather have silent corruption and bitsquatting to save a few cents.
PS: C. 2001, I intentionally tried to induce errors for failure analysis purposes of industrial Maxim flash EEPROM ICs rated for 10k cell writes by using an environmental cycling chamber with heat, cold, and humidity. The damn parts wouldn't fail beyond 2.5 orders of magnitude beyond that, and I started to question that writes weren't happening. If I had more time, I would've burned it down to the ground until there were many errors to characterize it. At the end of the day, it had to be left at using turbo codes to ensure redundancy of data by cell and across chips.
SSD's, when the fail, they usually fail catastrophically. Use automated backup software to regularly copy data to an HDD for anything you don't want to lose. And don't use SSD's for archiving, or long term backup purposes.
Also I stay away from Sandisk. They have always occupied the cheap space of drives and they have always been known to cut corners for profit.
Western Digital seems to be heading in that direction as well.
I have had a good experience with Samsung since the beginning of SSD storage.
Reminds me of my old Corsair Voyager. "Rugged" USB stick housed inside a fully rubber enclosure, which constantly causes the USB plug to snap off. Forgot how many times I had to RMA that thing.
The firmware “fix” sounds suspiciously similar to their handling of a similar issue on the WD Blue SA510 SSD’s, of which I’m on my third, after the previous two failed in less than 12 months. Didn’t they start using some new 3D NAND chips? I wonder if there’s a flaw in those chips. They would be in use on many different products so may explain the similar failures?
I'm astonished that after WD bought the SanDisk brand they kept it alive. You couldn't pay ME to use anything under that name, it's so negative. Maybe now with this critical failure they'll just slowly start branding things with any of the other myriad of brand names they've bought "hgst" for instance and slowly kill the brand.
What's wrong with SanDisk? Out of the loop here -- I had a SanDisk SSD around 5 years ago and it was absolutely great; it's still going today (it's seen quite a bit of use, too.)
I've very rarely had an SSD fail in general, to be honest -- though I do generally stick to reliable brands[0], not "Xykdidlwo" or "Dyewkdlo" off Amazon.
Right now I've got 3 SSDs in my server (2 mirrored so 1TB for apps, and a 500GB boot drive), and I'm interested to see which one goes first.
[0]: Crucial, Samsung, Kingston, SanDisk (until I hear any information which discourages me) etc.
I don't have any experience with their ssd's but I have a few sandisk usb drives that have lasted far longer than any other brand in that hellish environment of being an os system drive. It is not really that bad but with the frequency that usb flash dies when used as a boot drive you would thing I am abusing them. The no-names I understand, junk from who knows where. but the worst offender was kingston, they are probably fine on windows as a rarely used backup unit. but as an openbsd system drive, hot garbage, I went through 6 in six months, I would expect better from a named brand. as a comparison I am still on the original sandisk units, 5 years and counting.
There's only four flash manufacturers: Samsung, Micron, SK Hynix and SanDisk/Kioxia. All of them have had problems over the years. All of them will change the internals of products without changing SKUs or anything visible to the consumer.
Also, always run perf tests (especially using large writes - preferably up to the capacity of the drive!) for any drive that it is important 'you got what you paid for'.
The number of counterfeit, badly designed to the point of defective, or DOA SD Cards and SSD drives I've seen over the last few years is crazy.
I literally won't even buy USB sticks anymore. The last time I tried, all 5 different makes/models I tried were so dysfunctional they were useless. Literally unfit for purpose. Major brands too!
A lot (all?) recent USB sticks have terrible thermal design, and will throttle seemingly arbitrarily to very low speeds under sustained load. Like 2.5MB/s type speeds. They seem like they were made to to theoretically exist for the market niche, but no one expected them to actually be used by anyone who paid any attention at all.
Same for ones bought in big box stores as Amazon or the like. Name brand or random brand.
A lot of less expensive 2.5+ Gig Ethernet dongles do the same.
Good performance for 5-10 seconds, then abysmal.
I switched to SD cards, and at least the good brands of those had decent and predictable performance (50-75MB/s sustained for the same price point). They were also a lot cheaper in general for the capacity.
One of the more interesting things to me is that while every storage medium has failures (which is why RAID and backups are a thing :-) there are more failure modes with flash storage that present as abrupt storage failure.
If it's a critical workflow on which your business rests, then you immediately replace it with a better model/brand as that's a business tax write-off. Plus you have the usual on-site and off-sie back-ups which you should already have for your business.
You do have a back-up set up that you also test, right? Right? </Anakin-Padme meme>
The fact you have one SSD in a critical workflow is an immediate red flag. You should have some kind of redundant solution with backups even if you didn't suspect particular SSDs are prone to failure.
I think one can enclose m2 ssd's in usb adapters, then you just use well proven tech like samsung 970 pro, been chugging along on our build server for years now
Many of these adapters have their own quality problems which vary with the version of the controller. That version number is rarely available prior to purchase.
No but by me or anyone else who can hold a soldering iron :)
It's much much easier than a BGA cracking issue, or something internal in the flash which is basically unfixable. This is just some components tombstoning. It shouldn't cost a lot to get it fixed (of course Sandisk should take care of that)
The article unfortunately was written by someone with no clue so we don't know why tombstoned components (shown in the picture) were not caught in inspection/test. They imply the failures happened in the field, but that's not where tombstoning happens. Presumably what happened was that the supercap (looking like [1]) tombstoned in reflow. Then circuit test failed to test that it was installed so the unit was shipped. Subsequently in the field the unit suffered a sudden power loss with pending writes. Normally the supercap provides power for long enough to flush pending writes to NAND. But since it was open circuit, the power fail flush never finished, resulting in corrupted storage. Fixing the open circuit solder joint as you suggest does not remedy the problem for the user because their data is still gone.
One capacitor on a tank array would definitely reduce its total capacitance, but they are nearly always in parallel and would not cause a failure of the whole tank, and the device would be inoperative if the output of the array was shorted.
I'm skeptical that losing one capacitor in the array would cause the failure mode you're describing. Especially if the age of the devices is considered, the array would have been designed with margin to withstand capacitance loss as the device ages.
"I'm skeptical that losing one capacitor in the array would cause the failure mode you're describing."
Depends on what the capacitor is being used for in the circuit. In many cases, having a cap fail open results in a higher current draw which kills the unit if left in operation for too long. This is the case on some of the off-road lighting I manufacture. If one cap is present and fails open at ground, the circuit overloads. If the cap is connected to ground but not the rest of the circuit, the circuit doesn't operate.
Regardless, one component being off can cause a whole chain of maladies.
By anyone who can operate a stereo microscope and a surface mount solder station.
A Fisher-Price “My First 40 Watt Weller Soldering Pencil” won’t cut it for this type of repair as you’re not just flicking diodes off a board to “unlock” something.
It does for me.. I've soldered 0805 (and 1206 which was most of them fortunately) components with a screwdriver-tipped aldi iron as I didn't have anything else available. It was not a great experience but being very careful with the corner of it it worked.
But this is a super capacitor so it'll be a lot biger than that.
But a hot air rework station or a really fine temperature-controlled tip is way better of course, which is what I usually use.
Yes but this is more the problem with the mentality around today's disposable electronics than a real human problem. A lot of these skills have been lost.
In the 80s it was totally normal to get an electrical schematic with a TV for instance, and there were repair shops all over (or people doing it from home for a small fee as a side business).
These days it's not as impossible as people think. In fact very often when a TV fails it's a through-hole capacitor that is trivial to replace for a couple bucks. I have repaired several at work and for friends and they still work fine (I always replace it with good quality high-temperature rated ones, manufacturers often use too low a temperature rating so the equipment will fail far too soon and the customers buy a new one).
Yeah, this stuff is harder than it looks. If you need too much time with the soldering iron, the temperature can conduct through the wire and fry other components, those sensitive ICs that are the flash chips in particular.
>A new report from a data recovery company now points the finger at design and manufacturing flaws as the underlying issue with the recent flood of SanDisk Extreme Pro failures that eventually spurred a class action lawsuit
In many consumer products, tests like that are often not implemented or curtailed compared to OEM products. When you buy from a company like Dell or Apple, you get the benefit of having a large organization providing accountability. In other words, when a company like Dell represents their interests in receiving quality products to uphold their reputation, they also have a shared interest with the end consumer — but carry a lot more weight since they represent large contracts with the supplier. Suppliers tend to put more effort into testing their OEM products so as not to damage their business relationships.
Anyway, this kind of thing happens all the time in consumer storage. Likely nobody was doing reliability testing on these drives in the first place since that costs money and can only expose problems they didn’t really want to know about.