My story: my first real job after grad school was at Texas Instruments. It was a good job, and I enjoyed working there.
A fellow new hire and I were tasked with fixing a machine that ran in a clean room where semiconductor wafers were made. On weekends while the line was down, we would go in, crank up one section of the line and let waste wafers travel through the line where very rarely they might get stuck in a multi-lane machine that would etch the wafers with some sort of acid.
The machine had over two dozen asynchronous motors, actuators, pumps, sensors and so forth. All generating interrupts and I/O events that were sent to a computer that ran the whole line and controlled all the machines.
We couldn’t slow down the machine, it had to run at full speed. The program controlling the machine was thousands of lines of assembly language—everything was assembly language, including the homemade OS that ran the computer running the line. It took like an hour for us to bring up the line and two more hours to see the machine do something strange.
The computer running all this had no user interface other than a some front panel switches and some panel lights that would reveal 16 bits of it’s 128K of memory at a time. This was in the 1970s before Ethernet had been invented.
It felt a bit like those escape room events where you know there is a solution, but you don’t know if you will ever get out. Without my coworker cracking jokes about our plight, I’m not sure we would have ever triumphed over that stupid machine.
To jump on the bandwagon here- as a guy with zero hardware qualifications who likes to work on cars and motorcycles my first thought was a temperature issue. Not that I could’ve found the resistor issue, but this is not an obscure bug except maybe for pure software people.
I did too, but the first line of the thread: “winter is coming (freezing emoji)” primes you to think about temperature. I’m not sure it would have been nearly as obvious if after a few months in the field I suddenly got a few bug reports.
I think this might be a case of confirmation bias that people experienced when reading the article since the author tells you the solution in the first sentence.
I automatically thought temp too but only because of watching that aeroplane crash tv show which would go over a plane crash to figure out what went wrong.
I remember one episode where they figured the wings would freeze and not move so the pilots lose control and would crash.
They reengineered the part to not freeze during extreme temperatures.
Makes me think that it could also be something more indirect like humidity. Temperature inside facility is likely controlled. But humidity might not or it might go to lower bound of what is desired.
I really wish hardware startups would at least hire people early on who have experience with actual manufacturing. Stuff like this really shouldn't happen and really shouldn't be a surprise, either. Temperature considerations are a standard part of the design criteria and testing in any mechanical device, and standard root cause analysis (did they even do FMEA?) would have forced them to find the problem the first time, not the second. When you're still in the early prototype phase, fine, but this is a 25? person team selling a product with a subscription and everything.
Edit: just realized their stated fix is to remove two resistors. Did they actually test the full effects of that change? Seems like there's a decent chance of a third head-scratcher a few months from now...
Ha, reminds me of something at a company I used to work at! One of their products included this little switched power supply of some kind. I don't recall the exact details, but basically it included a snubber circuit that was failing due to some transients being larger than designed for. So they fixed it by.. removing the snubber. That worked briefly, but then of course the transistor it was protecting started failing. So they installed a massive overkill transistor, which worked for a while, until that started overheating, so they redesigned the PCB to move it to the edge of the board where it could be heat-sinked to the case. Even with that they were still getting occasional returns with it failed, so when I joined the company I was asked to look at it. Added the snubber back in with an appropriately-sized resistor, and problem solved. (And was able to switch back from the $30 transistor to one that cost a few cents.)
> standard root cause analysis (did they even do FMEA?) would have forced them to find the problem the first time, not the second
Not to mention that the standard checklist for solving "why is this joint having problems" is "is it cold or does it get worse when we make it cold"....
> just realized their stated fix is to remove two resistors. Did they actually test the full effects of that change? Seems like there's a decent chance of a third head-scratcher a few months from now...
As a half-decent analog guy I can actually believe this. I can imagine a lot of ways something like this could fix a problem, though of course I haven't seen any schematics here.
I'm guessing that a robotics company actually has very few analog-competent engineers. It may well have solved the issue if one of them had built an analog circuit. A lot of engineers don't think about operating conditions when they first learn to build circuits, and run them near the edge of their specified tolerances, so moving the component into a "comfortable" range would fix it.
However, if they were working with a COTS component and took some resistors off that, I doubt it was actually the right fix.
Yep, we used a temperature chamber that could ramp from -40°C to +80°C, along with humidity. We would push the limits until something broke, fix the design, and repeat.
Surprised, but not surprised, that everyone doesn't do at least some stress testing.
I was expecting something more complicated, like electromagnetic interference from a passing vehicle. Or ESD (static discharge).
I’m just a bum with an automated hydroponic garden and I verified and tested all of my components for expected behaviour in the normal conditions the system will be in. I had to exclude several components as a result.
On one hand I’m surprised they didn’t do that, but on the other, I have no time constraints and I’m very, very uncertain of what I’m doing most of the time. I knew I wanted to have accurate readings and reliable performance so my garden doesn’t die due to malfunctions. So, I goofed around and made sure stuff was right. I’d do the same with code in my spare time, but my employers have me cut corners all the time. People could criticize me for it, but it’s not as though I don’t know better.
It might be similar for this team. By the time the bugs strike, it’s not clear what’s been properly vetted or who knows what about which components. Debugging becomes harder because the initial spec and how well it was met is no longer clear.
I’m totally guessing as I don’t really know their team, process, or hardware in general.
I also discovered the ruggeduino line of arduino boards in the process of component elimination, which are pretty cool. Overkill for my use case, but I hope to have a use for one some day. I’m thinking of making a robot shop vac and metal-picker-upper, and the ruggeduino would be great there I think.
If you buy a password manager from mechanical engineers but it doesn't encrypt anything because they didn't know it should, is the issue time constraints or that they didn't hire a software engineer?
No. They sell product profitably. Who am I to say they're wrong? Same with their customers using the product profitably.
If they develop a business need to avoid similar issues before they happen, then the answer would be yes. The work they unwittingly skipped (like design for environment) and struggled with (diagnosing an environment problem) are the daily tasks for many people, so they obviously don't have one of those people.
As a software person this is first I have heard of FMEA (Failure Mode and Effect Analysis) [0]. Sounds like a very rigorous way to move through a system and identify all the ways it can break and develop ways to fix it.
Not everybody needs to run formal FMEA procedures on everything, but if you ship hardware without at least putting it in the fridge for a few hours next to the Jolt Cola and leftover kung pao shrimp, you are tempting fate.
Yeah I mean maybe it doesn't matter for robots intended to be used indoors, but they're making a robot intended to be used outside in all (dry) weather on construction sites!
So, sometimes you really don't anticipate the kinds of failures you can have. A really good one to remember is the "Camera Shy" Pi 2.
For those unfamiliar with this issue, or who forgot: The power regulator IC on the Pi 2 was a flip-silicon slab that was otherwise uncoated or encased. When engineers were taking pictures of it, they likely were using their cell phones and otherwise loose lighting under fluorescent or sunlight. When high intensity Xenon bulbs were used for flash photography, however, intense IR light would hit the silicon, excite something, and cause the power supply to go wobbly kneed, drop out like a Silicon Valley startup founder, and you'd have a bad day.
This failure was only figured out when someone happened to compare two flashing lights (one Xenon and one a bike light) that they figured out what was going on.
Engineers miss things all the time, even those who are supposedly really good. They are good, but humans err, because to err is human.
Prime example: I diagnosed an arcade cabinet that only had issues on sustained playtime during the summer, but never during the winter. It was fine on the off day that there was a lot of cloud cover, and only seemed to get worse after a shade awning was taken down: before the removal, it would intermittently sputter back to life during the later afternoon. Nobody knew what was going on.
I spent an afternoon watching it fail. At one point, I eventually went and grabbed an IR thermometer and pointed it at the cabinet. It was registering well into 100F. It was also outright refusing to work.
I rolled down the window blinds, pointed a fan at the poor machine, and waited for a while. Miraculously, it woke up after a while and started working.
I traced it to the three canned voltage regulators. The voltage regulators for the system were strapped to the metal chassis of the CRT, which contained the system's power supply. The machine happened to be right next to a window, and the cabinet had been painted black by the previous owner. As the daytime sun would wander over, the black paint would soak up the UV and absolutely cook the regulators. Once the sun had passed behind the shade, it was quickly cool enough inside the chassis to bring the regulators back into spec, which caused the machine to start working again. When the shade was removed, it only went back up once the sun was no longer cooking it like a roast chicken.
I believe my recommendation was "Replace the side paneling with woodgrain and move it out of the sun."
The engineer that designed that arcade machine probably never thought it'd be in a hot, New Mexico university faculty lounge next to a window that faced the sun. I remember going back about a year later and sure enough, it had moved about three feet and was "rarely, if ever broken" now.
“The reason is that, in other fields [than software], people have to deal with the perversity of matter. You are designing circuits or cars or chemicals, you have to face the fact that these physical substances will do what they do, not what they are supposed to do. We in software don't have that problem, and that makes it tremendously easier. We are designing a collection of idealized mathematical parts which have definitions. They do exactly what they are defined to do.
And so there are many problems we [programmers] don't have. For instance, if we put an if statement inside of a while statement, we don't have to worry about whether the if statement can get enough power to run at the speed it's going to run. We don't have to worry about whether it will run at a speed that generates radio frequency interference and induces wrong values in some other parts of the data. We don't have to worry about whether it will loop at a speed that causes a resonance and eventually the if statement will vibrate against the while statement and one of them will crack. We don't have to worry that chemicals in the environment will get into the boundary between the if statement and the while statement and corrode them, and cause a bad connection. We don't have to worry that other chemicals will get on them and cause a short-circuit. We don't have to worry about whether the heat can be dissipated from this if statement through the surrounding while statement. We don't have to worry about whether the while statement would cause so much voltage drop that the if statement won't function correctly. When you look at the value of a variable you don't have to worry about whether you've referenced that variable so many times that you exceed the fan-out limit. You don't have to worry about how much capacitance there is in a certain variable and how much time it will take to store the value in it.
All these things are defined a way, the system is defined to function in a certain way, and it always does. The physical computer might malfunction, but that's not the program's fault. So, because of all these problems we don't have to deal with, our field is tremendously easier.”
Any developer, hardware or software, will tell you that reproducibility is the key to solving a problem.
But reproducibility can be vague and sometimes, when you're under pressure, you can be quick to point to something and declare "aha! that's the root cause!" and be totally wrong.
While reading the thread, a red flag[1] was immediately raised in my mind when:
> We couldn't reproduce it, but we did come up with a theory for why it was happening.
...going right into mechanical subsystem redesign. Surely a cursory review would have challenged such a reactionary proposal: What meaningful steps were taken to falsify the prevailing theory?
There's something implied about discipline when this vacant QA Tester Hardware/Software engineering position description[2] bundles verification/validation test roles on the design/development/production/field support fronts with the following caveat:
> Initially, you'll be the only QA engineer and will perform active testing of new product releases in our lab and in the field at construction sites.
Also, non-rhetorical question: Selenium for industrial hardware test automation...is that really a thing in the wild?
Or even be totally right, but there's more to the problem. Peeling the onion.
We hope that fixing one piece of code will solve three or four exhibited problems. But it's often more like, change three or four pieces of code to make one problem go away.
As Heinlein is purported to have written, "If it's not one thing, it's two things."
If Heinlein did use a phrase like it, I expect my searches would have found it.
It doesn't appear to be a common saying, so I'm curious how you acquired the association between it and Heinlein. It doesn't seems like a common misquote people end up spreading.
[1] I've only read "Tramp Royale" up to the point where they left the US, I haven't read the "stinkeroos", nor his posthumous novels, nor most of what Wikipedia lists under "Other short fiction", nor a couple more non-fiction publications.
I agree, having read most Heinlein ever published (even including his unfortunate right wing and conservative and unscientific ramblings that wasted my time), I never came across "If it is not one thing, it is two things."
If Spider Robinson used that quote, I presume my text searches of archive.org would have found it. Yet across the different search combinations, I only found a dozen-odd books, none by an SF author.
Why do you associate it with something written by a famous author?
I don't see anything which suggests its a well-known phrase. Google and DDG together found fewer than 100 pages using that quote. Most appear to have been spontaneous creation. One attributed it to "a Norwegian friend."
(This all assumes my search terms were meaningful.)
> As Heinlein is purported to have written, "If it's not one thing, it's two things."
By Any Other Name, Spider Robinson:
"
And McLaughlin rescued the moment, in that split second before Higgins’s control would have cracked, doing his prizefighter imitation. “Aw Jeez, Tom, that hard cider. If it ain’t one thing, it’s two things. Go ahead; we’ll keep your shoes warm.”
"
for completeness, it's a play on the more common expression, perhaps best summed up in the modern age by Snoop Dogg in Pump, Pump:
At least in software you can (mostly) easily re-create the conditions that lead to the problem... Hardware exists in the real world, not a virtual one, so while it's possible to find correlations if you've got enough data, getting a definitive root cause can be much harder as you've got to create the real world.
As a hardware guy I look at software with envy. Having to deal with physics is such a huge fucking pain in the ass all the time. Reality real fucking hates low entropy systems and will try and sabotage you at every turn. There is also the inherent opaqueness to reality based systems that makes debugging them a huge pain that can be enormously time consuming and expensive. And scaling is ridiculously difficult and expensive.
And worst of all, for me, there is no money in hardware. At best you make a trinket that requires a $9.99 subscription to really get use out of. At worst you make a cool trinket, get forced by pricing to make it in China, and then end up just having the idea stolen and reproduced to be sold for 1/2 the cost.
Considering China makes Apple's hardware, I would say that Apple's software is what makes Apple stand apart.
And their SoC's of course, but ain't no lone engineer spinning up their own phone, much less their own SoC.
I'm really talking about hobby projects. Hobby swe you can actually make a product, sell it, and make some income. Hardware? Maybe you can make something, but make some money? lol.
Not really. There is deep knowledge and capacity in doing the hardware design. It's not as if Apple sends over a couple powerpoint slides with some renderings and then Foxconn delivers a product a year later. Apple delivers highly detailed designs, exact geometry and tolerance of each and every part, all the process documentation, specs, materials, designs for how the assembly line should be, etc. All that gets handed to Foxconn and they manage the staffing and feeding of the lines.
I met a guy who worked for IBM on the AS/400s. Apparently they had a 'server room' where they could control the temperature and humidity to basically any combination that was likely to occur in the real world and they would test all their hardware there under any extreme condition they could think of.
"Turns out that last year's coupler problems had the same root cause. While people were opening up the robot and tightening the coupler, the robot would warm up. By the time they put it back together, the problem would have gone away. It had nothing to do with couplers at all.
By the time we had rolled out the coupler "fix" to all robots, the weather had warmed up enough across the country that the issue didn't reoccur. We thought we had fixed it, when actually spring fixed it."
At first they correlated a possible root cause and then after learning from that mistake they finally understood the root cause.
I've seen it happen many times where people with not enough time and knowledge to debug a huge system had to resort to shotgun debugging. IME taking the time to understand always ends up 1) solving the problem and 2) saving time and money.
This is especially true when the problem is actually caused by two or more root causes.
Is there a good name for “two root causes” where they work in tandem? I am blanking on it. Contributing factors?
Necessary but not sufficient conditions?
Climate chambers are a staple of industrial hardware testing.
There are ISO standards to tests for temperature and humidity resilience and just as you should test for EM immunity and emissions, environmental testing is just as important, especially for industrial hardware.
It’s possible that in this case the customers were using the product outside its specification but when you are designing an electromechanical device there are tons of things that can go wrong once you’re put of the narrow band of environmental comfort.
Grease and lubricants can seize or liquefy, metals expand, humidity affects corrosion and heat convection, rubbers and seals can harden and contract, electronic components can overheat or change characteristics or prematurely age,…
Honestly, if the company didn't do basic environmental testing during development, they got off VERY easy.
I don't have an environmental chamber, but I wish I did. Everything I build goes into the fridge for at least a few hours (which means it gets tested at extreme humidity as well as extreme temperature, for better or worse.)
I should put boards on a hot plate at ~60C for a similar length of time, but I didn't do that recently, and I paid heavily for that bit of negligence. Probably wasted 100-200 person-hours at the factory, having them rework a NOR flash part that was fine all along but didn't like being inadvertently overclocked by 2x once the board reached operating temperature in the test area.
I have an HP Spectre x360 laptop. It wasn't registering some of its keystrokes. The same keys would fail, and for quite a while too - hitting them harder and repeatedly didn't happen.
Turns out it was the cold. Now when I take a trip out for coffee & coding, I boot up and let it sit for a while before starting my work.
I've worked for many hardware companies. Thermal chamber testing is pretty standard. When doing a design it usually has a requirement for temperature range and the parts are specked accordingly, and thermal testing is done before release.
Back in the dark ages when I fixed avionics (I grew up and went into software) I was faced with many intermittent hardware failures. The standard procedure was to use freeze spray. One component at a time. I had a lot of theoretical education, but I loved freeze spray
Wonder what the chances are that the first iteration of "problem solved!" actually was a lurking problem that would've bitten as well, sooner or later.
Debugging is a skill that is rapidly evaporating. The ability to debug like this is the secret to success in the hardware/physical realm. I’m just surprised it was resistors freaking out under cold temps and not the motor/gear configuration shrinking due to temps. You don’t often hear of electronics changing behavior in colder temps, but mechanical for sure.
I once lost about a month to a USB issue reported in the field on some custom hardware. Spent several weeks failing to reproduce it (setting up multiple machines to automatically hammer through typical usage). It eventually transpired that the issue correlated with cold temperatures and the recent outsourcing of assembly had resulted in some poor soldering.
It gets worse; some issues relate not just to a specific tempreture range but also to whether that range is approached from above or below .. and at what rate of temp change.
the field of engineering exists so that these types of problems are known by nature of what subfield you are working in so that you can use your expertise at the time of design to avoid such problems. COTS (TM) isnt a scapegoat either. dont buy bad components. electronics and mechanics are hundreds of years old. you should have seniors in your company that know which vendors are reliable. if you are a startup, you still should have attended a university that explained the meta to you. im sure most software engineers who think they can just use some random lib written by some geek and inherit the bugs from it will disagree
There is no mystery or surprise here. It’s basic functional qualification. You buy or make a thermal chamber and cycle release versions of your device before you ship one. This isn’t some uncatchable mystery, they just didn’t test adequately.
Depending on the product size and cost you may also do this to every individual robot off the line. This is not uncommon.
This isn’t “hardware is hard” this is “we thought it was software with screwdrivers.”
You are 100% right, but I would not be as dismissive to the engineering team.
When you run a hardware startup, you can only hope for an experienced team that would do everything by the book and implement best practices from the very first production unit. Reality is: that’s a luxury for most hardware startup teams out there.
Typically, there’s a frantic rush to get your device to market that you simply skip, or more likely don’t even have time to think about stuff like climate chamber cycling.
One thing I’m almost sure of: these guys have learned something — the engineer’s way. Good chance there’s a guy there googling climate chambers to ask the CEO for a budget to buy one.
>>Typically, there’s a frantic rush to get your device to market that you simply skip, or more likely don’t even have time to think about stuff like climate chamber cycling.
And that, right there, is the difference between a company oriented around myopic management vs a company oriented around robust quality.
Any company trying to build a quality reputation would spec this stuff out AT THE BEGINNING — what are the operational requirements, what loads will they see, in what environments will they run, etc??? Then spec every component, and test the whole lot against those requirements. Sure, this is more like the dreaded "waterfall" vs "agile", but the result is a quality product from the start that has far fewer of these problems (because they did this whole test & fix routine at the prototype or Alpha test stages), rather than showing up with stories like this of how they recovered from customer-reported problems.
If you're telling your customers that they're the Alpha testers because they get early access, fine. If you're selling it as a finished product, then we know your company isn't prioritizing actual quality.
> If you're telling your customers that they're the Alpha testers because they get early access, fine. If you're selling it as a finished product, then we know your company isn't prioritizing actual quality.
Sounds like Tesla... although it is public knowledge there that you are still alpha testers years after a model was introduced.
> When you run a hardware startup, you can only hope for an experienced team
If you run a hardware startup and fail to acknowledge that places like Alaska or Ontario exist, you fail before even getting close to merely inexperienced. The most charitable word I can think of is "myopic".
That seems reasonably accurate on the high end for me. For _sustained_ use, iPhones generally stop working well around 100-105F.
I suspect I would be disappointed by the performance of an iPhone that could operate in >125F temperatures (temperatures which I have worked in outdoors for several years)
I did not say the documentation is wrong. I am saying that it is not only startups that have trouble imagining their product being used somewhere other than Silicon Valley.
Ah yeah, the first part of my response was just fluff.
I’m undecided if they lacked imagination. I work with a lot of electronics that go in oil wells and we have to make different models for different temperature ranges. We usually have to sacrifice a lot of functionality to gain high operating temperatures.
I think it’s possible Apple decided not to serve the markets which need high operating temperatures, rather than simply didn’t even think about the possibility.
It affected me, when I worked outdoors in Saudi Arabia and UAE and even Houston. But I probably would still buy a high performance, high battery life 13 Pro Maxover a hypothetical lower performance 13 ExtremeEnvironment edition.
Operating at 125F ambient would require a whole lot of sacrifice on the power envelope and much greater overall size of the product. I like big phones but my current Pro Max is about as large as I’d like a phone to be.
I realize that there are physical limitations of the technology. I don’t have to go to Saudi Arabia for my phone to overheat. I have to be careful where I keep it in Florida.
It all looks so simple after someone else has debugged the problem and figured out it was temperature, that it's just too tempting to distill it down to some dismissive statement like "fail to acknowledge places like Alaska or Ontario exist". But ultimately it's about unknown unknowns. Sure, you can spend massive amounts of time and money trying to make the known unknowns into knowns (higher background radiation, higher cosmic rays, lower/higher air pressure, higher sun intensity, camera flashes, and so on). But even if you do this for a number of things including temperature, there will still be bunch of factors you won't be preemptively testing. In which case respecting the general lesson will come in handy, even though it's been explained by someone who failed to do testing that you consider routine.
This isn't about unknown unknowns. The blog post made it clear that the team did not anticipate temperatures to change much in their surroundings. That is a very narrow-minded view: places with around-the-year stable temperatures are the exception, not the rule.
I was born and raised in a place where the annual temperature range can be 90°K. From -55°C in the winter to +35°C in the summer. That is not even an extreme range, there are well-known places with large populations (>10MM) that can experience >100°K differentials.
Perhaps more importantly, temperature changes of 40°K in less than 24h are not uncommon. Daily deltas of 25°K are experienced several times a year.
If a company builds hardware, and they don't factor in a routine impact of thermal effects, they really have no excuse. Choosing to ignore these can be a valid product development or marketing strategy, but not being aware of them is nothing short of myopic.
I too am accustomed to weather, although not nearly as dramatic.
My point is that turning things that are unknown unknowns (from the point of view of the company) into known and checked for possibilities takes concerted effort, time, and money. It's easy to look at a problem after it's found and post-facto determine how to find the same problem quicker next time.
I agree that temperature is very basic, low hanging fruit. Especially for a device that seems to be aimed at operation by construction crews. But regardless of where you draw the line on testing environmental factors, you have to draw it somewhere. And so you will still end up with unknown unknowns that escape your QA or debugging process, sending you down the same path of needing to question your assumptions to figure out what's going on.
(Also you're not really giving them the benefit of the doubt here with this assertion that they didn't take temperature into account at all. It seems that they at least looked at the part temperature ranges. And it'd be courteous to assume that they did power dissipation and temperature rise calcs. What they didn't do was component or integration testing at varying temperatures.)
I’ve worked for a hardware startup. We didn’t have the budget for a thermal chamber, but that doesn’t mean we didn’t test and anticipate temperature related issues. We had plenty of setups like the one in the blog with units in the fridge or on a car dashboard on a summer day.
The difference is we did it before the units reached customers.
We skip stuff like the temperature chamber all the time. The difference is that we know what we're skipping and why. This can easily make for a "10x" team: we are experienced enough to know what we can get away with, and experienced enough to know very quickly what happened when we fail to get away with it. (So it's often a pretty quick and direct fix! Well, that or a C-level "you declined this part of our proposal so now it's failing exactly like we told you it would so you're looking at this big of a redo, exactly like we told you...").
I'm arm-chairing a bit. This is the type of problem I'd expect on a prototype, but not on a production level device - especially on a construction robot that will be clearly out in the weather.
It reads to me like they didn't properly spec/source components that were appropriate for the weather conditions these robots are likely to see. The fact that this issue was reproducible at refrigerator temperatures is even more shocking. 39F (taken from a photo) is not very cold.
Speaking very broadly, because it's not all standardized and many companies have their own definitions, in degrees C:
Range Min Max
Commercial 0 to + 70
Industrial 0 to + 85
Automotive -40 to +105
Extended Automotive -40 to +125
Military -55 to +125
But that's for the parts themselves, and just working at that. Weird things can and do happen at temperature extremes. If you're very lucky, they're documented in datasheets/manuals. If you're lucky, they're known enough to be in white papers or industry presentations, or known to one of the companies that do this stuff for a living. If not... well, hope someone's heard of it, or you've got a huge testing budget....
Agreed. In many more regulated industries (medical devices, automotive, aerospace) the kind of testing you mention is required by law. In all cases it's good form, good practice to test your product to make sure it works as promised by its labeling. Typically you put an operating temperature and humidity range in your manual or labeling. In many industries testing to those operating parameters is required by law.
Completely agree and will continue snarking. Chest freezers, dry ice, pie warmers are cheap. The fridge+freezer and oven in your company kitchen are free. HALT/HAS, 4-corners, thermal cycling and thermal shock testing are all things.
They wasted resources on fixing what they "thought" was the problem. That money would have paid for some nice chambers and other equipment.
Next thread: "We learned our robot doesn't work when sitting in a shipping container at -40C for three months because customs was being difficult."
Thread after: "Hardware is hard! We learnt about EMC compliance."
Or even just drive down to Lake Tahoe and find a winter construction site and try it there.
This is a general problem with all these California-based companies and inventors (especially SV inventors cranking out crowd-funded bike stuff.) They seem blissfully unaware of things like cold weather, water, dirt/mud, and road salt...or combinations of them. I laugh at all those stupid fucking delivery bots because they'll fall apart anywhere there's snow, and get completely stuck on the slightest bit of ice.
For many years, driving a Model S in heavy rain would cause water to get into the drive unit via either seals or vents that weren't sufficiently designed to keep water out. It "totals" the drive unit, causing corrosion of the motor control boards. And Tesla denies warranty claims on such repairs, because of course they do - just like they did on the windows that randomly shattered in parked cars.
Raise your hand if you've owned a car that had problems with water ingress issues affecting its transmission. Or windows randomly shattering.
I have owned a couple of Philips bread makers. Basic ones and expensive ones.
If you make sourdough with them (ie bread) the coating is stripped off the bowl and the stirrer corrodes.
They will deny replacement and claim you sprayed something acid on it. Yes, fermentation is acid, but they don’t believe bread would damage their unit.
Based on the pictures of the robots, it looks like they were intended for indoor use, I guess no one planned for an indoor space outside of 50-90 degrees - which is a pretty reasonable supposition, better than 80% of indoor environments are within that range.
The more I think about this the worse it gets. Trying to give the benefit of a doubt, but its really incompetency, and they decided to blog about it.
If they didn't account for temperature, they certainly didn't account for moisture, and these boards probably don't have any conformal coating on them. I guess its good for the customer that its a robots as a service, but these things are going to end up failing spectacularly in the field. These things are outside at construction sites, they needs to account for the environmental conditions.
> You buy or make a thermal chamber and cycle release versions of your device before you ship one.
That is a total waste of time in a startup with a small number of units shipped.
A component behaving out of spec due to temperature excursions simply isn't that common nowadays. If my system is mostly ADC to digital to DAC (standard for robotics controllers), testing for temperature is a waste until I'm shipping significant volumes.
There is a video of one of the slightly famous YouTubers who has a high voltage thing that fails at the altitude of his lab. The manufacturer did check it for function at the elevation of Denver, but his lab is higher than that. There are limits to how much engineering effort you should put in until you get an actual failure. (Maybe someone can link the video as I can't remember it at this point.)
You can waste infinite engineering effort covering all possibilities. Or you can ship the thing and fix the failures. "Good engineering" is about balancing the two--you need to ship, but you don't want to have too many failures in the field either.
No, often components are dependent on the stress they are under. If you have inconsistent reflow, or lots of rework happening, then each device will behave differently due to thermal contractions.
Testing thermal performance is as simple as going to a restaurant down the street and asking if you can pay them $100 to borrow their walk-in fridge for an hour for cold, and leaving your device in a hot car for an hour for hot. It's also exactly the kind of thing you should be doing if you're selling an actual product to a customer instead of partnering with someone to test your prototype.
>A component behaving out of spec due to temperature excursions simply isn't that common nowadays.
You are right. People are commenting here as if the company didn't know what they were doing. I suspect there is more to the story then what has been shared in a tweet. My guess is that the deployment was done in a place with a temperature differential well outside the component specs/tests and so simply was not known/tested for.
Except that it isn't. They just proved it with their tweet. They wasted a ton of time and money on what they thought was the problem and what sounds like essentially a product recall.
A $5k temperature chamber running over night or weekend would have caught this.
In fairness, it's generally known that a lot of consumer electronics have problems in freezing weather unless they're protected somehow (e.g. kept in an inner pocket) if only because of the battery. Some of it is possibly California designers not being especially focused on the sub-zero use case but it's also not clear how much focus there should be on that use case in general if there are costs (money, physical specs) associated with doing so.
Most people don't strongly mind keeping the phone warm in a pocket. There's a difference between experiencing very cold temperatures and reaching them.
A fellow new hire and I were tasked with fixing a machine that ran in a clean room where semiconductor wafers were made. On weekends while the line was down, we would go in, crank up one section of the line and let waste wafers travel through the line where very rarely they might get stuck in a multi-lane machine that would etch the wafers with some sort of acid.
The machine had over two dozen asynchronous motors, actuators, pumps, sensors and so forth. All generating interrupts and I/O events that were sent to a computer that ran the whole line and controlled all the machines.
We couldn’t slow down the machine, it had to run at full speed. The program controlling the machine was thousands of lines of assembly language—everything was assembly language, including the homemade OS that ran the computer running the line. It took like an hour for us to bring up the line and two more hours to see the machine do something strange.
The computer running all this had no user interface other than a some front panel switches and some panel lights that would reveal 16 bits of it’s 128K of memory at a time. This was in the 1970s before Ethernet had been invented.
It felt a bit like those escape room events where you know there is a solution, but you don’t know if you will ever get out. Without my coworker cracking jokes about our plight, I’m not sure we would have ever triumphed over that stupid machine.