This has been a known issue for some time, and it's amazing Tesla hasn't fixed it yet. As explained in the video, the issue is simply the base linux OS has full syslog's still enabled, which over time burn own the eMMC chip used for storage. Currently Tesla just replaces the entire media unit when this happens, which is extremely expensive for them. In reality they only need to replace a tiny part instead of tossing entire boards.
The GOOD news is this issue is almost fully solvable through a software update. They just need to disable base syslogs, or at least only store logs in RAM while car is running (if they need those logs for debugging purposes, which I doubt). Long term, they should replace this eMMC with a proper SSD, or removable SD card that's easily serviced when it goes bad (as is the case in other parts of the car).
I think this must be an issue of visibility in the company, and the right engineer just isn't aware of this issue yet. I hope this gets more visibility and gets patched in a future software update.
eMMC flash is typically a ball-grid array part and it's soldered down hard to the motherboard of the processor. Almost every mobile device has the same setup and it's tricky and expensive to rework this chip.
You need to carefully desolder it off the board, reball the part with the correct masking stencil, then reapply it without torching the rest of the board, missing a solder connection, or zapping any other component with static electricity.
You can see a similar process done on an iPhone here: https://www.youtube.com/watch?v=y4M9uAZlbK4
You can understand why Tesla would rather replace/refurbish the board than try to field-repair it.
Sure refurbish the processor module and replace the eMMC at a proper rework station but replacing the whole media unit esp when the failed part is on a removable board just seems like a huge waste to me esp when the customer is footing the bill.
In my experience, removable SD wouldn’t hold up to an application like this.
If it were held in and held against some robust contact pins it likely would be fine, even under automotive conditions. There are SIM card slots that are used on heavy vehicle computers and controllers (the ones that run the engine and so on) that are robust enough to hold a SIM card in place for years of excessive daily vibration.
I suspect that designing reliable spring contacts isn't as easy as it seems...
The trade-offs for a high vibration environment are very different.
You are not wrong that a soldered eMMC (compared to a eMMC held by a ZIF or a removable SD Card) would be more mechanically stable and potentially faster (A fast uhs-ii sd card compared to a cheap slower eMMC for example), just saying that Tesla already use removable SD cards in production cars today (which doesn't brick your media unit if it dies or dislodges).
Storing the OS on a removable SD Card is (imo) a bad idea, but if they are really interested in these logs they already have a easier to replace removable storage "drive" they could write the logs to instead of burning up the eMMC.
Replace the board, then you can refurbish it in a specialized lab
Same reason why genius bars won't replace components, just boards
On the newer versions of the Media Unit instead of disabling writing the syslog to the eMMC they added more storage to the processor boards which isn't going to solve the problem just push it from appearing to further down the road.
To say that the choice to use eMMC is the fault is to say that ANY use of flash storage is a faulty decision.
If there is sufficient space on the drive to mitigate write amplification, then the media will last a very long time indeed. Longer than the drivetrain, certainly.
The problem is that Tesla doesn't delete logs it has collected from the cars. All Tesla vehicles (as I understand it) upload their logs more or less continually. Currently, the firmware on the cars doesn't delete those logs once they've been uploaded, presumably so that they're available when it is time to service the vehicle.
The disk filling up due to bad log file hygiene, coupled with the fact that the storage is flash-based, together cause the issue.
A simple firmware update to the eMMC chip could make it use a wear levelling algorithm that really does spread writes evenly over the whole chip.
Current implementations simply use a sector till it goes bad and then using a spare till that goes bad, etc. After 1% or so go bad, the whole device rejects writes entirely.
With a decent implementation (like ssd's mostly do), you will never be able to kill a card in a lifetime.
Do you know that they toss the old media units? Are you sure they don't swap the entire media unit to simplify service procedures, and have have the old units inspected and refurbished (i.e. eMMC module replaced) for future repairs?
That sounds way more efficient than training all of the shop workers to diagnose and swap out individual parts of the media unit.
most of the time me and the other firmware folks were chasing elon's whims about what to do with firmware. where i should have been fixing critical issues in the system i was pulled off to do shit like add farting unicorns
The linked reddit post is from a user who claims to have worked for Tesla in the past, and is (presumably) not using their "real" reddit account to post it.
I don't read it as an insult.
Gotcha. That makes sense. Slashdot was never my thing. Thanks for clarifying; I had read it as bafflingly unjustified aggression.
Not sure of the benefits of an SSD over an eMMC. Guessing the former will have a more sophisticated FTL and other things.
But there's still the problem of a friction fit connection in an environment with a lot of vibration and contaminants. Seems like these would be good for use cases where the part needs to be removable and also have high write cycles, like a dash cam or data logger that has no other interface to a PC.
Also the eMMC interface exposes quite a bit of the underlying hardware (see mmc-utils on Linux). With SD cards I think you're generally stuck with some kind of SMART interface.
Am I the only one thinking about the recycling and cost implications of doing this?
I'm thinking they would have all the labor of taking it out, and all the labor of putting it in, but then much additional labor and repair time incurred by doing the component level repair instead of just grabbing another radio off the shelf. Recycling would be possible on the unit later and it could be installed as a refurb unit in another vehicle.
That's not to entirely excuse Tesla. If you have a modern operating system running from flash that needs to work for 20+ years as people expect from a car, it needs to be very carefully designed - all logging and runtime data written only to a RAM disk, system and user data on entirely separate partitions, etc.
Note that this doesn't excuse Tesla here, since the situation I discussed is very rare, normally if that module fails the vehicle will still start and run. Tesla engineers should absolutely have been aware of this issue, as pointed out up thread there are multiple tutorials for ras-pi SD memory preservation, and I have trouble believing a competent EE shouldn't be aware of life issues due to eMMC. It also shouldn't brick the car, normally automotive electronics are designed very carefully to avoid single points if failure, with fallback routines and safety "limp-home" modes in case of problems.
Wasn't preventing this one of the design goals/selling points of CAN?
Near as I could tell from my scope, the APIM was spamming the bus with exactly the right frequency to interrupt the ECM during it's scan of critical sensors. It was an extremely rare failure, and to Ford's credit they covered both the repair as well as my shop's diagnostic time.
edit: To make it clear, I have seen 2 vehicles that still operated with a direct CANBUS short to ground, as well as a vehicle that had CANBUS shorted to 12V+. In these cases, aside from expected failures (such as the BCM systems not responding, or transmission limp-home), modules were able to fall back into either safe states (limp-home, in the case of the TCM) or just a dashboard warning light (in the case of BCM no-comms).
Thanks for the anecdote!
The car won’t run when the eMMC chip fails, and Tesla solution is a new MCU board which costs $2,700 out of warranty. Not surprisingly Tesla is not desoldering and reworking just the eMMC chip.
There are any number of components that can disable a car, from the battery to the starter to something with the ignition, electronics, anti-theft, etc.
Sometimes the repair is as simple as a new battery, sometimes the repair is an expensive piece of hardware.
Now if the article was that Tesla Model S has a chip which wears out and the board holding the chip is expensive to replace, and maybe even getting into why don’t they push a software update to lower the writes to that chip — I would not disagree.
The article falsely claimed the cars are “bricked”. As it’s the main thrust of the article, it should be retracted.
> However, until the company starts stocking parts like the eMMC chip, as well as release detailed service manuals to the public, Tesla is going to be looking at a number of newish cars dead in a junkyard real soon.
They should stock a chip which is soldered to the board, and what, do reworks? That’s asinine.
Newish cars dead in a junkyard? Totally false. It’s an expensive repair for a problem that could have been avoided, and hopefully Tesla will remedy with a firmware update.
This is the board the chip is on, which Tesla does not offer to replace: https://share.icloud.com/photos/0u78AylGb9fv8QHQnyr9IC3nA
This is what Tesla will offer to replace: https://share.icloud.com/photos/0MVG8edJheaHYiEp7yWKkVePg
Is that what you were imagining?
Why a full memory chip logging data which is never used should brick a car is simply price gouging.
There is no need for it whatsoever.
Thousands of dollars to replace a board because of a full memory chip? That is even worse than Apples repair racket, at least in Apple's case some other circuitry besides memory is faulty - https://www.youtube.com/watch?v=2yJKix17yYE
1. Log something they never read back
2. Use a soldered down chip (they use a SD card in other locations)
3. Crash the media controls when the unused logs cannot be written
4. Disable certain car features, potentially immobilizing the vehicle because the media center is off.
There are several ways that Tesla could have prevented this issue, and the fact that they've never bothered to resolve the issue in later iterations is just baffling.
I suspect "we can fix it in a software update" lowers the priority of actually shipping this design.
and then add in "autopilot this year" and "model y coming soon", you get indefinite postponement.
There is no bad chip on the board.
A full chip is not a bad chip, and there is no reason why the board should stop working on account of a full memory chip which contains a syslog that is not used for anything.
Hacker News is riddled with shills and apologists for atrocious corporate behaviour.
(Incidentally, not a shill or shareholder, but rooting for tesla due the fact GM pushed for leaded gasoline and ford would rather litigate than fix dangerous vehicles. We're rooting for the underdog here hoping for global positive change.)
I don't think this is just me but I believe cars should be able to last longer than 4 years and not get bricked by fault of the manufacturer and then not at least easily give me the ability to fix it myself if they won't.
 - https://www.motor1.com/features/253277/comparing-new-vehicle...
The article is wrong to say these cars are bricked, but it is not a small issue, either.
Is Telsa not fixing things or did they just use something that had a short life span? Not sure why I read this article.
What makes this somewhat egregious is the lack of foresight, as well as the fact that Tesla uses standard socketed SD cards in other modules in the vehicle.
Replacement cost of the entire board is unfortunately high at around $2,700.
 - https://teslamotorsclub.com/tmc/threads/2700-to-fix-mcu-migh...
>I think we are on drivetrain #5 and it is now that one clunking intermittently, so maybe we take that one back in. We are on battery charger #2 and battery rebuild #3. We had most of the upgrades (wind noise on the door windows, etc). We had a left rear door handle start to go bad, took it to the shop and they fiddled with it and wanted to replace it, but afterwards no problems so we did not replace it.
Drivetrain #5, charger #2, and battery #3. All under 125k miles. Wow I’m not so sure I’ll be buying a Tesla for my wife.
Time will tell with the Model 3 how reliable it is. A major selling point is the reduced complexity of an EV. The Tesla drivetrain and battery is now widely considered the world's best and most reliable. Tesla has claimed the Model 3 drivetrain can be re-used on the semi and that it's tested to 1,000,000 miles . Take that with a grain of salt, but it's a world different from a 2013 Model S.
 - https://www.greencarreports.com/news/1101153_two-thirds-of-e...
 - https://cleantechnica.com/2018/10/16/tesla-model-3-motor-gea...
EDIT: Also, I don't know how it was for dealers, but 3rd party repair shops just throw their hands up for those systems.
Tech scheduled a service appointment for 4-13, much better than I could find in the app. On 4-4 Tesla pushed a FW update, which I guess is pretty common before a service appointment. On 4-5, a Tesla rep reached out and said the issue WAS in fact, with the original FW update and that the more recent FW update had resolved:
We have reviewed your vehicle logs and alerts. At this time we do not see any issues with the Autopilot system as well as you confirming that the system is operating as designed. Since this is the case there will be no need for you to come in for your scheduled appointment. While there is no way to say that you will never have an issue again, at this time there is no issue that needs correction. If you do have an issue in the future you are always welcome to create another appointment with your concerns.”
“Ok- I’d still like to understand what happened. It is generally acceptable that the universe is getting more disorderly. It is perfectly acceptable that something goes wrong and we don’t know why. It’s significantly less understandable, that the issue mysteriously resolves it self. Does that make sense?”
It was found to be a software bug in the last update that was performed causing your cruise control not to function. That was resolved in the last update you performed.”
The rep then canceled the 4-13 appointment without further explanation or communication.
The issue returned yesterday (5-13) and now the earliest appointment is mid June.
Cruise control existed in 1958. I get that Tesla has “tight integration” but the lack of a fall back/degraded performance mode seems ridiculous. This is before considering the lack of NoAP and FSD, which adds insult to injury- Those features represented nearly 15% of the vehicle cost in my case ($8,000 upfront).
Convenience in LA traffic was a primary consideration when purchasing the Tesla. The lack access to purchased features is irritating, but I knew they were bleeding edge. The further loss of a ubiquitous feature like cruise control is even more irritating. Tesla’s handling of the situation feels ridiculous.
Call centers and customer service departments in general report statistics that are used to measure their performance. Individual employees are instructed to "resolve" tickets and to make sure that no single ticket take more than a certain amount of time to be resolved. This creates an incentive to look at the tickets in a first-in first-out order and find a way to close them. An explanation was selected from a list because that explanation has gone unchallenged when addressing similar tickets. The appointment was cancelled and the ticket was recorded as having been resolved by customer service and software. This is a win for Tesla's internal reporting on the number of tickets that require a technician to physically interact with the vehicle. They can brag about most issues being resolved by software because they're a cutting edge company and are above crude hardware fixes like those other boring car companies. If you show up in person for repairs that means, in their view, that customer service has failed.
In short, this isn't Tesla being exceptionally bad or exceptional at all. It is an example of Tesla doing the same old thing that all the other companies have been doing.
My worst car ownership experience was yelling at the people at JiffyLube for replacing filters that were in perfectly good condition and trying to charge me for it even though I only asked for an oil change (they then made a big show of shoving them back in as violently as possible). As awful as mechanics tend to be, a component being too easy to repair or replace by anyone who can read a manual is better than begging a single entity for repairs.
At the same time, it is pretty amazing they are remotely logging into your car to review system logs and diagnose the issue, and then pushing firmware updates to try to remedy the problem.
There may be no choice but to drive back to the service center to try to get a quicker response from Tesla. At some point if they can’t fix it there could be a remedy under the lemon law — they only get 3 chances and a certain number of days AFAIK.
I can understand how a single fault could disable the adaptive cruise and likewise all of AutoPilot functionality. But Tesla is expected to be able to fix the issue promptly.
It really isn't if it didn't fix the problem and they used it as an excuse to not follow up with their scheduled appointment.
This would be like if I was fixing a bug, SSH'd into a live instance, didn't fix anything, then filed the ticket as "fixed" and refused to follow up on it. That's not amazing, that's pretty sketchy.
My boss called me around 9pm saying WTF. I started stalling, put him on speaker, then remoted in and fixed the strings (this was on my Samsung Blackjack running Windows 5- I think). I told him to check again because it was working for me. He gave an incredulous “huh” and said see you tomorrow. I thought I’d gotten away with it. He called me in first thing the next day- told me straight up: I get that things break, they don’t tend to fix themselves. He gave me a chance to own up, and I didn’t. I was young, and dumb. Said I don’t know what happened. Couple years later I realized he knew. I’ve since learned that it’s almost always better to own up. Even if you’re never caught, most people will respect the honesty. Those that don’t probably have their own issues.
nemosaltat didn't provide a complete chronology, but did say they had a service appointment on 4-13, received a FW update on 4/4, and a rep contacted them on 4/5 saying they reviewed the logs and the new FW fixed the issue.
Then nemosaltat responded to Tesla;
> Ok- I’d still like to understand what happened....It’s significantly less understandable, that the issue mysteriously resolves it self...
> It was found to be a software bug in the last update that was performed causing your cruise control not to function. That was resolved in the last update you performed.
> The issue returned yesterday (5-13) and now the earliest appointment is mid June.
So the issue was fixed, and then it returned. It's unfortunate but hardly sketchy.
The sketchy part for me is the inconsistency between the Service Tech and the rep who canceled the appointment. The tech (HW flaw) was very convincing, and the rep (SW flaw) was very vague. The functionality did not return immediately after the FW update, and I think the timing may have been coincidental. It’s even more frustrating that they weren’t/aren’t the least bit apologetic or accommodating. I can’t seem to get a straight answer on why basic cruise can’t be restored via a FW patch while I wait for my appointment. Is the same hardware/software required for Cruise on Model 3s without NoAP and FSD?
On the surface it doesn’t make sense that a FW issue would be causing an issue with your TM3. What about the other ~200,000 cars on the road?
But to answer your last question, yes, it’s layers of incremental software functionality on the same hardware stack. Without knowing at which point it’s failing it’s hard to say if cruise could be enabled without AP.
As a point of comparison, I had my TM3 wrapped in protective film. As part of that process they removed trim including the side cameras to lay the film flat with fewer cuts. The shop (not Tesla affiliated) left the side cameras disconnected. Upon starting the car it reported an issue with side cameras. AP worked but degraded — wouldn’t change lanes, and cruise still worked fine. Some some hardware faults do not fully disable the feature.
That’s very interesting, and closer to what I would expect.
The newest development, when I drove in this morning, the Service Tech said it’s a seatbelt sensor malfunction.
There is a “driver seat buckled” interlock for cruise control, but the alert for that explains the reason. I’m just getting “cruise not available” with no explanation.
I think that service tech was grasping at straws, unless he actually examined the logs and saw it written there. If it was the seat belt, wouldn’t you see it indicated on the display that the seat belt was unbuckled? It displays the unbuckled indicator for all seats that have occupants detected (weight sensor).
Presumably the logs will contain the exact reason it fails? Next time they text you maybe ask if they can tell you exactly what the log file says when AP engagement fails.
If they could pull your logs, your emmc did not fail.
(not that they are handling other issues well)
I guess you'll just have to drive with your phone.