Hacker News new | comments | show | ask | jobs | submit login
Former Tesla Firmware Engineer Discusses the System (twitter.com)
948 points by swalsh 3 months ago | hide | past | web | favorite | 555 comments



I used to work at SpaceX on the team that did a piece of software called "WarpDrive". It was a massive monolithic ASP.NET application, with large swaths done with ASP.NET WebForms and a slow frontier of ASP.NET MVC gradually growing when I was working there. This application was responsible for practically everything that ran the factory: inventory, supply chain management, cost analysis, etc. Elon is a big Windows fan and pushed hard to run the whole shop on Microsoft tech. Thankfully the rockets fly with a heavily customized Linux install.


I used to work at SpaceX on avionics software, in a role very similar to OP, and my experience was similar in some respects.

The tech and products were complex. The turnover rate was high and training new hires was a lengthy process. The new projects coming down the pipeline never ceased (this was during a period where FH/F9-1.1/Dragon/Crew was all under design/development and constant iteration).

It was fun for a young engineer, but burnout is real.

*WarpDrive was actually pretty impressive given the amount of stuff that it did.


I heard the same thing about the burnout and turnover. Do companies really think they are saving money by paying peanuts, grinding people down to burnout, and then constantly having to rehire/retrain new people as the old ones leave? Meanwhile the code is a mess because nobody has been there longer than a year and there is no architecture or design continuity. Just frantic patches over other frantic patches by burnt out junior engineers. It makes no sense!


I think it's a case of what is seen vs. what is unseen combined with "you get what you measure."

"We're paying more for programmers than almost anyone else in the industry!" is an obvious thing a bean counter would notice and point out. Productivity is harder to measure, and all that time lost to training on the job doesn't immediately leap out in spreadsheets because it's blended with actual work.


> "We're paying more for programmers than almost anyone else in the industry!"

I can't speak to development specifically, but I looked into one of Tesla's devops offerings and I make slightly more for a mid-level engineer than they offer for a senior position. I also 'only' work 40-hour weeks and my cost of living is about 20-30% lower than there.


This is a real problem. But if burndown charts can be translated into dollars, this might be noticeable.


It can and it has been for academic reasons.

But no employer tracks that, not even in academia...

I can’t find the reference I’m looking for, but cost associated with turn over rate due to narcissists is apparently as high as taking care is ASD. I would imagine turn over rate associated with burnouts not due to narcissism, but simply poor & short term management & vision to be at least on par.


Academia is the worst offendor when it comes to overworking junior people like teaching assistants...


“Do companies really think they are saving money“

Well they landed a rocket on a barge in the ocean so something about that model must be working right


Looking at what's been done isn't a good way of determining whether your process is efficient at doing said thing. You need to be able to compare it to something else.

To put it another way: If your method of writing novels is to hire an infinite amount of monkeys and put them to work on typewriters, you can't say "Something about this model must be working right, I came out of it with the complete works of Shakespeare!"

They landed a rocket on a barge in the ocean. Maybe with a better process, they could have done that two years faster, for 1/100th the cost, with no burnout. You don't know, and you can't say the model works right just because there's something to show for it.

All you know is that the process is able to eventually land a rocket on a barge. It doesn't tell you whether it's good at it.


True, but they also did that while maintaining the lowest launch prices in the industry and presumably they can now decrease that even further if they need to.

So while it may not be the most efficient process, the overall process is much better than their competitors since they can launch for so much less money.


At the end of the day that still doesn't really answer the question: "Do companies save money by mistreating employees?"

I don't think you can answer that question by just looking at Tesla.


I'm not sure that's quite a fair way of wording that (legitimate) question.

The whole company seems to be operating in the "burning the candle at both ends," not just the workers at the bottom. Also, it's not just "saving money" but pushing super hard to accomplish something extraordinary, i.e. generating new revenue, not just reducing costs. Additionally, the workers are partially compensated via stock options, so they share in the success of the company even if not through higher wages alone. So I'm not sure "mistreatment" is the right word to use.

At the end of the day, SpaceX (and Tesla) are not for everyone forever. I am not in a station in life to want to join right now, but may in the future. And maybe this strenuous effort is not especially profitable for SpaceX because of the churn that it creates. But that churn IS helpful for the industry (and thus, in my opinion, society) at large because it has spread SpaceX's know-how throughout the US aerospace community and resulted in alumni founding probably dozens of companies that can leverage the lessons learned from SpaceX. But some people work well in that environment and stay long term (which isn't to say it can't be improved).

So I am glad SpaceX is the way it is, and I hope they're successful in the future. But it also doesn't have to be the model for everyone else to copy. It might not work for everyone else, nor should it be expected to.


”Additionally, the workers are partially compensated via stock options, so they share in the success of the company even if not through higher wages alone.”

I wonder if the constant burnout and churn keeps employees from vesting and thus ever collecting much if anything in stock?


Of course you cannot answer that based on a single companies culture.

I'm refuting the statement that "saving money by paying peanuts, grinding people down to burnout, and then constantly having to rehire/retrain new people as the old ones leave" is unanswerable in the current context based on 1 company especially because this company seems to be destroying their competition.


Part of the fun of the monkeys/Shakespeare mot is that it's completely inapplicable to the real world, and thus absurd. The point of the "landed a rocket on a barge in the ocean" response is that whatever methodology SpaceX uses is not in that category -- it's an existence proof that what they are doing works in the real world, and is thus not absurd.

If they have a process proven to work, in a world where they are already doing things no one else has been able to do, changes to that process should be introduced very slowly.


Or maybe that is THE SMART THING. Realizing that you need resources to accomplish it, and throwing money would do it for you.


Monkeys have no salary and no rights. It's more scalable and easier to manage than regular employees.

Get a lot of monkeys and put them at work. They will produce something. Better have something than have nothing.


That's a bit like trying to use Apple ][ machines for cloud business.


Although your simile is reductio ad absurdum, a version of this idea does, occasionally get surfaced, in the form of using (modern) small, lower-power processors (e.g. Atom, mobile ARM) in very large numbers in the datacenter.

While touting the purchase cost or energy benefits, these ideas routinely ignore the overhead cost inherent in a distributed system, let alone the Fallacies [1], which is the GP's and OC's (and possibly your) point, I believe.

[1] https://en.wikipedia.org/wiki/Fallacies_of_distributed_compu...


Does paying peanuts here mean—not very top of market? Or?


i assume it means an above median annual salary but a very poor hourly rate. usually these discussions omit a calculation of to what extent the engineers took the job knowing this already and still decided to do so of their own free will, and how much is... not so voluntary.


I think it's the converse of the idiom "working for peanuts", which means to make a relatively small salary (see https://idioms.thefreedictionary.com/work+for+peanuts)


Yes, but I don’t think that’s accurate in this case. Unless the “relatively” part is crafted to be an unreasonably narrow comparison group.


True. From blind, I think it matches average startup pay. However, that does not account for the fact that most people work very long hours. My brother started off with 80 hours weeks and then came down to 60 hour weeks when he got into his groove at tesla. So when you account for that, it is 33% below average startup pay (assuming other companies are doing 40) which itself is below big tech pay.


I don’t think other companies are doing 40. In fact, I don’t know anyone making six figures that does 40. (I’m not saying no such jobs exist, just that no one I know works in one of them.

If other high end software jobs are paying the same for 45-50 that Tesla is paying for 60, that’s on the low side hourly, but the low side of the high end.

75% (45/60) of 150k is still $112.5k plus presumably good benefits and some sort of equity component. That’s damn fine compensation for someone fresh out of undergrad even in 2018.

I wouldn’t want to work 60 indefinitely even for great pay, but that’s a separate issue.


> I don’t think other companies are doing 40.

Maybe I am just lucky but I have worked for 2 of the major big tech companies and came out of college making 100k+ while primarily working 40 hour weeks at both.

> 75% (45/60) of 150k

Tesla new grad software engineer total comp is 150k? Damn, in that case they are pretty close with big tech (amazon is 145k and G/FB is 165k from what I have heard). I assumed it was lower since my brother was a PM with 6 YOE and got payed 130k a year.

> That’s damn fine compensation for someone fresh out of undergrad even in 2018.

Oh totally, my girlfriend is probably going to make like 60k out of grad school. However, while it is much better than anyone besides what my finance friends are making, that does not mean they are paying well relative to the tech industry.


Sorry if I was being unclear. I have no inside knowledge and just wanted to throw some numbers out there so we didn't continue to talk past each other. If Tesla is paying $120k to fresh CS grads and expecting them to work 60 hour weeks, I'm still not sure I'd say they were "paying peanuts" but it's at least getting there.


Capitalism. Need for money and results. I don't think that anybody wants to burnout anybody, it's just the pressure of the whole system I guess. Money pushes to do things fast. Shareholders and clients to keep happy.


> Shareholders [..] to keep happy.

In my opinion, this is the biggest flaw of the system. For many investors, a company is less about what it makes and more like a process to grow their money. Even when a company becomes profitable, there's always pressure to make it even more profitable quarter after quarter.


> Even when a company becomes profitable, there's always pressure to make it even more profitable quarter after quarter.

Would you keep your retirement funds in a company that doesn't grow?


Sure, as long as it pays good dividends.

Not everything needs to grow into the sky.


You do get the fact that a company with no growth but paying dividends will be worth (including the dividends you extracted) exactly what you paid for?

It would be worse than buying a bond: you'll get the risk of equity with the returns of a bond.


Not an economist, but I have tried to keep informed and what you say doesn't seem like it should follow.

(Then again English is not my first language.)


The question of course is whether the high turnover actually is the most efficient way—you're spending far more employee time training them (both learning the company's approaches at a high level, the problem space, and their way around the code base) than you would if you had higher retention.

While you might need to pay more and improve working conditions, would you be more efficient given the individual staff would spend more time being productive?


Burning out people at large rate is not effective and persistent crunch is contraproductie. It is less of capitalism and more of wishfull thinking combined with wish to be seen as tough manager.


I'd say it's the pace in which modern capitalism operates at, particularly in tech. When the rate of production goes higher and higher due to automation and a global outsource-able workforce, the shareholders chase after higher and higher profits.


Tesla can't outsource globaly and does not have profits yet. Effectivity is about whether burnout brings higher profits, not whether people unable to gain them burnout themselves or those under them due to wishful thinking.


Nothing has changed, and ironically, this whole part of the question - is it better to burn people out and rehire - is a capitalist question, and answered with capitalist goals.


I really don't care if they "don't really want to", they're still actively choosing paths that lead to that, and don't really care.


Former SpaceX Flight Software Engineer here.

Agree about WarpDrive being pretty amazing for all the stuff it did. Although amazing things tend to just clump up from all the features that you need, and you end up with an app that is hard to manage.


As someone who interfaced with the Warp system (both software and the organization), I agree with @cbanek. The number of features and the custom nature of it are impressive, but for anyone who had to deal with the politics of improving the system, it was a nightmare. There was (continues to be?) an effort to deconstruct the monolith, but it was a very painful process. Motives from different departments were in constant competition. Getting something done in Warp meant calling hours of meetings and getting the ear of a director/VP who would champion your cause -- and an associated PM that would be ready to serve said VP. This process was so backwards that, if the actual technical work that went in didn't burn someone out, the politics sure could.

It wasn't all bad. There were other groups at X that provided pretty amazing tools for people to get things done (thanks, @cbanek and friends!)


Elon nearly brought the newly merged X and Confinity to its knees during a critical period in its development by 1) insisting on the "X" brand when Paypal was more popular with users, and 2) insisting on Windows over Linux despite the protests of his tech team.

Soon after he was ousted and Thiel was made CEO. Interesting to see he's still pushing Windows.

Source: Paypal Wars by Eric Jackson


read this book too. The insistence on Windows blew my mind, given his "reputation" as an engineer.


The argument (at the time) was that Windows C++ tools were better than anything on GNU/Linux of the time. Also, in most non-CS fields (aka "real" engineering fields ahem) Windows, for better or worse, is still heavily entrenched. I have mixed feelings on this, as I am a Unix junky but it requires a lot of arcane knowledge to be effective, which many folk don’t have the wherewithal to acquire.


> The argument (at the time) was that Windows C++ tools were better than anything on GNU/Linux of the time.

That's still true today and it will probably always be true unless Microsoft ports Visual Studio one day. It's not a super great reason to choose it as an operating platform though. :\


There are many engineers who actually like Windows. Especially if they're not software engineers. A lot of industrial equipment runs some embedded Windows.


I would risk stating that all non software engineers have to like Windows because there is no software for them runnig on linux. All cad cam programs, ms Office, etc...


Thinking from first principle you can reason why Elon pushed for windows. Elon is a guy who wants to get done. On windows you get a linear output with time spend on it. On Linux the output is exponential. Relative to windows lot of time is required for entry on Linux.Since Elon is man of output he did not like to spent so much upfront time. Also he don't want to let go the control as he is micro manager.


This comment is complete gibberish.


you know what's non-linear and very stochastic on Windows? an OS update.


I don't think you know what you're talking about


This has to be a parody of a techbro.


Okay I will give another try. Elon was doing PhD and discontinued while he founded his first company in 90's. Since computers was not his main interest he used windows to the get the work done because windows is easier to learn and start getting output. From then on elon was crazy busy to do lot of other things and never got time to learn Linux . At the same time Elon wants control over everything which means at any point in time he wants to understand what the software is doing and even be involved in making changes to it if required first hand. Now with this background obviously Elon went with windows because he don't have to learn Linux and can use his already learnt windows knowledge.


> the rockets fly with a heavily customized Linux install

My jaw hit the floor the first time I heard this. Why Linux instead of an RTOS?? Apparently Tesla's autopilot also runs Linux, which seems like a huge accident waiting to happen (pun intended).


As a person who used to write navigation and control software for autonomous vehicles who occasionally gets downvoted for telling people that you really don't need a fancy RTOS for this stuff, you really don't need a fancy RTOS for this stuff. Linux is a very common platform for highly responsive robotic systems. I promise that their pid control isn't an Electron app.


As someone who worked on RTOS systems and now works on autonomy.

These things frighten me everyday. What frightens me even more, is the people who work on autonomy without a real grasp on determinism. It's unfortunate that the people who have the most high tech backgrounds (phds in computer vision AI, etc) applicable to autonomy, have never implemented safety critical autonomous systems outside of a research project that tested some aspect of detection or control in a test environment and had to only work once to get a paper published.

For general robotics linux is great. But there is an enormous difference between a robot roaming around your house bumping off walls, and a vehicle carrying a whole family at 70mph.

Most of the linux based systems I have worked with have some form of redundancy, whether it be other chips running linux, or ideally ECU's running an RTOS that perform monitoring, gating, and/or some level of safety fallback control. The RTOS based redundancies often are what provide ASILD. Trusting a single linux processor is what everyone does to get funding, but when you go out and test on public roads with human lives at stake, or start selling a product, you better have some quantitative guarantees other than "It's been fine so far..." That kind of stuff makes me angry.


Having worked several years on critical embedded systems in aerospace, I would tend to agree with you.

But on the other hand, has any Tesla car ever had an accident because of this? At some point, "heavily tested and validated end to end in real-life conditions for years" and "formally proven on a simplified model using reasonable assumptions made by human engineers" become relatively close in terms of how much trust you can put in a system.

But somehow we tend to prefer to later. I am not sure if this paradigm is still relevant these days.


> "heavily tested and validated end to end in real-life conditions for years"

That seems like a more wordy way to state "It's been fine so far".

> "formally proven on a simplified model using reasonable assumptions made by human engineers"

I didn't know about that. What kind of formal proofs did they do? Did they involve the linux scheduler?


Thankfully, the critical reactive components of the Tesla vehicles do seem to be run by an RTOS -- spesifically FreeRTOS [0].

[0]: https://youtu.be/KX_0c9R4Fng?t=8m47s


SpaceX's Linux-based engine controllers at least have a decent amount of redundancy. They're all triple-redundant, and each of the three components consists of two cpu cores running in lockstep that are validated against each other as well.


Signalling and train control systems typically have two independent implementations - full stack: hardware, os, software, different development teams.

I guess you'd need three separate implementations to achive some redundancy when you can't just slam the emergency brakes if the systems disagree.

(... For some reason, I find it higly demotivating that another team is doing the exact same thing. Maybe I just want to be a snowflake...)


You're making a lot of unwarranted negative assumptions and setting up a lot of strawmen here.


Which ones?


+1. I wrote some near flight software for an instrument using Ubuntu with a RT patch on modern hardware, in parallel with another team that took the traditional approach. Not to brag, but our “undeterministic” system ran far more reliably than the real time one that didn’t have the advantage of modern application libraries. Plus, even though we had that pesky operating system in the way, we ran on blazing fast modern architectures and were actually more deterministic than the slow as hell hardware we were benchmarking against.

I’ve seen the same pattern a few other times. Slow, hand built, rad hard systems CAN be more stable and demonstratively safer... but that is rarely the case and the effort required to get such a system right is orders of magnitude greater than using standard “undeterministic” systems. That engineering effort can be better spent innovating and building fundamentally more advanced solutions.

Just my experience. Just my opinion


Having done just enough robotics stuff to have used linux before (though not controlling anything even kind of large), I find this an odd notion. It's not that I think linux would go wrong for this sort of thing often, but when dealing with things like rockets or self-driving cars I'd think you want more assurance than "well we haven't had it be slow yet."

I haven't messed with an RTOS before but have done some fooling around with scheduling on microcontrollers and I can see why linux is tempting for ease and speed. But we're talking rockets and self-driving cars. These things are expensive as hell, can easily kill people, or both. It seems like the exact sort of place you'd want to take the time and effort to be sure.


Agree completely. Hard real time is possible with Linux, we use it for sub millisecond control of Traffic Lights. The only issue we ever hit is proving that the code running is the stuff we expected to run.


>we use it for sub millisecond control of Traffic Lights

Why would anyone need "sub millisecond control of Traffic Lights"?

Traffic lights are mission critical systems, of course, but even millisecond precision should be more than enough, and possibly even 0.5x-1 second precision...


The sub-millisecond precision ensures they can totally destroy your day with short left turn arrows and yellow lights.


Yes, some green phases definitely feel sub-millisecond.


For legal reasons. Regs say that stop cycles have to meet minimum times in each state, if you're even off by a Tony bit you could potentially challenge it legally in court.


Bear in mind that the software only controls the cycle. The lights are electrically wired so that it is impossible for example to have "GREEN" illuminated in crossing directions. Or so I've read -- I don't work in traffic control software or hardware.


if it controls a ton of them, then small stuff could get them to be intolerably out of sync maybe? I'm just guessing :p


The real question is: why do you need an entire OS kernel to control traffic lights?


To communicate with all the other traffic lights and sensors at the other intersections on the road to optimize traffic flow.


Yeah, I still think that's overkill, simply because the bulk of the computation is done on a remote server anyways. All you really need on the frontend is a TCP/IP stack to send telemetry and receive commands.

If the connection is lost, the exchange can just fallback to "naive" mode.


I guess using off-the-shelf mass market hardware combined with a software stack anyone can design, setup, and implement is way easier and cheaper than a customized solution.


Consider the overhead of maintaining two entirely separate software stacks, with different libraries and controllers then? And the ongoing costs of discovering your "minimal" hardware can't accomodate a future improvement, compared to just using general compute at a marginal upfront cost, and then having everything else be familiar?


Such thing does exist? I thought traffict light only time-based on no sensor involved.


Depends on the area but yeah many intersections use sensors (cameras, sometimes under the road pressure sensors, etc) to make anywhere from subtle to extreme changes based on traffic patterns. The under the road pressure sensor has been around for decades.

When I was a kid there was one light that, when you drove over the pressure sensor, it wouldn't really do much. But if you backed up and drove over it again it must have registered an additional car coming through and the light would almost immediately go through its light cycle to change. It was really interesting to see!

Nowadays I think it's mostly cameras? We have a light near my home and the left signal will literally never trigger unless someone is in one of the left lanes.


It's not a pressure sensor, but an induction loop. Basically, there is a coil placed on the road that has a small AC current passed through it. When a car (metal) sits on top of the coil, the two "coils" couple, changing the overall inductance. A simple sensor can detect this change.


The current is even able to pass through the tires?


No, the current passes through the coil, and the coil has no physical contact with the vehicle.

Look up inductive coupling. The basic idea is that a changing (AC) current in a conductor generates a changing magnetic field (Ampere's law). This changing magnetic field then induces a voltage in the second conductor (Faraday's law). This is the principle behind how transformers work.


As far as I understand, it's just electromagnetic waves. No current passing through to the car, but the coil can "register" a change in its magnetic field and can determine that it's a car and how fast it goes


Can confirm. Also interesting to note that motorcycles often have trouble triggering these sensors (a common trick is to stick a heavy duty magnet underneath).


Really interesting! Thanks for the correction!


As a sibling comment noted, it's not a pressure sensor [1] but an induction loop.

The Wikipedia article https://en.wikipedia.org/wiki/Induction_loop#Vehicle_detecti... has fairly extensive details on modern implementation.

[1] These do exist for weight-in-motion systems, however.


An OS has a kernel. TRON [1] equally fits your description. Which is IIRC what traditionally runs on traffic lights.

[1] https://en.wikipedia.org/wiki/TRON_project


> Agree completely. Hard real time is possible with Linux, we use it for sub millisecond control of Traffic Lights.

Out of curiosity, why do traffic light controls need to be that precise?


Venturing a guess -- there's either some X% accuracy standard required by the government for who-knows-what reason, or it's for red light cameras and ticketing systems.

"Ironically, the biggest concern with red-light camera systems is that they are so precise. They measure a driver’s speed and exact location within a fraction of a second — but do not leave any wiggle room for the errors of traffic signals such as inconsistent yellow light times"[0]

If there's not an accuracy threshold for safety reasons, there's gotta be one when traffic ticketing revenue is on the line (also I guess determining fault at accidents, vehicular manslaughter cases, etc.)

[0] https://www.mercurynews.com/2014/06/06/red-light-cameras-how...


I don't actually know, but my guess is time drift with respect to other lights.


resume-driven development?


everything is revenue driven development


he said resume, i.e. working with tech you want to use at your next gig not the tech you need at this gig.


could have sworn i read that right ... ah well

still, revenue driven development actually makes sense as a thing too ...


On a tangent, but still germane to the thread:

Why arent their efforts to create "autonomous only" traffic managment scenarios, where people drive into a given, known area, and the area then takes control of managing the traffic and vehicles. Such that you relinquish control of the vehicle to that area's control system, with your destination stated and then your vehicle is managed accordingly.

For example, a parking lot for a really large venue with an autonomous valet system.

YOu drive up and get out and then the system takes over your car and drives off iwth it and parks it and you recall it when needed...

Or managing traffic in a very heavily trafficed bottle-neck of a grid; such as the baybridge merging egress from SF financial district.

If you put in your destination, and join the group, all the cars could then be managed for getting onto the bridge more rapidiously ...

Autonomous doesnt need to drive me from SF to LA, but it would be great if an autonomous hive mind could get all the cars to up throughput in given situations, no?


If it were that easy to setup tech-based infrastructure, we would have had positive train control implemented years ago.


Rio Tinto have spent more than a decade on autonomous freight trains.

On tracks they own.

With no other traffic than theirs.

And unlimited funding.


What is Rio Tinto? I'm curious because it is the name of a local city here in Porto.


One of the world's largest mining companies.

https://en.wikipedia.org/wiki/Rio_Tinto_Group


BHP (largest mining company in the world) and Rio Tinto (second largest) are to parts of Australia as Google and Facebook are to the Bay Area.

And much less apologetic about wielding power.


Yep. Probably why Australia doesn’t churn out Tech behemoths. They’re too busy investing money to dig massive holes in the ground.


The dynamics of mine development show interesting parallels with tech, actually.

Lots of mines start from little companies searching for a possible ore body (the idea or market fit), then raising money to perform a closer survey (seed funding). If the closer geological work is promising they often obtain a lease (patents or other IP).

At this point it goes one of two ways. Either they raise enough money to start and operate the mine themselves (series A, B etc, leading to an IPO) or they sell the prospect to a major company.

Then the newly-minted millionaires, who know a lot about mining, invest in the next crop of junior miners.

So as with tech there are conceptual, exploratory, growth and liquidity phases, followed by a process of reinvestment.

I remember realising this when living in Perth and being frustrated that, with quite literally billions of dollars sloshing around the city looking to invest, you'd be hard-pressed to pitch anything smarter than a brochureware website to the local investment class.

There were other structural problems. Stock options are not A Thing for various legal reasons. Failure in starting a high-risk business is a bit of a black mark. There are VCs but so much of their money came from governments trying to jump-start a market that they were about as risk-taking as a loans officer at a bank (what government wants "10 MILLION WASTED ON PHONE APPS" as a headline?).

Meanwhile the super funds are collectively sitting on trillions of dollars[0] and investing an absurdly dumb fraction of it in the ASX. Putting just 0.5% of their holdings into VC would unlock tens of billions of dollars of potential investments.

For which, hey, VCs who lurk here and want to raise a fund: go talk to the Australian superannuation industry. It is a massive pool of underperforming cash languishing in the same dozen public companies and, because Australian law forces all Australians to set aside at least 9.5% of income for retirement, the industry will never stop having incoming funds. There will always be new money to raise[1] and it will probably the 2nd largest pool of pension investments sometime in the next 10-15 years.

I will accept finder's fees and/or massively remunerative job offers as reward for this insight.

[0] https://www.superannuation.asn.au/ArticleDocuments/269/Super...

[1] https://www.willistowerswatson.com/-/media/WTW/Images/Press/...


You're right. We should all just give up.

I mean, heck. If it were that easy to start a unicorn...

But seriously - I am not saying its easy, I'm just surprised that we haven't made much (publicly known about/announced) efforts along these lines.

I mean, we have TCP down pretty good - if we are simply thinking of cars as packets, a lot of the math should exist to ensure collision-less delivery?


Well TCP doesn't do collision avoidance, it's a link layer thing. And on Ethernet, it is collision detection on the shared medium. Wireless does avoidance due to the hidden terminal problem.

Neither of these models is really analogous to cars on the road.

But applying collision detection and exponential back off in road traffic is a "fun" thought experiment.

A more apt model would be critical sections and semaphores from concurrent programming. Which is named after a collision avoidance scheme used to control trains. And we all know how difficult concurrent programming can be. I don't want traffic with deadlocks, starvation, busy waiting or live locks.


> if we are simply thinking of cars as packets

You better hope you don't get any dropped packets.


It's TCPizza delivery! much better than UDPizza - I never know when they cant locate my MAC.


Oh, we lost a car. No big deal, we'll just send another copy!


That seems to match the right tool (AI driving) to the right job (well-defined, well-controlled situations).

I seem to recall that similar ideas go back to the early 1990s, at least, for highways: Drive your car to the entrance ramp, plug in your destination, and the autonomous system takes it from there.

But for many of these things, such as the Bay Bridge or a highway, it seems like there is a simpler solution: Put the cars on a train and take them across by rail. I suspect I'm not the first person to think of it so I wonder why it's never been done (i.e., what problem I'm overlooking).


I suspect it's never been tried because the cost necessary to get from where we are now to there outweighs the potential benefit compared to more conventional transit solutions, carpooling, etc. Once automated driving gets to a point where it's possible to implement "autopilot-only" lanes (and doing so gets past the sociopolitical hurdles), I suspect those will come into play too, though.


Cars, or more specifically the way "we"* tend to (mis)use them (single occupancy mode) are an exponential major world problem.

* Full disclosure: I ride bicycles and only plan to ever live in localities where I won't need to purchase an entire car.

More cars, or even car-friendly tech will never be the solution for the issue of too many cars. See also "induced demand": https://en.wikipedia.org/wiki/Induced_demand

https://www.mrmoneymustache.com/2011/10/06/the-true-cost-of-...

https://carbusters.org/2011/09/08/are-cars-really-our-greate...


Ive been picturing the rail problem for some time as well.

Not just for cars, but also for cargo... just have a constant gondola-like conveyer that detaches a platform from the line to slow it enough to allow for cargo to get on, then re-zip-it backinto the line and speed it along, de-rail it once it hits its exit/location...

ideally though, in cities, there would be no surface streets and all cars would have their own level below that of bikes pedestrians.

What would SF look like if a superstructure was built above all streets and all pedestrian and bike traffic was moved up there? (sure, SF may be a poor example, so just select [city])

Look at Singapore's vast underground connecting malls between facilities. Those are pretty amazing.

I grade US urban/city planners rather poorly.


> What would SF look like if a superstructure was built above all streets and all pedestrian and bike traffic was moved up there?

The street level would be dark and storefronts would become difficult to access. If the stores moved up to the 2nd floor (a massive transformation of real estate, probably greatly reducing available living space), what would go on the first level? Not many people would want to live in the dark.


It's not a pure hypothetical, large parts of contemporary Chicago were built up a level.

The results are mostly "garden apartments" which are damp, dim, and slightly less expensive.


Underground space isn't unlimited.

Besides, the best integrated transport solution in the world already exists in places like Utrecht, Groningen and Assen thanks to reforms that started decades ago.


Well this would mean collaboration between autonomous car manufacturers to build a common protocol. And this does not fit with their business model of getting massive investment on the grounds of potentially being the first player on the market.

I don't think there is any possibility of large scale autonomous driving without a shared control infrastructure. Autonomous driving will only work as long as autonomous cars are a small minority.

As soon as they stop being in the minority, some shared control infrastructure is necessary.

Case in point: 4 cars arriving in a no-lights 4 way intersection simultaneously will cause a deadlock. A tie breaking scheme requiring some form of communication is necessary.


Not just between car manufacturers, but also between the cars and the area conrol system. Basically, all car manufacturers whould have to agree on a common API that allows the control system to take over, with full access to sensors and drive controls. A manufacturer could not simply refuse to implement the API, because this would effectively make their cars unable to use certain parts of the road network.

Even if manufacturers would somehow manage to agree on an API, it could then be "abused" by competitors or accessory vendors to sell their own customized car assistants, which would instantly work with any car brand - without them having to negotiate with the manufacturers.

I fear we will sooner have a usable open IoT standard than manufacturers giving up that level of control.


Airport baggage systems are mostly automated.


But I have to add we have a real embedded cortex checking everything, just in case ......


So what sort of interlocks do you have on top of that to prevent simultaneous greens?


Traffic signals have hardware interlocks so that can never happen. I don't know if they are using mechanical relays or what, but you can't turn both sides green in software.


Also, an RTOS is susceptible to priority inversion[0] so it’s not necessarily a panacea.

0. which infamously occurred on the Mars Pathfinder. https://www.rapitasystems.com/blog/what-really-happened-to-t...


There are solutions to priority inversion. This is an old limitation that is now days even taught in school.

Any decent RTOS should have Priority inheritance that should avoid this.

Pointing to this one things as RTOS issue isn't really an accurate portrayal of current RTOS capabilities.


That’s good to know. Indeed, the Pathfinder’s OS (VxWorks) had priority inheritance but it wasn’t enabled on a particular mutex and enabling it was the fix.

Priority inversion had been known about since the 70s. Priority inheritance seems to have first been proposed in 1990:

https://www3.nd.edu/~dwang5/courses/spring18/papers/real-tim... (Priority Inheritance Protocols: An Approach to Real-Time Synchronization)

The Pathfinder engineers were apparently unaware of the priority inheritance option available in VxWorks until they had to debug the issue live from a few hundred million km away.


Automotive ADAS systems generally require ASIL-D certification, which is much easier with an RTOS than Linux. I don't have much experience with real-time embedded Linux, but my understanding is that it is very difficult or impossible to certify to ASIL-D. Can someone correct me?


A RTOS helps because the vendor will usually provide the RTOS already certified for ASIL-D application. The rest of the software components will also need to reach ASIL-D, but getting the RTOS to ASIL-D makes things a little bit easier.


Usually this boils down to the ASIL-D RTOS systems are much smaller, e.g. much simpler to verify, leaving the developers of the systems above with much more job to verify there parts.

Also in my experience it might have been easier to reach the ASIL-D requirements, using a smarter combination of a Limiter on RTOS and using more generic code on something like linux for more of the code. This probably also would end up in more used and tested applications reaching more stability. (That's is partly outside ASIL-D).

Functional safety and ISO-26262 is much misunderstood in automotive development and architecture.

Also imho the certifications, well with out the safety case are kind of useless. You still have to make the assessment how you will find the problems with it in your use case. That might ever so slightly differ from what they certified. The automotive industry thou loves to have someone else to blame, e.g. the supplier of the RTOS, Compiler etc. Using Linux makes the blame game hard.


Agreed. I work on both vxWorks and Linux in the defense industry for a very popular armored fighting vehicle, and despite popular belief, the Linux kernel with the RT patch works well enough that both the cost of vxWorks and the issue of finding developers to maintain it isn't exactly justifiable anymore. Without going into too much detail, there have been a good few studies internally to show that our current fire control unit doesn't need the hard time precision it once required on legacy hardware, and all of the Linux ports with the RT patch perform just as well. The biggest hurdle, of course, is not exactly the performance. It's the certification process.


Is that like, you really don't need a fancy RTOS for this stuff 99.999% of the time, although sometimes you do? Or truly, despite the life safety element, there is never any need for RTOS.


In every application I can think of off the top of my head, but mostly in the ones that apply to Tesla, you truly don't, except when the law or a contract says otherwise. I'm sure there are exceptions for things that don't apply to Tesla (or for that matter SpaceX).

Here's what's inside of every autonomous vehicle ever made: a message-passing subsystem, sensors, fusers, navigation, dynamic control, actuator device drivers, and thruster device drivers.

Sensors measure things and emit readings. Your most expensive, highest frequency general purpose sensors emit new readings at something too fast for a human but hella slow for a computer, like 100Hz-5KHz. Your common sensors, a video camera for instance, don't get even close to that. These sensors are often connected, even today because milspec companies hate modernity, via RS-232 serial cables. For those younger than 30, RS-232 is what non-Apple computers used for non-keyboard/mouse peripherals prior to the introduction of the first iMac in 1998 because USB didn't really take off until then.

Sensors send their readings via the message-passing subsystem to fusers.

Fusers take the readings from the sensors and, hur hur, "fuse" them together into a description of where the vehicle is and what the environment is like. This usually involves something like a kalman filter. Fusing even your very fastest sensors, the 5KHz IMUs of the world, is just a small bit of math and basically takes no time at all.

Fusers send their fused states via the message-passing subsystem to navigation.

Navigation takes the fused sense of self and the world and decides which direction to head and how fast to go. The objective could be something like hitting route waypoints or it could be something like staying in a lane and not being rear-ended and avoiding obstacles. Car navigation probably doesn't act on new input more frequently than 100Hz, you certainly can't act on new input more frequently than 100Hz, and it takes basically no time at all.

Navigation sends its directives via the message-passing subsystem to dynamic control.

Dynamic control takes navigation's "which way" and "how fast" directives and turns them into more realistic short-term goals accounting for hysteresis and other physical limitations of the system like minimum turn radius. This is just a small bit of math and basically takes no time at all.

Dynamic control sends its directives via the message-passing subsystem to the actuator and thruster drivers.

Actuator drivers convert dynamic control's "go more left" message into trying to go more left.

Thruster drivers convert dynamic control's "go more fast" message into trying to go more fast.

Actuator and thruster drivers send readings (hopefully) from the actuators and thrusters, because those are also sensors, back to dynamic control and fusion.

Sensors feed into fusers, fusers feed into nav, nav feeds into dynamic control, dynamic control feeds into actuation and thrust. When you have new data, you do something new with it which is technically doing the same old thing with it and just producing new output.

Now there aren't that many sensors. There are way fewer fusers. There's only one navigation. There's probably only one dynamic control, though there could be a couple.

Anything else that I haven't already described, like Waymo's machine learning object classifying 4D mustache adding hotdog detectors, are just sensors and fusers sitting on their own computers feeding new lat/lng/heading/speed to navigation at a rate that is hella slow for a computer. And for sure Waymo's convolutional neural network middle-out jaywalking yoga mom detector takes a lot of processing, but it's running on its own computer, not competing for resources, and emitting its fused readings at some hella slow for a computer rate.


Nice high level conceptual model of such a system. What you conveniently ignore is the complexity that is necessarily introduced by hard real time constraints, safety and all the reliable communication required.

This stuff really does get complex. A sensor controller will likely be on multiple cycles internally: one for oversampling the sensor hatdware and one for transmitting the (filtered/corrected/calibrated) results. A "fuser" as you call it (never heard that term before) needs to make sure that it does never act on stale sensor information (sensor malfunction, accumulated communication issues). Transmission errors need to be detected. Random bitflips in values that are stored in volatile memory for long time spans need to be checked and acted upon.

Every independent controller in such a system requires some kind of watchdog that needs to be reset periodically. Too many watchdog resets in a row indicate a failure and the affected system must shut down in a defined way. You need ways to deal with any combinations of controllers going belly up and avoid taking unsafe actions. For many systems transitioning into a totally inert safe mode is sufficient, but not always.

All of the hardware must constantly run self tests. That includes periodic CPU and memory tests (both volatile and non-volatile memory) and also all periphery that us involved. If, for example, a DAC is used to send a signal, the resulting signal must be read back by different hardware to check that the generated voltage is indeed correct.

Manually threading together all these different kinds of cycles and asynchronous events without a RTOS scheduler is hard and becomes error prone. The result is likely less resilient than a preemptively multithreaded firmware.


I had a very mean line-by-line response, but then I deleted it because it wasn't in keeping with HN guidelines. I apologize for having written it even though you'll never read it. Instead I'll just say that literally nothing you've mentioned has anything to do with whether you choose to use Linux. Yes your serial lines will be noisy. Yes you have to write software. Yes you need specific domain knowledge to do it well. Nobody has ever said otherwise. None of that, none, has anything to do with the operating system.

An operating system, real time or otherwise, handles activating processes, IO, and interprocess messaging. That's it. You don't get magical serial line noise clearing pixies with it, and it doesn't make your actuators less drunk.

And...

> Manually threading together all these different kinds of cycles and asynchronous events without a RTOS scheduler is hard and becomes error prone.

You just said "getting input and sending output is way too hard for software on a Linux kernel." That's a crazy person statement. It turns out that Linux is, and has been for a loooong time, very good at doing operating system things like activating processes and interprocess messaging.

> The result is likely less resilient

Saying "likely" here suggests that you don't know what the result actually is. So what are you arguing? What was my statement?

Whatever assumption you're making that says "This. This right here is the reason why we definitely need an RTOS." Just don't make that assumption. That assumption is wrong.


We need to both take a step back here and look at what we are saying. I am working on safety critical embedded software running mostly on small(ish) microcontrollers. It's the kind of environment where you're running either bare metal or an RTOS at best. There is simply no way to run anything else in this kind of very constrained hardware we have.

So to me, "you don't need an RTOS" means that you're running on bare metal. And that would be hard to pull off for the reasons that I outlined above. And I think this is where we ended up misunderstanding each other.

I enjoy the kinds of restricted RTOS environments that we use because their simplicity means that I can get a total understanding of what is going on quite easily.

This does not mean that Linux is completely inappropriate for real time tasks. I am sure that you could analyze and patch the kernel to match pretty high standards (others mentioned patches). Given the relative size and complexity of a Linux system, this is no simple task. But if you run it on appropriate hardware (not your run of the mill x86), I don't see why you couldn't get reliable realtime responses.

But safety essentially means that the software will not fail more often than once every x hours where x ranges between 10^5 and 10^8, depending on the level of safety required. Proving that for a complex system is hard. For example, how do you show that the essentially indeterministic pattern of dynamic memory allocations happening in a Linux system will never lead to memory exhaustion by fragmentation?

I know of no version of the Linux kernel (or GCC, for that matter) that got a functional safety certification. Safety standards are transitioning away from allowing positive track records as sufficient proof that a piece of software meets safety standards. Do-178 now only allows certified software AFAIK and I expect this to be carriedover into ISO 61508 and ISO 26262. This means a regulated development process, pretty strict coding standards, complete test coverage, full documentation, and so on also for all 3rd party software. Not sure how this transition is going to play out in practice.

Do you know any ASIL-C or ASIL-D (or SIL-2/SIL-3) software that is running on Linux? I am curious whether anybody managed to get that certified. I know that Linux is running on some class II medical equipment, but then, standards for these devices are inexplicably lower in practice in my experience.


A common distinction is between hard real-time and soft real-time tasks. Hard realtime means that missing a deadline results in a complete failure of the system. A soft real-time deadline means that missing a deadline will not result in system failure. Many soft real-time problems are incorrectly and unnecessarily promoted to being described as hard real-time. This can lead to completely different system architectures and much longer development time.

A stock modern Linux kernel on almost any hardware platform will give millisecond level responses. Much of the old PREEMPT_RT patch set features from the old 2.6.x days for real-time response has been merged to the mainline kernel.

There are lots of problems in software where you are controlling something physical with a control loop from 1 to a couple hundred Hz. Many people assume a hard real-time deadlines are necessary for this sort of system, but through good system design practices it often is not necessary. For example, if something physical must be sampled with very low jitter, let some hardware do the sampling and latch it in a register and then let the software come in with a variance of hundreds of microseconds to get its work done. Once again, write the output to a latched register and let hardware worry about taking the shadow register with very low jitter.

Having worked on bare metal microcontrollers, to various RTOSes, to higher-performance embedded CPUs with Linux, I prefer Linux on higher-performance hardware. Obviously, this isn't always possible, especially in power constrained situations. But with Linux, when you suddenly need to have support for an arbitrary network protocol, a database, a filesystem, graphical output, etc. you can have something together in no time. It is often a monumental effort for such a task when bare metal or with a RTOS. It is often difficult to get the supporting software and libraries to build on an RTOS in the first place.


Do you have a book where you just explain various things in this style? I’d buy it.


Interesting. I guess it's time for me to pick up another hobby!


check out comma.ai for a mostly open-source implementation


That's not the hobby I was talking about.


I think in terms of missiles, so everything needs an RTOS. Or bare metal.


Have you written missile guidance or are you armchair speculating?


Guidance? No. Infrared detectors? Yes.


A Falcon 9 and a missile share a lot of qualities.


Interesting talk you may be interested in: "Who needs a Real-Time Operating System (Not You!)" (2016)

https://kernel-recipes.org/en/2016/talks/who-needs-a-real-ti...


you really don't need a fancy RTOS for this stuff

maybe you don't, but genuinely curious how do you validate/guarantee scenarios then?


I suppose my best questions back to you might be:

1) What concrete guarantees do you think you get from a special RTOS?

2) Which of those guarantees are meaningful to the scenario?

3) How (by what mechanisms) do you think the special RTOS guarantees the things that it guarantees?

4) Which of those behaviors are something that only an RTOS can provide?


Among other things, determinism, since timing can be guaranteed (within a margin). RTOS will run with consistent timing, which is guaranteed and facilitated by control over tasks priority and checks whether timings are met. You could probably do that without a hard RTOS, but without any sort of formal guarantee. So, it might work all fine and dandy, until it doesn't. Doesn't should not exist in a hard RTOS by definition and proofing of it.


You don't need an RTOS to know that a system without extraneous background processes isn't doing extraneous background processing. Where precisely do you think think your timings are going? You're not also running a Minecraft server on the nav computer. Using Linux doesn't mean you also need to enable stupid PC things like search indexing or seti@home. Navigation isn't resource constrained. The only resource constrained component in the whole system is the computer vision module and it's going to be on its own processors, and the rate at which it can hand off new output is on far faaaarrrr longer timescales than your interprocess communication latency.


Yes, thata one of the RTOS thing, but almost always, when trying to build this down into what these timing limits are, they are kinda arbitrary, gut feeling put into a number. There are exceptions, yes, usally these exceptions as well are only for a limity subpart of the system. So for this using RTOS for everything is as stupid as not using RTOS, (or possibly even puting these parts into hardware, ASIC of FPGA) for these small subsystems. These timing used, also is mostly up to scheduler and Linux is possible to run with a scheduler that gives it similar capabilities ans most RTOS systems. It's a blury grey area at best.


Yes, this is so true. When someone has a true jitter-sensitive task that needs to run on say microsecond or sub-microsecond accuracy, that is not for the realm of a high-performance CPU with caches even if it is on a RTOS. My first question is if that tight of a bound is truly necessary or just an over-specified requirement. If it is necessary, I say do that in hardware or on a simple microcontroller (e.g., Cortex-R) if that is truly your requirement.


>I promise that their pid control isn't an Electron app.

Given all the wtf stuff in TFA, it might very well be...


> Why Linux instead of an RTOS?

Same reason SpaceX eschews radiation-hardened processors for redundant off-the-shelf cores: supplier competition. There aren't many RTOS engineers on the market; there are many Linux engineers. Once they got over the cost of hardening the kernel, SpaceX found itself at a scaling advantage versus RTOS-based competitors.


Not just that but holy shit some of those commercial RTOSes have major issues. I work in aerospace and we recently used one where the whole system would crash after ~230 days of uptime.

At least with Linux you're getting a system that's been used so much that all major issues like that are ironed out. Nothing beats a few million testers.


248 days of uptime by any chance?

edit: I saw the same on a fleet of thousands of JVMs which hung on 100% CPU after 248 days very consistently. Closest thing to an explanation I ever got was perhaps it is storing uptime in hundredths of a second (why not ms???) in signed 32 bit integers, see: https://ma.ttias.be/248-days/ In the end we solved it by restarting with a cronjob between 2am and 4am after 247 days...


One thing to look at is the sum of Anonpages if you have THP enabled. That was enabled by default after CentOS6.2. The usage itself isn't an issue, but there is a known memory leak in THP and the fragmentation can get wedged after a couple hundred days based on usage characteristics of the server.


Ubuntu Linux has millions of testers, your custom version has only been tested by you and your customers :)


Hmm, doubt that there are millions of folks testing Ubuntu controlling a car.


Unless you patch the kernel very heavily, no.


Million? For linux perhaps billion :)


> There aren't many RTOS engineers on the market; there are many Linux engineers.

There aren't many Linux engineers who have experience with resource-constrained systems or real-time programming requirements.


What about people who make video games? There is a great video on YouTube called "Software powering Falcon 9 & Dragon - Simply Explained" which goes more into this topic.


(I don't have any experience with real-time applications.) What are some things they had to do to harden the kernel?


>Same reason SpaceX eschews radiation-hardened processors for redundant off-the-shelf cores

A devil-may-care throw-caution-to-the-wind cavalier attitude?

>supplier competition

I stand corrected. And I'm never buying a Tesla or property even remotely close to SpaceX launch sites...


I would imagine commoditization generally increases - those variables which market values. In this case I presume the variables include reliability since that stuff is probably used in IoT systems where minimizing maintenance can be probably considered am asset.

Although I don't know the market. It might screwed by some weird market dynamic.


>> Apparently Tesla's autopilot also runs Linux, which seems like a huge accident waiting to happen (pun intended).

Not to worry, the tight control loops the require determinism all run on Arduino boards.


Not sure if you’re trolling or not. Well played


What would be particularly bad about running a real time application like that on an Arduino? Isn't that what they're made for?


Arduino boards arn't designed to operate in a rugged environment such as a car which is a pretty hostile environment regarding vibrations, electro magnetic interference and thermal cycling.


Arduino is not real time.


Sure it is. It's nothing but a gussied-up libc implementation for 8-bit AVRs. It's as "real time" as you want it to be.


[flagged]


Don't get that worked up my friend! It was meant as a compliment for the sly sleight, I do have a sense of humor... (oblique, but that depends on the crowd's judgement.)


I have some good news for you. There is a place where that type of humor is appreciated. That place is Reddit.


I recognize jokes are rare here, but that one was borderline at most. Reddit is where that comment would birth a chain of Galaxy Quest references.


The GP's joke was good dry humour that, given the lack of downvotes, was clearly appreciated. Reddit is where you go for low-hanging fruit. So no, I'm going to push back against turbonerd NO FUN ALLOWED types when the jokes are actually funny.


Google's self-driving cars run a modified Ubuntu. Or did, last time they talked about it.

https://www.youtube.com/watch?v=7Yd9Ij0INX0


I'm sure anything embedded is almost nothing like the pre-modified stock. The few paper I've read from embedded people is that they know how and will strip everything down until things are like they need.


"strip everything down" consists in removing packages, especially daemons, you don't need.

You don't (want to) make huge changes to the kernel and libraries codebase tho, even if the changes are meant to remove code you don't need, because testing a heavily modified OS gets prohibitively expensive.

Especially on "modern" embedded from the last 10 years were RAM and storage are not that limited.


Ubuntu and Centos come pretty naked; what would you strip and why?

Im asking because I run few instances with very heavy traffic and have no issues whatsoever. Just added nano and fail2ban and it runs with no issues for about 2 years now.


Any kernel module that's not required (wifi, graphics, sound, USB, etc depending on application), any security system like selinux, any unnecessary libraries, helper utilities, etc etc.

Basically you rip out anything not strictly required for the task at hand.

Running on embedded hardware is quite different from running on server hardware, disk space and memory are measured in megabytes, not gigabytes...


I once ran an "embedded" Linux on a $10 marvel SOC. It was a pretty vanilla kernel running a basic Debian install. 10MB out of 128MB RAM used most of the time.

Obviously you can go much slimmer, but a $10 board is surprisingly capable.


Not necessarily. I'm working with B&R's range of industrial controllers at the moment, which are Atom processors with a few hundred Mb of RAM and CF cards up to 32Gb, but still running a traditional RTOS with hard timing guarantees. They have built-in web servers and (basic) web browsers...!


Those two, particularly Ubuntu, are the opposite of naked. Try something like Arch.


Arch? Arch ships packages with debug symbols and docs included, and takes over a hundred MB for just a base install! Alpine is way smaller; base image under 10MB, packages broken apart so you only get binaries unless you ask for more, linked with musl to make it even smaller.

EDIT: This is meant to be a bit tongue-in-cheek, but I seriously do prefer Alpine over literally every other Linux distro I've yet seen for minimalism. Also geared towards embedded-type work.


Fair enough, and you're quite right about Alpine being geared more for embedded. For general use I find Alpine a bit of a pain due to lack of systemd (bring on the hate ;)), and of course lack of docs hurts usability a bit. With regard to debug symbols, I like what Redhat and Debian are doing with an embedded build id linking binaries to separate debug packages.


yeah come on, don't bring Alpine in the mix, we were talking about Ubuntu, not DamnSmallLinux. Alpine is great, no doubt, but it's in a class of its own.


Ubuntu is certainly not known as a lightweight distro.


That's because "lightweight" to Linux nerds is more like "weight of light" than "light of weight".


The Ubuntu message of the day when you log in on a shell runs curl to feed in advertisements. Pretty big attack surface.


? I’ve never seen that. Since what version?



Any substantiated argument why using Linux "is a huge accident waiting to happen"?

Did you know that Linux can handle hard RT, if you use the right hardware (and maybe the right kernel/patches).?


Certification. Many commercial RTOS's are already certified to ASIL-C or ASIL-D, which requires extensive testing to verify that the system will work as designed and every code path is covered by tests. Products exist like Automotive Grade Linux, but you will have to go through the verification and certification process yourself which isn't quick or cheap. I'm not even sure it's possible to certify Linux to ASIL-D.

So yes, I'm sure that Linux can work. But it will be difficult to prove to auditors that it will always work.


I'm speculating that he does know. But with all the filters you added (right hardware, right patches etc.) you would be better off to add the right patch -Linux/+<real_realtimekernel>

Besides, if you follow Linux kernel development, you see that the effort is virtually never for real-time but for general purpose.


Any RTOS has the very same hardware restrictions, and often only works on a very small number of (certified/supported) platforms.

There is no reason per se to attribute any downsides onto these "filters".


>Did you know that Linux can handle hard RT, if you use the right hardware (and maybe the right kernel/patches).?

Does TFA (which sure, is for Tesla, but same leadership) describe an environment where people use "the right hardware"?


I'm not familiar with this space but I'm pretty sure there are hard realtime kernels for linux. I'd hope they're using one.


Technically there used to be (still is?) such a thing as realtime Linux but not _for_ Linux (which you probably meant the userland?).

Regardless. There are many and far better alternatives to Linux for real time applications.


RT Preempt / Preempt-RT patch still exists. It slowly is being integrated into mainline as per everyone's wishes but it is still a thing.


The original RTLinux it also still usable and it provides it's own hard RT scheduler.


I've worked with RTLinux in automotive. It is only used in R&D and testing. It's much worse than INTime (windows rtos) or L4, which are usually used. I've even patched g++ to work on RTLinux eliminating all the dynamic, unreliable stuff.

People are using Linux when they need HW and driver support, e.g. gigabit Ethernet, firewire and such. RTOS vendors charging shitload of money on those drivers. I trust the Linux drivers more than the RTLinux scheduler or libc. But well, recently networking went to hell, so even there they start fucking up.


Pretty much anything that actually needs hard realtime is very likely running on a dedicated MCU/FPGA/ASIC. Linux is vastly easier to use for everything else, though. Navigation or communication doesn't need hard realtime, for example.


NI's cRIO devices use a real-time variant of Linux.

http://www.ni.com/white-paper/14627/en/


Some of the rationale for the flight software design is explained in https://lwn.net/Articles/540368/


Now just hold on a minute. I’ll bet you five bucks that Linux is just the brains of the rocket, controlling a network of RTOS-running (or not) microcontrollers (or maybe ARM SBCs.


I've worked on RT Linux before. Works fine.


Ditto. The Xenomai Linux co-kernel does too. Handy for CAN/etherCAT controllers


> Why Linux instead of an RTOS?

An RTOS has nice guarantees, and I definitely see the appeal, but on the other hand:

- SpaceX machines receive more irradiation that computers on the ground, so computation errors already make the behaviour of the software chaotic,

- The wealth of complex, widespread libraries helps a lot.


They could have used RTEMS (was designed for missiles) - on some level, if you design accordingly, you dont need a realtime operating system anymore - you can get by without it, and its easier to find people to design software in my opinion for not an RTOS


How would you design a complex control software for a realtime application so that you don't need an operating system? I don't see how.


The operating system provides a standardized IO API and schedule processes/threads. If you don't have fancy IO (LCD, hard drive, network, etc...) and multi processing, there is no need for an OS.


Sorry, I don't understand your point at all. Even fairly simple looking applications can end up doing all kinds of things at once / in an interleaved manner. Preemptive scheduling is simpler and therefore safer than trying to squeeze everything into some kind of gigantic global state machine.


Yeah I don't even do embedded work or anything real-time and even I know that in applications like this you should probably be using something like Green Hills Integrity RTOS.


I'm amazed that anyone would think that. I've worked on all sorts of embedded systems and worked with a many more people who've worked on far more than I, and RT Linux was very common and suitable for these sorts of things 10-15 years ago, never mind now.


(no judgement) Given that you've never worked in the field, what makes you think that?


It has a long history of use in Aerospace and military applications.


That's a reasonable starting assertion, I guess, but all kinds of things have long histories of use without precluding the validity of using other things. Sometimes things get used because of inertia. Sometimes things get used because people who don't really know the difference say things like "we definitely must use X" (historically X might be a megacorp technology company like SAP or Oracle or IBM) despite Y being just as good or better.

For instance, did you know that Windows XP has a long history of being used in military embedded devices that store user data (on a writeable, obviously, FAT32 file system) where the way you turn them off is to just cut the power? I shit you not. I've seen state-of-the-art Navy-used sonars where the internal computer was running Windows, and you would transfer data off of the internal hard drive by FTP over Ethernet, and it had no on/off switch, just power or no power.


I worked on some .NET modules that indeed worked in rocket installation so yepp both linux and windows runs up.


Eh, RT-Preempt works fine.


Given the many ASP.NET software packages that handle all the backoffice functions you've described, it sounds like they chose the right platform for that.

Why would you want to run backoffice on Linux and then re-create all those wheels by hand in-house? Relying on the expertise of other companies for basic backoffice systems is actually recommended practice until you become big enough to actually need custom software (generally, north ten thousand employees).


Very little off-the-shelf software was in use. The whole thing was custom.


Custom as in customized Oracle/SAP, or custom as in from-the-ground up custom?

The former is generally a given for a company of any significant size (employees or business activity). The latter is unheard of for most backoffice functions (other than specialized accounting and finance functions) since it's a waste of money and would place the company at significant legal and regulatory risks--it would require effectively becoming experts in accounting, HR, etc.


In my opinion once you start to need heavy customization in off-the-shelf ERP solution (ie. because you do not fit in the vertical that the system is designed for) you are generally better-off writing the whole thing in house. And if you want to do it as web application then ASP.net with WebForms is one of the more productive approaches to that problem.


As someone who now works in the "backoffice" I would say that building backoffice solutions from scratch is the height of hubris if your primary business doesn't involve those backoffice functions. There's a million small things on the compliance side that need to be addressed, and which the incumbents know about and already handle.

This applies even if you need heavy customization. In fact, it applies even more--since that level of customization usually means sufficient complexity of backoffice needs that only the pre-built service providers will have the sufficient depth and scope to cover you.


There isn't actually that many sap/oracle like packages in .net

Even dynamics isn't .net


There seems to be a misconception that enterprises (specifically ones where software is not considered a product) means Java and .NET with frameworks. Reality is that is this just the tip of an iceberg: a bank, a hotel chain, etc... might use Java or .NET for web applications yet great deal runs on proprietary software. Much of that software either runs on machines that are still sold and marketed as mainframes or if on commodity hardware (which nowadays offers plenty of vertical scalability in terms of memory and total number of CPU cores) coming from a mainframe lineage.

It seems as if there were two distinct cultures of engineers. Those working on workstation-grade hardware networked over TCP/IP (whether running proprietary UNIX, open source UNIX, or Windows NT) -- and Java emerged out of this.

The second cultures were developers building mainframe applications; usually they would be ones working on problems related data processing, planning, and automation for businesses (not just enterprises but also many SMBs, government organizations, hospitals, etc...)

Java clearly emerged from the first culture being built by a vendor of networked UNIX workstations. Some of Java's most memorable failures - either exceedingly complex and brittle systems like RMI, JMS, and J2EE (I mean this literally: not modern Java EE like Jersey/CDI/etc... but EJB 2.0) or features that were in retrospect far ahead of its time (JINI or JXTA, compare with consul/etcd/zookeeper and the idea of a service mesh today) came as an attempt to commoditise approaches commonly used by the first group as frameworks for solving the domain specific problems of the second.


This reply really exemplifies clear thinking. It is likely you will fit the role of a solution architect, rising above the menial arguments that frequently occur between developers.


What's wrong with ASP.NET? It might not be the most sexy framework out there, but it's great for this sort of thing - there's a huge business application ecosystem surrounding it.


While that may be true, a huge monolithic system that does everything is seldom great.


Monolithic does not always mean unmaintainable, though.


The majority of the professional world runs on Windows. There are many advantages:

(1) Stable platform that's backward compatible over long periods of time.

(2) Very good rapid application development tooling, e.g. Visual Studio which is probably still the best IDE overall.

(3) A huge trained developer base making it easy to recruit. Same goes for IT personnel.

(4) A huge pool of software, custom dev firms, etc.

(5) Certification for US DOD and other certification-heavy environments where Windows is used heavily, which may be important for an aerospace company.

(6) Integration with everything in the business and government world is already done.

(7) Windows has a lot of complex user, permission, and policy management stuff. Active Directory is The Standard for UAM in the corporate world.

The cost of Microsoft licensing is chicken feed compared to the cost of building and launching rockets.

Overall I don't think it's a bad decision. Not everything is an Internet startup or hacker project. Right tools for the job.


I don't agree with most of it (except that VisualStudio is a really good product). Windows is a consumer platform for playing games and having fun at home. It is also very expensive in corporate/server space. I have seen backwards compatibility broken many times (drivers no longer supported), it doesn't even support other arch than x86 and it is bloated.


If it's more than 10 years ago, Visual Studio was the only IDE for C++ and the C++ was the most common language.


Why do you think that is?

I've always wondered why there's such a small list of decent IDEs for C anything.

I usually just stick to vim and a handful of plugins.


For a long time the tools were either proprietary or GNU. Sadly gcc worked well enough that its policy of not exposing anything that could be abused by proprietary software meant that any tooling that surfaced had to work around the free compiler. Remember the day emacs got full featured refactoring support based on GCC? Stallman singlehandedly killed that for exposing too much. Good C++ tooling only started to turn up when Apple made its move from gcc to llvm/clang and we actually got competition in the free compiler space and a compiler based framework to build tools on top.


There are no good development tools for C++ because it's an impossibly hard language to parse and process.

For instance, try writing an auto completion tool for C++ and Java. The first one take orders of magnitude more work for a mediocre result.

Using vim is symptom of the problem. The available tooling is so bad or non existent that it's comparable to a text editor.


Didn't netbeans exist?


When Musk became CEO of PayPal, he tried to switch the servers from Unix to Windows. Of course the founders revolted and fired him for such an dumb and wasteful idea.


> This application was responsible for practically everything that ran the factory: inventory, supply chain management, cost analysis, etc.

They could have bought something off the shelf. Not sure what was the value in building everything from scratch


What like SAP? And then spend 10s of millions in customizing it for their needs and dealing with an arcane, unintuitive interface? I've seen this over and over again.


I understand SAP's business model: common core business logic code that helps your legal compliance, then lightly customized at great expense.

What I don't understand is why the UI sucks so very, very much every single time. And why it's so very, very slow. It seems like it has to be on purpose. Can anyone with insight explain it to me?


Different priorities. The goal isn't to make beautiful software. The goal is a system flexible enough to meet the users needs.

I also think it is a cultural. See SAP's blog entry "Why users might think their SAP user interface is crumby" https://blogs.sap.com/2015/09/15/why-users-might-think-their...


Also see: JIRA


Maybe all of the contractual obligations for existing products were not a good idea


>Elon is a big Windows fan and pushed hard to run the whole shop on Microsoft tech

wow - this is surprising to me and wasnt mentioned in the biography - any ideas why ?


This is discussed at length in the first chapter of Founders at Work. The chapter is written by Max Levchin (co-founder of PayPal) and discusses his bitter feud with Elon Musk over Musk's desire to convert systems over to Windows. Interestingly Musk is never mentioned by name.

https://www.amazon.com/Founders-Work-Stories-Startups-Early/...


This was also in Eric Jackson's history of PayPal. Jackson was a marketing guy, so had no technical dog in the fight, but he wrote this particular fight up.

Jackson was the guy who realised that PayPal and eBay had massive synergy and worked super hard to get PayPal in there. Eventually leading to eBay buying them out, and Musk and Thiel going from merely rich to actually billionaires.


Yeah it was (Ashley Vance book), he chose Windows over Linux when developing PayPal back In the day because the tooling in Windows was far more advanced (visual studio IDE) due to the parallel games industry driving development on that platform


Some of the known hardships of working in the games industry seems to pervade present-day Tesla and SpaceX, particularly working over-time to get things done "in time". The upshot of SpaceX (and maybe other Musk ventures) is that devs are at least building things slightly more tangible than just pure entertainment.

Musk has even admitted as much that he prefers game developers. Maybe he see's the parallel for working overtime and uses this "perk" to his advantage? From an article from 2015[0]:

> "We actually hire a lot of our best software engineers out of the gaming industry," said SpaceX CEO Elon Musk, when Fast Company posed this question during the May 29 Dragon V2 unveiling. "In gaming there's a lot of smart engineering talent doing really complex things. [Compared to] a lot of the algorithms involved in massive multiplayer online games…a docking sequence [between spacecraft] is actually relatively straightforward. So I'd encourage people in the gaming industry to think about creating the next generation of spacecraft and rockets."

[0]: https://www.businessinsider.com/why-is-spacex-at-a-video-gam...


Probably related to his use of Windows early on at home and at Zip2, PayPal, etc.


I have heard in some youtube video that one of the first post-merger conflicts at paypal was about elon pushing for Windows NT (and presumably MSSQL) instead of Oracle (presumably on Solaris).


In fairness: Competing with Oracle is one of the very few times when choosing Microsoft makes sense.

That felt weird to type.


Because it runs counter to the "epic nerd" persona Musk was trying to build with his authorized biography. Windows isn't cool right now.


When was it cool?


When NT4's kernel was released and Linux was on 2.2, there was a good reason to choose Windows for stability - or at least, there were trade-offs that were acceptable up to Windows 2000. After that, it became a battle for libraries. If you're using C# or other dotNet, then you're on Windows (or Mono?!?), otherwise your platform is Linux.

Both are reasonably capable of high service uptimes and solid performance. With Server Core and PowerShell, there's a lot more parity than my fellow Linux admins want to admit, but either is a viable choice for general IT services at this point.

Note - I'm excluding licensing entirely from this, as well as infrastructure maintenance and control surfaces. Nobody likes DSC, and there are several superior config management solutions for Linux that don't have meaningful analogs on Windows.


The thing that made NT4 still "super uncool" for me was needing to reboot when the IP address changed.

Back at that time of NT4 & Linux 2.2, I'd argue Solaris was the best option.


> Thankfully the rockets fly with a heavily customized Linux install.

That's good to know. When I saw the inside of the Dragon capsule, with its shiny touch controls, I was already imagining Astronauts having to deal with installing Android updates on their control tablets while mid-flight, or some other crazy stuff along those lines.


Only naive ppl blame the language instead of the codes based on the language.


Well said, though I think the GP (and many others) blames the platform, not the language.

I find C# and .NET runtime (before it meets windows) quite nice. I'm not a big C#-er myself though.


The .NET CLR and core C# runtime libraries are really nice to work with. But things become somewhat Microsoft-y (that is, nice-looking but amazingly half-assed in the most unexpected ways) when you start to do things like writing GUIs.


I've been learning WPF, and I want to shoot myself. It looks pretty, and is very flexible, but it's just so much damn typing. Plus the errors you get out of it are often pretty useless.

Maybe my brain just doesn't get it, but the documentation makes me crazy too. I just hate everything about it.


WPF is a huge change from "traditional" UI frameworks like Windows Forms or Swing. It requires some rethinking and to get the most out of it you should really do things the WPF way in many (not all) cases, even though other options appear to work (they're just more work in the end and less flexible).

The documentation is actually fairly good in my eyes iff you're only writing applications. As a library and custom control vendor (a position I find myself in at work) it can be atrocious and sourceof.net is hugely helpful (and I still find myself wanting to debug framework source code at times).

Still, if you have specific questions, you can throw me an e-mail if you want.


I came to WPF having not touched WinForms for nearly a decade and I fell in love immediately.

Once you accept that reactive data models are the 'correct' pattern things become so much simpler.

Throw in JSON.net and QuickType (amazing if you haven't seen it, feed it JSON or JSON Schema and it outputs correct code to serialize to from JSON in about 25 languages pretty much idiomatically (for C# it uses JSON.net for TypeScript interfaces and if you want it runtime validation).

It's a remarkably stable way of hoisting an API.


I'd like to know more about this, can you recommend some good places to learn about it?


The actual principal is simple, your classes have private properties which contain the thing, you use get/set on public properties and on the set you raise a property change event (that's declared via an interface).

WPF binds to those objects and when you change the thing via a public property the change notification is fired and the UI updates.

Docs you want are MVVM and particularly INotifyProperyChanged.

https://www.c-sharpcorner.com/article/explain-inotifypropert...


Thanks, I appreciate the offer for help. If I get stuck again, you might hear from me =D

I think a lot of the problem probably depends on the work you're doing. I'm an engineer, usually I just want a GUI that shows me the information I need, I don't care much for design aesthetics beyond making sure it's not hideously ugly. Winforms was good for this, but it definitely looks dated and needed to be replaced or massively overhauled.

I can absolutely see how somebody doing more attractive design work would like WPF. It just makes me grouchy. Somebody else mentioned mixing blend into their workflow, maybe I'll take a look at that.


> but it's just so much damn typing.

Expression blend saves a lot of typing. Also if you provide design-time data, saves time because WYSIWYG gives faster feedback than change-build-test cycle.

When I work on XAML-based GUI, I open the project in both VS and Blend, and use them alternately.


The documentation can be pretty half-assed too, e.g. it states "x can throw these exceptions: a, b, c"... x throws something else at runtime.


Some languages are unfit for certain purposes. For example, Python is not meant for low-level programming, and C is not meant for writing secure, mission-critical applications.


Furthermore, some languages tend to attract less skilled programmers, in part due to having lower barrier of entry and requiring less domain knowledge for being able to crank out something functional, quickly.


> C is not meant for writing secure, mission-critical applications

What OS kernels are actually used that are written in anything other than C? Plenty of them (INTEGRITY, VxWorks, QNX) written in C are used in secure, mission-critical applications.


> Only naive ppl blame the language instead of the codes based on the language.

A language is a tool, and like any tool it can be badly designed and/or unfit for certain purposes.


> Thankfully the rockets fly with a heavily customized Linux install.

And this is why Torvalds takes kernel API stability seriously...


I can see this being an environment that’s too busy to pay tech debt or put in DevOps processes.


[flagged]


Monolithic in design, massive in size


Or they’re trying to raise it an order of magnitude.


My respect for Elon just dropped 10x (still high) because I know he is a fan of Windows.

More

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: