Hacker News new | past | comments | ask | show | jobs | submit login
A Look into NASA’s Coding Philosophy (mystudentvoices.com)
247 points by astdb on July 30, 2017 | hide | past | favorite | 51 comments

This article is very much at odds with my 15-year (1988-2000, 2001-2004) experience writing software at NASA (albeit at JPL, which is to say the unmanned space program, but still NASA). In my experience there was just as much politics, marketing hype, and general bullshit as I've seen in the commercial world. It was actually pretty amazing to me some times that things worked at all, let alone that they worked as well as they did. I think the real secret at NASA is not that they are better at writing code, but that they write it slowly (development cycles are measured in years) and they test the living shit out of it. Then, once it's working, they don't change it except under extreme pressure. But while it's being developed it's just as much of a sausage factory as anything else.

As I wrote when this showed up on proggit,

Not to put too fine a point on it, but I would be very skeptical about using NASA as a good example for much of anything, but especially software development.

My bonifides: I'm 49; have spent roughly 25 years as a systems programmer, systems administrator, and general bit wrangler; and I worked for 8 years at the Marshall Space Flight Center, specifically the NASA Enterprise Application Competency Center.^1 (And yes, that's a thing. Started as IFMP (Integrated(?) Financial Management(?) Program), and was IEMP (Integrated(?) Enterprise Management(?) Program) when I started.) If you're a NASA employee, you might recognize STaRs (I worked on the rewrite, post Perl 1 and Monster.com), NPROP/Equipment, or DSPL/Disposal. And IdMAX, which I noped out of shortly after moving to the project.^2

NASA itself is a massively disfunctional organization, in my experience, and a failure to "cut through the bullshit" is a major reason why. For software development specifically, while I didn't do anything with "man-rated" development or the other important bits, I have strong doubts that they are any better than other avionics, automotive, or other embedded development organizations.

There was no mentoring. People tried, it didn't go over well, usually with the mentees.

You have to trust each other's potential, because there is no damn chance of getting any two projects to agree on anything. What goes on in that other silo is their business, not yours.

I did say, "I don't understand." A lot. Frequently pronounced "WTF?"

The list of "unreliable sources of knowledge" looks rather like a checklist of how things got done.

^1 NEACC is also the acronym for the North-East Alabama Community College, which I find ironic for no good reason.

^2 Unfortunately, I don't have a picture of me looking arrogant. Sorry.

> NASA itself is a massively disfunctional organization, in my experience, and a failure to "cut through the bullshit" is a major reason why.

I completely agree with this, though I think it's important to note two things. First, despite the dysfunction, they do, more often than not, make things that work under extremely challenging conditions. And second, a lot of the dysfunction is not their fault, but is a consequence of the fact that they are a government agency and hence ultimately answerable to Congress. Not only that, but they are a government agency created to fight a (cold) war that ended almost thirty years ago so they have been rudderless for a long time. The fact that they've done anything other than suck up taxpayer dollars is a testament to the incredible skill, both technical and political, of many of the people in the agency. Notwithstanding my vocal criticism of the organization, I have tremendous respect for many of the people who work there.

I will copy and paste my reply from your last thread:

NASA, like many large organizations known for prestigious accomplishments, can struggle to meet the needs of those doing the enterprise work (like internal HR apps as you mentioned). Especially in a place like the government, people's ego's feel small, and they try to compensate by making their projects bigger, grander, and overstated. This just results in bloat, too much talking, and BS as you mentioned.

I'm sorry you had that experience. If you ever consider coming back, I'd recommend getting involved with something more directly related to space applications.

I never knew what code post-80's NASA regulars actually wrote. When the MCC was moved off of the mainframe (MOC), it was all contractors. A crazy mix of skillsets in C/C++. A lot of subsystems/silos. Working across silos, it was incredible the brain trust you could meet from the old crowd (often retirees brought in for 2AM tests). I learned a lot of kids were afraid to say wtf. These old coots would love to talk for hours if you asked why something was done that way.

Considering the TRAJ assembly probably went back to DEW Line, I have no idea how they peeled that logic off. I still run into LockMart folks running ancient ATC code on Unix variants that have been out of service for over a decade.

Our code will never last that long.

I've also heard 1st hand stories and there have been multiple comments on HN regarding NASA.

The best I can tell is that the big difference in perception comes from people who worked on code for manned flights or anything that will interact with an actual human being vs non-manned and even non-mission critical code.

It seems that anything that will come close to a human is developed like it's the new arc of the covenant while for anything else it's YMMV based on the project and the team.

NASA has a classification of software into class A, class B, class C, etc. Class A is human spaceflight, class B is critical non-human spaceflight, and the rest is non-critical software.

The amount of process and oversight is highest at class A, and lowest at class E. If you are developing class E, you can do pretty much whatever you want and your team doesn't even need any training in software engineering.

This is completely true, in my experience.

If you want some fun, track down somebody working at the NASA Enterprise Applications Competency Center and ask them about the Office of Education software. (Me, I know nothing, through an exercise of good luck and skill. "No hablo ingles. Tengo hambre. Mucho muerte. <feigns death>")

I'm not sure about that. I worked at MSFC before and after Constellation was canceled and I'm not sure the perception isn't due to believing their own propaganda. From a short distance outside, it didn't seem to be running very well.

(I have a friend and excoworker there who was an aerospace engineer, specifically concerning the design of solid rocket motors, who explained why Constellation was not going to match its hype: you can't add a segment to a SSRB to get a bigger motor; it's a complete redesign anyway.)

To me, the point of career comparison is probably more for new programmers (e.g. assigning a mentor) rather than long term...though for a US Federal government job, fifteen years in not particularly long term. I mean the experience you describe would not be a surprising description of Google or Apple or IBM. Nor would it be surprisingly applied to the US Park Service or Harvard. Those features are applicable to pretty much every large institution.

Early in a career, can be a good time to try out large institutions because large institutions tend to have explicit structures for on-the-job training and career development. Early in a career is also a good time for some people to decide "big companies ain't for me" and for other people to decide "this is a good fit." Over time people in the startup ecosystem often become just as cynical as career government employees. I think it's more a function of age and experience and personality than anything else. Bright eyed and bushy tailed doesn't go with gray hair.

Not mentioned: Some software at NASA gets "human in the loop" testing. I work with GSFC, and that makes a lot of things difficult and very slow.

> ... they test the living shit out of it. Then, once it's working, they don't change it except under extreme pressure.

I wonder if self-driving car manufacturers will apply this same philosophy.

They're gonna have to. The first self-driving fatality (that isn't the fault of the driver or nearby drivers) is going to very severely impact any self-driving car manufacturer's brand.

Hasn't a Tesla self-driving system already caused a fatality when the system failed to identify a truck approaching from a perpendicular angle?

As far as I know, all investigations into Tesla self-driving fatalities didn't find the system at fault.

The author is young, and has been there for 3 years. I'm guessing this is anecdotal, and he happens to have had a good experience. Always possible things have improved though?

The fundamental truth that must be considered of any article like this is "YMMV". (Your mileage may vary)

Especially in a big organization, things are different depending on the division you're in and where you work. I work at a large .gov institution and am only here a decade later because my leadership team in the early days were great bosses and amazing mentors. They taught me a lot both in a professional and personal level. I aspire and in many ways fail to be the same.

I look at other parts of the organization and see great places as well divisions where employees are treated like crap and are demoralized to the point of depression.

Working in a "good part" of a big company or institution can be amazing.

This is one of the best-chosen comments in the thread. The OP has encouraged commenters here to make generalizations about "NASA" software, but this does not make sense. It's too big, too dispersed, and the software developed is in a big range from research codes to operational codes to human-rated (see upthread: https://news.ycombinator.com/item?id=14886727).

In short, s/w practices and the dedication and capabilities of the people writing the software vary too much to characterize at that level of granularity.

I don't think you contradict each other. You just seem to focus on different sides of NASA development process.

Does that mean that as long as you test the shit out of it, it doesn't matter how it gets done?

No, it still matters. But exhaustive testing can produce reliable products from unreliable processes. The extreme example of this is evolution itself.

Pretty sure that if repeated tests reveal aspects of the program to be deficient, the "how it was coded" will come out the proverbial wash.

That seems a reasonable position - yes.

Reasonable if you ignore maintainability and standards compliance.

Not a coder, but if you used an idiom that worked because of a compiler quirk then that could lead to problems down the line but related to maintenance per se, hence why I added the second part.

I doubt NASA would ever ever upgrade compilers on software once tested. If it is like a aviation, there might even be manual passes over the assembly code for further verification.

I was thinking in a historic archiving sort of situation, bringing to mind recent stories of NASA code being unearthed. Basically I was expecting "it passes all the current unit tests" to be an insufficient standard for NASA coding.

In any mission critical situation the entire tool chain is just as locked down and controlled as the "source code," even down to the silicon.

the ISS laptops run on commodity hardware and software, often undocumented, and from my interview with that team the notion that they have some fully deterministic system set up for their operations is horrendously false, and they've run into many issues because of it

As an aside, is JPL really NASA? Whatever happened with the lawsuit about HSPD-12 badges?

JPL's status is unique. It is technically an FFRDC (federall-funded research and development center) whose money comes from NASA. It is considered a NASA center, but it is unique among NASA centers in that its employees are Caltech employees, not federal government employees. That's why I added the prominent disclaimer. Whether or not it's "really" NASA is somewhat in the eye of the beholder. But they are definitely not a private enterprise like Space-X.

If a lot of process is applied to verify the robustness of the code, then to me the overall development process is not a sausage factory.

> This article is very much at odds with my 15-year (1988-2000, 2001-2004) experience writing software at NASA (albeit at JPL, which is to say the unmanned space program, but still NASA). In my experience there was just as much politics, marketing hype, and general bullshit as I've seen in the commercial world.

An institution's philosophy is almost always bullshit. Whether it is Google's "Don't be evil" or north korea's "worker's paradise" or our own "capitalist democratic paradise".

The media isn't a fount of journalist integrity. The government isn't a found of "goodwill towards its citizens". The military isn't "defending the peace".

It's sad but the more you work and the older you get, the more the facade crumbles. At the end of the day, it's just fallible humans trying their best to get by.

Here's a paper[1] for a space conference in Montreal written a few years ago on DO-178 standards and applying the FAA's certification process to space systems development (DO-178 sets the bar for "testing the shit" out of software, I've seen projects held up for years because it's like approaching a limit f(n) -> L (where f(n) is the software development cycle and L is getting a non-bought DER to sign off on it) proving your system to its requirements).

[1] http://articles.adsabs.harvard.edu/cgi-bin/nph-iarticle_quer...

DO-178B was one of best things to happen in software assurance. It was a non-prescriptive method of certification that forced precise requirements, design, and code with reviews of each and tracability between them. The high cost of certification and certification failure led to a booming ecosystem of supporting tools (esp static analysis or runtimes) and pre-made components (esp RTOS's & middleware). Some companies even specialized in handling the administrative overhead. The result was that certified systems usually had very, high quality. Meanwhile, there were people arguing on HN and other places about whether a certification could ever be useful for software.

Anyone reading the above paper should skip to the summary to get a high-level view of the overhead and resulting benefits. The author of the paper agrees with me that the certification is beneficial. The gist is you keep things simple, think about everything in lifecycle, test everything in lifecycle, throw every analysis tool you can at the design or code, esp take your time to develop the thing, and document all of this in a way where any aspect is easy to verify by a 3rd party. That's how high-assurance systems are done. They usually work, too. Some work for a long, long, long time. :)

Applied very narrowly to the right problem, maybe.

But I have also seen DO-178B to lead to a completely non-maintainable mess. That mess helps trace each statement to a requirement, but leads to code that is unreadable to anyone used to normal code.

It is not that "no extra code" and exactly meeting requirements is a bad model in itself. However writing code in a way that makes it convenient to prove the above encourages unnatural coding styles and programmatic contortions. YMMV however.

This happens a lot. IMO, the best method to mitigate this is to design and write your code base to the system requirements rather than try to perfect the low level requirements before writing any code.

You can then reverse-engineer the low level requirements from source, deriving your verbiage from the system requirements and logic from the actual code. This way your low-level/unit testing is precise and verifiable while also ensuring strong traceability up and down the chain.

If there ends up being low level requirements/code that don't trace up to the system for some reason, then either there's been a disconnect between design and execution or, if not, you make the case for a derived requirement.

I agree that the left hand side of the Vee has to be seen as a two way street. As we develop the code we gain experience and insights which enable us to elaborate and refine our requirements. The requirements then act both to guide the development of the system and also to document the lessons that have been learned during development. The output of early requirements elaboration phases are then draft, rather than final, requirements sets.

The key, for me, is the tooling that we have around maintaining requirements and the traces between them and the code. I think that it should be possible to navigate traces and update (low level) requirements and test specifications without leaving the text editor. If it takes 5 minutes to open DOORS and to find the relevant requirement, then you aren't likely to keep it as up to date as you should. The same is true if the requirements are stored on a spreadsheet in a document management system.

Some ALM/IDE tools may enable you to manage things in a better way. From what I have seen, mbeddr is a really interesting experiment along these lines. I also suspect that Visual Studio and Eclipse have some pretty powerful features in this regard.

In my side project (and purely for my own 'entertainment') I am experimenting with ways of embedding low level requirements into comments in the code (the build extracts them and can update DOORS or some other tool), as well as a generic trace system that allows different classes of traceable item to be defined and indexed. The build can enforce requirements coverage in the same way that it enforces test coverage, and because the text and metadata of requirements is easily accessible to the build, we can use NLP tools to enforce the use of restricted natural language or DSLs in the requirements text. The ultimate goal being to use machine learning to learn the correlation between requirements changes and code changes so that a change in one can help generate advice and guidance on what needs to change in the other.

DO-178C has a formal methods follow-on. I haven't seen it, though.

The article severely misrepresents Amdahl's law, in particular in the sentence:

> Though you took half of your entire program, and optimized it to be four times faster, the overall system is only 1.6x faster.

This application of Amdahl's law doesn't tell you anything about what happens when you take half of the program; it tells you what happens when you take half of the execution time. If you only get a 1.6x speedup, it means you optimized the wrong part of the program!

Suppose 20% of the program is responsible for 80% of the execution time. If you speed up this bit four times, you end up with a 2.5x speedup (and not a 1.18x speedup as the article tries to imply).

Basically this. The article makes it sound like "optimizing half the program" means "optimizing half of the codebase", not "optimizing a part of the codebase that consumes 50% of the execution time."

This quote is especially misleading:

> Amdahl’s Law is telling us that to significantly improve the speed of a system, we’ll have to improve a very large fraction of it.

No, it's telling us we'll have to improve the part of it which is responsible for a very large fraction of the execution time. Often this is only a tiny amount of the overall codebase.

I always found this article(1) about writing code for NASA fascinating. I'm sure a lot has changed in the 20 years since it was written, but it's such a stark contrast to what I'm used to.

(1) https://www.fastcompany.com/28121/they-write-right-stuff

20 years ago:

* 420,000 LOC

* 260 people

* $35,000,000 / year

There's something to be said about doing more with less.

Software is not Modern or Ancient. Software is either Good (i.e. it works as it is supposed to do AND it is easily maintainable and modifiable) or Bad. NASA clearly needs Good software, and indeed the article never mentions arbitrary deadlines.

I'd like to see how this compares to, SpaceX for example

When it comes to physical components SpaceX has a habit of selecting an off the shelf component and asking the vendor if they can get a modified version that includes some changes that make it capable of living in their application.

That's great for reducing cost (or increasing quality or buying time, they're all related) but in the long term it often leads to spaghetti.

Considering the time pressure they're perpetually under and how Elon is known to run his organizations I would suspect their software products are no different.

On that note, what language(s) are SpaceX, Tesla built on?

Job postings I see a mix of C# and golang. I cannot believe either a Model X or a rocket is rockin C#...

The software team did an AMA on reddit a while back: https://www.reddit.com/r/IAmA/comments/1853ap/we_are_spacex_...

Basically it's C/C++ (as expected) for critical stuff, like the rocket or dragon, and C# for all enterprise systems. Plus an assorted set of other languages/technologies depending on team and project.

Probably not on the rocket itself, but C# is a very common language for ancillary testing tools in the industrial automation world. I wouldn't be surprised if a lot of their test infrastructure is written in C#.

Pity. I've been using Python a lot for test infrastructure and it fits really well.

The last I heard, from outside the loop, there was considerable concern about how they were going to get man rated.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact