Hacker News new | past | comments | ask | show | jobs | submit login
Ingenuity had more computing power than all NASA deep space missions combined (arstechnica.com)
104 points by rntn on Jan 29, 2024 | hide | past | favorite | 64 comments



My dad's friend worked at JPL on Ingenuity's software. When I talked to him about it in 2019, he was sure he was going to retire after the rover launched/landed, because he thought all the consumer hardware would probably not work on Mars for very long and there'd be nothing to do after that.

I remember he was ranting about how they used Python. Like they had so much compute power available they could just "waste" it running "slow as hell" Python. It was such a departure from all his previous rover missions where they very judiciously optimized low level code.

When we met up in 2023, we were still surprised he was working. He was too since he didn't expect Ingenuity to be in service for that long, but he figured, "well, no one's going to train a replacement for me, might as well see my last mission to its end."


I understand that point of view. It just missing that most of the actual scientists are not top programmers. It is hard to be both. You can hire professional programmers to do some work but that has its limit and you will need more active participation from actual scientists. We usually have this discussion at CERN. But usually the very delicate components and code that works in harsh conditions and time constraints are done in a more throughout process with help of professional programmers (many of them are very capable physicists) but the majority of physicist can work in a world where what is important is to get something done in reasonable amount of time. Usually physics analysis where physics and statistics is more important than the implementation itself.

I don't know about the dynamics of how this plays out usually in NASA but I would think this problem is there and the approach is probably the same too.


I believe it was the same people, people that used to carefully plan how they'd use their 100kB of ram budget in C that then switched over to mission-of-the-week Python scripts.

The code that runs on Mars was always written by career programmers, not the rocket scientists. It was the mentality that they didn't have to carefully plan resource use, or even care too much about it, and if it didn't work, they could just ship a patch was just a giant cultural shock for them, because it was so different from how they wrote mission critical software in the last several decades.


Is there anywhere to read about what CI/CD looks like for a Mars rover? Curious what a pipeline for something like that looks like and how they account for bugs, what kinds of conventions they must use to keep errors from occurring.


I would be shocked if patches are applied to a mars rover in any way that's not got a human at the helm...


What's really interesting is that this has almost always been a debate. Back in the 80s, the average microcomputer could basically do nothing by today's standards. But they were still faster than a human. They were so fast in fact that most people used them with a very slow interpreted language called BASIC which was perhaps a hundred times slower than machine language and ate up a lot of the very precious memory. Still, people did a certain amount of useful work with these systems, ran businesses, managed finances, simulated reactors, etc.


Key quote:

"The miracle of Ingenuity is that all of these commercially bought, off-the-shelf components worked. Radiation didn't fry the Qualcomm computer. The brutal thermal cycles didn't destroy the battery's storage capacity. Likewise, the avionics, sensors, and cameras all survived despite not being procured with spaceflight-rated mandates."


I assume that, despite their lack of design pedigree, JPL still thoroughly tested all these components and discarded many that didn't make the cut.


That’s gotta hurt the radsafe equipment manufacturers


Somewhat, it will definitely tip the needle for those who are on the fence about their risk posture with radiation. But the surface of Mars is by no means the worst radiation environment. For overall dose levels, the surface of Mars is about 2.5x worse than being on the ISS [0], which is in general a very low radiation environment.

If you look at slide 15 of this presentation [1] "TID (Total Ionizing Dose) Mitigation", the ISS would be in the "LEO-LOW" category on the curve. When you start looking at the MEO and GEO orbits you have to start contending with the trapped proton and electrons of the radiation belts. Not a whole lot in MEO outside of the various GNSS constellations, but tons of things in GEO that still have doses that are orders of magnitude more.

[0] https://mepag.jpl.nasa.gov/topten.cfm?topten=10#:~:text=Mars....

[1] https://www.osti.gov/servlets/purl/1524958

Edit: Also worth noting that the radiation environment for Single Event Effects (SEE) is going to be comparatively worse than the total dose when compared to the ISS. SEE are cause by individual high energy particles. This will be worse on Mars due to the lack of magnetic field. So from that perspective it is quite a bit worse. However, if you go look at my other comments in this thread you can see that they specifically did SEE testing and not TID testing on the Qualcomm chips.


For what it is worth I am confident Sony camera modules can handle about 24-48 hours in the typical boiling water reactor for visual inspections. We had to replace them constantly. Near the fuel band it looked like Mardi gras with all the radio caused static.

Does anyone know what type of camera they used on the rovers.


Maybe, maybe not.. human-rated equipment will (almost certainly) always be held to a higher standard than autonomous drones/rovers.


I'm hoping their market will stay the same or grow as more space missions with human pilots/cargo take place in the future, but I'm thrilled about the possibility of many more 'cheap and cheerful' automated missions running on cheap hardware. Think of all the possibilities that open up if you can deploy 50 little rovers, accepting the fact that you'll lose 10 or 15.


"The processor on Ingenuity is 100 times more powerful than everything JPL has sent into deep space, combined," Tzanetos said. This means that if you add up all of the computing power that has flown on NASA's big missions beyond Earth orbit, from Voyager to Juno to Cassini to the James Webb Space Telescope, the tiny chip on Ingenuity packs more than 100 times the performance.

This is what blows my mind. a 2015 smart phone has more power than everything else. proof that modern programming can be massively wasteful on resources.


> This is what blows my mind. a 2015 smart phone has more power than everything else.

Not really. The things that regular end-users do on their smartphones are computationally much more intensive than what you deploy on the edge in space. I'll be the first to gripe about the inefficiencies of modern front-end programming, but the software on a Mars rover also just doesn't have that much number-crunching or throwing around ginormous assets to do.

A much more interesting question is to ask what you could do on Mars if you had that compute power. For example, how realistic is it to expand further on autonomous capability.


Percy has some degree of autonomous driving capabilities, requiring fair bit of image processing. I'm sure the autonav programmers would love more compute to be able to literally drive faster; the improvements from Curiosity to Perseverance already made huge difference.


The question is whether you need to. Yeah, it would be cool. Yeah, you could probably go faster. But usually, none of that is worth it. The science can still get done without it.


I think it’s a bit more complicated than that - yes, a ton of great science has been done but there is also a considerable amount of time spent working around limitations, too, and some of those lead to less science being done over the life of the mission.

As a simple example, greater autonomy might allow the rovers to do more by avoiding the number of times where they have to wait for the speed of light (20 minute one-way latency plus annual blackouts) & bandwidth delays for someone at JPL to learn about an obstacle, decide what to do, send commands, and see what happens. They’ve spent a lot of time doing that cautiously because the failure mode of some outcomes is losing a rover, but there are other scenarios where the same is true in the other direction so I’m sure they’re keenly working on ways to make it better able to handle safety navigation and various recovery scenarios for things like losing communications, but I’d expect that might come in the form of a system which operates as they have but logs what it would have done so they can compare the human and robotic commands.


You have to understand that most space missions are essentially webcams attached to a remote-controlled vehicle. You don't need hardly any onboard processing 99% of the time if you're just executing a sequence of actuator commands and maintaining PID loops.

It's only very recently that they've started to look at what the sensors are giving onboard the craft. I'm glossing over some very important details, but mainly, the sentiment holds: Spacecraft didn't have to do much, so they didn't have to think much.


The most intensive work spacecraft used to have to do would be to apply a (lossless, hopefully) compression algorithm to send more data over a limited high latency link. I wonder how much forward error correction capacity they allow for.


More than anything, this is great evidence that traditional aerospace has greatly overestimated the risks of using mass-produced terrestrial hardware.


It certainly points in that direction but it does make me wonder what it’d look like with n>1. Anything crewed would be more conservative but you’d also want that for other things - it’d be really bad if you had a correlated failure on an entire satellite network after a freak event or because a problem only showed up, say, 4 years in.


I think the networks actually get more reliable not less, because they're designed to be redundant and you have so many nodes one failure is tolerable. That's how it is for us at www.careweather.com.

I guarantee Starlink isn't using components from the 1990s.


Oh, I’m sure they’re big on redundancy. I mentioned correlated failures, however, because that has multiple failure modes and the one I was thinking about is something like physical degradation: redundancy doesn’t help if all of your redundant nodes have a component which fails at roughly the same time. There’s a reason why storage admins learned to mix hard drive brands and manufacturing batches in RAID arrays, and that problem is much harder to solve in space.


> proof that modern programming can be massively wasteful on resources

Depends what you consider resources. CPU cycles are resources. Programmer hours are resources.


> Proof that modern programming can be massively wasteful on resources

Alternatively, consider the possibility that what a smartphone does is a lot more computationally complex than you realize.

Disclaimer: I worked on smartphone chips for many years, including the Qualcomm chip that ended in Mars.


the two sentiments are one and the same; the phone does indeed do a lot: I can observe that it is slow to load things. Thats because it's doing computationally complex things. The problem is I don't want it to be doing computationally complex things. I want to use an irc chat equivalent. The reasons behind it may be all sorts of fancy features im not using - the engineers that made those tradeoffs on my behalf, collectively, have added up to an enormous amount of wasted processor power.

The things that seem the most computationally complex, like viewing hd video streamed over the internet in realtime, work fine on my phone. But mundane tasks I know shouldn't by any rights be complex regularly take forever to complete. Because they're probably negotiating some nonsense protocol for the bluetooth headphones I connected three weeks ago while facebook tries to access my gps signal for the tenth time this second while sending an updated list of every gesture, tap, and scroll to thousands of separate parties, each individually as a separate task


> The problem is I don't want it to be doing computationally complex things. I want to use an irc chat equivalent

Have you considered the possibility that the phone is doing what most phone users want, and that their needs being more complex than yours doesn't mean that "programming can be massively wasteful on resources"? Instead, the phone you purchased is not fit for your purposes.

> But mundane tasks I know shouldn't by any rights be complex regularly take forever to complete

It is also possible that what it takes to complete that task is somewhat more complex these days than you realize. Not because it's poorly programmed, but because the software is designed to support many more features than you are currently using. An individual user may only rely on 5% of the features, but different users want a different set of 5%. Supporting other people's needs isn't any more wasteful than supporting your needs.


One Voyager probe cost a billion dollar. I‘m not sure your Todo app can afford to be written in assembly.


Does this mean the radiation hardening is a bit overkill for some applications ? I mean, even if the 801 Soc is really weak against radiations (apparently, not that weak for a 90day+ mission), it might be better to just throw 2 or 3 of them with redundancy instead of going for the good old RAD750 and its pound of weight, crazy cost and weak performances ?

Or is just crazy luck and the thing avoided the beams ? Might be plausible, if the last flight is not explained, it was the one flight with less radiation luck ?


I know this is what most of the SpaceX mission control systems do. They're set up with ~3x/5x redundancy for effectively consumer hardware. If they detect memory misalignment between 1/3 of the nodes, they'll power cycle it while the other 2 take primary control. Big cost savings by not having to purchase radiation-hardened hardware and AFAIK they see similar reliability.


IIRC they have 3 pairs (6 total) and if any of the pairs have a mismatch they powercycle them both and let the other 4 continue as-is as you described.


Iirc, that's how the space shuttle control systems worked as well, but they were not COTS...


I believe the shuttles used five computers - four ran the same software and voted if there was a discrepency, whereas the fifth ran simpler independently developed software in case a common bug rendered all four primary computers inoperable.


Yes, most of the manned missions (at least Apollo and beyond as far as I'm aware) use the belt and suspenders approach of triple voting rad-hardened processors. So 3x RAD750s in a voting redundancy for main control systems.


I'm very curious if any analysis on this can be done and published, though I imagine it's difficult without access to the actual equipment.


Goddard does do a lot of radiation testing on commercial chips. I believe they usually publish the results and should be able to look it up.

Edit: database here, but I haven't looked for data on the chip: https://radhome.gsfc.nasa.gov/radhome/raddatabase/raddatabas...


Interesting data - I couldn't find the Snapdragon 801 ... but they have tested a nVidia Geforce 1050 (https://nepp.nasa.gov/files/29573/NEPP-TR-2018-Wyrwas-TR-17-...)

I wonder why they are testing GPUs? I would imagine there are at least some people somewhere writing papers about sending GPUs on rovers etc for better on-device AI/ML (e.g. image processing)

Also: brief mention of raspberry pi (no dedicated wreite-up like the GPU): https://nepp.nasa.gov/files/27888/NEPP-CP-2015-Campola-Paper...


Yeah I didn't find it there either, but there is an IEEE paper with testing [0]. Looks like they only did single event effect testing (SEE) and not total ionizing dose (TID) as well. The SEE is for looking and effects for individual memory upsets, latchup, gate rupture, etc. You can still do a lot with a chip if you are only getting the non-destrucrive effects with memory scrubbing and resets (as long as you're not mission critical). But the destructive effects (like gate rupture) are the ones that could make this totally infeasible depending on the limit.

Not surprised that they didn't do TID as it was only supposed to be a 90 day demo mission so not terribly long. And TID can be a costly test because it can take so long.

As to the GPUs, yeah the main application is some processing when you are comm bandwidth constrained. Then you can send down processed products or snapshots from something that is collecting a lot of data (like large imager or high bandwidth SDR)

[0] https://ieeexplore.ieee.org/document/8906649


Primarily for machine vision applications, both for robotic satellite servicing and for e.g. processing earth imagery data onboard to reduce downlink requirements.

Goddard does quite a lot of work on camera-based RPO (rendezvous and proximity operations) which require cameras and lidars to image a client spacecraft and calculate relative pose. This is the single most computationally taxing operation for robotic satellite servicing missions.


It stopped being about reliability twenty years ago. Now it's about lining pockets. The current top of the line RAD5545 is based on a chip from 2010 with 45nm lithography. It consumes 20 watts to perform 3.7GFLOPS. A Snapdragon 8 Gen 3 has a TDP of 12w and does 4.7TFLOPS. Over a thousand times the performance at less than a thousandth the price.


Radiation tolerance goes down with node size. Companies aren't pushing much smaller because they there's nothing they can do about the increased upset rates at those sizes.

Rather, they're turning to other options to achieve rad tolerance in critical systems, stuff like soft cores specifically designed for redundancy running on radiation hardened FPGAs.

Also, the rad tolerant parts aren't there to do the crazy high speed work. When people talk about voting architectures with redundant COTS parts, what do you think is counting the votes and resetting the effected systems?


So the big change seems to be foregoing the usual radiation hardened components. Was the calculation that this was less risky for planetary exploration, or the standard risk was acceptable given that it was not the central thrust of the mission? What would the total cost look like if you could launch several redundant cheap probes instead of an NASA-grade one anytime this is a question?

Interesting to see how the lithium batteries did with such extreme temperature cycling. And here I am bringing my mower batteries inside in a cold snap.


My guess would be the latter. Ingenuity was primarily a technology demonstration whose failure wouldn't impact Perseverance's main objectives, so the risk was probably acceptable.

As for launching several cheap, redundant probes - I think the biggest issue would be the cost of the scientific instruments. I don't have the background to know how much money could be saved you by using commercially available equipment. The basic scientific requirements might be too specialized for cheap commercial equivalents to exist, though maybe not; I really don't know.


The scientific instruments need to be mass-produced, too, of course. A 7-digit number of rovers would be extremly cheap per piece. Shipping would become a problem though.


Heh. I have long advocated that we attempt dozens of "Deep Impact" style missions against the NEOs - short mission lifetime, perhaps months from launch to impact, a high launch rate - would bring the cycle time from innovation to exploration way, way down.


Unfortunately I think this will have less of an impact than many people think. As expensive as the radiation hardened hardware is, it's still series produced stuff that doesn't need reinventing for every mission. Almost all of the costs are in development of the unique scientific instruments each mission needs, and in keeping the mission teams staffed.

For example NASA's Lucy mission that launched in 2021 had a just under $1 billion budget, in which $560m was spacecraft development and $280m was for keeping a mission team staffed for 12 years. The launch on an Atlas V was $150m. What's $250k for a radiation hardened CPU in that? It's not even peanut crumbs. And as expensive as the launch was, even if SpaceX and other new launch companies drive it down by 90%, the budget for this mission would still be about a billion dollars.

Heck, even if you started series production of probes and their instruments and as such were able to amortize the development costs over a bunch of spacecraft, the fact remains that if you wanna go somewhere interesting it's gonna take years and you're gonna have to have a staff of mission specialists on retainer for most of that time. You can't disrupt away the fact that space is really big, at least not until someone discovers some new physics.


I have long advocated that we attempt dozens of "Deep Impact" style missions against the NEOs - short mission lifetime, perhaps months from launch to impact + a high launch rate - would bring the cycle time from innovation to exploration way, way down. The "Pluto Express" abuse of the word "express" rankled. Not having to fund a team for decades per single mission would be a savings also. If we shift our foci to closer targets, outside LEO, by delta-v and flight time, we will evolve our spaceflight technologies much faster. The moon is a lousy target, because landing on it, and returning from it is so hard, and Mars is too far.


Note, also, that just being on a planetary surface significantly reduces your radiation exposure even if the planet doesn't have magnetic fields. The body of the planet shields you from the sun half the time. A thin atmosphere also has some effect.

In addition, being on a planetary surface greatly reduces the thermal cycling. In free space, the instantaneous thermal gradient on your spacecraft could be several hundred degrees, which the spacecraft either has to mitigate via thermal control or it just has to take; usually it will do some of both, so the electronics do need to survive a not insignificant thermal cycle.

You never get that on Mars. The planet is a huge heat sink.


How about during the journey in space, where there's more radiation? Was any shielding provided during transport? (e.g. even from the position of sensitive components relative to the rest of the mothership).

I notice the two flight control MCU's connected to the Qualcomm CPU were still radiation-hardened.


Sure, there's some radiation, and it's shielded to some extent. But 10 mm of aluminum isn't that great at radiation shielding. The big difference is that deep space missions stay in space for decades. The rovers were in space for about a year.


My weed whacker batteries stay inside because the summer heat makes them nearly unusable.

I am surprised you are mowing the lawn and a cold snap though


Yeah, I keep the batteries inside in the hottest and coldest weather. That cold snap is bookmarked by spring-like weather, and the grass is lush and wet and not fun to walk in.


Maybe it's usually stored in an unheated shed or similar.


> an automobile-sized nuclear-powered drone over the organic-rich sands on Titan

Wow! Can’t wait for Dragonfly!


So Titan gets its flying car before I do. 1960's called and asked "what happened?"


"The processor on Ingenuity is 100 times more powerful than everything JPL has sent into deep space, combined."

I think that's a bit exaggerated. The RAD750, which has been used for all sorts of devices, runs from 110 to 200 MHz, typically. A Snapdragon 801 is a quad core 32 bit CPU that can run up to 2.36 GHz. Even if the Snapdragon's IPC is twice that of the PowerPC, even if the Snapdragon is run at its full clock speed (which it certainly is not), and even if the comparison was with the slowest 110 MHz clock of the RAD750, that'd be a factor of 172 times faster.

I think that the person saying this is comparing the floating point throughput of the RAD750 and other CPUs sent in to space with the floating point throughput of the Adreno GPU built in to the Snapdragon 801. That's both misleading and not how things work. But I suppose it makes for catchy headlines, even if it's not factually correct.


Snapdragon also has SIMD instructions (NEON) which can bring 2x-16x performance improvement depending on the algorithm.


I'm no expert, but I think more than these things go into it (and I suspect way more than a 2x IPC bump, but that's guesswork). Also more cache and I'm sure more and faster memory.


The point I was making is that even if one assumes the worst about the PowerPC and the best about the Snapdragon, the comparison doesn't make any sense.

Even if the IPC were FOUR times that of the PowerPC and we accepted the other assumptions in favor of the Snapdragon, that'd make one quad core Snapdragon 344 times faster than the slowest RAD750, so if four or more RAD750s have ever been sent to space, then that'd make the Snapdragon LESS than 100 times faster than just four RAD750s. That's not taking in to account all of the other CPUs used in space missions.

Larger caches and faster memory are necessary to just maintain the IPC - they don't mean that the Snapdragon is faster per clock by virtue of those alone.

I have a 600 MHz PowerPC 750 system (original iMac with Sonnet accelerator) and a Cortex-A15 system (the Cortex-A15 is a 32 bit ARM from around the same time as the Snapdragon 801, and which has better IPC than the Snapdragon), and I can say that the IPC of the Cortex-A15 is much closer to 1.5 times than 2 times that of the PowerPC.


What are they planning to install as the computer for the Titan helicopter mission launching in 2026?


You know, I've always wondered why we've not used COTS parts for space missions. Every time I've heard of the next rover etc, it's been PowerPC CPUs from 20+ years ago.

"Radiation hardened" is what I've heard so far. But as I understand it, the work done to actually improve chips is small compared to the huge number of certifications they have to test / pass. Sounds like a LOT of paperwork.

It's similar with in-flight infotainment or in-car systems. Back in 2020, when I was shopping for cars, the car salesperson's big pitch on innovation was "touchscreen". I was very underwhelemed.


Original article about Ingenuity using this processor (2021):

https://news.ycombinator.com/item?id=26177619


TL;dr an interesting frame through which to view Ingenuity use of off-the-shelf components,

is as an indicator of where we are in the Moore's Law (etc.) type views on the inverse performance and cost curves in our computational infrastruture.

There was some implicit dotted line in N-space representing the necessary reliability, performance, weight, etc. characteristics required for multi-million-dollar NASA missions.

The take-away is that we have now advanced the industry generally such that with are we can be over that line.

I can connect this, observationally, to a hardware startup I was employee #1 at many years ago. At the time a friend started his company, it had only just—perhaps in a 12 month window—become viable for two people to design, program, and ship a hardware product based on embedded systems on microcontrollers, fast-turn PCB fabrication, and to do mechanical design in e.g. Solidworks on a PC. At the time, data sheets were still delivered via FAX trees. McMaster-Carr and Digikey were all we eneeded. It COULD be done, and we did it.

It felt at the time like what we were able to do represented a collective crossing of some event horizon. The garage was back, baby.

That we have now crossed a similar threshold wrt putting largely autonomous flying bots on Mars I didn't foresee. Nonlinearity is hard.

But at moments like these I do like to try to look over my shoulder, ahead. Where in another 25 years?

I'll play. Autonomous mesh-networked group-mind self-repairing and possibly bootstrap-replicating vestigial von Nuemann probes, populating and mapping the ocean worlds of the gas giants.

And I'll bet that's conservative—one X factor being just how long it takes to get out there. But I'm cautiously optimistic we will be getting there in sub-year travel times by then.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: