>The analysis reveals that the lander fully completed the entire planned deceleration process, slowing to the target speed of less than 1 m/s in a vertical position at an altitude of approximately 5 kms above the lunar surface.
Ouch.
What seemed to happened was
1. The probe descended normally.
2. Glided horizontally over a 3km deep cliff of a crater
3. Sensor suddenly gives large altitude change
4. Onboard computer sees the sensor value change is larger than expected, deduced the sensor is mis-reading, and filtered the (correct) value.
>as the lander was navigating to the planned landing site, the altitude measured by the onboard sensors rose sharply when it passed over a large cliff approximately 3 kms in elevation on the lunar surface, which was determined to be the rim of a crater. According to the analysis of the flight data, a larger-than-expected discrepancy occurred between the measured altitude value and the estimated altitude value set in advance. The onboard software determined in error that the cause of this discrepancy was an abnormal value reported by the sensor, and thereafter the altitude data measured by the sensor was intercepted. This filter function, designed to reject an altitude measurement having a large gap from the lander’s estimation, was included as a robust measure to maintain stable operation of the lander in the event of a hardware issue including an incorrect altitude measurement by the sensor.
It's also explained further down in the article why the software was programmed the way it was:
>One major contributing factor to this design issue was a decision to modify the landing site after critical design review completed in February 2021. This modification influenced the verification and validation plan despite numerous landing simulations carried out before the landing. ispace as the mission operator maintained overall program management responsibility and took into account the modifications in its overall analysis related to completing a successful mission. It was determined that prior simulations of the landing sequence did not adequately incorporate the lunar environment on the navigation route resulting in the software misjudging the lander’s altitude on final approach.
TL;DR: Plans were modified after the software was programmed, software was not sufficiently reprogrammed due to overreliance on old, pre-modification simulation data.
As human errors go this looks egregious. One hopes their subsequent missions don't run afoul of the same screw ups.
> overreliance on old, pre-modification simulation data
It somewhat reminds me of the Genesis sample-return mission’s landing failure.
There the parachutes failed to open. The parachutes failed to open because the accelerometer intended to trigger them did not trigger. And it did not trigger because it wqs installed according to the plans, but the plans had them upside down. And they didn’t catch the issue because the submodule in question has already flown, and thus was deemed not necessary to review it in details. But what changed from the previously succesfully flown configuration that they turned the submodule around. So they introduced a change without realising that this change invalidated some of their previous tests/analysis.
Obviously the technical root cause is very different here (software vs hardware; landing site change vs submodule orientation change), but the organisational root cause is similar. A change invalidates assumptions in previously performed tests/analysis/review, and nobody spots this thus the test/analysis/review is not performed again and a problem sneaks in.
This is something I’ve long struggled with: how you capture assumptions tied to decisions so you can revisit the decisions when the facts on the ground change.
(Architectural) Decision Records should explicate assumptions behind the decisions. There will be implicit assumptions too, but at least you can later go back to the records and analyse what was implicit in them.
Yup, we have a strong culture around this at work. Problem statement, decision drivers, decision criteria, alternate options to seriously evaluate. It should be clear why one is recommended over the other, and under what circumstances other options might be preferable.
requirements traceability - dependency tracking/tracing graph, upwards (because of) and downwards (therefore).
It is a very tedious process, Very few companies do it.
But once the graph is built, one can poke questions on it, like what would/might be affected if node X changes. And any assumptions also counts as requirement of sorts
Is there something like penetration testing for projects? i.e. a business unit that is working fulltime on identifying “what could possibly go wrong” in very complex projects.
Of course, since software errors are inevitable there is no other way than to test extensive and systematically. In Europe adaptions of the V-Model [1] are in wide spread use. Even for smallish projects, expensive simulation of substantial cost is completely normal [2].
Hence there should have been a test software component, that can simulate realistic landing data. This component failed to identify the error, which is nothing more than a big blunder of the testing team.
Yes, though what such things are called varies by domain, audits of various types are popular terminology. In engineering independent technical reviews can have significant legal implications.
Ouch.
What seemed to happened was
1. The probe descended normally.
2. Glided horizontally over a 3km deep cliff of a crater
3. Sensor suddenly gives large altitude change
4. Onboard computer sees the sensor value change is larger than expected, deduced the sensor is mis-reading, and filtered the (correct) value.
>as the lander was navigating to the planned landing site, the altitude measured by the onboard sensors rose sharply when it passed over a large cliff approximately 3 kms in elevation on the lunar surface, which was determined to be the rim of a crater. According to the analysis of the flight data, a larger-than-expected discrepancy occurred between the measured altitude value and the estimated altitude value set in advance. The onboard software determined in error that the cause of this discrepancy was an abnormal value reported by the sensor, and thereafter the altitude data measured by the sensor was intercepted. This filter function, designed to reject an altitude measurement having a large gap from the lander’s estimation, was included as a robust measure to maintain stable operation of the lander in the event of a hardware issue including an incorrect altitude measurement by the sensor.