
Jack Garman, Whose Judgment Call Saved Moon Landing, Has Died - helloworld
http://www.nytimes.com/2016/09/25/us/jack-garman-whose-judgment-call-saved-moon-landing-dies-at-72.html
======
pdm55
I think I have read various aspects of this event. I would feel enlightened,
if someone with more knowledge could provide more details to flesh out my
(false?) memories. Different aspects I have read are (1) The programmer (a
woman) set up the code, so that in a case of overload it could be simply
rebooted; (2) Buzz Aldrin claimed that he took the decision to not abort - I
guess he had the final call; and (3) The overload occurred because Buzz didn't
shut down one function of the computer system that he was expected to.

It's interesting to me that there can be many viewpoints to such a critical
incident. It would be good to have the whole picture.

~~~
robryk
The chief programmer of both Apollo guidance computers was Margaret Hamilton.

The computer had only non-volatile memory. Upon restart it would clean some
parts of it up, but most of what the user would think of as the state would
remain.

No clue about whose decision it was. Runbook said "go if no apparent
degradation of control" for this alert.

The overload occurred because the rendezvous radar was on _and_ was started in
a way that caused some signals to be out of phase. AFAIK it shouldn't have
been on. The values is provided were ignored, but incrementing and
decrementing hardware counters is what took the time. Alas, Buzz also noticed
that alerts appeared only when he started a routine that refreshed a display
of some data on his display, which he was supposed to start. This turned out
to be true -- the computer was this close to having exactly all of its CPU
busy that this simple routine would tip it over the edge.

~~~
pdm55
Thanks. Good info. It helped my search.

THOSE ALARMS

 _" It wasn't 10 seconds after the LEM was secured on the (lunar) surface that
NASA was on the phones to the (MIT) Lab. This was the Lab's responsibility,
our system, our machine, our alarms. "What were those alarms? We're launching
(the lunar module from the moon's surface) in 24 hours and we're not going
with alarms. We must have an operational computer." We really went to work.
The computer seemed to be operating at 80% of its normal speed, but why?

We turned to our simulation facilities. We had a high-fidelity digital
simulation of the computer and the executing programs, surrounded by a digital
simulation of the LEM vehicle, the equations of motion, and the gravitational
environment. We also had an analog simulator with the real guidance computer,
the inertial measurement unit (IMU), and a man-in-the loop. We tried every
anomalous condition. We examined the executive code, the alarm mechanism, and
the fundamental algorithms. We worked all night and time was running short.
Our NASA buddies called us every 15-30 minutes anticipating, demanding a
solution. We had to find it. We re-covered old ground, new ground,
brainstorms, crazy ideas, anything."_ Fred H. Martin

[https://www.hq.nasa.gov/alsj/a11/a11.1201-fm.html](https://www.hq.nasa.gov/alsj/a11/a11.1201-fm.html)

[https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html](https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html)

[http://history.nasa.gov/alsj/a11/A11_MissionReport.pdf](http://history.nasa.gov/alsj/a11/A11_MissionReport.pdf)
Mission Report with 22 pages of text and 22 pages of diagrams describing
significant problems during the Apollo 11 mission.

[http://www.htius.com/Articles/r12ham.pdf](http://www.htius.com/Articles/r12ham.pdf)
Margaret H. Hamilton describes the software’s global error detection and
recovery mechanisms she helped design.

[http://www.doneyles.com/LM/Tales.html](http://www.doneyles.com/LM/Tales.html)
Don Eyles describes the components of the Lunar Guidance Computer.

~~~
robryk
>
> [https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html](https://www.hq.nasa.gov/alsj/a11/a11.1201-pa.html)

I'm sorry to say that this particular article contains some inaccuracies.

First: The Apollo 14 fix didn't involve providing any new code to execute. The
fix changed some data values in erasable memory so as to fool the computer
into thinking an abort has already started. For more details see:
[https://www.ibiblio.org/apollo/index.html#Final_exam_for_the...](https://www.ibiblio.org/apollo/index.html#Final_exam_for_the_advanced_student_)

\-- Edit: The following paragraph is wrong. --

In fact, it was impossible to execute code from erasable memory: it was a
Harvard architecture machine and code memory was in the nonerasable memory.

\----

In fact, the ranges of adresses for ROM and RAM were disjoint and the code
likely never jumped to RAM, so it seems to have been impossible to have any
executable code from RAM be executed.

Second: The Apollo 11 problems were not caused by an undesired piece of
software operating during the landing. It was caused by a mechanism for
updating hardware counters that used up CPU cycles. In order to simplify
concurrency issues, hardware counters were not directly manipulated by
hardware in LGC. There were "increase" and "decrease" interrupts, which were
raised when the counter would need to be increased or decreased. These
interrupts had a hardwired interrupt service routine that called a "hardware
counter inc/decrement" instruction (this instruction didn't even have an
opcode, because it wasn't designed to be called from actual code). This took
some CPU time each time it happened. Rendezvous radar angle was the counter
that caused the problem. The reason that counter caused the problem was that
when the radar was not set to computer control, the radar position signals
going to computer could indicate random quickly changing garbage. For more
details see:
[http://www.doneyles.com/LM/Tales.html](http://www.doneyles.com/LM/Tales.html)

Why do I trust my sources more? In the first case, because I can actually
verify that solution on the simulator and because the machine language
documentation that was used to create the simulator makes it clear that code
and data address spaces are separate. In the second case, because it's a more
detailed description that matches what I know about LGC from other sources and
because it comes from Don Eyles.

------
NotSammyHagar
We can all wish to do something significant with our time on earth like this
guy.

~~~
onetimepadder
(s/wish/try) and I agree with the first part of your sentence 100%

~~~
NotSammyHagar
I am trying, but I'm not sure if my work building new database optimization
and execution engines is what my contribution should be. I'd rather be working
at space x or tesla.

