Hacker News new | past | comments | ask | show | jobs | submit login
Apollo Guidance Computer switching power supply works after 50 years (righto.com)
234 points by asmithmd1 on Aug 24, 2019 | hide | past | favorite | 78 comments

We found that the capacitors were all in good shape with the proper capacitances. This is in contrast to modern capacitors, which often leak or fail after a few years. NASA used expensive aerospace-grade capacitors and X-rayed each one to test for faults, and this made a large difference.

That's also because these are hermetically sealed wet tantalum caps, not the dry type that's notorious for shorting out and catching fire. "Wet tants" are very expensive and you can still buy them today:


We were somewhat surprised that both power supplies worked flawlessly after 50 years.

I'm not all that surprised, actually --- but perhaps that's because I've seen plenty of videos on YouTube of more mundane equipment, like vehicles and appliances, coming back to life after several decades of storage or exposure to the elements with minimal repairs, so for a clearly high-reliability component like this AGC PSU to work is almost expected.

Yes, tantalum caps are hard to beat, and China has a near monopoly on them, and coltan refining.

Even "Made in USA" "military" Kemet and Kyocera caps are only packaged in USA, with bulk film still being produced in China.

That's fairly recent then. They have a video from 2012 where they show them building the ceramic sheets on carrier film.


The interesting detail to look for in the video is the LCD with the yield on it - 84% of the capacitors tested within 4.5% of target, and 15% within 9.5%. This must not have been the space-rated product line :)

MLCC caps are not tantalum caps

Did it use wet tantalums, though? If the alternative is "solid tantalums", then it didn't -- SCD 1006755, the only type of tantalum used in the computer, specifies them as "Capacitor Fixed, Electrolytic, Solid Tantalum". Our computer has KEMET KGxxJyyKPS parts (xx=capacitance, yy=rating), but they also at least evaluated Sprague 150D, 151D, and 350D to fill the role (and utilized at least one of them).

Are wet tantalum capacitors still used in really mission critical things, like whatever steps down engine alternator supplied voltage on a f22 to avionics circuit board level voltages?

Wet Tantalum caps have terrible vibration/shock sensitivities. I would use ceramic caps in aerospace 100 times out of 100 rather than consider wet tantalums. At least ceramics are repeatably manufactured and can be mounted J-lead to reduce shock loads. The tants will just randomly break open on you. Not sure what they're currently used in, but I wouldn't consider them for new applications.

When are the wet's used?

I don’t understand downvotes for this.

That is beautiful work, the welded connections are really impressive. Obviously they had to take into account massive forces, vibration and possibly impact damage, but still, to see components welded in place is something I've never seen before, not even in very high end HF power electronics.

> The AGC that we're restoring belongs to a private owner who picked it up at a scrapyard in the 1970s after NASA scrapped it.

That is one lucky find. Imagine that it had not been found and scrapped for its metal value. The mind boggles at the potential destruction of such a historical artifact.

> the cordwood components are mounted differently from the other cordwood modules.

For those from cities who have never seen cordwood:

Cordwood is a stack of wood of short length seen from the endgrain, typically used when referring to firewood but also sometimes used in construction.

> ... Obviously they had to take into account massive forces, vibration and possibly impact damage ...

That's why they also glued them in epoxy. If you don't want you electronics to move (and thus break connections), put them tight in epoxy. And I mean completely glued in. No air left. This is as valid for modern solder joints as well as welded.

The AGC we restored was used for ground testing, so most of the modules weren't encased in epoxy. This made things much more convenient for us. We did have to dig through encapsulation to fix one module, though.

Modern electric guitar pickups are epoxied in place precisely to prevent unwanted vibration and thus the additional noise.

Well wire-bond (within IC packages) is a form of welding, so it's not so uncommon.. Also the electrodes within a vacuum tube were all welded. Also wire welding is very common (it's the main technique) in automobile wiring harnesses.

Suppose you had an old radio made with all welded construction. I wonder if service would be easier or harder? There is no un-welding (so you'd have to cut the broken component out), but making the new weld is faster than soldering.

> Well wire-bond (within IC packages) is a form of welding, so it's not so uncommon.. Also the electrodes within a vacuum tube were all welded.

Sure, but that is inside the components. This is on the outside, after the components themselves have been manufactured.

> Also wire welding is very common (it's the main technique) in automobile wiring harnesses.

That may be so today but it certainly wasn't in the 1960's.

And even today your typical automotive boars is just soldered. Makes me wonder whether satellites and current space and possibly aircraft circuitry is still manufactured like this.

I'm not an expert on this, but there's a famous case from 1998 where the Galaxy IV telecommunications satellite developed "tin whiskers" on it's soldered connections which caused an electrical short resulting in a total loss.

So, at least in this case solder was used on a satellite.


Solder is used extensively in satellites nowadays. The catch, though, is that you must use leaded solder, because the lead dramatically decreases the occurrence of tin whiskers. RoHS has been rather annoying in the aerospace industry, because anything RoHS compliant can't safely fly.

Weren't a lot components RoHS-compliant intrinsically anyway, without any change in process whatsoever?

what is the welding process here (for external leads). i've been concerned about tig welding on enclosures with components tied to the frame ground..in fact i'm pretty sure i've destroyed some stuff that way

Almost certainly resistance welding. It's very fast and passes current only through the two pieces to be welded together, so avoids all the issues of stray currents and the other safety precautions needed with arc welding processes like TIG that generate UV and showers of molten metal.

19:50 in this video shows one in use in making a vacuum tube:


I don’t know, but I’d bet GTAW/TIG. But, if you are careful about ground placement you might be ok. Certainly you can’t just use the table ground

In the early 00s I went to upgrade the RAM in my father’s pre-built PC and was confronted with two sticks that had been welded in. It was a bit too bizarre for me to be overly upset.

Welded or soldered?

Oh, you’re right, soldered.

I've been told that welding components was considered a common practice in the USSR since it was cheaper than soldering them at some point in time.

I was shocked to see the title because I thought forward converters were invented in the 70s but it turns out they were invented in the 50s with some early examples in the 20s, if one gets loose enough with the definition of “converter”.


Yes, these converters go way back, although the AGC used a buck converter not a forward converter. They became more practical in the 1970s as transistor technology improved. I wrote a history of power supplies in the IEEE Spectrum recently: https://spectrum.ieee.org/computing/hardware/a-half-century-...

Nice article thanks for sharing it. What definition are you using for forward converter? The definition I always use, which I thought I got from Erickson’s Fundamentals of Power Electrons, is any topology where the inductance is charged in series (as opposed to parallel like a boost or fly back) with the load. I’ve always liked this definition because it still conveys critical meaning even when engineers do trivial topology changes from the “standard” forward converter topology.

I've seen a forward converter described as buck-derived, or buck with a transformer, but buck converters and forward converters are generally viewed as two separate things. I'm not too attached to definitions but I consider a forward converter as using a transformer and transferring energy while the switch is on. This is the definition used by the article you linked to, and by Wikipedia for Forward Converter. I also like the TI poster of power topologies: https://www.ti.com/lit/sg/sluw001f/sluw001f.pdf

I think the article I linked to didn’t require the transformer. Only that there be transfer with the switch on which implies the series connection. :) Wikipedia certainly does though and honestly, especially in power conversions, there’s not a lot of standardization in terms which is why I asked. I’m not too hung up on the definition either. I’m probably just over sensitive because all the marketing material I’m asked to review...

I have a 1926 book "Alternating Current Rectification" which describes a large (room-sized) DC-DC converter for an industrial application. It used rotating commutators to switch the current synchronously on both sides of a transformer. That's essentially a forward converter.

The book was written from the point of view that rectification was pretty much all figured out and it was just a question of some more tweaking and miniaturizing.

Also, 1930s-1950s car radios typically had a step-up converter from the battery voltage to the plate voltage which was a vibrating relay connected to a transformer with diodes to rectify the output.

[0] https://www.amazon.com/Alternating-current-rectification-all...

Huh? They got the whole computer working, not just the power supply.

Anybody who hasn't seen the two dozen or so videos made by CuriousMarc is in for a geekout of historic proportions. Some truly amazing work, with more ups and downs than the Apollo program itself. I can't easily come up with a link on mobile but you'll find it by looking up CuriousMarc on YT. Watch the whole playlist, it's awesome.

Yes, the whole AGC works; this article discusses the power supplies in particular. The video playlist is at: https://www.youtube.com/playlist?list=PL-_93BVApb59FWrLZfdli...

Were you able to see who made the tantalum capacitors? Or did the cordwood construction prevent that?

I ask because my dad was Group Product Manager at Kemet during this time period. Growing up, he mentioned their products going into several aerospace systems (various ICBMs and military projects, etc) as well as the IBM System 360. But he never mentioned involvement with Apollo and I'm curious if he might have had a hand in the AGC.

BTW, your link to the NASA contract drawing is incorrect - here's the one you probably wanted.


The (corrected) drawing says Sprague made these capacitors: http://www.ibiblio.org/apollo/SCDs/scd_1006755-.pdf (I don't know if Kemet made other capacitors used in Apollo.)

To the contrary! In our AGC, all of the capacitors not tucked into cordwood holes sport a KEMET logo and part number: https://photos.app.goo.gl/oGsrgXBpNMccmkWV7

I think the Computer History Museum's might have Sprague caps in it, though.

Yep, that's the capacitor I was referring to below. At least some of them were definitely Kemet parts. I don't remember seeing any Sprague logos but I imagine there were some of those around, too.

That's a bummer, but thanks for taking a look. Sprague was and still is a competitor to Kemet.

I checked and there's a visible capacitor in the AGC's power supply with the Kemet name on it. So maybe your dad had a part in the AGC. Here's a photo by Mike Stewart showing the Kemet capacitor: https://photos.app.goo.gl/iWcWNqt1UzUzyeAz9

It would almost certainly have been Kemet, as they really made their reputation on those wet-slug hermetic tantalum caps. If you take apart a piece of HP gear from the 1960s-1970s timeframe, you find that those are the ones that are still good even today.

I definitely remember seeing Kemet markings on at least one of the caps in the power supply restoration videos.

I recall him saying they sold to Tektronix as well as HP.

The story he told me was that when the DoD wanted to buy scopes from Tek, rather than jumping through all the acquisition process hoops, Tek just handed them a glossy product catalog and told them to pick out which ones they wanted. Which went about as well as you might think, but back then there really was no substitute for a Tek scope so the Pentagon caved.

Yeah, I'm going through the same process right now with a similar product and DoD customer. What happens these days is that they send out an RFQ to half a dozen different "approved suppliers," which is government-speak for "People who have jumped through the necessary hoops to drop-ship stuff to the Federal government from the basement in their house."

Whichever "supplier" marks up the product the least -- a product they will never see, much less stock -- will presumably be the one who wins the right to pass along the order to me. This is the government's idea of competitive bidding.

It's just nuts. The waste, overhead, red tape, and opportunities for corruption are just totally batshit nuts. But what can you do...

> but back then there really was no substitute for a Tek scope so the Pentagon caved.

Well, they still got their own SKUs with (minor) modifications.

Such as being cloned shamelessly (and illegally) by Hickok, IIRC.

That was with the 500 series, Tektronix supplied 5000 and 7000 series systems to the DoD, e.g. as the AN/USM 281.

It's always impressive to see somebody try to build something that lasts, and succeed. It's even rarer to see such a thing in consumer electronics.

I'm currently sitting in the same room as a Bryston amplifier that, according to it's date code, was manufactured in late 1998. That means it's almost 21 years old and just 1 year off warranty. It's been switched on for most of those 21 years but still works great and has never been serviced. Even more surprisingly, it's not obsolete. It's currently hooked up to a 2018 receiver.

I'd love to see somebody do tear-downs of devices like this and explain how their construction takes longevity into consideration without resorting to the same extremes as NASA (e.g. X-raying caps).

I run my Carver amplifier, bought new in 1981, every day all day. That's 38 years of near constant use!

Damn! Take that NES!

Looks at 90's automotive ECUs. Most still work fine 25ish years later. The way you do it is overspec things, design defensively, and make system controls super conservative. A fun example of this is if you look at information for modern automatic transmissions, they have pre-figured and pre-programmed "limp"/failsafe modes that can usually provide you with about half of the gearing of a fully functional transmission no matter what breaks internally

It’s sad that there is a dichotomy between building a good product and building too good of a product that you don’t get enough return customers.

The art would seem to be to build the durable product with room for extension, so that compelling upgrades are possible.

As seen with people who collect Apple products and just keep adding to the fruit basket.

Apple products are pretty much the opposite of good products, at least in the meaning of "good" implied in the context. Leaving aside the failure rate of these, most of their products are sealed with propriety battery. This gives them a pretty short service life which guarantees returning costumers.

Was that true of the old Mac computers, though?

I'm not talking about the post-Jobs stuff.

The old Mac computers were not as bad as the first iPod, but they weren't heavy-duty or anything of the sort. Nothing exceptional in that dimension.

Jobs era ipods, iphones and iPads had builtin batteries too, right?

Perhaps you could make your production capacity small enough you won't need return customers, at least until you've earned enough and slowly reduce production as market saturates?

Would need to resist a lot of temptation for short term gain.

If you look for instance at high end woodworking tools, you create a constellation of related tools and expect customers to buy others.

You also see variations in quality levels (eg, kitchen knives).

It's a common misconception that a good kitchen knife will last forever. It won't. Repeated sharpening will make the knife smaller and smaller. Here's a picture of a sushi chef's knives: the knife on the bottom right was once the same size as the top left one. http://i.imgur.com/EiQnGu3.png

If I could get my partner to agree to use the end-grain cutting boards I bought exclusively, my knives would last longer than I will be capable of safely using sharp implements.

'Forever' really just means 'until I elect to give them up' to most of us. That's why all those appliances came in ugly colors in the 70's. We were still flirting with planned obsolescence.

It's one of the big flaws in [Western] Capitalism.

Is it? Many people like low prices, and might not be interested in keeping stuff for more than a few years before getting a better widget. (Depending on what it is.)

It is, because I like widgets that last 10+ years and they are becoming increasingly hard to find (and yes, I would pay much more for them, if only they were made).

For household appliances there is Miele. They do cost much more. The vacuum cleaners are actually fairly affordable in Germany but not in the US AFAIK...

Slightly off topic but I understand as the Apollo 11 LEM descended, there were several "computer overload errors".

Is this out of memory or skipped input or dropped processing requests or too much power drain? What exactly is an "overload error" on such an unusual machine?

In essence, an interrupt storm[1] caused by a design flaw that resulted in the rendezvous radar reporting erroneous readings at a high frequency.[2] The computer couldn't handle all of this work on top of what it was already doing (too much work, too little time) so ultimately it had to prioritize and drop some of the less important work.

[1]: https://en.wikipedia.org/wiki/Interrupt_storm [2]: https://arstechnica.com/science/2019/07/no-a-checklist-error...

If you google for "apollo 1201 or 1202 alarm" you'll find several good magazine-length articles on the design of the landing guidance computer systems.

The TLDR is: a switch misconfiguration was tasking the computer to process some rendezvous tracking information which was not needed. This exhausted a rather limited set of storage locations and triggered the alarm.

The computer was properly programmed to treat the condition as a secondary failure and continue with its primary task.

The thing that was remarkable to me was the control room engineer's response to the 1201 right before touchdown - they hadn't seen one of those, and it was not the same as the 1202's they had seen earlier.

In the videos I've seen you hear the audio - "1201 alarm" and within less than a second the 28-year-old (average age, just guessing) responds "1201 go". He just gave instant clearance to proceed past a malfunction a few hundred feet above the moon's surface, in real time. Talk about being in the zone.

Sadly most of what you read by googling these problems is misinformation. It was actually an incredibly sinister systems integration bug, that wasn't well described even at the time.

It wasn't a switch misconfiguration; the Apollo 11 astronauts were flying to the checklist, and did as they had simulated. The Rendezvous Radar switch has three settings -- LGC, AUTO TRACK, and SLEW. In LGC mode, the AGC controls the positioning of the antenna; in AUTO TRACK, the radar automatically tracks the CSM based on return strength; and in SLEW it is automatically positioned.

The trouble came from how the trunnion and shaft angles of the antenna were measured. They used "resolvers", which are sort of like variable transfomers. Resolvers look like motors, and attached to the shaft there are two windings positioned 90 degrees apart from each other. An AC "reference voltage" is applied to an outer winding in the case, and that voltage couples onto the two inner windings with a magnitude proportional to the angle on the shaft. One winding (the "sine" winding) produces an output equal to Vrefsin(theta) and the other ("cosine") winding produces an output equal to Vrefcos(theta), where Vref is the reference voltage and theta is the angle of the shaft. The voltage and phase of both windings can be used to determine exactly what the theta was that produced them.

The circuitry to do this is a bit involved though and lived outside the computer, in a device called the CDU, or Coupling Data Unit. The CDU constantly maintained its own idea of what the angle ("psi") in a digital register. It translated the incoming sine and cosine voltages into a digital representation by mechanizing the equation +-sin(theta-psi) = +-sin(theta)cos(psi) -+ cos(theta)sin(psi). It did so by using the bits of its digital register containing psi to switch on and off resistor dividers that effected cos(psi) and sin(psi) onto the incoming signals, which were then added together with a summing amplifier. The goal of the CDU is to zero this sum; to accomplish this, it "counts" the angle register up or down to reduce the magnitude of the sum. As it counts, switches are changed, which switch out resistors in the circuit, which in turn change cos(psi) and sin(psi) in the above equation. And also, with every other increment, a pulse is transmitted to the AGC to indicate that the angle has changed slightly.

The problem comes in because in addition to the above, the CDU also, for many angles, added to the sum some fraction of the reference voltage directly. This is fine when the switch is in the LGC position; the resolvers are supplied with the same 28V, 800Hz reference voltage that is used inside the CDU. However, when the switch is put in either of the other two positions, the reference voltage for the RR resolvers is switched to an unrelated 15V rail. Critically, this 15V reference has no defined phase relationship with the CDU's 28V 800Hz reference. The phasing is locked in by the exact millisecond at which you power up your subsystems.

So when the switch is changed, the sine and cosine outputs from the resolver are suddenly derived from the 15V reference -- they are much lower before and at a random phase. The CDU doesn't know that this has happened, and still tries to perform the summing as before. However, for many theta/phase relationships, it becomes impossible for the CDU to actually null the sum. In these cases, the CDU becomes "manic", and starts seeking back and forth, frantically changing switches to try to figure out what the angle is, but never succeeding.

This causes a huge flurry of +1 and -1 pulses to the AGC. In order to minimize circuitry, the AGC implemented what was called "unprogrammed" or "cycle-stealing" instructions. The computer only contains a single adder, and adding or subtracting 1 from the current angle requires use of that adder and a memory cycle. Rather than generating a full interrupt, which would require many memory cycles and instructions to handle, the computer simply transparently inserts a single-cycle instruction in between two "programmed" instructions that performs the addition or subtraction. This is totally transparent to software, normally. But with a manic CDU that is incessantly seeking on both RR angles, the AGC receives something close to 12,800 pulses per second, which translates into something around 15% of its total computational time. The landing software had only been designed with a margin of 10% or so.

The 1202s were also a lot less benign than is often reported. They occurred because of the fixed two-second guidance cycle in the landing software. That is, once every two seconds, a job called the SERVICER would start. SERVICER had many tasks during the landing. In order: navigation, guidance, commanding throttle, commanding attitude, and updating displays. With an excessive load as caused by the CDU, new SERVICERs were starting before old ones could finish. Eventually there would be two many old SERVICERs hanging around, and when the time came to start a new one, there would be no slots for new jobs available. When this happened, the EXECUTIVE (job scheduler) would issue a 1201 or 1202 alarm and cause a soft restart of the computer. Every job and task was flushed, and the computer started up fresh, resuming from its last checkpoint. It was essentially a full-on crash and restart, rather than a graceful cancellation of a few jobs. And unlike is often said, the computer wasn't dropping low-priority things; it was failing to complete the most critical job of the landing, the SERVICER.

Luckily, the load was light enough that of the SERVICER's duties, the old SERVICER was usually in the final display updating code when it got preempted by a new SERVICER. This caused times in the descent when the display stopped updating entirely, but the flight proceeded mostly as usual. However, with slightly more load, it was fully possible that the SERVICER could have been preempted in the attitude control portion of the code, or worse yet, the throttle control portion. Since each SERVICER shared the same memory location as the last one (since there was only ever supposed to be one running at a time), this could lead to violent attitude or throttle excursions, which would have certainly called for an abort. Luckily, this didn't happen -- and the flight controllers didn't abort the mission not because 1202s were always safe, but because they didn't understand just how bad it could be, were the load just a tiny bit higher.

Could I ask how you know so much about this, or where I can read something more detailed than the usual story that's reported? Thanks.

Many years now of research and simulation of the system (I led the restoration of the computer mentioned in the article). There's not a single place where you can read everything, unfortunately, aside from the comment above. We're planning on making a video on it in the future. But I can cite sources:

CDU theory of operation (starting PDF page 15): http://www.ibiblio.org/apollo/Documents/HSI-208435-003.pdf

CDU coarse module schematic: https://archive.org/stream/apertureCardBox462NARASW_images#p...

Grumman memo (from 1968!) describing the problem, and mentioning it is due to the reference switching to a 15V 800Hz source: https://www.ibiblio.org/apollo/Documents/Memo-GAEC_LMO_541_1...

Excerpt from the LM-8 Systems Handbook showing the reference voltage RR switch wiring: https://i.imgur.com/fMsQ7RI.png

Don Eyles describes the software side best in his book Sunburst and Luminary (which I highly recommend) but he also talks about it in some detail on his website: https://doneyles.com/LM/Tales.html

> Sadly most of what you read by googling these problems is misinformation

Thanks for such a detailed account. Unfortunately, it will be added to the trove of otherwise categorized -information that Google returns.

Thanks also, very much, for the links you included below.

It is insane how many unknowable little variables need to go impossibly right on the journey to land on the moon and come back with that old technology.

Now I wonder if fuel elements powering this PSU can be restored to a workable state? Then moving forward to restoration of the whole spacecraft. Shouldn't be too hard :) given that some of most complex units are already working...

Buck converters, as any fan of Big Clive videos knows, are still widely used in low cost power supplies for LED light bulbs and cheap USB chargers, due to the few components needed and wide tolerances the circuits can handle.

Buck converters are still widely used in absolutely everything.

I wonder if there were any issues with the STBY button being located where it is. It seems really risky place to have a off key right between other frequently used button.

Nah, they put a couple of precautions in to make sure you couldn't put the computer into standby accidentally. First, software has to set a bit to enable standby mode to be entered. For the astronauts, this was keying in VERB 37 ENTER 06 ENTER, which started P06, the pre-standby program. They then had to press and hold the button for a period of time between 1.28 and 2.56 seconds (exactly how long it takes depends on where the clock divider chain was at when they first press in the button). To bring it out of standby, you have to push the button in for a similar duration.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact