Hacker News new | past | comments | ask | show | jobs | submit login
The most remarkable legacy system I have seen (thehftguy.com)
526 points by user5994461 on July 13, 2020 | hide | past | favorite | 182 comments

I'll share an anecdote along the same lines, albeit much smaller in scale:

In 2004 an acquaintance asked me for help with sharing an Internet connection to the residents of his condominium after he had failed getting a common router/switch solution to work. The router was not playing ball unless all clients presented themselves in one and the same subnet, which prompted the unmanaged switch to pass traffic directly between the ports, and that was a no-no for historic reasons relating to Windows and its malware of the time. So I repurposed an old anno 1999 ATX motherboard with a mix of Ethernet cards - the board offered 6 PCI ports and the condominium has 5 residents - 256 MB of RAM, and a low-power passively cooled Pentium III to act as router and switch in one, running OpenBSD with some dhcpd(8) and pf(4) to manage clients and traffic.

16 years later this set-up is still in use 24/7 due to ISPs claiming logistic problems preventing them from installing gateway equipment in the building. It serves a VDSL2 line of 60/20 Mbit/s coming in over the POTS. At its peak the computer had 6 years of uptime. It is still running OpenBSD 3.5 due to poor #OPSEC on my behalf.

I worked at an Insurance Broker ~1998 as the assistant to the CIO who was also the only network engineer. Which I guess made me the PFY.

We use to deploy those mini tower eMachines to the staff, but we'd rip out the WinModems and replace them with a NIC. So we had a box of about 100 WinModems collecting dust in the corner. The newer machines came with 20+GB HDDs so we'd swap those out with smaller 4-8GB HDDs we had laying around.

At some point a 1800 number got decommissioned but the local number and internal distribution tree were still active in the PBX. I ended up taking a decomissioned IVR and swapping out the Retorrex boards for WinModems and configuring it as dialup internet service and used a bunch of those large HDDs to host an FTP Server on it.

I left sometime in 2005 but my makeshift dialup ISP was still active as late as 2008 and there was an extensive collection of DIVX movies still on the FTP server.

Good lord, 6 years? I've had some managed switches with nearly 2 years uptime but 6 is just asinine. I assume it was on a UPS? If so, how did the battery healthy stay usable? Or was the electrical grid there just super dependable?

Six years without a power cut doesn't seem too surprising.

Was "asinine" the word you meant there? It doesn't make sense to me in context.

Hello from a developing country

I envy all of you.

Where I live, short (usually less than 5 minutes) power outages are common throughout the day. Half a dozen such cuts every single day.

And everytime after a cut and the power comes back, there was a 60 second delay for the internet to reconnect, and about 20 seconds more for VPN to reconnect. Annoying especially when working in teams.

It is unusual not to have a power outage during a 24 hour period.

Here's my current setup to stay ahead of the game:

* Solar panels, with automatic switching between the mains and solar when either of them goes down

* Voltage stabilizers

* Surge protectors

On top of all that I'm currently contemplating getting a UPS for my computer specifically, because there was this one time when the solar failed and it couldn't switch to mains because it was a power outage.

One thing I miss from my student days in the US is always-on electricity. During my 3 year period there, there was only 1 power outage, and that lasted less than a minute.

Not the OP, but I assume asinine was meant to be "insane", but got messed by predictive text.

> Six years without a power cut doesn't seem too surprising.

It depends where you live and / or you might not notice them if they happen at night.

Even in France which is supposed to be a first world country I experience a couple power cuts every month (twice at night this week-end, it killed my tile generation and corrupted my filesystem on my OSM server).

I probably should invest in a UPS instead of complaining on HN !

Every month is not normal at all. You should check with Enedis.

I live in France and my power cuts are less than 10 in +20 years.

Had +5 years of uninterrupted power at my previous place and at the current flat I live, it's currently at +1,5 years.

Conversely, I live in SoCal and every month is very normal.

The only power outages I have here in Socal have been scheduled, and they replaced several poles and cabling. Out for like 8 hours.

Maybe they happen when I didn't notice them but in 15 years of living in Austria & then Germany I think the number of power-cuts I experienced would be countable on the fingers of one hand.

Yeah that doesn't sound right at all. I live in UK, we have maybe 1-2 power cuts a year max. And I have equipment running 24/7 so I'm sure I'd know.

A couple a month seems way too much to me. I know for sure my last one here in southern Germany was more than 5 years ago. The only one that I actually remember experiencing during daytime was in my childhood, more than 20 years ago.

Here in Czech Republic it is also very infregvebt to have aby non-planned power cuts.

I'm intrigued by these typos. q to g, u to v, n to b. Are you using a handwriting recognition system, by any chance?

Just typing too fast on the default keyboard of Sailfish OS running Xperia X. :)

At our home we had one, in 2004.

Wow, I thought power in France would be great. I'm in NZ, and I'm sure I've gone six years without a power cut. I have a PC that's always on (no UPS), so I'd definitely notice.

The power in France is great, I never heard anyone have so many cuts. The only cuts I encountered is from falling trees due to lots of wind the winter. Otherwise I can count the number of cuts on two hands for the past twenty years.

On the other side of the Tasman they're pretty common due to summer storms. In Queensland having candles ready when we sat down for dinner was an almost daily occurrence some months, the PC and just about everything else would be turned off and unplugged by then. The best nights were when the power went early and we had an emergency Barbecue.

Over the years it got more reliable but I'm not sure if that was universal or just my specific area getting a lot more built up.

I'm in Queensland and haven't experienced any power cuts in the last 5+ years.

Until the relatively recent planned power outages because of fire risk, here in Northern California (an hour north of SF) it was once every year or two.

If I had to guess, I would think it's wildly variable based on location and the factors of that location. A more remote community with one main power line through forested area might experience power loss more often, and a larger city with higher density probably has its own problems with load spikes which may cause problems. Maybe a suburb or medium sized city with semi-recent infrastructure is actually the sweet spot for power reliability?

Here in Gilroy (southern end of the Bay Area) we get one or two outages a month during the summer. Unless it's a planned rolling blackout due to heat, they're usually only a few minutes in length, at least. The Costco-special UPSes I use (CyberPower 1350VA units) have all proven adequate for keeping my work/compute infrastructure online.

I've got solar, though, and a house battery is very much on the list once my budget allows it. I don't see local system stability getting better any time soon, and likely worse now that PG&E has more leeway to shut down power during wind/heat events and the like.

I don't remember a power cut since I moved to my flat in a big city 11 years ago. I would most likely notice because I run my server almost 24/7.

Every month sounds very bad, even when I lived in a countryside in the least wealthy EU province (Lubelskie, Poland) it was never this bad. Maybe once a year (and usually with a warning that it will happen).

It also depends on the status of your local grid/substation. My university was on the same local distribution as a nearby hospital. The hospital had special status for receiving power, for obvious reasons. So in the 10 odd years I spent doing undergrad and post grad studies there was only ever 1 brief power outage during a thunderstorm. Meanwhile nearby suburbs probably averaged 2 power outages per year, typically for a few hours at most.

The power is very reliable until I install a solar panel and feed to the grid. Every time a thunder storm I have to worried as it would happen. Got 2 ups in the house until they reroute the power arrangement in the house so isolate the solar with the household main (by doing something about the fuse sensitivity level). No thunderstorm for sometimes. Still awaiting.

Power is a funny thing for computer guys.

I have lived six years at my current location in Germany and have not had a power cut, except for a burned fuse in my apartment.

I grew up in a small city in Ireland and power cuts were a couple of times a year thing.

I now live in Dublin and I've had one in 6 years.

The only one I can remember in recent times - say the last 10 years or so - is when they did “something” out at the Facebook datacenter in clonsilla ...

Yeah, twice in the same weekend is the trigger condition to go ahead and buy a small UPS (or several, depending on how your computers are arranged). Be sure to get one that communicates, usually via USB-HID, so your computer(s) can shut off cleanly if the outage is long.

Power cuts every month doesn't seem very normal to me. I had a home server running for a few years without UPS, and at one point it had over 2 years of uptime.

That's too much. I live in Spain and the last power cut was maybe last summer. It's probably less than once a year on average. Maybe way less, but i'm not sure.

"Six years is just stupid" doesn't make sense?

Not in that context, no.

The computer used to sit behind a diminutive UPS, though I don't think it was ever fully discharged; off the top of my head I can only recall two brief power outages here in the last 20 years.

Where do you live? I don't think I've had a power outage in like 20 years. (Yes, I know it happens on a daily basis in some countries.)

Third world country here...

I live in a block with a bad circuit, and we have brief <5min outages at least once a week.

I dont like it because it makes my microwave's clock unusable.

Neighbours one block away don't have that many outages.

FWIW. 256MB should handle the latest OpenBSD, with little issue (although an upgrade to 512MB wouldn't hurt if possible). The main issue is going to be pf.conf syntax.

OpenBSD has become a little bit more RAM intensive on boot recently due to relinking shared libraries and the kernel to randomize them (Think .o level ASLR), but 256MB IIRC can handle it. Boots will be slow, but if you've dealt with 6 year uptimes in the past that probably won't lead to much of an issue.

why wasn't the router playing ball ?

It would only route traffic from one pre-configured subnet.

Ah you could not edit the config ? and it didn't have a route of last resort?

I would have though having a second router in front of the ISP router would be the way to go.

As the comment by timsworkaccount says, this is definitely Athena at JPMorgan.

For those interested, I gave a talk on Athena at PyData UK 2018 called "Python at Massive Scale". 4500 developers making 20k commits a week. Codebase with 35m LOC.

The video is here: https://www.youtube.com/watch?v=ZYD9yyMh9Hk

It covers Athena's origins, what it is used for, application architecture, infrastructure, dev tooling and culture.

I worked on that project 10 years ago as a consultant, and it was certainly strategic at that time. It was fairly well known that one of the main stakeholder was pushing for those system and making a career out of it by jumping bank every few years.

I understand the value of it, but as an experienced scientific developer and python dev, the culture shock was huge. I don't have fond memory working on it. It is very different from traditional programming environment, closer to a kind of reimplementation of smalltalk env w/ python. I believe an influence was actually an old system that used smalltalk in the 90ies at JPM.

One of the fancy thing was integrating reactive programming, which worked through ugly hacks, at least at that time, by parsing code to detect dependencies. IIRC, it could manage list comprehension but not loops. They also had their own python binary w/ both python2 and python3 in one process.

The culture shock you mention was very real. I joined in 2010 (when Athena was 3 years old) and left 8 years later in 2018.

I remember my first few months unlearning normal Python and figuring out how to build the 'pixie graph', a lazily-evaluated Python DAG suited for calculating financial instruments. It took a while to get your head around this, but when you did it was a very powerful and productive way to building trading and risk management applications.

To get some sense in how this worked, here are two public projects on github with good introductions:

* https://github.com/timkpaine/tributary

* https://github.com/janushendersonassetallocation/loman

Sounds like a descendant of Goldman Sachs‘s SecDB/Slang with automatic dependency graph building. Did it have purple children (nodes whose value influences the graph structure) and twiddle scopes (modified copies of subtrees)? :-)

Just to add a bit of history, SecDB itself was an acquisition from J Aron, a commodity trading shop in the early 90s.

It was ubiquitous in GS by the mid-2000s, and was rather instrumental to GS navigating the crisis in 2008.

Being able to accurately and quickly compute risks across the entire firm's books, rather than manually merging across separate systems was key advantage, which encouraged JP, Citi and BofA to build a SecDB clone themselves.


Yes indeed. Quite a few of Athena's early developers were from GS.

I've not heard the term "twiddle scopes" before; in Athena it is "tweaking".

Come to think of it, it might have been a “diddle scope” (I’m not a native speaker).

For finance applications (risk) the whole concept works quite well, doesn’t it - combining the natural advantages of code and Excel.

It was indeed "diddle," not "twiddle." At BAML, it's a "tweak." Beacon Platform is another implementation by the same team (but with support for the public cloud, and many more advanced features, I believe, including tighter web integration). I think it uses the terminology "bind."

[Disclosure: I was part of the Quartz Core team in 2011/2012.]

I was positively surprised that Slang was actually quite usable. I had expected much worse.

I suspect the Common Lisp influence was beneficial.

The weirdest thing, language-wise, that I noticed was scoping. It's neither dynamic nor static scoping, but weird scoping. (But there are work-around to get something like static scoping.)

Outside of the language, the whole CVS-based version control and review process was weird. But understandable as a product of the late 1990s, when review-before-going-into-permanent-history must have been way ahead of its time.

Slang was OK as a language. The SecDB/Slang ecosystem was years ahead of its time and all credit to its inventors and maintainers, but a monorepo full of decades of critical code from thousands of developers still gives me palpitations.

> The weirdest thing, language-wise, that I noticed was scoping.

Not the spaces in variable names?

Those annoyed me immensely. I can feel my heart rate go up 10 BPM just thinking about it. I guess it was meant to make scripts more readable to minimally-techy people, but actually just made it harder to parse.

No, actually not. I found that quite refreshing and occasionally useful.

But I was used to different and exotic conventions from the obscure languages I played with over the years.

Loman author here. Thank you very much for the mention. Amazed that I never heard of Athena or pixie graphs. Our intention with Loman was to create a library scoped for a single process - we looked at the possibility of creating a system responsible for executing much larger graphs on a real-time ongoing basis, but it felt like a larger project than we'd be able to execute well. It sounds like Athena was that, and it worked well, subject to being a culture shock for people coming into it?

A similar library from another asset manager - https://github.com/man-group/mdf. Although MDF seemed to work at the level of timeseries instead of scalar values.

I really like this video for explaining the DAG - https://www.youtube.com/watch?v=lTOP_shhVBQ

Man AHL also had a version that worked on top of nodes representing timeseries: https://github.com/man-group/mdf

Hi Steve, thanks for your talk and for the links. Are those repos pixie dependencies? Or are you saying that pixie works like those repos?

They are "toy" projects illustrative of the general concepts. ("toy" as in smaller in scope; one of the two is by a former colleague).

JPM's actual pixie code is proprietary, extremely performant after 12 years of pushing into bigger and bigger scale problems, and is definitely not on GitHub!

I believe an influence was actually an old system that used smalltalk in the 90ies at JPM.

That system you're thinking of was Kapital.


It's the only Smalltalk usage at JPM I believe, they could have doubled-down on it but instead went to Python and as you say, tried reinventing the wheel in it.

> tried reinventing the wheel in it.

That's not reinventing the wheel, that's building a new wheel you hope performs better/different with different materials and maybe some different techniques, and has a long and storied history of both successes and failures.

Reinventing the wheel would be if they published this system and some other bank recreated it without using what they published, thus "inventing" something that already existed and was available.

I never realised Kapital was written in smalltalk. Interesting

Apparently when it booted in the terminal, there was a nice ASCII portrait of Marx. I also have it on good authority that the program that launched the Smalltalk image from the terminal was called "das," so you literally would type "das kapital" to get the thing going.

> IIRC, it could manage list comprehension but not loops.

If it was like the predecessor system I worked with, it could handle either. But in a list comprehension, it could figure out the individual dependencies of the individual list items.

So, suppose you had a a function that took in a list, and returned a list with some computations performed on each element. If done with list comprehension, and one input element changed, only the corresponding output element would need to be recomputed, and only that would be recomputed.

If done with a loop, and one input element changed, the entire output array would be recomputed (even if all but one element remained unchanged - it would be hard to guarantee that property by code inspection at the time the dependency graph is built).

> python2 and python3 in one process

This would have had me running screaming from the building

I stared at my screen with deer-headlight eyes for half a second while hearing screaming in my head too.


> I believe an influence was actually an old system that used smalltalk in the 90ies at JPM.

Imagine the benefit to the business had they just stuck with Smalltalk the whole time! It is remarkable how many businesses could have been quietly making money for years using a high-quality dynamic language such as Smalltalk or Lisp rather than going through the whole C/C++/Java/Python/Ruby/JavaScript/Go treadmill.

But I guess if they had done that then their programmers might not have gotten to type as many {}! Nor gotten to reinvent the wheel so often.

Kapital (the Smalltalk system) wasn't much of an influence on Athena. Athena was very much a SecDB/Slang derived effort.

And Kapital was an interesting experiment but rightfully died. Smalltalk just wasn't the right environment for something that complex and scale. I'd expect most lisps to suffer in the same way. And TBH it was more the image based model rather than the dynamic language per se though that was an issue.

There's a point where dynamism becomes a liability and Kapital went way beyond that point. Python fares a bit better but still, a huge effort is needed to control its failings at scale. This isn't helped at all by sticking a DAG in the middle of it as was done with Athena and its ilk. This all can be made to work but in the "cos its Turing complete" sense rather than the language helping in any meaningful way.

Source: I suffered through it.

> This all can be made to work but in the "cos its Turing complete" sense rather than the language helping in any meaningful way.

I had to LOL. You have very nicely put into words an experience I bet many have had.

> I'd expect most lisps to suffer in the same way.

Why? Eg Scheme or even the uglier Common Lisp are quite disciplined. And Clojure ain't too bad either.

As far as I can tell, Lispers don't usually go around monkey-patching things.

Elispers certainly do!

Oh, definitely. But no one in their right mind would use emacs lisp. Or any dynamically-by-default scoped Lisp.

Kapital was also rewritten in Java and survived in a few successor places ('Derivatives Studio' at CSFB). There it was my introduction to 'thunks'.

I don't anything about smalltalk, but one issue that was highlighted at that time was the inability to scale the number of "objects" to the required numbers (I can't remember exact typology, sorry). They used one of the smalltalk provider at that time, and went way beyond the upper limit.

That was the official story anyway. A lot of things were claimed when I worked there, a lot of it BS.

The culture in banking was definitely not for me: generally fairly smart people, but very perverse culture. This was the only environment I ever worked where people who actually lie to my face to make sure I made mistake when coding and fail. Everything was custom, and sometimes that meant athena, sometimes that meant terrible systems such as a "distributed database" that was a single threaded wrapper on top of an C++ STL hashmap running on NFS. The guy behind it was reworking its own protocol based on UDP instead of TCP because said system was too slow...

There was 0 abstraction, so if you wanted something such as EUR/USD pair, you had to ask the one guy who knew which id it was, so that you could get the data from in code. To this day I am convinced this was done on purpose for job security.

And all the time they saved they could have used to train new devs, reinvent a(nother) new vcs, figured out a new way to deploy image based software etc?

(I realize Smalltalk is great, but my point is it has its issues as well otherwise I recon sooner or later the advantages would be so clear that Amazon or Google or someone would be all over it since it would give them a competitive edge ...)

Hi Steve, nice to see you on here and doing well! :)

My first thought was Athena as well, but it could be Quartz (it started in 08-09 at BaML, vs Athena starting in 2006).

Both Citi and Barcap also tried (albeit to less success than JPM/BaML) to do similar things.

With that said, For those unfamiliar - at least two major banks run insanely large production python installs across thousands of developers. Core skills (nowadays) are python and react - and yes, we’re all actively hiring :)

All of the large banks also have pretty good open source initiatives - just check out github repos for JPMorgan, BaML, or GS. We have also been active participants in Pycon (key sponsors and doing sessions, like Steve’s) since at least 2009.

This thread is a real trip down memory lane!

I joined JPMorgan in 2010, as the first bank in London to offer me a Python role. I ended up staying for 8 years, almost all of it working in Athena: Commodities and FX trading; a bit of Equities; then with the Athena Core team working on the machine learning environment.

I learned a ton, worked with many hugely impressive people - on both the trading and technology sides of the business - and left with lots of good memories.

I remember interacting with you at one point. Time flies!

The word "bank" in the first sentence [0] links to the Wikipedia page for JP Morgan, so I'd wager it is Athena.

[0] "There was one remarkable legacy system when I was working in a bank sometime back."

A serious question, how do you feel about working at such a powerful bank that has been involved in so many controversies, to put it lightly?

It is not a far-fetched argument that institutes like that are only a burden to society, existing only to widen socio-economical gaps. The point of view of an insider would be very interesting I think.

Do you believe banks are a net negative to society? Or more about institutions that partake in trading and the like?

This is really quite unlikely to be Athena, which isn't scheduled to complete python 3 migration until Q4 2020 according to:


The update isn't a job for one person, and I don't think it is finished yet, unless it is ahead of this alleged schedule...

This is talking about "Athena Server Pages" which is a web framework within Athena.

Ah! Not sure how you figured that out, but it would make sense and explain the "legacy" comments (Athena is not legacy as far as I know but a component like that could be).

As someone who has worked with these Python-based bank frameworks on the UI side, the UI frameworks leaves a lot to be desired. Lots of reinventing the wheel (resulting in an inferior wheel).

Thankfully, both JPM and BAML seem to have started to ditch the UI frameworks, at least on the web side (in favor of React mostly). I don't know what the situation is on the desktop UI side. But I know what they had before was loathed by many.

Developers hated it because it was like working with one hand tied behind your back. Traders hated it because developers couldn't deliver the best product because they were working with one hand tied behind their back.

I presume the core teams responsible for building the framework hated it because trying to get support for the proprietary UI tools was painful.

I imagine hiring managers and HR hated it because they would blatantly lie about the job - that the role would be WinForms/WPF (desktop) or JavaScript (web) UI development, without mentioning that all of that was really wrapped up by the proprietary Python framework, and very rarely would you get to touch the underlying industry standard tech stacks.


I thought it sounded like Athena. It’s the best CI/CD I’ve ever used despite its quirks. I talk fondly of it at future jobs I’ve gone to

I've lost count of the number of internal projects I've seen named after mythological figures.

I've lost count of the number of internal projects I've seen named after mythological figures.

The Athena team ran out of names and one major component of it is called Bob.

Is it the job scheduler?

From the discussion here, sounds like this is Python version of K's (built in) trigger + dependency mechanism, which traces its origins to A+ and A ; A and A+ were originally developed inside Morgan Stanley by Arthur Whitney, and K was developed for a bank client IIRC, (UBS? or MS?)

This sounds really interesting; can you elaborate on it at all?

K4 uses : for assignment, and :: for dependency, so:

means c is defined to be the sum of a and b and dependent on both. If either a or b changes, c will be marked stale, and the next time you use it, it will be recomputed. It's essentially as simple as that, and is unified with database views - e.g.

    total::select sum(balance) from accounts
means that any insert/update to the "accounts" table will mark "total" as stale, and the next use will recompute it, but otherwise it behaves like any other variable.

I'm not sure if the K4 engine is smart enough to avoid recompute on updates to other columns.

If you want more fine grained control, you can set up triggers - e.g. "execute this code on update to variable a" - where your code gets the changed indices of a list/vector (or changed tuples of a table) as arguments; this is useful if a plain complete recompute is too costly, or you need to do something at the time of change, rather than at the next evaluation.

K2 and A+ had a similar system, each with different syntax.

IIRC the folks who worked on SecDB at GS bounced around Wall St. recreating this abomination in Python.

And wasn't Athena JPMorgan's attempt at re-creating something like Slang and SecDB?

Yes. If I'm not mistaken it was from the same guy (and team?) that created Slang/SecDB at Goldman?

Same team subsequently moved onto Bank of America to create Quartz, which is BoA's version of the same concept.

Then subsequently moved on to found Washington Square Technologies, where I think they're doing something similar, but not attached to a specific bank? Not 100% sure.

I've heard the concept was shopped around to a few other banks as well, but did not take root.

I have seen stuff done two decades ago in the defense industry which put to shame some of the projects I am working on nowadays: extremely modular architecture, very good service-oriented framework with hot-swapable and hot-reloadable components, automatic distribution and automatic redundancy for fault tolerance.

There really are some gems out there.

Ditto here, my friend Jan de Leur, architected the solution for the KVSA, the company that arranges for a very large fraction of all container shipping in the world to be distributed across shipping companies and vessels. We're talking 1983 or so here, the system used QnX, a very early service oriented operating system and was absolutely rock solid and scaled horizontally across many systems (it had to, given the typical system of the day). Using commodity hardware in a cluster of several 10's of PCs the system was hardware fault tolerant and as far as I know never went down except when someone deliberately caused the air conditioning to fail. On a typical monday it would handle several million containerloads without a hitch.

Hmm... it's almost like the organizations who solicit the development of software are not optimizing for the quality of that software, but some other variable. One day I'll figure out what it is.

I had the misfortune to work on (and re-write) a legacy system in a previous job and I constantly alternated between admiration and horror.

In some places the web GUI was precisely designed exceptionally well, items placed together precisely, all designed expertly around the specific domain of concern.

On the other hand, there were race conditions, corrupted data, sql injection, mountains of source files that re-implemented the same things, IE6-7 era compatibility issues, etc.

Then there was the mysterious parts of the program which I never understood, like the auto-email capabilities whose scripts could never be located, mysterious mirror servers that I would figure out existed by looking at the ip addresses of odd requests, little bits of domain logic which I would accidentally break and have to cobble back together.

In a lot of ways, many of the problems just stemmed from age, it was an early 90's application in an early-2010's world. There definitely were some not ideal software development practices that contributed to difficulties as well. But I still had a sense that the previous developer had crafted this beautiful, unique, intricately complex and inter-dependent little world.

Sometimes I was sad I had to brush all of that aside even when my changes made things more robust, reliable and compatible with the modern world.

> In a lot of ways, many of the problems just stemmed from age, it was an early 90's application in an early-2010's world. There definitely were some not ideal software development practices that contributed to difficulties as well. But I still had a sense that the previous developer had crafted this beautiful, unique, intricately complex and inter-dependent little world.

I've had exactly that experience. It was an old Apache mod_perl CGI app that used labels and goto statements for flow control, and for well over a decade nobody wanted to touch it because of that. Once I dived in to change it I realized there was a method to the madness. Mod_perl essentially takes your code and wraps it in a sub (function), so if you have subs in your script and use any sort of globals, those turn into function variables, and their use in the subs causes closures. By emulating function calls with goto and labels this problem was side-stepped.

Not how I would have done it (and not something I kept as I revised the script), but what at first appeared to to be complete insanity eventually ended up being based on the environment and the time it was built in. It's a good reminder that often what appears to be stupidity of the prior programmer is actually your own misunderstanding of the problem (and probably poor documenting of the problem as well...)

I don't know if I'm unique in this regard, but I actually find it rather fun / interesting to work on the "rewrite the legacy system" projects. I generally find it to be an interesting puzzle.

It takes a lot of patience but it is oh so rewarding. I like to think of it as "software archaeology".

It's fun to gradually discover the model that the programmer was intending to program, then to see where it broke down X months/years later and was fixed with a hack. Then, you get to design a new model that fits both cases elegantly, instead of just a subset.

You kind of feel like the scientists in Jurassic Park! :)

"software archaeology" is definitely a way I'd choose to describe it as well!

My favorite activity (in life really) is to "make order out of chaos". Refactoring from tiny feature to full rewrite is prime example of this. Adding tests, adding documentation, all of it. Loves it!

In real life I get "chaos -> order" highs from organizing my collections, tidying up house, doing dishes, etc.

If I look around code base or my house and there is chaos I get stress. If everything is "in it's spot", ordered, understood I literally feel wave of "ease" wash over me, I am able to relax.

I like working on legacy systems too, but not to rewrite them wholesale, as that just tends to cause more problems --- instead, to carefully maintain and extend them.

We often call things legacy because they were made a while ago, or the person who made it isn't at the organisation anymore, or — frankly — because we weren't involved in making it. But those are just proxy indicators for the true indicator of legacy technology: how difficult it is to change.

As far as I can tell, the most consistent definition of "Legacy" is "Existed before I got here"

By that standard, this article would not be about a legacy system since it turned out to be easily maintainable.

I think Bob Martin said something like;

"Legacy code is code without tests"

Meaning if a codebase has no tests, it's api and behaviours are extremely hard to determine and change without breaking things.

Michael Feathers, in Working Effectively with Legacy Code, uses that definition.

Super good book if you're dealing with legacy systems at work.

Mid-aughts I created a thing for healthcare data exchange. Inspired by postfix. Today you'd probably compare it to AWS Lambda and Kinesis. But radically more useful, useable. At the time, I considered it the anti-J2EE.

We could onboard "interface developers" in a week. As in first time seeing Java and deploying code to production. Normal onboarding time was 3-6 months. We figured we had 10 year backlog of work using the more traditional tools (eg BizTalk, SeeBeyond). Our firm awarded $20k bonus for hiring referrals, demand was so great.

So my stack was a serious competitive advantage.

Alas, it was too simple. Our startup got acquired by Quest Diagnostics. They loved themselves some InterSystmes Caché. Their PHBs (Directors and VPs opposed to the acquisition) just couldn't handle that my stuff ran circles around theirs. So of course it had to be killed in the crib.

FWIW, technically, an "interface" (what healthcare calls something that munges HL7 or equiv) was just self contained code. Straight up data processing. Input, transform, output. Pass in a stream and some context, stuff comes out. Compose these snippets just like in Unix. In development, you'd just run it from the command line (or IDE). In production, we had a spiffy process runner with a spiffy management UI. Built-in logging, metrics, dashboards, etc.

I did some AWS Lambda stuff at my last gig. I absolutely fucking hated it. The managed scaling is maybe nice, but table stakes these days. The programming model is just a kick in the berries.

PS- Word about InterSystems. Of course I tried to be a good soldier. My architecture was more important than the implementation language, persistence engine; I really didn't didn't care what tools we used. Oh boy. Caché might be the worst tool I've ever used. For example, one time a compiler error bricked my entire dev runtime, unrecoverable. The Caché partisan mocked me "What did you do?!" Um, a typo. "Duh, don't do that!!" Apparently Caché self immolation is normal. So keep regularly image snapshots. At the time there was no version control options; you'd export "source" and hope it'd reimport later. Ludicrous.

I've worked at a few banks and early fintechs, starting 30 years ago. I've seen some very remarkable systems well ahead of their time:

realtime distributed stream processing in the late 90s

complex event processing before it was called that

distributed application frameworks

I wouldn't call them "ahead of time" - there was a need for a certain type of service and smart people recognised good solutions. It may have been ahead of time from general availability of that tech, and internal frameworks from Big Banks are rarely discussed publicly. E.g. by far the best build/deploy system for C++ I've ever seen will likely remain locked inside a certain Big Bank, never to be seen outside their walled gardens.

Are you referring to GS’ meta-make?

No, not GS. Am not at liberty to say I'm afraid

By the title, I was ready to read an article about software that was created in 70's or 80's. Nope, 2008.

At my work, we have (all in-house and most are more than twenty years old):

  - A (much more complicated) make clone
  - A test running framework
  - A remote session tool
  - A test specification framework
  - A preprocessing tool
  - At least six domain-specific languages for specifying compilers, assemblers, linkers, simulators.
Of course, you also need to know Linux, bash, C, C++, Perl, and Python.

Needless to say, it takes some time to get up to speed. On the other hand, you can run some very simple commands and have a bunch of servers run hundreds of thousands of tests on your code, on different OSes.

Out of curiosity is that an EDA company?

We make compilers for embedded platforms

Sounds like Athena (JPM) or Quartz (BAML). Though as I understand those have a lot more official buy-in than the article suggests.

Same DNA, aren't they? i.e. all created by ex-Goldmans people that worked on SecDB and similar for Mike Dubno [http://dubno.com/wsj/giveaway.html]. I know that is definitely the case for BAML. In fact, Dubno came out of retirement to work there for a while.

That’s what I understand too. (I worked on the Quartz Core team for a couple years at BAML.)

I was at BAML just as quartz was starting, but never worked on it. I heard all the stories of them creating their own bi-temporal database etc.

I also heard in the early days it was a bit resource heavy on a user's machine.

He's definitely talking about JPM. Since he's drawing parallels to AWS Lambda, I reckon he'll be talking about their grid compute.

Not sure when the bank deprecated IE though - it was certainly in use for most internal sites when I left in 2018.

By that they mean chrome is the “strategic” solution and all new things should target chrome. IE is still required for many internal sites. They are setting up to move to Edge so both the old and the new can be accessed in one browser

So that's why Edge exists.

Nah it exists because an endless stream of security bugs is a headache.

But... if I cannot describe it as a buzz word on my resume, clearly it is worthless.

You don't need a resume if you stay at the same employer for 40+ years. Win-win!

> You don't need a resume if you stay at the same employer for 40+ years. Win-win!

I you're happy there, why not. Unfortunately over the last few years I got the impression that loyal employees don't necessasirly see the appreciation they deserve by their employers.

Getting a raise often becomes increasingly hard once you're working there for long enough. Switching jobs or even companies is often the best option to get a raise or better benefits.

most of the companies nowadays consider all their engineering hires will stay at most two years, so they think giving a raise or promotion is a pointless action and waste of money.

This is self-perpetuating. It's been my experience that moving companies has gotten me +20% in pay for doing the same work with the same title. If employers assume people will leave after 2 years and hence don't give promotions and give paltry CoL raises then obviously why would people stay? So these employers won't give raises to their current workers but will happily give new employees +20% every 2 years just for swapping names on their Resume? Obviously something doesn't add up here.

Exactly. I like my current job at $COMPANY, but I also know that they have a pretty negative procedure for raises and promotions, even though you do exceptional job. So, is anybody hiring? :)

They prefer to bring in/hire more candidates from other countries and pay them less than what current employees ask for them, because it is cheaper.

From the other side, it seems like staying at one company for say...more than 5 years tops?... is considered a red flag. See that earlier HN thread about "expert beginners" for one reasoning.

Staying in the same role for more than five years would suggest stagnation to me.

There's plenty of big pond companies with enough roles to keep someone learning and improving for decades.

I am starting year 6 at my company, which would fit your description of big pond. I'm also painfully aware I'm stagnating.

That said, unless I completely change my role and/or specialization, I don't see any room to grow.

I'm a frontend engineer with some fullstack mixed in. Going backend would certainly be an interesting change, but it would be at the cost of atrophying frontend skill/knowledge, the end result being it would be harder to get a future FE (atrophy) or BE (beginner/mediocre skillset) job. Especially since companies these days seem to be creating FE specific interview tracks.

> unless I completely change my role and/or specialization

That would be the point, yes.

I definitely did not mean to imply that spending 6 years doing the same thing at a big company is in any way better than doing the same at a small company.

My point was that since a large company likely does allow you to completely change your role, domain and specialization without quitting, you should look at years-per-role and not years-per-company.

Don't the majority of FAANG engineers cap out at Sr. Engineer and that's a title you can get just a few years out of College? So are +/-50% of Google engineers stagnant?

For companies as prestigious as FAANG, I think such rules are thrown out the window. Assuming you choose to leave, you probably will have no shortage of companies asking you to come work for them.

It's a valid point. I should have specified that I meant stagnation skill-wise, not career-wise.

Don't they have a complex, coded level system?

I'm sorry but I cannot, in my right mind, recommend anyone ever use Tornado. Not least because python now natively supports async/await. There was a thing.. an unholy union of threads, sleeps, and tornado.gen co-routines. It would always break. How could it not when mixing every concurrency paradigm imaginable. When it did everything started burning. Data stopped flowing. Many an engineer tried and failed to fix it. Have you ever seen three TLS client hellos on the same TCP connection? I have. Tornado is actually the most accurate name you can give the framework: a big whirling mess. Maybe I'm being too hard on tornado, but I kinda blame tornado for introducing co-routines before python was ready for it. The company has since moved on to golang, but the tornado is still whirling and will be for years to come.

Also at a bank, a file/database server that had two physical sides. Each side could grab a cartridge containing basically a cd-r, push it into a reader, read the track of data based on a file system like record in an oracle database. This was done to meet SEC Rule 17a-4 (Write Once Read Many) requirements.

It was old, like "Side A got stuck have to run to the DC to fix it" old and you would get errors accessing those files but not side B files. I'm guessing 15+ years. Weirdly you can grab a PDF off this thing in less than 2 seconds. This had to exist at other organizations but this was the only one I saw.

Now Azure and AWS provide the same service for pennies on the dollar.

My father worked IT with a medical record imaging system that is very similar to what you're describing. They called them platters, and they were like CDs except the size of a dinner plate or so if I remember correctly. And there was basically a giant jukebox system to switch them out.

Sounds like a proper old WORM drive with integrated media changer - had seen one in telco I worked at, it's also referenced by Plan9 file server which used one as its main storage.

I never personally developed on Athena but I remember it requiring AIM (an internal distributed file system sort of like NFS built on Hadoop that had a fairly complex data model). All the Python libraries were compiled and linked against the AIM fs and required some shenanigans to import modules making the code not very portable

Athena had two levels of code deployment:

* The core Python libraries and third-party modules, with weekly rolling releases of new/old versions for controlled deployment. These came from internal repos (the AIM system you mention) so that security scanning and licencing was checked.

* Application-level code, with Python imports taking source code from a database rather than the filesystem.

These are explained in detail in my PyData talk video, linked in my other comment.

> Maybe one third of internal apps developed in the last years don’t work at all outside of Chrome, the page remains blank or has broken widgets all over the place.

Does this statistic match what others have seen? If so, that’s staggering, and unfortunate. I know there are more sites that do this than there used to be, but I wasn’t under the impression it was such a large proportion.

I can confirm that this is the case within JPMC. Old things only work in IE and new things have been tested most in chrome. But this stat is quite high just because chrome is the “strategic” solution so it is seen as unnecessary to make things work elsewhere.

As a devout Firefox user it is frustrating (especially being told your web browser is no longer supported).

They are setting themselves up to eventually migrate to Edge so that one web browser can access both the old legacy sites and the new stuff.

I don't think something created in 2008 can be considered a legacy system

The date doesn’t matter so much (to me at least). Legacy means lack of tests and the inflexibility to change the system. Working on these systems it’s easy to have a part of the app become obsolete in 2-3 years because it’s written by one someone who doesn’t really care about technology and they want to “deliver for the business” so they crank out some shitty code with no tests, documentation, or design of any kind (eg encapsulation). The business rules are complex and not explained or understood. Then the main developer leaves and all that knowledge vanishes out the door.

I've seen systems that were legacy before they were written.

It definitely can. We have a project on our books that was started in 2010. It was built using Flex (Flash) so was already legacy before one LOC was written.

I'm dealing currently with legacy code that is less than 1 year old. The discussion is already at the level of "we would do the rewrite from zero if not for timeline issues".

I'm working on code that was built three years ago, It would be considered it legacy code as soon as it was deployed the quality is so low.

Well, this may be one of the more interesting and useful threads I've seen here in a while: Systems that are insanely robust; Well engineered; And have withstood the test of time among an ocean of hype. You don't see that very often. +1 for everyones stories

You've made an old guy feel even older LOL!

A couple of years ago I inherited a software engineering department, near the bottom of a downward spiral shipping 25+ years of VB6 code to customers. Brought in external help, and we spent a year and a half rebuilding a modern version. During that project, I saw the most awful code on a daily basis. I still have nightmares about it!

I had something similar, currently running 3k "lambdas" of rules to check if a user action is fraud or not. Running 60k qps with 32 nodes, Python2+NSQ.

A new "lambda" takes at about 1-60 seconds to propagate to all nodes.

The article does not make it immediately obvious what language version the original made-in-2008 product was written in. Python 2.7 was not released until two years later, in summer 2010.

It seems it is Python. The author specifically mentions an upgrade to Python 2.7

I suppose <2.7 then. I was thrown off by "...every good thing must come to an end; Python 2.7 was going end of life.", implying that the product was written for 2.7 and was hence facing its deprecation problem.

Python 2.7 was the last of the Python 2.x releases. They likely started with an earlier version, kept upgrading to 2.7 and then had to bite the bullet and actually rework it for Python 3.x (which is a highly similar but different programming language). Which is why that quote makes sense.

If it was only so easy. In the real world, IE might still be required for some obscure critical applications that are difficult and/or expensive to replace.

I'm surprised the SOX auditors didn't shut this down. Its the type of stuff the ITIL fanatics hate and makes IBs toxic places to work.

Remember that the alternative would’ve been a couple of massive Excel sheets held together by a few shell scripts and an intern.

A disturbing amount of the world is held together by Excel spreadsheets with no version control.

Agreed but the info sec nazis wouldn't care because there is SoD.

Fascinating story, but then it ends with a bust: 5 seconds to do anything? Seems like a show-stopper for real work...

Assuming (as one conversant with how financial institutions of this size’s IT departments work) this is most likely a system for running contract valuations; a five second lag between when the valuation is requested (process launched) and when it completes is entirely acceptable if it can be scheduled for running at regular intervals or on triggers if required, and five seconds is fine for the occasional interactive (user-initiated) dalliance.

This is almost certainly JPMC. I wonder how many systems are also relying on COBOL/mainframes?

This “bank tech is stupid” meme is ill-informed.

There is spectacular technology developed at banks, you just don’t hear about it because of secrecy.


Reads cool until you reach the "5 seconds baseline" part.

Computations in finance can be quite time intensive, and are typically repeated many times with slightly modified (bumped) inputs to compute risks (finite difference approximations to partial derivatives). An initial 5s startup is negligible then.

I know. That's Ok to add 5 seconds to a lengthy financial calculation, from the purely practical point of view. Nevertheless it still looks awkward. My entire computer boots in comparable time nowadays (perhaps some seconds more) and somehow a Python app takes this much just to start up in such an environment.

Cloud-like run environments would typically have some network to do. Comparing booting an OS from a local SSD with booting from network-reachable resources isn't really fair due to huge difference in the performance of the "resource bus" assigned to each case in my opinion.

AWS lambda is also slow to warm up

It's an order of magnitude faster in terms of warm up times - Python Lambdas warm up in ~100 ms.

Depends on the base you use for orders of magnitude. In base 10, 100→5,000 ms is 1.7 orders of magnitude, but in base 2 (which I think is useful because we intuitively understand that twice as long or half as long make a big difference) it is 5.6.

I don't know if natural logarithms make sense in this context, but if so: using base e 100→5,000 ms is 3.9 orders of magnitude.

Regardless, it is IMHO a reasonably big deal, a bigger deal than 'an order of magnitude faster' indicates.

If you're allowed to pick the base then any difference can be 'an order of magnitude'.

Applications are open for YC Winter 2022

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact