
The most remarkable legacy system I have seen - user5994461
https://thehftguy.com/2020/07/09/the-most-remarkable-legacy-system-i-have-seen/
======
daneel_w
I'll share an anecdote along the same lines, albeit much smaller in scale:

In 2004 an acquaintance asked me for help with sharing an Internet connection
to the residents of his condominium after he had failed getting a common
router/switch solution to work. The router was not playing ball unless all
clients presented themselves in one and the same subnet, which prompted the
unmanaged switch to pass traffic directly between the ports, and that was a
no-no for historic reasons relating to Windows and its malware of the time. So
I repurposed an old anno 1999 ATX motherboard with a mix of Ethernet cards -
the board offered 6 PCI ports and the condominium has 5 residents - 256 MB of
RAM, and a low-power passively cooled Pentium III to act as router and switch
in one, running OpenBSD with some dhcpd(8) and pf(4) to manage clients and
traffic.

16 years later this set-up is still in use 24/7 due to ISPs claiming logistic
problems preventing them from installing gateway equipment in the building. It
serves a VDSL2 line of 60/20 Mbit/s coming in over the POTS. At its peak the
computer had 6 years of uptime. It is still running OpenBSD 3.5 due to poor
#OPSEC on my behalf.

~~~
paco3346
Good lord, 6 years? I've had some managed switches with nearly 2 years uptime
but 6 is just asinine. I assume it was on a UPS? If so, how did the battery
healthy stay usable? Or was the electrical grid there just super dependable?

~~~
mkl
Six years without a power cut doesn't seem too surprising.

Was "asinine" the word you meant there? It doesn't make sense to me in
context.

~~~
jmnicolas
> Six years without a power cut doesn't seem too surprising.

It depends where you live and / or you might not notice them if they happen at
night.

Even in France which is supposed to be a first world country I experience a
couple power cuts every month (twice at night this week-end, it killed my tile
generation and corrupted my filesystem on my OSM server).

I probably should invest in a UPS instead of complaining on HN !

~~~
alibert
Every month is not normal at all. You should check with Enedis.

I live in France and my power cuts are less than 10 in +20 years.

Had +5 years of uninterrupted power at my previous place and at the current
flat I live, it's currently at +1,5 years.

~~~
hilbertseries
Conversely, I live in SoCal and every month is very normal.

~~~
vondur
The only power outages I have here in Socal have been scheduled, and they
replaced several poles and cabling. Out for like 8 hours.

------
stevesimmons
As the comment by timsworkaccount says, this is definitely Athena at JPMorgan.

For those interested, I gave a talk on Athena at PyData UK 2018 called "Python
at Massive Scale". 4500 developers making 20k commits a week. Codebase with
35m LOC.

The video is here:
[https://www.youtube.com/watch?v=ZYD9yyMh9Hk](https://www.youtube.com/watch?v=ZYD9yyMh9Hk)

It covers Athena's origins, what it is used for, application architecture,
infrastructure, dev tooling and culture.

~~~
cdavid
I worked on that project 10 years ago as a consultant, and it was certainly
strategic at that time. It was fairly well known that one of the main
stakeholder was pushing for those system and making a career out of it by
jumping bank every few years.

I understand the value of it, but as an experienced scientific developer and
python dev, the culture shock was huge. I don't have fond memory working on
it. It is very different from traditional programming environment, closer to a
kind of reimplementation of smalltalk env w/ python. I believe an influence
was actually an old system that used smalltalk in the 90ies at JPM.

One of the fancy thing was integrating reactive programming, which worked
through ugly hacks, at least at that time, by parsing code to detect
dependencies. IIRC, it could manage list comprehension but not loops. They
also had their own python binary w/ both python2 and python3 in one process.

~~~
stevesimmons
The culture shock you mention was very real. I joined in 2010 (when Athena was
3 years old) and left 8 years later in 2018.

I remember my first few months unlearning normal Python and figuring out how
to build the 'pixie graph', a lazily-evaluated Python DAG suited for
calculating financial instruments. It took a while to get your head around
this, but when you did it was a very powerful and productive way to building
trading and risk management applications.

To get some sense in how this worked, here are two public projects on github
with good introductions:

* [https://github.com/timkpaine/tributary](https://github.com/timkpaine/tributary)

* [https://github.com/janushendersonassetallocation/loman](https://github.com/janushendersonassetallocation/loman)

~~~
FabHK
Sounds like a descendant of Goldman Sachs‘s SecDB/Slang with automatic
dependency graph building. Did it have purple children (nodes whose value
influences the graph structure) and twiddle scopes (modified copies of
subtrees)? :-)

~~~
eru
I was positively surprised that Slang was actually quite usable. I had
expected much worse.

I suspect the Common Lisp influence was beneficial.

The weirdest thing, language-wise, that I noticed was scoping. It's neither
dynamic nor static scoping, but weird scoping. (But there are work-around to
get something like static scoping.)

Outside of the language, the whole CVS-based version control and review
process was weird. But understandable as a product of the late 1990s, when
review-before-going-into-permanent-history must have been way ahead of its
time.

~~~
sweeneyrod
> The weirdest thing, language-wise, that I noticed was scoping.

Not the spaces in variable names?

~~~
silverdemon
Those annoyed me immensely. I can feel my heart rate go up 10 BPM just
thinking about it. I guess it was meant to make scripts more readable to
minimally-techy people, but actually just made it harder to parse.

------
brmgb
I have seen stuff done two decades ago in the defense industry which put to
shame some of the projects I am working on nowadays: extremely modular
architecture, very good service-oriented framework with hot-swapable and hot-
reloadable components, automatic distribution and automatic redundancy for
fault tolerance.

There really are some gems out there.

~~~
jacquesm
Ditto here, my friend Jan de Leur, architected the solution for the KVSA, the
company that arranges for a very large fraction of all container shipping in
the world to be distributed across shipping companies and vessels. We're
talking 1983 or so here, the system used QnX, a very early service oriented
operating system and was absolutely rock solid and scaled horizontally across
many systems (it had to, given the typical system of the day). Using commodity
hardware in a cluster of several 10's of PCs the system was hardware fault
tolerant and as far as I know never went down except when someone deliberately
caused the air conditioning to fail. On a typical monday it would handle
several million containerloads without a hitch.

------
InfiniteRand
I had the misfortune to work on (and re-write) a legacy system in a previous
job and I constantly alternated between admiration and horror.

In some places the web GUI was precisely designed exceptionally well, items
placed together precisely, all designed expertly around the specific domain of
concern.

On the other hand, there were race conditions, corrupted data, sql injection,
mountains of source files that re-implemented the same things, IE6-7 era
compatibility issues, etc.

Then there was the mysterious parts of the program which I never understood,
like the auto-email capabilities whose scripts could never be located,
mysterious mirror servers that I would figure out existed by looking at the ip
addresses of odd requests, little bits of domain logic which I would
accidentally break and have to cobble back together.

In a lot of ways, many of the problems just stemmed from age, it was an early
90's application in an early-2010's world. There definitely were some not
ideal software development practices that contributed to difficulties as well.
But I still had a sense that the previous developer had crafted this
beautiful, unique, intricately complex and inter-dependent little world.

Sometimes I was sad I had to brush all of that aside even when my changes made
things more robust, reliable and compatible with the modern world.

~~~
Teknoman117
I don't know if I'm unique in this regard, but I actually find it rather fun /
interesting to work on the "rewrite the legacy system" projects. I generally
find it to be an interesting puzzle.

~~~
gen220
It takes a lot of patience but it is oh so rewarding. I like to think of it as
"software archaeology".

It's fun to gradually discover the _model_ that the programmer was intending
to program, then to see where it broke down X months/years later and was fixed
with a hack. Then, you get to design a new model that fits both cases
elegantly, instead of just a subset.

You kind of feel like the scientists in Jurassic Park! :)

~~~
Teknoman117
"software archaeology" is definitely a way I'd choose to describe it as well!

------
timwis
We often call things legacy because they were made a while ago, or the person
who made it isn't at the organisation anymore, or — frankly — because we
weren't involved in making it. But those are just proxy indicators for the
true indicator of legacy technology: how difficult it is to change.

~~~
gitgud
I think Bob Martin said something like;

 _" Legacy code is code without tests"_

Meaning if a codebase has no tests, it's api and behaviours are extremely hard
to determine and change without breaking things.

~~~
disgruntledphd2
Michael Feathers, in Working Effectively with Legacy Code, uses that
definition.

Super good book if you're dealing with legacy systems at work.

------
specialist
Mid-aughts I created a thing for healthcare data exchange. Inspired by
postfix. Today you'd probably compare it to AWS Lambda and Kinesis. But
radically more useful, useable. At the time, I considered it the anti-J2EE.

We could onboard "interface developers" in a week. As in first time seeing
Java and deploying code to production. Normal onboarding time was 3-6 months.
We figured we had 10 year backlog of work using the more traditional tools (eg
BizTalk, SeeBeyond). Our firm awarded $20k bonus for hiring referrals, demand
was so great.

So my stack was a serious competitive advantage.

Alas, it was too simple. Our startup got acquired by Quest Diagnostics. They
loved themselves some InterSystmes Caché. Their PHBs (Directors and VPs
opposed to the acquisition) just couldn't handle that my stuff ran circles
around theirs. So of course it had to be killed in the crib.

FWIW, technically, an "interface" (what healthcare calls something that munges
HL7 or equiv) was just self contained code. Straight up data processing.
Input, transform, output. Pass in a stream and some context, stuff comes out.
Compose these snippets just like in Unix. In development, you'd just run it
from the command line (or IDE). In production, we had a spiffy process runner
with a spiffy management UI. Built-in logging, metrics, dashboards, etc.

I did some AWS Lambda stuff at my last gig. I absolutely fucking hated it. The
managed scaling is maybe nice, but table stakes these days. The programming
model is just a kick in the berries.

PS- Word about InterSystems. Of course I tried to be a good soldier. My
architecture was more important than the implementation language, persistence
engine; I really didn't didn't care what tools we used. Oh boy. Caché might be
the worst tool I've ever used. For example, one time a compiler error bricked
my entire dev runtime, unrecoverable. The Caché partisan mocked me "What did
you do?!" Um, a typo. "Duh, don't do that!!" Apparently Caché self immolation
is normal. So keep regularly image snapshots. At the time there was no version
control options; you'd export "source" and hope it'd reimport later.
Ludicrous.

------
fnord77
I've worked at a few banks and early fintechs, starting 30 years ago. I've
seen some very remarkable systems well ahead of their time:

realtime distributed stream processing in the late 90s

complex event processing before it was called that

distributed application frameworks

~~~
bnastic
I wouldn't call them "ahead of time" \- there was a need for a certain type of
service and smart people recognised good solutions. It may have been ahead of
time from general availability of that tech, and internal frameworks from Big
Banks are rarely discussed publicly. E.g. by far the best build/deploy system
for C++ I've ever seen will likely remain locked inside a certain Big Bank,
never to be seen outside their walled gardens.

~~~
petr_tik
Are you referring to GS’ meta-make?

~~~
bnastic
No, not GS. Am not at liberty to say I'm afraid

------
werdnapk
By the title, I was ready to read an article about software that was created
in 70's or 80's. Nope, 2008.

------
rustybolt
At my work, we have (all in-house and most are more than twenty years old):

    
    
      - A (much more complicated) make clone
      - A test running framework
      - A remote session tool
      - A test specification framework
      - A preprocessing tool
      - At least six domain-specific languages for specifying compilers, assemblers, linkers, simulators.
    

Of course, you also need to know Linux, bash, C, C++, Perl, and Python.

Needless to say, it takes some time to get up to speed. On the other hand, you
can run some very simple commands and have a bunch of servers run hundreds of
thousands of tests on your code, on different OSes.

~~~
helltone
Out of curiosity is that an EDA company?

~~~
rustybolt
We make compilers for embedded platforms

------
jackric
Sounds like Athena (JPM) or Quartz (BAML). Though as I understand those have a
lot more official buy-in than the article suggests.

~~~
gadders
Same DNA, aren't they? i.e. all created by ex-Goldmans people that worked on
SecDB and similar for Mike Dubno
[[http://dubno.com/wsj/giveaway.html](http://dubno.com/wsj/giveaway.html)]. I
know that is definitely the case for BAML. In fact, Dubno came out of
retirement to work there for a while.

~~~
Wald76
That’s what I understand too. (I worked on the Quartz Core team for a couple
years at BAML.)

~~~
gadders
I was at BAML just as quartz was starting, but never worked on it. I heard all
the stories of them creating their own bi-temporal database etc.

I also heard in the early days it was a bit resource heavy on a user's
machine.

------
bbarnett
But... if I cannot describe it as a buzz word on my resume, clearly it is
worthless.

~~~
WJW
You don't need a resume if you stay at the same employer for 40+ years. Win-
win!

~~~
moooo99
> You don't need a resume if you stay at the same employer for 40+ years. Win-
> win!

I you're happy there, why not. Unfortunately over the last few years I got the
impression that loyal employees don't necessasirly see the appreciation they
deserve by their employers.

Getting a raise often becomes increasingly hard once you're working there for
long enough. Switching jobs or even companies is often the best option to get
a raise or better benefits.

~~~
csunbird
most of the companies nowadays consider all their engineering hires will stay
at most two years, so they think giving a raise or promotion is a pointless
action and waste of money.

~~~
devalgo
This is self-perpetuating. It's been my experience that moving companies has
gotten me +20% in pay for doing the same work with the same title. If
employers assume people will leave after 2 years and hence don't give
promotions and give paltry CoL raises then obviously why would people stay? So
these employers won't give raises to their current workers but will happily
give new employees +20% every 2 years just for swapping names on their Resume?
Obviously something doesn't add up here.

~~~
csunbird
Exactly. I like my current job at $COMPANY, but I also know that they have a
pretty negative procedure for raises and promotions, even though you do
exceptional job. So, is anybody hiring? :)

They prefer to bring in/hire more candidates from other countries and pay them
less than what current employees ask for them, because it is cheaper.

------
dcow
I'm sorry but I cannot, in my right mind, recommend anyone ever use Tornado.
Not least because python now natively supports async/await. There was a
thing.. an unholy union of threads, sleeps, and tornado.gen co-routines. It
would always break. How could it not when mixing every concurrency paradigm
imaginable. When it did everything started burning. Data stopped flowing. Many
an engineer tried and failed to fix it. Have you ever seen three TLS client
hellos on the same TCP connection? I have. Tornado is actually the most
accurate name you can give the framework: a big whirling mess. Maybe I'm being
too hard on tornado, but I kinda blame tornado for introducing co-routines
before python was ready for it. The company has since moved on to golang, but
the tornado is still whirling and will be for years to come.

------
jabart
Also at a bank, a file/database server that had two physical sides. Each side
could grab a cartridge containing basically a cd-r, push it into a reader,
read the track of data based on a file system like record in an oracle
database. This was done to meet SEC Rule 17a-4 (Write Once Read Many)
requirements.

It was old, like "Side A got stuck have to run to the DC to fix it" old and
you would get errors accessing those files but not side B files. I'm guessing
15+ years. Weirdly you can grab a PDF off this thing in less than 2 seconds.
This had to exist at other organizations but this was the only one I saw.

Now Azure and AWS provide the same service for pennies on the dollar.

~~~
jdmichal
My father worked IT with a medical record imaging system that is very similar
to what you're describing. They called them platters, and they were like CDs
except the size of a dinner plate or so if I remember correctly. And there was
basically a giant jukebox system to switch them out.

~~~
p_l
Sounds like a proper old WORM drive with integrated media changer - had seen
one in telco I worked at, it's also referenced by Plan9 file server which used
one as its main storage.

------
nijave
I never personally developed on Athena but I remember it requiring AIM (an
internal distributed file system sort of like NFS built on Hadoop that had a
fairly complex data model). All the Python libraries were compiled and linked
against the AIM fs and required some shenanigans to import modules making the
code not very portable

~~~
stevesimmons
Athena had two levels of code deployment:

* The core Python libraries and third-party modules, with weekly rolling releases of new/old versions for controlled deployment. These came from internal repos (the AIM system you mention) so that security scanning and licencing was checked.

* Application-level code, with Python imports taking source code from a database rather than the filesystem.

These are explained in detail in my PyData talk video, linked in my other
comment.

------
willj
> Maybe one third of internal apps developed in the last years don’t work at
> all outside of Chrome, the page remains blank or has broken widgets all over
> the place.

Does this statistic match what others have seen? If so, that’s staggering, and
unfortunate. I know there are more sites that do this than there used to be,
but I wasn’t under the impression it was such a large proportion.

~~~
chaseing
I can confirm that this is the case within JPMC. Old things only work in IE
and new things have been tested most in chrome. But this stat is quite high
just because chrome is the “strategic” solution so it is seen as unnecessary
to make things work elsewhere.

As a devout Firefox user it is frustrating (especially being told your web
browser is no longer supported).

They are setting themselves up to eventually migrate to Edge so that one web
browser can access both the old legacy sites and the new stuff.

------
peterkelly
I don't think something created in 2008 can be considered a legacy system

~~~
wirthjason
The date doesn’t matter so much (to me at least). Legacy means lack of tests
and the inflexibility to change the system. Working on these systems it’s easy
to have a part of the app become obsolete in 2-3 years because it’s written by
one someone who doesn’t really care about technology and they want to “deliver
for the business” so they crank out some shitty code with no tests,
documentation, or design of any kind (eg encapsulation). The business rules
are complex and not explained or understood. Then the main developer leaves
and all that knowledge vanishes out the door.

~~~
StavrosK
I've seen systems that were legacy before they were written.

------
Uptrenda
Well, this may be one of the more interesting and useful threads I've seen
here in a while: Systems that are insanely robust; Well engineered; And have
withstood the test of time among an ocean of hype. You don't see that very
often. +1 for everyones stories

------
binarysneaker
You've made an old guy feel even older LOL!

A couple of years ago I inherited a software engineering department, near the
bottom of a downward spiral shipping 25+ years of VB6 code to customers.
Brought in external help, and we spent a year and a half rebuilding a modern
version. During that project, I saw the most awful code on a daily basis. I
still have nightmares about it!

------
est
I had something similar, currently running 3k "lambdas" of rules to check if a
user action is fraud or not. Running 60k qps with 32 nodes, Python2+NSQ.

A new "lambda" takes at about 1-60 seconds to propagate to all nodes.

------
daneel_w
The article does not make it immediately obvious what language version the
original made-in-2008 product was written in. Python 2.7 was not released
until two years later, in summer 2010.

~~~
vegesm
It seems it is Python. The author specifically mentions an upgrade to Python
2.7

~~~
daneel_w
I suppose <2.7 then. I was thrown off by "...every good thing must come to an
end; Python 2.7 was going end of life.", implying that the product was written
for 2.7 and was hence facing its deprecation problem.

~~~
enjo
Python 2.7 was the last of the Python 2.x releases. They likely started with
an earlier version, kept upgrading to 2.7 and then had to bite the bullet and
actually rework it for Python 3.x (which is a highly similar but different
programming language). Which is why that quote makes sense.

------
doggydogs94
If it was only so easy. In the real world, IE might still be required for some
obscure critical applications that are difficult and/or expensive to replace.

------
x87678r
I'm surprised the SOX auditors didn't shut this down. Its the type of stuff
the ITIL fanatics hate and makes IBs toxic places to work.

~~~
FabHK
Remember that the alternative would’ve been a couple of massive Excel sheets
held together by a few shell scripts and an intern.

~~~
perardi
A disturbing amount of the world is held together by Excel spreadsheets with
no version control.

------
cpr
Fascinating story, but then it ends with a bust: 5 seconds to do anything?
Seems like a show-stopper for real work...

~~~
qubex
Assuming (as one conversant with how financial institutions of this size’s IT
departments work) this is most likely a system for running contract
valuations; a five second lag between when the valuation is requested (process
launched) and when it completes is entirely acceptable if it can be scheduled
for running at regular intervals or on triggers if required, and five seconds
is fine for the occasional interactive (user-initiated) dalliance.

------
tomrod
This is almost certainly JPMC. I wonder how many systems are also relying on
COBOL/mainframes?

~~~
jmeister
This “bank tech is stupid” meme is ill-informed.

There is spectacular technology developed at banks, you just don’t hear about
it because of secrecy.

------
fabrijunca
Nice

------
qwerty456127
Reads cool until you reach the "5 seconds baseline" part.

~~~
FabHK
Computations in finance can be quite time intensive, and are typically
repeated many times with slightly modified (bumped) inputs to compute risks
(finite difference approximations to partial derivatives). An initial 5s
startup is negligible then.

~~~
qwerty456127
I know. That's Ok to add 5 seconds to a lengthy financial calculation, from
the purely practical point of view. Nevertheless it still looks awkward. My
entire computer boots in comparable time nowadays (perhaps some seconds more)
and somehow a Python app takes this much just to start up in such an
environment.

~~~
C4stor
Cloud-like run environments would typically have some network to do. Comparing
booting an OS from a local SSD with booting from network-reachable resources
isn't really fair due to huge difference in the performance of the "resource
bus" assigned to each case in my opinion.

