I was the person who first deployed Python at Goldman Sachs. At the time it was an "unapproved technology" but the partner in charge of my division called me and said (This is literally word for word because partners didn't call me every day so I remember)
Err....hi Sean, It's Armen. Uhh.... So I heard you like python.... well if someone was to uhh.... install python on the train... they probably wouldn't be fired. Ok bye.
"Installing python on the train" meant pushing it out to all the computers globally in the securities division that did risk and pricing. Within 30mins every computer in goldman sachs's securities division had python and I was the guy responsible for keeping the canonical python distribution up to date with the right set of modules etc on Linux, Solaris and Windows.
Because it was unapproved technology I had a stream of people from technology coming to my desk to say I shouldn't have done it. I redirected them to Armen (who was a very important dude that everyone was frightened of).
The core engineers from GS went on to build the Athena system at JP and the Quartz platform at BAML.
Armen was very passionate about the value of "strats" (GS's own term for "quants", and later broadened to include software engineers and data scientists).
A favorite quip of his:
At GS, I'm like an arms dealer. When a desk has a problem, I send in the strats, and they blow away all the competition!
Also, SecDB's core idea is not just tight integration between the backend and development environment, but that all objects ("Security" in SecDB lingo) were functionally reactive.
For example, you would write a pure function that defines:
When the market inputs changed, Price(Security) would automatically update (the framework handled the fiddly bits of caching intermediate values for you, so even an expensive Price function is not problematic).
This is loosely the same idea that drives React, ObservableHQ, Kafka, and other event-streaming architectures, but I first encountered this ~15 years ago at a bank.
It's as old as VisiCalc, it's how spreadsheets work.
I built a similarly reactive system for web UI binding back in 2004, running binding expressions on the back end with cached UI state to compute the minimal update to return to the front end, in the form of attributes to set on named elements.
This immediately struck me when I was reading this article.
To be honest, this whole paradigm seems absurdly fucking efficient for the developers. But I wonder about stuff like
* What happens if the data model needs to change? If you need to move something from db["some/path"]?
* How is it coordinated at a larger scale, how does everyone know what is running and how it interacts with everything else - can you figure out what depends on an object? What if the data used by your Price(Security) object changes and breaks it?
You write conversions and there's a registry where you register them to be picked up by the unpickler. If necessary you can also customize the logic that determines which version a given pickled datum uses to deserialize. There aren't so many guardrails when you're writing that stuff, but the infrastructure does its best to support you.
> If you need to move something from db["some/path"]?
There's support for both symlinks (db["some/path"] -> db["other/path"]) and for a kind of hardlink by making both paths point to the same inode-line id. You can usually find a way to do what you need to.
> How is it coordinated at a larger scale, how does everyone know what is running and how it interacts with everything else - can you figure out what depends on an object? What if the data used by your Price(Security) object changes and breaks it?
There's a common model for the things that are shared, and that has a versioning and release/deprecation cycle. Otherwise every type has an owner and you probably had to request their permissions to read their data, so you should have a channel of communication with them. But yeah people do rely on the fundamental business entities not changing too quickly, and things do break when changes are made.
True but not really helpful for this problem, because it can only tell you about the job you're debugging, whereas what you want to know is what code might ever depend on that data.
> This is loosely the same idea that drives React, ObservableHQ, Kafka, and other event-streaming architectures, but I first encountered this ~15 years ago at a bank.
See also the "observer pattern" [0]. It's a fun exercise to implement a reactive system in Python using the descriptor protocol [1]. IPython's traitlets library is an example of this in the wild [2].
Thank you so much for this context! When I started at Lehman in 2007 and for years through the i-banks & hedge-funds, 'Sec-db' was the North Star for so much me and my friends built. It's amazing to hear from the folk who brought that to life!
Took many more years for us to understand that we had learned more about banking and making money than our masters had learned about the potential uses of the platforms we had built.
We started Sandbox Banking. Many of our friends are at hedge-funds :-(. What's the career paths of those that first built sec-db?
I worked on Quartz for 3 years and loved it. Some devs grumbled about various aspects of it, but I come from an application support background and taught myself python, so I suppose I had fewer developer habits to un-learn.
From what I understand, all this started with SecDB at Goldman, which was a the prototype for all these systems but wasn't Python based. The lore is that SecDB was instrumental in Goldman being able to rapidly figure out what their exposure was during the 2008 crisis.
Some of that team, lead by Kirat Singh went on to start Athena at JP Morgan and then Quartz. I met Kirat once, he was considered a bit of a rock star in the bank tech community. He now runs Beacon, which is basically Bank Python as a service.
I work at Beacon.io and it's an awesome place to be. Kirat is indeed a rockstar and it's awesome to work with an CEO that knows great code. We also landed a Series C last month and we're growing :)
As a full stack quant dev who is struggling to go all-in on Beacon: Do people generally like to use Glint for complex applications? Are benefits of Beacon lost when interfacing from e.g. Angular? I'm afraid that professional frontend devs might be unwilling to work with a proprietary framework, but that's speculation from my side.
Glint is an integrated framework with the platform but it does not limit you to just using the framework. The platform is designed to be as extensible as possible, worry not about being locked in :)
I took a quick look, it seems like all the postings are London or New York. What's the feeling internally about remote hires? I'm assuming that's still out of fashion in finance and Beacon feels the same?
I know Kirat really well. Fun fact, one two-week dev cycle we had 667 distinct developers commit to the secdb code base which Kirat's boss described to me as "The number of the beast.... plus Kirat"
Second fun thing. Kirat was advocating for lisp for secdb for a long time and used to rag on me for liking python when it's so slow.
Interesting to be reading all this about SecDB. About 15 years ago I was offered a job working on SecDB (I forget exactly what the position was now). It and Slang sounded really interesting.
I do sometimes regret not taking the job because the people there were wickedly sharp and the tech sounded great, but in hindsight I'm not sure I would have thrived in a bank long term. I did a 3 month internship at Lehman's which I enjoyed, but I don't think I'd have suited a career in it. One thing I did get out of it was a total lack of fear around job interviews, if I could survive the 14 hours of interviews at GS and come out with an offer, then I can handle pretty much any recuitment process :)
It's amazing how a few people have left such a big mark on a part of the investment banking industry. I missed Kirat right before exiting BAML but met all his "disciples" and Dubno..including his miniature dinosaur and telescope in his office :). Very much felt like tech religion where no open debate on merits and drawbacks could be discussed. And a lot has changed in terms of engineering innovation with turnover since that era...
Why is the number of people who "left a big mark" so small?
1. An organization/industry can adopt each new technology only once. New technologies arise infrequently. Each time they arise, only a few people get to work on the projects introducing them. In other words, opportunities to leave a big mark are limited.
2. Credit for innovation is political capital. People hoard political capital and become powerful. They act as gatekeepers of innovation and take the credit for successful projects.
> From what I understand, all this started with SecDB at Goldman, which was a the prototype for all these systems but wasn't Python based.
Correct. SecDB has its own language, S-lang, which you could probably best describe as a "lispy Smalltalk, with a hefty sprinkling of Frankenstein". The concept of SecDB was so good that other large banks wanted to get their own. Athena and Quartz have been mentioned several times in this thread, by people far more knowledgeable than I could ever be.
It's not just banking, I know of at least one large pension/insurance company who are building their own version of SecDB, with direct lines to GS. (They don't try to hide it, btw: the company is Rothesay Life.) The last time I talked with them, they were looking for Rust devs.
> From what I understand, all this started with SecDB at Goldman, which was a the prototype for all these systems but wasn't Python based. The lore is that SecDB was instrumental in Goldman being able to rapidly figure out what their exposure was during the 2008 crisis.
Correct. We used python for a bunch of infrastructure stuff (eg distributing all of secdb to all of the places it needed to go). The actual pricing and risk was written in Slang with a lot of guis that were "written" in slang but actually that caused autogeneration of JIT bytecode that was executed by a JVM. Most of the heavy lifting behind the scenes was C++. So a bit of everything.
My grandpappy always told me to cut out the middleman.
Modern C++ was heavily influenced by the need to make it simple to use directly. If you are in the business of writing code instead of reminiscing, you can now leverage move semantics, lambdas, and smarter pointers to create software that is close to the silicon.
Python might be great, but it sure is slow. Its success is founded on smart people making it easier for not so smart people to call C++ that does the heavy lifting.
A big force multiplier in the old GS secdb model was simply the speed of the dev cycle vs speed of the code. As a strat you could push slang changes to pricing and risk literally in minutes with full testing, backout, object audit logging etc.
C++ changes went out in a 2 week release cycle so changes were still fast by most standards but much slower. But yeah we had 20m + lines of C++ code so it was extensively used.
For some context (as an ex-Goldman employee myself), "Armen" in the quote is most probably https://www.goldmansachs.com/insights/outlook/bios/armen-ava... , who has quite a legendary reputation within the firm for the work he's done. He was also one of the first to be hired as a "strat", which used to be how Goldman referred to its quants who sat between front office and tech systems and worked with both sides.
SecDB/Slang originated around 1992 at the commodity trading shop J Aron, which GS had bought 1981. Later (end of the 90s) there was a push to extend it to the rest of the firm, first fixed income, then equities. Armen flew the whole world-wide strat team to NY and gave a presentation, and to drive the point home, he played a clip from StarTrek: "You will be assimilated. Resistance is futile!"
I worked on Athena at JPMorgan for 8 years, and loved it.
Seeing Python at the core of trading, risk and post-trade processing for Commodities, FX, Credit etc was such a great developer experience.
By the time I left JPM, there were 4500 devs globally making 20k commits weekly into the Athena codebase. (I did a PyData presentation on this [1] for more details).
The one downside was the delayed transition from Py2.7 to 3; I left just as that was getting underway.
There are a number of workplaces where I'd have been willing to rely on "probably wouldn't be fired", but a bank is definitely not one of them. Congratulations on shipping something useful in the face of that risk and uncertainty.
The trading community walk on a knife edge all the time, it's not a place for the faint of heart. I used to support derivatives trading systems and a few times there were issues that meant they'd lost control of orders on an exchange. Scary stuff. It requires a crazy mixture of careful, deliberate, calculated risk control on the one hand; but once you commit to something you jump in with both feet and throw everything into it.
You need to be both meticulously risk averse, and also willing to do whatever needs to be done when it needs doing, and accept responsibility. It was great!
I worked at UBS on the G10 FX options desks in Stamford and Singapore. Remember being very surprised by how my interview and training were incredibly stressful. It was very intentional as well, where my trainer knew the exact reactions he was creating with his behavior.
It was only a couple of weeks in, when I had to react within 60 seconds (USDJPY option expiry, NYC cut) on a position our front book would have lost MM on; with senior sales MDs screaming at me as well. Lo and behold, I was just used to it and could focus and execute based on my training.
My wife calls my thinking on the training - Stockholm syndrome. I still believe those skills were incredibly valuable for me, just perhaps delivered in a more 2021 acceptable approach.
Investment banks are basically risk-management shops. The partner made an assessment and evaluated the potential benefits as higher than risks. Note the word "probably".
Almost definitely false. Provided you weren't doing anything intentionally malicious with it, the risk would be that regulators might fine the bank for inadequate controls. As such, the bank might fire you for doing something that could lead to such a situation, but I don't see a criminal charge. There was actually quite a decent bit of "unapproved" software in use at one of the banks I worked in - mostly stuff that was in the process for approval, but that could take forever, so it was reasonably common for teams to run through the checks themselves (security scan, license review, etc) and move forward while the official review confirmed no issues.
Well the login message I was greeted with on every ssh connection certainly threatened criminal prosecution for unapproved software at the extremely large bank I worked at.
Unlikely? Sure. But a lawyer somewhere thought it was worth reminding me 10x/day, so going to assume it's possible provided your unauthorized software caused a serious monetary loss.
I joined the BAML grad scheme 10 or so years ago. We had a presentation from one of the Quartz guys and someone asked how they’d manage upgrading the version of Python. They were using something like 2.6.5. The whole move to 3.x was a thing. The Quartz guy just flat out said they wouldn’t upgrade.
Seemed crazy to a new grad back then, but now I wouldn’t want to consider it either.
Thanks for your contribution! It was amazing that even in my role where I didn’t use Quartz, I could see and search all the code. Felt quite novel back then.
A friend I've known since I first started on Wall Street now rides herd on the BofA Quartz libraries. One component of his job is to make developers aware of existing libraries they can use to solve business problems instead of reinventing the wheel. His theory of why they always have excuses not to do that is that they have no training in software development. They are still at the point in their learning where the are just excited that they can press keys on a computer and get it to do things they are barely able to understand.
I remember Slang when I first saw the code, a parse tree based evaluator in 1997. Come on folks. Separate parsing from evaluation. Opaque types with message passing. Inference anyone? Clearly no one read hindly milner.
Add parse time optimizations, add locals, hey globals and locals weren't handled properly. Python in the 90s anyone?
Shitty KV store with 32K object limits related to some random OS2 btree limits. Add huge pages.
Deal with random rubbish related to inconsistent non transactional indices.
Figure out you should layer nodes in a dag. A dag is topologically 2 dimensions, fairly limiting.
Figure out that's somewhat similar to layering databases, it's just another dimension.
Hmm bottom up evaluators, you actually need top down as well, create a turing complete specification. well, limit it a bit.
Ah KLL points out that layers on top of dimensions end up being combinatorial, but you can actually cache the meta traversal and it's small n.
Lots of people point out category theory parallels. Haskell is pretty but completely unusable. I'm a math guy, and it's still unusable, I don't like feeling smart when I do simple things.
But interestingly creating imperative functions with pure input/outputs with implied function calls is pretty interesting. You can create an OOP paradigm with self and args as known roots aka linda tuple spaces.
Ah and each tuple space point can be scheduled independently, some issues with serialization and locality...
Go to another bank and choose to use python, foolishly decide to rely on pickle. Do that twice. Bad idea.
But write a much better petabyte scale multi-master optimistic locking database with 4 operations. Insert, Upsert, Update, delete. WAL as a first class API.
Finally decide that writing a coding scheme to convert python objects to json is not really hard. And of course cloud native and cloud agnostic is really the only way to go nowadays.
I'm always confused why people complain about Athena/Quartz, hell we wrote it all, fix it if you don't like it. Open source it if you want other people to contribute. If we made stupid decisions on pickling data, well there's a version id, add json serialization, it's not hard, don't take things as given.
Oh wow, I remember Quartz at BAML… Though this was several years after initial deployment and when core devs left.
One day I will sit down and write a small poem about the insanity of software development based on my experience with Quartz. It will be an intriguing story of love and hate being told through a sensual dance between sales and engineering. The battles will not be epic but the consequences of one’s actions will be far reaching.
Tell me more about what worked and didn't..I recall the pain of watching QzDesktop load and Bob/HUGS jobs failing..but what else, and what did you enjoy
When all you have is a DAG all solutions end up with sales person coming at your door problem. But years down the line we didn’t have traveling salesman problem but out of date libraries problem.
The monkeys tried to make their risk taking more safe by introducing all kinda of random constraints not realising that what gave power to the whole idea was actually risk taking.
And no, they didn’t try to manage risk but have completely fail to understand that risk is what made product good in the first place.
And this is how every great idea in human history fell apart, the original people had different view of a current problem however the following generations have only understood the simplified problem and down the line the the solution to original problem had became the actual problem.
I still believe the poem would make greater justice to the whole idea, but what I tried to explain here is that the simple things we have all witnessed actually hide much deeper truths about life in general.
And from there I assume QzDesktop loading times where not the issue at the beginning of this deterministic chaotic system, but simply one of the possible generational products.
I am yet to understand how to solve the monkey issue.
Curious why would this high-powered person go to bat for a technology decision they didn’t seem to have done any risk assessment of? Wouldn’t he be liable if something was exploited and hurt and company, like his head would be on the chopping block for giving the go ahead when they traced it up the chain of command?
This is only my opinion, but I think the reason Armen said it like he did was because by not making it an order he's giving Sean the option of not doing it, if he's not up for accepting the risk. However the risk was both of them could have got fired.
Armen must have known people would know Python had been put on these machines and that he authorised it, in fact what's the point of putting it on them if nobody knows and nobody uses it? I can guarantee you that within 24 hours someone was asking Armen why he'd authorised this and was justifying it. There cannot have been any possibility of dodging responsibility for this decision. If anyone got fired it would have been Armen, with a possibility of Sean going as collateral damage.
This is the big league. You make your decisions and you accept responsibility for them.
Exactly, so what is this Armen character getting out of this other than a potentially big amount of liability and unarticulated risk.
The OP said he told people openly that Armen told him he could do it when asked.
This makes no sense to me, what’s the upside to Armen? If he is business savvy, he needs to be gaining something in exchange for having his name thrown around by OP as signing off on this.
>In 2011 Goldman Sachs put its top computer wizard, Armen Avanessians, in charge of the division. He has helped turn round its fortunes. The arm’s assets under management reached a nadir of $38bn in 2012, but it now manages $91.8bn...
Maybe that bank is more generous. The one I worked at begrudgingly counted out the pennies like it was coming out of the war orphans fund or something.
He's doing his job, which is to ensure people have the tools and resources available to do their jobs. You know, furthering the goals of the organisation.
You say he reached out to you and asked if you liked Python. He probably wanted to roll out Python and was looking for someone who wanted to do it. If he told someone to do something they weren't passionate about, they would fail. He wanted to make sure it succeeded, so he reached out to you.
If he's such a bigshot and everyone was frightened of him, he must not have been afraid of them. When he said, "you wouldn't get fired," he probably meant what he said. He was giving you air cover. And it worked. When the gnomes came out after you, you just sent them to him. And they didn't bother you again.
I can imagine how the conversation went:
"Armen, did you tell Sean to install Python"
"No, did he"?
"Yes he did!"
"Great!!"
Now the gnomes are on their backfoot and have to defend why Sean shouldn't install Python If this guy Armen told you to do it, Armen has to defend himself to them.
Hehe. Well I guess you would have a unique insight into his thought process. ;-) But yes indeed that's certainly another explanation and it did indeed work that way.
Well thanks for the air cover, and for all the other opportunities you provided for me and others at GS. I really appreciate it. It was an amazing time and I learned a great deal.
I worked for GSAM for a bit as my first NAPA project - I guess you're referring to Armen Avanessians? Haha haven't heard references to the "train" in so long. Did you ever do any Slang/SecDB dev? I was mostly in FICC Tech so was pretty much slangin' slang most of my time there.
JSI (Java Slang Integration) was just getting off the ground but there wasn't too much for the front office tech teams to do there until it was to mature in the coming years.
Yes, that's who I was referring to. I did absolutely tons of slang and a fair amount of TSecdb as well as a lot of work on the C++ infra and the build and distribution code.
There was no single person that introduced python at Citigroup that I am aware of. It came in via a variety of teams mostly because of the fact that the alternative was perl, and no one wanted to write perl (yet somehow kdb was acceptable a few years later).
I'm seeing a lot of people speculating about which bank this might be; I think the point is that it's all of them. I could loosely describe a previous job as implementing Morgan Stanley's Walpole and integrating more source code management into Minerva (even though that system wasn't actually Python-based).
Having a global view on everything is large banks' value-add, it's why they haven't been outcompeted by their more nimble competitors. Being able to calculate the risk of the whole bank isn't just a cool feature, it's the core value proposition of this platform.
Being able to just upload your code and run it is really cool, and if you squint it looks a bit like what the outside world is trying to set up with serverless/lambda-style platforms - just write a function, submit it, and there, it's running. (But it's worth remembering that Python is not a typical programming language; python build, dependency and deployment management is exceptionally awful in every respect, this isn't as big a pain point in other languages). Obviously there's a tension between this and having good version control, diffs, easy rollbacks etc. - but because Minevera is already designed to do all that for data (because you need that kind of functionality for modifications to your bonds or whatever), doing it this way strikes a much better compromise than something like editing PHP files directly on your live server.
What this article calls data-first design has a lot in common with functional programming. I hope that as the outside world adopts more functional programming and non-relational datastores, Minerva-style programming will get more popular. It really is a much better way to write code in many ways. The difficulty of integrating with outside libraries is a shame though.
It's not a single giant pickle dump; each individual object gets pickled and stored in Minerva (which works more or less like Cassandra or something). It's a pretty similar high level design to what the likes of Google or Facebook do do where you store everything as protobufs in BigTable - the bank uses pickle rather than protobuf because they put a higher priority on being able to store arbitrary objects and deal with robustness/compatibility later, rather than having to write a proto definition and a bunch of mapping code up front. You wouldn't want to use a relational database because they're not properly distributed (and, frankly, kind of bad and overrated).
The Minerva I worked on was temporal and append-only, like a HBase that never did compactions (so "delete" actually just writes a tombstone row at a particular timestamp - there was an "obliterate" command but you needed special authorization to use that), and it was distributed (with availability zones even) so you didn't really worry about losing data; loading data as-of a particular timestamp was part of every query (and implemented efficiently). There were probably regular dumps somewhere too but I never needed to encounter those.
So Minerva is like a distributed datastore, specifically for python object storage ?
Interesting. Do you think you would do this today with a Cassandra/Hbase? Can it be done - let's say take python 3.10 and the latest Cassandra (or even better - something like Firebase or Cloud Spanner).
Just curious that in a post AWS/Firebase world, can something like Minerva be built, without investing in writing the db store ground up.
The incarnation of Minerva I worked on actually used Cassandra as its storage backend. But it's something that's not particularly useful piecemeal; the great value of Minerva is that all the bank's data is there and it's all temporal, all access-controlled and all the rest. The most fragile and cumbersome parts of Minerva are the parts where it integrates with an external/legacy datastore - but if you tried to introduce a Minerva-style datastore as a small piece in a system that was otherwise using a "normal" technology stack, those integrations would be most of what you made.
ZODB is the object oriented database as a giant pickle dump. Surprisingly, it works and scales wuite well. The downside is that non-Python tools cannot access it all.
I learnt Python via Zope in 2000, and attended the Zope Conference in Python that year.
Joined JPMorgan in 2010 to work on Athena, and immediately had a real sense of deja vu... Athena's Hydra object db (essentially an append-only KV store of pickles) felt like a great grandchild of Zope's ZODB.
I remember explaining our tech stack (Python and Zope) to clients.
“Where is the code for that page?”
“It’s in the database”
“Oh… Like MySQL?”
“No. It’s an object database”
“???”
I called it “Martian Technology Syndrome”. But it worked. At later stages we paid the price and had to serialize the datastore for migrations, but that’s what you get for relying on pickles.
Specifically, until you realize that pickle changes based on python version, so updating from py3.x to py3.x+1 will prevent your application from reading previously stored data.
This is wrong. pickle can read old files just fine and lets you generate files in old pickle format versions if you require backwards compatibility further than when the current protocol was introduced (it does not get increased with each python version).
I think it’s mostly true. The complaint with this kind of “industrial global Python code base”, be it at banks or elsewhere, is that often they are hastily cobbled together and depend on extreme user care to not flop over all the time.
I guess banks are the archetypal places that care only about feature creation and not about maintenance or technical debt. When something does break in the end, someone senior just shouts at the poor devs until it works again - usually fixed with a hasty patch again.
Similarly, documentation? Access control? Sanity? These seem to be left behind.
You could not be more wrong. Quartz has first class documentation, solid tooling, very well thought out and rigorous code review and access controls. Banks are regulated up to the eyeballs, so everything has to be audited and justified in detail.
It's not nirvana, these are real working systems built by humans with human failings. There are tradeoffs. Not every application is suited to these sorts of platforms, but the people building these things are top notch technologists and know what they're doing.
For some reason people think everything is as good as it can be at FAANG and other big name tech companies, and everyone else just walks around with their pants down at their ankles bumping into walls until 5:00. It’s just not true.
I mean, it's not just FAANG. Literally everyone except banks use the standard Python environment. simonh is right about the reason for the forked tech stack.
Actually, it's probably not FAANG folks at all. I'd expect ex-FAANG folks to be more sympathetic to the forked python situation... FAANGs have an abundance of non-standard and frustrating infra (wasn't 5TB just posted yesterday?), and maybe even on steroids compared to banks (do any of the FAANGs not have at least one custom linux kernel?) Hell, both As roll a shitload of their own silicon.
Google invented "NoSQL" before anyone else knew what it was, and all those "cloud" tools they used internally were obviously proprietary (except the ones they open sourced). Ex-Googlers I work with typically had to spend quite a bit of time re-adjusting to the "inferior" tools and processes in other companies.
Microsoft invented their own development ecosystem, and the only reason it's "common" or "standard" in the tech community is because they sell it as a product. This is the same for Apple at least for iOS development, and Amazon for their cloud service offerings.
When companies have millions of dollars to spend on maintaining a custom development environment that they think will give them a competitive edge, they will do it. It's the smaller shops that can't afford not to go with the flow, so to speak.
I mean Google has Bazel which has reliable hermetic distributed builds with modern statically typed languages. I don't think this weird custom Python system is really in the same league. Barely even playing the same game.
Well, I don’t know about Quartz. I worked with DB systems and they were awful in that regard. They worked, but largely because people stuck to convention.
For example, changing scheduler jobs required submitting a change in Excel and having it approved (twice...) by someone. Except the table was world-writable and changes not logged. So in principle only your appropriate superior could approve change, in practice anyone could, and you’d never even know.
Youch, that's nasty. I can completely believe it though, banks are huge unwieldy organisations. I spent a fair bit of time working with auditors though, so I know a huge amount of effort goes into rooting out things like that.
The thing is just because a team in a bank did this thing, that doesn't mean "The Bank" thinks that's a good idea. Like any company, banks are communities. I'm not making excuses, the fact this system wasn't properly architected is a failure of governance, but I've been on the other side of this trying to get teams to fix their problems and adopt resilient processes and procedures. Every offender thinks their service is special and their violation of the standards is justified.
Yes, it's fine IMHO. Ok it's not your favourite Java IDE, but it's way, way better than some of the crap I've had to use at various places. But then I wrote a 5Kloc PyQt desktop app almost entirely in IDLE so yeah, maybe I'm not the best judge.
Hmm, I found the opposite - the fact that there was this global framework that managed all the data and code meant that access control was actually pretty good, better than most tech companies I've worked for. You had a single source of truth for what your access rights were, there was integrated Kerberos any time you needed to access a system outside Minerva. And having all the code in a managed place meant good deprecation cycles - not instant deprecation like the Google monorepo, but tracking and policy for which old versions of libraries were in use and how much that was tolerated. Documentation was at least attempted, and while platform stability/enhancement work did have to be coupled to business initiatives to a certain extent (e.g. "we're doing this performance work to enable us to run risk estimation more often to meet MIFID requirements at low cost") there was leadership that put a value on maintaining high quality code and this paid dividends.
This let the front-office devs be highly productive on adding real business value for their trading desks.
Remember also that these systems at GS, JPMorgan and BAML started around 2007-2010. The infra we all take for granted today at AWS/GCP/Azure simply did not exist back then, and banks' data security policies at the time did not allow cloud processing.
> Remember also that these systems at GS, JPMorgan and BAML started around 2007-2010.
GS had „these systems“ well before 2000 (via J Aron). I think around the time you mentioned they spread to other firms (in their Python reincarnation).
> I guess banks are the archetypal places that care only about feature creation and not about maintenance or technical debt.
It depends which department you are in, but in general: absolutely not. Actually the reverse is true.
Banks have huge risks to manage: just imagine for instance what damage a hack of their account system could cause. Or a crash of their payment system. Therefore it is of the utmost importance that systems are stable and bug free. In most departments, feature development is only in second place: stability and reliability have priority one.
This concern is so important that it is not just left to the responsibility of the banks themselves: for many systems, banks have to comply with external standards and are audited for that by external agencies.
If you compare Python's deployment and dependency management to those of statically compiled languages like Go, Rust, Zig, or Nim, you quickly see the experience with Python is quite poor.
In all the above languages, you simply ship a statically compiled binary (often just 1 file), and the user needs nothing else.
With any sufficiently complex Python project, the user will need:
1. virtualenv
2. possibly a C compiler
3. recent versions of Python (and that keeps changing. 3.0->3.4 are "ancient", and 3.6 seems to be the absolute minimum version these days --- due primarily to f-strings)
4. Or you ship a dockerfile and then the users need 600mb of Docker installed
I sometimes joke that in the future every Python script will require a K8s deployment and people will call it "easy".
Python is a great language, but deployment is a massive pain point for the language.
When I know I am writing something that has to "just work" on a wide range of systems that I don't necessarily control, well I don't write the solution in Python. I pick Go, Nim, or Rust (Zig would be a good choice too).
Or you just provide your own Python package. Most of the time that will be less than 100 MB if you don't include huge libraries. You can test and build automatically. For deployment you then have an installer or rpm that is probably smaller than most of the other enterprise software your customer's infrastructure admins are handling.
The problem is "less than 100mb" is unacceptable (for my use cases).
There are use cases where 100-300mb is no big deal and customer can handle this.
But single binary deployments with a statically compiled language where a fully-featured binary can weigh in from 5-30mb are what I'm after.
And honestly, with upx I can take even those fat Go binaries down from ~35mb to 7-8mb. That's an order of magnitude less than 100mb of Python and all it's dependencies. Not to mention with all those languages I mentioned (Go, Rust, Nim, Zig), I get multi-threading and high-performance as well.
I think there are too many options, or not enough direction for busy people. Once you understand how it all works and pick / build the right tools it all works pretty well.
I would agree for all but deployment. I know my way around python reasonably well, but pyinstaller and friends still make me have bad days pretty regularly.
Four Python projects, same customer, five different deployment systems. Docker, a Capistano look-a-like I coded in bash, git pull (their former standard), git format-patch plus scp, zip archives. Yes, python file.zip works if it contains the right files. Probably the latter is the easiest way, except it doesn't address the dependencies.
Python is literally decades behind. Dependency resolution is nondeterministic by default. The way you run a build is still not remotely standard (e.g.: I've downloaded the source of one of the top 20 packages on PyPI. How do I run the tests? Perl had a standard way to do that back in the '90s). Deployment is so bad that people recommend using containers as a substitute for something like fat jars.
Am I the only one who has never had enough headache over the years to say its awful? I do think dependency management is a somewhat difficult problem to solve and a lot of systems have pros/cons but I never have had huge issues with Python's.
BAML Quartz was conceived by a bunch of front-office quants who had not the first idea about the software needs of a big bank beyond the front office. There was an arrogant assumption that front office software is obviously the most complicated/difficult variety of software within a bank and therefore any system designed with front office requirements at the forefront would, of course, be perfect for universal use.
This assumption was challenged at the time by various groups - I was closest to the Equities Operations software team (although not part of it) who absolutely dug in their heels and refused to use Quartz. The assumption was explosively invalidated when people started implementing in Quartz applications that fell under Sarbanes/Oxley regulations and Quartz picked up a severity 1 audit finding - because Quartz was explicitly designed for "Hyper Agility" (literal quote from the quartz docs) - and anyone-can-change-anything-at-any-time does not make for applications that the regulators trust.
There was an interesting trajectory of Python hiring during my time at BAML. I joined just as Quartz was getting started and we managed to easily hire tens of python devs in London because it was easy to sell the fact that BAML was making a strategic investment in Python and therefore their (at the time relatively uncommon) skills would be highly valued. But as Quartz matured, Python developers generally came to dislike it (for reasons see original article) and it became hard to retain the best ones. And after a while Python 2.x became a massive embarrassment and, as Python became a more common skill in the marketplace, it became harder to hire good developers into BAML.
> BAML Quartz was conceived by a bunch of front-office quants...
It was worse than that. What they actually built was a system designed to support complex hybrid structuring. It's what markets desks had been making a lot of money in prior to the crash esp GS. Unfortunately, post-crash there wasn't much money in structuring so the Front Office was more interested in investing in flow. Quartz was really, really bad at flow.
It took a long time (and the departure of Mike, Kirat et al) to get Quartz to a position where it was a reasonably sane FO system for the world as was rather than as it had been.
I was at BAML when the sev 1 audit finding happened. My view was from an application support team in Risk. For us Quartz was fantastic, and it had a pretty decent permissions system. The problem is there were two miss-aligned goals.
On the one hand the goal was to build a single enterprise scale system with a holistic view of the bank's data to do rapid ad-hoc position evaluations and meet new needs rapidly.
On the other hand, access to all that data and all the code is clearly a security concern. By the time I left the sev 1 finding was well on the way to being mitigated, but for example it meant that instead of handing out quartz developer accounts and IDE access like candy it had to be restricted to technology personnel only.
> in London because it was easy to sell the fact that BAML was making a strategic investment in Python
I reckon I "felt" that push to hire at one of the early PyConUK, where your boys suddenly showed up with a big contingent. I even thought about applying, but I was not based in London - and there were some red flags, like running a pretty old Python version (I thing it was 2.2 or 2.1, when 2.4/2.5 were the expected mainstream), that kinda sounded like I'd be signing up for the modern equivalent of mainframe maintenance.
Also consider Application Support. I know it's not sexy rockstar dev stuff, but if you can get into App Support on the Quartz (or Athena I suppose) environments you get a dev account and access to all the tools. You can view all the code, config and running systems. If you have a good relationship with your dev team you can submit patches e.g. to improve logging. The live log files of all your applications are just a URL away.
If you're up for it, you'll spend a significant amount of time in the Quartz IDE. There are teams within App Support that develop monitoring and compliance reporting tools in Qz and do about 50% development. I know because I ran one. One of my team transferred into our dev team.
Yes. Many adverts will specify financial services experience but it's worth applying anyway. You'll probably find that roles in back-office technology areas (operations, finance etc) are less demanding in this respect. I hired mostly from outside the financial services industry because other industries had, on average, better-skilled developers, lower salaries and better development practices.
Depending on what kind of engineer, it is far better to go to the finance (front office quant, back office risk) side than the tech support side. They are less snobbish about autodidacts and pay is far better if you are willing to learn about things outside the dev sandbox.
(Our front office has a few quants and ex-quants with electric engineering background, I don't know of any software engineers there.)
Rule of thumb: the closer to the business (ie front office), the more money and stress.
(Front office deals with clients, and in this context comprises sales, trading, structuring. Middle office run control functions, reporting, risk, compliance, etc. Back office would be settlement, accounting, operations, etc.)
> I've mentioned that programmers are far too dismissive of MS Excel. You can achieve a awful lot with Excel: more, even, than some programmers can achieve without it
This is one of the most underrated topics in tech imho. Spreadsheet is probably the pinnacle of how tech could be easily approachable by non tech people, in the "bike for the mind" sense. We came a long way down hill from there when you need an specialist even to come up with a no-code solution to mundane problems.
Sure the tech ecosystem evolved and became a lot more complex from there but I'm afraid the concept of a non-tech person opening a blank file and creating something useful from scratch has been lost along the way.
problem as described to me is that excel starts being used for regulated processes and it's not well auditable, access controlled, changed controlled, tracked, etc etc. Then people need to implement the exact same process across departments and they're all using a separate excel sheet and they all submit different numbers. becomes a huge mess and so much more complicated and expensive systems become commissioned.
Fun story: I was at a bank that used Excel for everything. As you say, there came a complaint from the auditors that it's not well auditable, and there needed to be "a system".
Solution: the bank put together a system that constructs (from Excel templates and the bank trading data and market data) Excel spreadsheets from scratch every day, then used those for the calculations, and stored them. But now it was "a system", so all good.
Well you can audit the code that generates spreadsheets, which seems to solve the audit problem. Kind of like I prefer reading a Dockerfile that builds a program from the GitHub repo, rather than downloading a pre-compiled package I can't trust.
sounds like a great system. we have something similar where we put excel in and out but doesn't sound as slick as that. on top of the system there is access control, versioning and such. the data gets approved and then stored in the backend to feed the regulated process.
This describes what I've seen happen with Excel over and over again. I'm curious if the use of collaborative Google sheets could be a fix for this? Something where a portion of the sheet could be shared globally, but the rest of the document would be local to the instance working on it.
There's an excellent example of this phenomenon in the JPM "London Whale" report where -- at various points -- poorly maintained and validated spreadsheets appear as minor villains in a $6.2bn loss.
The jargon for this is "user-developed application," and auditors do keep an eye out for these. Banks, from what I've seen at least, typically have some process to document these as they come up, replace them with supported solutions, and retire them. At least, that's the "happy path," where people are willing and able to get all that done before a big-three auditor comes in and tears you a new one.
Plus, a spreadsheet is basically purely functional (unless there's mucking around in VisualBasic), and has a beautiful dependency graph and calculation engine! (And that is a big part of what SecDB/Slang/Bank Python brought to the table.)
I think part of the problem with Excel (or clones) is that you can do so much haha. Its such a powerful tool, that you end up doing things in it that it really wasn't optimized or designed for and managing the change history in excel is pretty tough.
Very true. I often prototype algorithms and things in google sheets. One time I had backpropagation working in there, with a little button to process the next "row" of training samples.
The problem is sometimes analysts turn into shadow BI or even DE who only know Excel. They know Excel so well that they create a whole monstrosity in Excel. MSFT has been sort of encouraging that too by introducing some Power BI feature and now Javascript into Excel.
I can see the benefits of this collection of tools within an all-in-one monolith. Ease of deployment is a big benefit. I can also see the costs. As a stack its probably better in some ways than how a lot of other businesses operate as well as worse. There's probably a lot both ways.
The mainframe mindset might be a factor here as well. The giant mainframe where all the magic happens is still a thing to behold and this is definitely part of banking's history and present. Mainframes are beasts and are still far from any kind of obsolescence. A monolithic Bank Python with a standardised set of libraries etc would slot right in to that mindset and way of thinking.
The part about programming languages frequently not having tables is interesting. The closest as mentioned is the hash, but you lose so much in that abstraction eg the relational aspects. The counter argument then becomes the obvious: why aren't you using a database library, or in a pinch, sqlite? Rightly so. Why would you add relational tables to python rather than have a generic python database spec or a collection of database connector libraries. Databases are separate and large projects in themselves.
I'd still be overly disturbed if they were running some old python 2.5 or similar. Just saying. That would be a source of pity.
> The part about programming languages frequently not having tables is interesting. The closest as mentioned is the hash, but you lose so much in that abstraction eg the relational aspects. The counter argument then becomes the obvious: why aren't you using a database library, or in a pinch, sqlite? Rightly so. Why would you add relational tables to python rather than have a generic python database spec or a collection of database connector libraries. Databases are separate and large projects in themselves.
The separate datastore is the problem to be solved here - databases, especially relational databases, are extremely poorly integrated into programming languages and this makes it really painful to develop anything that uses them. You can just about use them as a place to dump serialized data to and from (not suitable for large systems because they're not properly distributed), but if you actually want to operate on data you need it to be in memory where you're running the code and you want it to be tightly integrated with your language and IDE and so on.
(It's not even the main benefit, but just as an example of that kind of integration, when you're querying large datasets Minerva works a bit like Hadoop in that it will ship your code to where the data is and run it there)
Funny thing is, databases were tightly integrated into programming languages all the way back in 80s - that's exactly what dBase was, and why it became so popular. FoxBASE/FoxPro, Clipper, Paradox etc were all similar in that respect.
And yes, it made for some very powerful high-level tooling. I actually learned to code on FoxPro for DOS, and the efficiency with which you could crank out even fairly complicated line-of-business data-centric apps was amazing, and is not something I've seen in any tech stack since.
> FoxBASE/FoxPro, Clipper, Paradox etc were all similar in that respect.
> the efficiency with which you could crank out even fairly complicated line-of-business data-centric apps was amazing, and is not something I've seen in any tech stack since.
Did you ever get to try Delphi? Those "line-of-business data-centric apps" is what it was all about.
And I'm not quite sure, but I think and hope Free Pascal / Lazarus is close to that in ease and power.
> The separate datastore is the problem to be solved here - databases, especially relational databases, are extremely poorly integrated into programming languages and this makes it really painful to develop anything that uses them.
Hence "Active Record" ORMs like Rails and Django being highly successful. They functionally embed the RDBMS into the language/app (almost literally if using SQlite), which is a huge boon for developer productivity...
...but also a significant footgun, because it means the database is now effectively owned by the Active Record ORM and its (SWE) team, and not by some app-agnostic data team.
Want to reuse that juicy clean data managed by Django? Write a REST API driven by the app; don't try to access the data directly over SQL, although it may be tempting.
> Hence "Active Record" ORMs like Rails and Django being highly successful. They functionally embed the RDBMS into the language/app (almost literally if using SQlite), which is a huge boon for developer productivity...
Right, those are a step in the right direction, but still a lot more cumbersome than properly integrating your datastore with your application.
The first-blush conversion from Excel to this ecosystem only needs lookup tables. Excel has some static database I/O, but people who only know Excel use it as dat input for lookup tables.
The Python results of that first conversion need to test against Excel, so it’ll have identical lookup tables.
> The part about programming languages frequently not having tables is interesting. The closest as mentioned is the hash, but you lose so much in that abstraction eg the relational aspects. The counter argument then becomes the obvious: why aren't you using a database library, or in a pinch, sqlite? Rightly so. Why would you add relational tables to python rather than have a generic python database spec or a collection of database connector libraries. Databases are separate and large projects in themselves.
This is covered in the article, in the distinction between "code-first" and "data-first". Databases means that you leave the interaction with data to a third party, and the only thing you do is send commands and receive results. This is very different from having all the data in your program, and starting from that. I'm not sure if "code-first" is the right word from it. Perhaps another way to put it would be that when data is the most important thing, you don't want to encapsulate it in a "database object", you want it to be right here.
I worked at one of the largest of these systems. It seems to be the one referred by the post.
The global distributed store of pickled python objects using Event Sourcing was one of the most horrible and expensive database systems I've ever heard of. It runs on THOUSANDS of expensive servers with all data stored in-memory. To get the state of a single deal you had to open, decompress, deserialize, and merge hundreds if not thousands of instances. And 90% of the output was more often than not discarded.
The Python interpreter extensions reveal the ignorance of Python by the original developers. There was no good reason to fork CPython.
There were many small subsystems created and supported by lone rangers with impressive CVs and astronomical salaries. A JIT better than any other one out there (but with a lot of limitations). A meta-query system extremely elegant.
But this all was a sham. The actual daily crunch/analytics was run on more classic SQL/Columnar clusters. From the distributed object database hours-long running batch jobs loaded stuff on old school DBs. And those blew up frequently. Sometimes those blow ups cost many millions in delayed regulatory reports. The queries running on top of SQL were beyond stupid and the DB engine could not optimize for them. And of course, people blamed SQL and not the ridiculous architecture and the OOP dogma.
Don't work for old school banking, hedge funds, or anything like that. They are driven by tech cavemen and their primadonnas. Exceptions might be some HFT and fintech shops.
It looks like sound ideas, poorly implemented. A lot of these capabilities crop up in BEAM or Smalltalk. I do wonder if Erlang had better developer ergonomics we wouldn't have seen that instead of Python as a basis.
That proprietary IDE was a piece of crap. Monkey patching was prevalent, exponentially increasing startup time depending on the last time you opened it, and an “online” source tree where one could easily modify the source code in someone else’s ‘private’ workspace.
PyCharm was a move in the right direction, but the way it worked was absurd- it would run the internal IDE in the background and sync to the file system. Given the proprietary IDE took up more resources than PyCharm, you constantly had to shut down apps so the machine had enough memory.
IDEs should not be a requirement- they are tools… but you had no choice but to use it and their totally flawed code completion. Their measure of success was tantamount to having a Jupyter notebook- write code and get back results immediately.
I guess it was originally created as a productivity tool as there was not any good Python IDE back before the year 2010. But after a while it became a monstrosity and new tools emerged but it was rooted so deeply that it was impossible to implement the new tools.
But as said I'd really love to work on those projects. Most of the people don't get the prestige to own one's own baby in a big corporation. 99% of the job is maintaining a shit mountain of code and piling new shits on top of it. It really took a lot of luck to be able to make it happen.
I mean even for people who get to work for Jetbrain or Microsoft Visual Studio team, they don't get to create new IDEs, they are buried deeply in a shit mountain of code and JIRA issues.
Plus the pay and vacation is really good for the banks.
Both frontarena and murex use python as their “vb” kind of language. If you thought your deployment pipelines were weird, ours have included putting entire python apps into single strings and inserting them to an oracle db, where a fat windows client selects them and runs them on a windows python interpreter … via citrix… :/
This reminds me of an e-commerce system which stored data in a mixture of Oracle, and text files on the local disk. We handled backups by loading the text files into blob columns in the database, and then just backing up the database.
Reminds me of what we used at the ATLAS experiment at CERN* . Python was tightly integrated with the application framework, Athena (which I just realize has the same name as JPM's Python framework!). You could use it as a job description language, and you would compose computation steps from classes you could write in C++. I think there was a separate `athena` executable that was just python with some packages pre-loaded. Because of all the binary modules, but even more so because of the minor syntax changes, the transition to Python 3 was really a problem (I hope they did it by now :-D).
There was also a bespoke time-span database. You could store keys and values in there, but every data point had a start and end time. Then you could query what the values were between certain times, or run numbers (operational periods). We used it for example to store what configuration the detector was using when a certain dataset has been recorded.
(* I've been out for a couple of years so I don't know what they use now, but I imagine it hasn't changed much.)
I go for James Bond references, when I can. 'Moonraker' is always a great choice.
Sometimes I'm constrained to a specific starting letter, so I've had to stretch it at times, like when I needed an 'S-' word... ended up going with 'Sinatra' since Nancy Sinatra performed 'You Only Live Twice' for that movie
Doesn't seem that strange compared to K[1] or Q[2], which are used by Wall Street banks. K encourages you to use single-letter variables and bunch your code up as tight as possible into long lines. Here's an example: [3]. Interestingly, their Github repo has some K-inspired C[4], Java[5], C#[6], and Javascript[7].
This took me a while to figure out but K/APL code is built on a different value system than most software. Specifically, the goal is to be able to see and operate on the entire program at once. Obviously this only works for programs up to a certain size but that size is larger than you'd expect when abstractions, variable names, and general legibility are sacrificed. I wouldn't write code this way but I can see how someone would find it valuable.
The big banks don't write code in K4 though, managers generally encourage people not to write code in it due to the difficulty of finding developers who are fluent in it.
They all use q and q is very wordy and highly readable if you speak english. It's mostly just developer defined functions which are compositions of the keywords of which there are not many: https://code.kx.com/q/ref/#keywords
Most of the code you would see in a kdb+ system in an investment bank won't look like any of the links you've provided.
Do you have any suggestions for q code to look at? Every time I try array languages I bounce off the ubercompact and I feel like there's actually a chance I could learn something more q like.
Project Euler frequently has ultra-short solutions in K/J/whatever other single letter they’re using at the moment. It is quite intriguing, but ultimately I put too much store in readability, so decided not to pursue these.
> Dependency graphs are an elegant solution to risk management and pricing etc.
Dependency graphs are not a solution to risk and pricing. They are, in certain circumstances, a very useful tool. That's all. They also scale notoriously painfully.
Putting a dependency graph as a mandatory component in your risk system was one of the worst technical decisions I've come across (and I've been doing this lark a long time).
Wouldn't an observer pattern work better? The graph itself could even be used to instantiate subscriptions in a pub/sub system where changes in underlying pricing could be dealt with via an event queue. Compaction and debouncing could be applied on top of the queue to avoid lots and lots of redundant execution.
Better as a solution to what problem? In some cases a dependency graph is an excellent solution. In some cases it's not. In some cases it's fine for small graphs but scales poorly as it can be very hard to reason about (as attested by pretty much anyone who's supported a really big spreadsheet).
But that's the point; it's a really useful tool. Sometimes.
I worked on Quartz for a while as a contractor. Hated every second of it. Python version was old (2.4 if I remember correctly when 3.x was already the popular version). But that wasn't it. It was the proprietary version of everything in the stack that got me. Proprietary ide, source control, libs etc.
I noted that none of the others who have been there for years have any transferable skills that can cary them out of banking and into a startup for example. All were good devs but only knew quartz. I can say they were great quartz devs.
The pay was great, but the work was soul crushing.
The reason this was a problem was that it meant investment was needed for each of these things, and as such fell behind.
The IDE fell behind most modern IDEs, presumably because it didn't get the budget for it. The source control/ libs where usually modified versions of existing libs, but now needing to be maintained internally to remain compatible with the mainstream versions (which, again, they did not, and so fell out of compatibility).
> any transferable skills
It puts you in a position of arguing that you are familiar with <some-lib>, just a modified proprietary version of it.. Makes those conversations a bit more difficult..
> All were good devs but only knew quartz
To be fair - this is their own fault. It's difficult providing proof in terms of "what you worked on in your last job"; but there's no reason a professional python dev couldn't become familiar with the popular versions of things on their own, given they are fairly close in functionality. Many of the quartz devs I knew already had backgrounds in Python, attended pycons etc; so knew more than Quartz.
It's not just the "knowing some lib". It's a way of working that is not compatible with the outside world. I had every single character i type having to get director approval. I sopped counting broken things that required human intervention (on rota). The style of programming is... well.. . bankish? I would have had to take a big gamble hiring most of these guys.
Yes, it is their fault, but the organisation didn't even attempt to nurture professional development. Stagnation was a feature, not a bug. Arguably, these guys were paid very well so they would have taken a pay cut anywhere else anyway.
I think this would have depended on the teams/individuals you worked with. I recall some devs being completely clueless about environments - my code (that depends on dozens of other files) in uat is producing different outputs compared to my code in dev, why? Some didn't know how to debug. Many went on to FAANG as senior engineers (including Uber when it was the next big thing), hedge funds (citadel, 2s), startups (twilio, is twitter still a startup?). And as far as python knowledge is concerned - I recall attending a number of python talks given by my former colleagues at Python conferences, various PEP discussions around whether a given PEP would help or be detrimental to qz. As a matter of fact one of the qz core engineers is now also a PSF core engineer. Quartz was polarizing, but there was/is plenty of talent among the engineers.
P.S.: fwiw, when I left baml, the migration to 3.6 was nearing completion, and the migration to 3.7 was in progress. I guess at some point we realized python and qz were not going away, we must to migrate, so infra was built out to make future migrations easier.
The builders/maintainers (AKA the "Quartz Core Team") seemed to have a poor view of the users (AKA the "Line of Business" development teams). Of course you wouldn't catch them saying that openly, but it was sometimes implied during interactions. I will admit, that view was not completely unwarranted.
I second your opinion (interned @ Athena, not Quartz) - compared to my current BigN experience everything was worse by a magnitude: the IDE, the source control, the review mechanism, the job scheduler and so on.
I'd expect with so many devs working on this the DevX will be ironed out
Compared to one major IB bank Python system, this is all extremely clean and neat.
Consider a Python API that is a thin wrapper on COM calls intended to be used from Excel. Want to request some data? Fill in a 2D virtual Excel table. Want to pull some data? Query it and parse a text-dump of a table excerpt (remembering to parse #NA! Etc as nans). Want to automate a job? Enter it as a new row to a global spreadsheet. And for Gods sake, do NOT edit any of the other rows, lest the whole house go down in flames!!!
this is the approach I'm familiar with, but back when I did it in Perl (yes Win32 perl with COM bindings) and Java (we wrote native Java plugins to do COM to Excel). all the important stuff is Excel VBA code that they spent years developing and can never replace, so any front end type of thing had to somehow get back to the Excel models.
We eventually did rewrite the Excel models in Java, released something, and then the whole project probably got cancelled or something, 9/11 happened a few years later and the whole building in which all this code was written had to be demolished.
I remember back in 2000 converting the windows line of business app team at $ISP (I was mostly on the provisioning automation side) over to using a COM component called JendaRex which wrapped the perl VM just to expose the regexp engine.
This basically came about after the Nth time they asked for regexp help and I had a trivial solution that didn't work in whatever native implementation they were using and I basically gave them a choice between JendaRex and "not having me debugging their regexps anymore".
They unanimously chose JendaRex and everybody ended up happier as a result.
I can confirm that, in a non-bank financial institution, date formats involved with Excel->CSV->XML->Certain (shit) magic application were considerable pain point :|
That's why working as the FIRST generation of programmers in any big financial shops is so fun. You get total ownership to whatever you build and others totally rely on you for their job. Even better, you can take months to reply to a requirement, if it doesn't come from a key stakeholder.
> There is an uncharitable view (sometimes expressed internally too) that Minerva as a whole is a grand exercise in NIH syndrome.
My brief experience with this (in an adjacent area - proprietary trading) was that the more charitable view is that these firms need to be able to fully own their software stacks, and have the resources to pay for that luxury.
Reading these descriptions from the article, I can't help drawing a connection to the Smalltalk ecosystem. It sounds like, to at least some extent, what these banks have built is a system that exhibits many of the more interesting characteristics of an enterprise Smalltalk system, only on top of a tech stack that they could own from top to bottom.
Sounds like the Quartz platform at Bank of America. When I interviewed with that team they joyfully espoused the virtues of their Principal Engineer who quote, “created the database software from scratch!”
Edit: For the record, I implement third-party vendor Excel functions (DLLs written in C++) in C# and it’s a great way to send useless processes to the shadow realm.
I'm curious about experiences in other similar orgs:
I work as a portfolio manager for a large reinsurance/insurance company but spend significant time in SQL, Python, Excel (not unexpected I'm sure).
The Wapole platform in the article struck a chord with me. We built something roughly similar - call it Trek - that handles jobs. Jobs encompass lots of tasks - reading/writing from Excel, executing SQL, running Python, running C#. I could list many limitations, but realistically, the biggest limitation is that the platform can't handle something that it can't configure and run. In other words, the platform isn't set up with R - so no creates data pipelines/jobs that use R. Lots of people here use R (among other tools)
One key problem (maybe?) is this all action happens inside the business. Trek was built by a talented actuary/programmer. No software engineering org involvement at all. I'm sure lots of folks here can imagine why: lots of red tape, general adversity to software that isn't already here, long stretches of time to get things done. Also, frankly, lots of our software devs write bad code.
For folks familiar with the orgs in this article, and other similar orgs, is what the article discusses happening mostly with software devs in IT functions? Are these folks embedded in the business? And also are there folks using the more technical bits of the systems that are business-oriented - analysts, investment professionals, etc.
Realize the lines are very blurry these days, but interested to learn from everyone here about the types/roles of end-users
I work in a Technical PM role at a large North American Insurance company and used to work at one of the largest Banks as a Sr. Business Analyst (or Sr.Systems Analyst depending where you are).
>lots of our software devs write bad code
Ultimately, you get what you pay for, all our full stack devs are making 6 figures... While it might not be FAANG money they also almost never work overtime and the stress levels are relatively low.
>Are these folks embedded in the business?
No, thats my job. As a Tech PM I'm supposed to know exactly the business requirements, what my guys can do (to manage expectations) and any limitations of the software/business. I find the best PM's are the ones that have some Dev experience but also have extensive people skills and understand how to manage stakeholders.
>are there folks using the more technical bits of the systems that are business-oriented?
It varies, I started off as that business-oriented person (corporate finance) and eventually made my way over to the Data side of things and finally some programming work and now I'm running projects. While you won't get many analysts/portfolio managers doing dev work I do try and get them to have a hands on approach especially when doing QA and UAT work.
> Ultimately, you get what you pay for, all our full stack devs are making 6 figures... While it might not be FAANG money they also almost never work overtime and the stress levels are relatively low.
Yes, it's the blessing and the curse. They are lovely people. They have lives. They aren't working 24/7. But there is misalignment between senior folks who want to innovate and build internal tech around core IP and the talent level of the folks tasked with actually getting that done. Insurance hardly unique in that regard, but an acute issue nonetheless. Firms like GS, JPM, etc have fatter margins (I think) and can afford to pay devs/strats/etc.
Interesting to hear that you went from corporate finance to technical PM. Quite the journey. Would love to dig more into that if you're willing.
>Firms like GS, JPM, etc have fatter margins (I think) and can afford to pay devs/strats/etc.
From what I've seen on my end the Insurance firms have started paying somewhat of a premium to compensate for the lack of "excitement" that is associated with the insurance industry as a whole.
>Would love to dig more into that if you're willing.
I'd guess that you have proprietary (e.g. valuation / capital calculation) systems that need to interface with Trek in some way. Could you share how you've approached that at all?
Also not clear why R couldn't be added to Trek alongside Python and C#?
Certainly - let me try to share succinct version germane to my day-to-day
- We regularly perform group/segment level risk roll-ups. Involves running computationally expensive (by insurance standards) in-house and third-party models that estimate loss from hurricanes, earthquakes, etc. A lot of our insurance data is unsurprisingly stored in disparate systems that don't talk to each other, and in some cases, don't have any useful interface that someone like me can query/view. Things are changing, but still quite a lot of history to overcome
- We also have lots of stuff from outside underwriting parties in form of Excel, CSV, MDF files.
- We have to bring all that together to make sense of the portfolio, so we use Trek to do a lot of the various involved tasks like running the models, processing CSV data, processing Excel data, attaching databases, creating dashboards in PowerBI (tangent: hate it)
- Sample pipeline: query portfolio data from one DB, read CSV file from another third-party, pull both into risk model, kick off analysis, then execute a script to pull together results in the model's databases or elsewhere.
Happy to expand or answer other q's as you have them.
As for your other comment about R - it's just a matter of the install. Someone has to install R so that Trek can use it. Not a major problem. Pointed this out as a contrast in our org that is probably smaller and has many fewer devs compared to what I'm reading in the post where "bank python" sort of feels like the platform in which everything happens / everything is configured.
Thanks for a really comprehensive reply. Enjoyed your comment on PowerBI.
My background is mainly life which is dominated (at least in Europe) by computationally demanding proprietary liability modelling systems but I think Python / R is getting a foothold in capital calculation / aggregation.
My perception that there is a lot more use of in-house models in the GI / property-casualty worlds so more Python etc but sounds like you still have to interface with proprietary modelling systems.
Absolutely - and for quite a long time (I work mainly property).
There's not much (if any) appetite to completely rebuild 3rd-party geophysical vendor models. Those folks have 20+ years of work behind them and a different talent base (e.g. different types of scientists building the base models).
But we do focus on all the other stuff. Making data input easier/more accurate. Same thing re: output. Also the vast majority of our capital and group-level risk work happens in-house - R, Python, etc.
>read CSV file from another third-party
Generally, if you're getting a CSV file from a client that indicates to me that theres a good chance they're exporting info from their systems and sending you the CSV.
Have you considered developing a client facing API that can be used to send and digest data?
We have, but that's not something my team (actuarial) can accomplish without help/blessing/oversight from IT.
I don't know all the details but we do have connections to some of our MGAs through XML dumps or perhaps real-time feeds (I'm doubtful). But that data is often missing some of the details as I need. It's useful for policy admin - not for all the other stuff.
We've also explored portals but those are fraught with concerns about double-entry.
Title should be ".. of investment bank python", trading and risk has little in common with a retail digital bank like say N26.
The problem with these projects is that the folks leading them have never built a real trading system in entire their lives (the ones who have been there for many years worked with end-of-day batch systems) and there is a layer of useless and incompetent "business analysts" who hide behind their incompetence by finding ways to malign developers..
Pro tip: Dont work for a bank before assessing its open source repos. They have none? Run in the opposite direction.
> In order to deploy your app outside of Minerva you now need to know something about k8s, or Cloud Formation, or Terraform. This is a skillset so distinct from that of a normal programmer (let alone a financial modeller) that there is no overlap.
This rang a bell. How did deployment become such an arcane skill?
One of them is you don't own where your code runs anymore.
This might scare some people until they realize they can get high availability without waiting 3 months for some server to arrive, but it makes deployment harder.
I still believe the main reason people adopt CI/CD today is that "suddenly" deployment in a complex environment becomes easy and software gets tested. A lot.
> One of them is you don't own where your code runs anymore.
That's not new. The first money I earnt from computers was making websites. We didn't own the webservers; we rented webspace from some company. To deploy it, we literally just uploaded PHP files to a place using an FTP client. It was that simple and it worked.
1. Sysadmins had to find new careers after cloud providers destroyed their livelihood.
2. cloud providers try very hard to lock you in, by offering all sorts of advanced goodies. They tend to come with a learning curve, and they all accumulate. Sooner or later someone comes up with cross-provider solutions, and they too have learning curves.
3. inventing new ecosystems means creating new work for advocates and ninjas. You don't become a rockstar by diligently doing what has been done before, but by finding (or inventing) a niche and becoming a guru.
4. some problems are indeed hard to solve, and the more products try to do that, they more they get complex.
5. everyone thinks they will have Facebook-scale problems, even when they never will.
Prior to industry wide "solutions" such as Docker, or rather linux namespaces and cgroups, there was no obvious process isolation, so "deployment" was copying tarballs or using Windows installers.
Also, investment banks want "support" from vendors, so hardware was either Windows servers (mostly for Exchange and AD), or Sun Solaris boxes.
So although linux cgroups came around 2006 (?) and namespaces in 2001 (?), banks didn't do too much with linux until after 2005 (when Redhat were providing the aforementioned 'support'). I don't think the 'industry' widely recognised the potential of cgroups and namespaces.
I can report from another bank (in the top 10 globally), that recently moved from a more bespoke system (not even on Python) to having Python+Notebooks+Labs available to all - using Apache products and a global Anaconda-like Python distribution. The fact that you can use the Python, R or whatever programming language seems to be a factor.
would you mind telling, which Apache products. I've been thinking about pushing that to around my organization (mostly for replacing reporting/...) but especially the I/O-interfaces are not that great if you are not living in a central database yet.
I imagine something container-like with the notebook/batch-job (can be anything really), hooking up to datasources such as SMB-shares, thus allowing people who want to automate generating report Z to just request access to folder X for their job and thus be able to seamlessly create dashboards/... even if a lot of the org still is using "traditional" workflows.
> This kind of Big Enterprise technology however takes away that basic agency of those Excel users, who no longer understand the business process they run and now has to negotiate with ludicrous technology dweebs for each software change. The previous pliability of the spreadsheets has been completely lost.
> Financiers are able to learn Python, and while they may never be amazing at it they can contribute to a much higher level and even make their own changes and get them deployed.
Coming from a slightly different part of the finance world (insurance) this rang very true.
I think there is a huge opportunity here to build on the Python ecosystem - which is gaining more and more ground - and provide much more powerful alternatives to Excel and legacy proprietary systems.
Also in insurance. I'm a big fan of using python to generate "read-only" pretty formatted workbooks. It makes the process more reproducible but people who "need my data in excel" still get that.
How are the Barbara databases synchronized - as multiple nodes are mentioned ? The description makes it sound like it's just a large set of pickles in something like a Berkley DB?
Each server in a ring has a complete copy of the data for that ring. Each ring consists of a network of servers which may have nodes in different geographies. They're called rings, but are actually acyclic networks (IIRC).
Replication occurs automatically so you need to manage consistency in your app architecture. For example if you have instances of an app running in different geographies, the specific data for those instances should be in different folders.
Always funny to see the objections of new hires without finance experience to the use of floating point for pricing. It’s more related to the inherent inaccuracy of any pricing model though, rather than clients not caring about pennies.
Exactly. Floating point is inappropriate for anything related to accounting (payments, balances, etc.), but when the numbers you're producing are effectively forecasts or estimates, it's no different than what floats were originally invented for (numerical modelling in physics and engineering).
I remember someone demanding that we needed to run our Monte Carlo pricer on 1024 paths, that 256 just wasn’t precise enough and one of the risk guys said “Well since we know our assumptions are wrong I’m not sure what difference it makes.”
Sure we add up the numbers, but there are thresholds for everything anyway. We might not be concerned about something within a 1MM range let alone a floating point inaccuracy. The uncertainty is accounted for already.
This seems baroque but I quite like the idea, seems similar in spirit to having a shared LISP environment where you can live code and snapshot images with minimal hassle. I also see as an improvement over the usual Excel mess, in particular there’s some version control and automatic propagation of changes.
Really interesting read. From what's described: Walpole (distributed job runner), Dagger (DAG that recalculates when dependencies change), Barbara (global key value store) and monorepo/fast deployment, its not so different from some big tech companies.
Completely off topic - but I love the aesthetic of the post. "Vanilla HTML" is a design that isn't used enough. It's something I tried to apply to my personal blog, but I think it's been done much better here.
I worked on modeling/mapping market risk schema in Quartz a few years ago and used to wonder why they were "customizing" open source software/systems in house, when they can as well as supported those initiatives directly and publicly. As a C++ dev,I had already realized the world of software tooling had passed by me, but still used to wonder at the (over)engineering of everything in quartz and involved skills that were not transferable.
My view now is the value of these in-house systems is essentially a cost efficiency play on run costs, and there there is very little revenue/growth opportunities for the business from these investments. With Volcker (really the best thing to happen in the '10s) and loss of prop trading means all the market makers live off the spread, and so while there is value to minimize operational costs, they are not worth the investments that have been made.
Bottom line for me is investments within large firms in capital markets are unlikely to generate revenue/profits in scale - I am sure there are some exceptions and would like to know
There's one smalltalk vendor whose main product is an object-oriented database that is also a smalltalk instance (and you use other smalltalks as IDEs to it).
I had the exact same thought! A custom IDE with all the source stored in a database is extremely smalltalk, although here the source is stored in a shared DB (it seems), rather than a per-user DB as it is with a regular smalltalk image.
Back in the day, Morgan Stanley & JPMorgan & …, used ENVY/Developer — fine-grained Smalltalk config-map / sub-application / class / method configuration & version control in a central database.
Can anyone confirm whether JP Morgan were able to decommission Kapital when they went to Athena?
I've seen so many cases in banks where the old system is still running years and years after it was 'replaced'. And Kapital was used in so many, different, parts of the business.
There's a lot here in common with the higher-prestige systems operated by major tech companies. Giant custom monorepos, sometimes with custom IDEs built into them. Big proprietary services for running asynchronous jobs and collecting logs and everything else. Data-driven frameworks for spinning up new services. Bespoke databases. All of it rings a bell.
The one thing in there that really jumped out at me as "oh my god never ever do that" was using pickle for serious persisted data. I can see the upside around letting people who aren't dedicated programmers avoid thinking about serialization but...daaamn. My understanding is that this locks you into a "the API is all the code" situation where you can never change the data layout of any class once you've written it?
It’s not just straight pickle (usually) - there’s a layer in between that allows at the very least for the handling of deprecated fields/new fields/renames etc.
But - after a change - you have to choose either to leave that ‘backward compatibility’ in-place (essentially forever), or put together a job to run (on that scheduler! Hah) to go re-write in your new format. If you care enough, you might - and then you can remove your logic to handle the old names/shapes.
The charm in a lot of it is in its simplicity. It doesn’t claim to very smart, but people get it - and are often remarkably productive.
Reminds me of when I interviewed for an internship at Fidessa in London back in 2011-ish. I remember the team lead talking about an in-house programming language they used called FidessaC which used a mixture of C and SQL syntax.
Seems like a lot of the banking world like to invent their own tech stacks.
That's actually an unfortunate side-effect of banks having weird requirements. When I was at GS we had this enormous source repo built on CVS. So we made improvements to CVS to try to make this more manageable . For example because branching in CVS absolutely sucks we had to use tags (rather than branches) to identify releases. This meant you end up with lots of tags and when you look at them it's really hard to find (visually) whether code has a particular tag. So we patched CVS to sort the tags alphabetically. We tried to upstream this but the CVS devs didn't want to know. So we had to maintain it.
Likewise a bunch of fixes to the timezone handling code that iirc glibc simply wouldn't upstream so we had to maintain even though they were bugfixes.
We did used to upstream everything we could and I think the situation is improving.
Most licenses just require your users to have access to the source code. As all the users are bank employees, this is usually easily achieved. If the license is violated it's only by accidental oversight.
Pretty much everything described is a Python library not a change in the Python interpreter so can be under a proprietary license.
As I understand it from a legal point of view the user in this case is the bank, not individual employees running it on the bank's behalf, and the bank already has the code so it's a non-issue.
I know some people think this is contrary to the spirit of open source, but it isn't. One of the goals of open source is so that users can customise the code to their specific use case, with no obligation to share. That's all the banks are doing. They have the same rights as any other user.
This. Even RMS has said many times that a company is a single user/owner of said code, and it doesn't matter who works on it as long as it doesn't leave the company. It's all explained in the GPL but the gist is, if the company only uses it internally/doesn't try to sell the code, they can do whatever TF they want.
Quite well. You are under no obligation to commit back your changes under any OSI license. The old Sun CDDL required it, and was denied OSI "open sourceness" as a result.
As others have mentioned, it's fine, even with GPL, as the licences only really kick in when they try to distribute the software. They are only really hurting themselves. When starting a private fork you force yourself to maintain it alone. That means either letting it rot (ie. it becomes insecure and obsolete with no new libraries supporting it) or keeping up with the mainstream yourself. Either way it's a lot of work that wouldn't be necessary if they upstreamed their changes. Maybe one day they'll get the message. You'd think that long-term investors would understand this concept better.
Are they distributing/selling the code? If the answer is no (and it pretty much always is) then there's no implications. Nothing in the GPL says you can't have a web front end to gather info that's then processed by your modified GPL code (which never leaves your possession) and the results spat back out to that web front-end.
Well, with GPL particularly but open source licenses more generally, the user is allowed to do whatever they want with the code. It is only when the code is redistributed that the source must be provided. With AGPL the source must be provided also to anyone using a service running AGPL licensed code.
So it plays perfectly with the licenses. This is the sort of thing free software was designed for, allowing everyone that uses a codebase to own it 100%.
Before Open Source, you hired a company that "did" software/computers. You knew jack shit about it and couldn't do anything about it other than "run commands".
After Open Source, you still did that, but you also occasionally hired people to download and cobble stuff together to save money, and maybe one or two people to write code to help cobble together stuff.
Over time this evolved into managing entire "technology divisions" of people writing code and cobbling together stuff, to manage larger and larger internal projects, to support teams, to build components, to eventually be used by one internal product used by a customer. 50 different teams to build components, and 1 team actually servicing a customer. And each component built is exactly the same as components built at other companies. Sometimes even exactly the same as other components in the same company.
Nowadays a Big Bank might produce more software than Facebook, and none of it ever escapes into the world as Open Source. Whole oceans of software are birthed, live and die in the shadows. Millions of lines of code that live for half a decade. Constantly manufacturing their own hammers because they believe theirs will work better than an existing hammer, or because they're too lazy to learn how to use an existing hammer. And never sharing their custom-built hammers with the rest of the world. All because some clueless executives believe this solves their business case better than buying something off the shelf and making it work for their business case.
Somehow reading this article really made me think of lisp and some old jane street and k lang articles here on hn over the years. I wonder if it’s the composability or the centralization?
I'm getting serious MUMPS flashbacks with a critical hospital-wide database accessible through only one programming language with absolutely no access control or rollback.
I worked for a hedge fund that build their own database on top of MongoDB. Data was serialised and stored as binary blobs. This bonkers implementation took away any advantage of using MongoDB. Unfortunately this system had a lot of political backing, and rather than addressing the short comings we were told to simply datasets to ensure they could be stored in this bespoke database. I suspect a lot of trading signals were lost this way.
> I suspect a lot of trading signals were lost this way.
If you can prove & show evidence about this, I'm sure they will (grudgingly) accept the need to improve.
Financial institutions are mostly risk-averse, "if it works, don't fucking fiddle with it!" They need something that will impact the bottom line in an (almost) direct way.
I remember working with a financial institution back in 2010. I pushed for virtualization by presenting scenarios where the main servers are impacted, we have no warm backup servers, and calculate the impact to bottom line using known MTTR (Mean Time To Recovery) values of similar scenarios (gathered from incidents all over the world).
Took several back-and-forth meetings with the BoD, but in the end they accepted the need to improve and allocated the funding + greenlight the project.
(That was also back in the day when Microsoft had just released Hyper-V, and when we asked Microsoft to join in the project, I talked to their head engineer, and they respectfully declined because "their internal testing shows that Hyper-V, at the moment, is unable to fulfill the required parameters". Ended up with XenServer.)
It is very hard to prove you have lost trading signal. You would have to redo your backtesting, and would politically be fraud (high risk of getting sacked). Often trading systems will perform worse in the real world anyway compared to the simulations, so it generally can only be proven if there is a politically willingness to do so.
> Investment banks have a one-way approach to open source software: (some of) it can come in, but none of it can go out
I'd say thats how they see society and their role in it more generally. Doing God's work is a surprisingly directed graph. It applies also to the broader banking world, but investment banks being the most lucrative (when not bailed out) attract talented people that in principle can close the loop and return something important back (much more so than the "sleepy" commercial bank or credit union).
It would interesting if the above (potentially biased) view could be backed up by computing an open source leech ratio per industry sector. The amount of open source code used versus the amount contributed.
NB: a high leech ratio does not necessarily make you the worst offender. If your business model is evil no amount of open source contribution will wash it out
Not entirely true. Two things to consider:
1. Public sentiment. When Goldman Sachs open sourced their collections library on GitHub, it gained marginal traction (opinion: seemed more about PR to attract tech talent). When it was adopted by the Eclipse Foundation, usage rose by a non-trivial amount (based on usage stats from mvnrepository that aren’t other eclipse projects).
2. There was a massive hiring frenzy, and their due diligence regarding IP was non-existent. Garden leave doesn’t compensate for ‘strategic’ systems. Apart from “competitive advantage”, when you have someone as tenured as he who shall not be named, you mitigate the risk of being sued by not making it obvious you’re cloning a system developed for another firm.
Bulge brackets are more risk avoidant than smaller firms, like hedge funds. Today, we have LMAX disruptor and Real Logic’s Aeron (basis for Akka Remote) due to their liberal policy towards open source.
I have been a developer for eight years and yet I still get shocked about the places where Python will be used. I mean, it is my favorite language, but in the communities I gravitate towards (basically communities like HN) it has so many detractors for being dynamically typed, not being functional enough, being slow, etc, that sometimes I'm tempted to think maybe it's actually a guilty pleasure of mine, and that I should look for better pastures.
Then I read articles like this and I remember why I like it: it gets the job done, and quickly (for the developers at least). It's why it's so widely used and keeps climbing. Of course, nothing wrong with learning other languages and I do try to keep up, but Python will remain my go-to for the time being.
People who make complaints like that are privileging their own personal aesthetics over pragmatism.
Same mistake as the people who keep talking about perl being "dead" while they're deploying their production platforms on debian or red hat based systems and ignoring the fact that the packaging and release QA work for those distros is substantially dependent on - actively maintained by the distros in question - perl projects.
Sounds like someone else is putting their personal aesthetics over pragmatism. Perl is a dead language walking, the fact that there are some tools it hasn't been worth rewriting doesn't contradict that.
I'm talking about actively chosen new development because it's still the dynamic language most oriented towards being comfortable as part of a unix environment rather than simply running on top of one.
Modern async/await + heavily OO based perl is not, I suspect, the language that you're thinking of when you made your comment.
This storage of code and data in the same oddball data structure and the "walled ecosystem" reminds me of forth. Forth had great access to its hardware, but it's "screens" and file structures were all unique to it... And extremely performance.
Lots of "table-driven" systems live in bankland, and this python system sounds like the natural evolution of this...
By virtue of Turing completeness there's nothing you can do in Excel that you can't do in a program. It's all a matter of speed.
Having seen Excel wizards work their magic before, the dizzying ways they can slice and dice their data with the help of a combination of GUI affordances, formulas, and hot keys is truly astounding. Often times a person could build out a full set of data and charts in half an hour that might be something like > 100 lines of equivalent Python/Pandas code.
And crucially often the report would have less bugs than the equivalent code because the analyst could see all the data in front of them as they were manipulating it and would naturally do spot checks along the way.
Now note the "some" in that statement. A Python/Pandas master could also probably whip up the equivalent > 100 lines in half an hour. But it was really astounding just how fast Excel experts worked.
Yep. The peculiarities that make this Excel workflow unreasonably effective are pretty easy to identify:
1. Tabular data. There's some tricks with named ranges etc, but for the most part your entire application state is spread out before you, scrollable in every direction. It's just tables, and clicking into a cell highlights relationships (data dependency) between cells.
2. Visible data, hidden code. =macros are hidden behind their result; the most obvious thing is to treat them as tiny black boxes, applying a single data transformation (or small set of logically related transforms), and immediately see the result applied to your data set. This is a tiny bit like functional programming, and a tiny bit like Pandas or Spark (immutable data, lazy evaluations). Except unlike those worlds, Excel pushes the data front and center, not the code.
And prob a bunch more I'm forgetting. It doesn't even feel like programming, more like building data pipelines in Unix or something. Except you can easily preview the data at each step in the pipeline. What I really want is Excel or Siag, but with Python and SQLite and a nice spreadsheet UI.
You've built this brilliant report because you're an excel wiz, but because of that you've gotten someone up the chain's attention and you need to do it every day/multiple times per day, and automating out all your shortcuts, hotkey and ui clicks with macros becomes a horrifying cludge that had you invested in something more automation oriented earlier would produce more resilient/repeatable solutions.
> Often times a person could build out a full set of data and charts in half an hour that might be something like > 100 lines of equivalent Python/Pandas code.
But only ten lines of R! (Excel is kinda awesome though).
WYSIWYG, in every sense of the expression. There's no written abstraction. You just see the data (and depending on how things are structured, sometimes the intermediate steps getting that data from point A to point B).
With code, you type the abstractions while imagining the data in your head -- and then check to see if the final result is the right one. The average user will usually litter the code with debug print statements to help understanding what's going on live
With Excel, you live in the runtime and you stare at the results of your "code" all of the time. The abstractions and the links between steps of the process either live in your head (because you know how that financial model works) or are buried in formulas in the most asynchronous of layouts (cell A1 may be the result of a calculation in cell Z99, for instance)
Because it was unapproved technology I had a stream of people from technology coming to my desk to say I shouldn't have done it. I redirected them to Armen (who was a very important dude that everyone was frightened of).
The core engineers from GS went on to build the Athena system at JP and the Quartz platform at BAML.
//Edit for grammar