Hacker News new | past | comments | ask | show | jobs | submit login
An oral history of Bank Python (calpaterson.com)
864 points by todsacerdoti 23 days ago | hide | past | favorite | 325 comments



I was the person who first deployed Python at Goldman Sachs. At the time it was an "unapproved technology" but the partner in charge of my division called me and said (This is literally word for word because partners didn't call me every day so I remember)

    Err....hi Sean, It's Armen.  Uhh.... So I heard you like python.... well if someone was to uhh.... install python on the train... they probably wouldn't be fired.  Ok bye.
"Installing python on the train" meant pushing it out to all the computers globally in the securities division that did risk and pricing. Within 30mins every computer in goldman sachs's securities division had python and I was the guy responsible for keeping the canonical python distribution up to date with the right set of modules etc on Linux, Solaris and Windows.

Because it was unapproved technology I had a stream of people from technology coming to my desk to say I shouldn't have done it. I redirected them to Armen (who was a very important dude that everyone was frightened of).

The core engineers from GS went on to build the Athena system at JP and the Quartz platform at BAML.

//Edit for grammar


Armen was very passionate about the value of "strats" (GS's own term for "quants", and later broadened to include software engineers and data scientists).

A favorite quip of his:

  At GS, I'm like an arms dealer. When a desk has a problem, I send in the strats, and they blow away all the competition!

Also, SecDB's core idea is not just tight integration between the backend and development environment, but that all objects ("Security" in SecDB lingo) were functionally reactive.

For example, you would write a pure function that defines:

  Price(Security) = F(stock price, interest rate, ...)
When the market inputs changed, Price(Security) would automatically update (the framework handled the fiddly bits of caching intermediate values for you, so even an expensive Price function is not problematic).

This is loosely the same idea that drives React, ObservableHQ, Kafka, and other event-streaming architectures, but I first encountered this ~15 years ago at a bank.


It's as old as VisiCalc, it's how spreadsheets work.

I built a similarly reactive system for web UI binding back in 2004, running binding expressions on the back end with cached UI state to compute the minimal update to return to the front end, in the form of attributes to set on named elements.


Yes, although doing it in distributed fashion at the scale of SecDB or Athena introduces quite a few more complexities.


This immediately struck me when I was reading this article.

To be honest, this whole paradigm seems absurdly fucking efficient for the developers. But I wonder about stuff like

* What happens if the data model needs to change? If you need to move something from db["some/path"]?

* How is it coordinated at a larger scale, how does everyone know what is running and how it interacts with everything else - can you figure out what depends on an object? What if the data used by your Price(Security) object changes and breaks it?


> What happens if the data model needs to change?

You write conversions and there's a registry where you register them to be picked up by the unpickler. If necessary you can also customize the logic that determines which version a given pickled datum uses to deserialize. There aren't so many guardrails when you're writing that stuff, but the infrastructure does its best to support you.

> If you need to move something from db["some/path"]?

There's support for both symlinks (db["some/path"] -> db["other/path"]) and for a kind of hardlink by making both paths point to the same inode-line id. You can usually find a way to do what you need to.

> How is it coordinated at a larger scale, how does everyone know what is running and how it interacts with everything else - can you figure out what depends on an object? What if the data used by your Price(Security) object changes and breaks it?

There's a common model for the things that are shared, and that has a versioning and release/deprecation cycle. Otherwise every type has an owner and you probably had to request their permissions to read their data, so you should have a channel of communication with them. But yeah people do rely on the fundamental business entities not changing too quickly, and things do break when changes are made.


There's also a graph debugger that allows you to step through the dependency graph node-by-node across the various globally distributed databases.


True but not really helpful for this problem, because it can only tell you about the job you're debugging, whereas what you want to know is what code might ever depend on that data.


Thanks! Interesting stuff.


> This is loosely the same idea that drives React, ObservableHQ, Kafka, and other event-streaming architectures, but I first encountered this ~15 years ago at a bank.

See also the "observer pattern" [0]. It's a fun exercise to implement a reactive system in Python using the descriptor protocol [1]. IPython's traitlets library is an example of this in the wild [2].

[0]: https://en.wikipedia.org/wiki/Observer_pattern

[1]: https://docs.python.org/3/howto/descriptor.html

[2]: https://github.com/ipython/traitlets


Thanks for the links. eye opener for me, despite using python for serveral years.


Thank you so much for this context! When I started at Lehman in 2007 and for years through the i-banks & hedge-funds, 'Sec-db' was the North Star for so much me and my friends built. It's amazing to hear from the folk who brought that to life!

Took many more years for us to understand that we had learned more about banking and making money than our masters had learned about the potential uses of the platforms we had built.

We started Sandbox Banking. Many of our friends are at hedge-funds :-(. What's the career paths of those that first built sec-db?


I worked on Quartz for 3 years and loved it. Some devs grumbled about various aspects of it, but I come from an application support background and taught myself python, so I suppose I had fewer developer habits to un-learn.

From what I understand, all this started with SecDB at Goldman, which was a the prototype for all these systems but wasn't Python based. The lore is that SecDB was instrumental in Goldman being able to rapidly figure out what their exposure was during the 2008 crisis.

Some of that team, lead by Kirat Singh went on to start Athena at JP Morgan and then Quartz. I met Kirat once, he was considered a bit of a rock star in the bank tech community. He now runs Beacon, which is basically Bank Python as a service.


I work at Beacon.io and it's an awesome place to be. Kirat is indeed a rockstar and it's awesome to work with an CEO that knows great code. We also landed a Series C last month and we're growing :)

https://www.crunchbase.com/funding_round/beacon-series-c--60...

We've also got a bunch of positions open too for those that are interested in joining!

https://www.beacon.io/careers/


As a full stack quant dev who is struggling to go all-in on Beacon: Do people generally like to use Glint for complex applications? Are benefits of Beacon lost when interfacing from e.g. Angular? I'm afraid that professional frontend devs might be unwilling to work with a proprietary framework, but that's speculation from my side.


Glint is an integrated framework with the platform but it does not limit you to just using the framework. The platform is designed to be as extensible as possible, worry not about being locked in :)


I took a quick look, it seems like all the postings are London or New York. What's the feeling internally about remote hires? I'm assuming that's still out of fashion in finance and Beacon feels the same?


Because of Covid I've been remote since last year. It's still an evolving situation. But things have worked out pretty well.

Our CEO was also remote during that time too and here's him giving a webinar from a cabin :)

https://www.youtube.com/watch?v=fXPDXbrPdxI&ab_channel=Beaco....


s/rockstar/primadonna

Build complex system, when things starts to get messy move somewhere else. Rinse, repeat.


I know Kirat really well. Fun fact, one two-week dev cycle we had 667 distinct developers commit to the secdb code base which Kirat's boss described to me as "The number of the beast.... plus Kirat"

Second fun thing. Kirat was advocating for lisp for secdb for a long time and used to rag on me for liking python when it's so slow.


Interesting to be reading all this about SecDB. About 15 years ago I was offered a job working on SecDB (I forget exactly what the position was now). It and Slang sounded really interesting.

I do sometimes regret not taking the job because the people there were wickedly sharp and the tech sounded great, but in hindsight I'm not sure I would have thrived in a bank long term. I did a 3 month internship at Lehman's which I enjoyed, but I don't think I'd have suited a career in it. One thing I did get out of it was a total lack of fear around job interviews, if I could survive the 14 hours of interviews at GS and come out with an offer, then I can handle pretty much any recuitment process :)


It's amazing how a few people have left such a big mark on a part of the investment banking industry. I missed Kirat right before exiting BAML but met all his "disciples" and Dubno..including his miniature dinosaur and telescope in his office :). Very much felt like tech religion where no open debate on merits and drawbacks could be discussed. And a lot has changed in terms of engineering innovation with turnover since that era...


Why is the number of people who "left a big mark" so small?

1. An organization/industry can adopt each new technology only once. New technologies arise infrequently. Each time they arise, only a few people get to work on the projects introducing them. In other words, opportunities to leave a big mark are limited.

2. Credit for innovation is political capital. People hoard political capital and become powerful. They act as gatekeepers of innovation and take the credit for successful projects.


Had a lot of fun pulling things out of MetaDir and recreating what seemed like "early history" of SecDB when I was there, was a lot of fun :D


the spirit of dubno lives on...


But does he? There seem to be a lot of people in the know on this thread. He seems to have disappeared after leaving BofA.


deep sea diving was last i heard: https://www.youtube.com/watch?v=KKvJrCvIvOc and before that he was featured in a wsj article: https://www.wsj.com/articles/goldman-sachs-has-started-givin...


> From what I understand, all this started with SecDB at Goldman, which was a the prototype for all these systems but wasn't Python based.

Correct. SecDB has its own language, S-lang, which you could probably best describe as a "lispy Smalltalk, with a hefty sprinkling of Frankenstein". The concept of SecDB was so good that other large banks wanted to get their own. Athena and Quartz have been mentioned several times in this thread, by people far more knowledgeable than I could ever be.

It's not just banking, I know of at least one large pension/insurance company who are building their own version of SecDB, with direct lines to GS. (They don't try to hide it, btw: the company is Rothesay Life.) The last time I talked with them, they were looking for Rust devs.

Disclosure: I work at Beacon.


> From what I understand, all this started with SecDB at Goldman, which was a the prototype for all these systems but wasn't Python based. The lore is that SecDB was instrumental in Goldman being able to rapidly figure out what their exposure was during the 2008 crisis.

Correct. We used python for a bunch of infrastructure stuff (eg distributing all of secdb to all of the places it needed to go). The actual pricing and risk was written in Slang with a lot of guis that were "written" in slang but actually that caused autogeneration of JIT bytecode that was executed by a JVM. Most of the heavy lifting behind the scenes was C++. So a bit of everything.


My grandpappy always told me to cut out the middleman. Modern C++ was heavily influenced by the need to make it simple to use directly. If you are in the business of writing code instead of reminiscing, you can now leverage move semantics, lambdas, and smarter pointers to create software that is close to the silicon. Python might be great, but it sure is slow. Its success is founded on smart people making it easier for not so smart people to call C++ that does the heavy lifting.


A big force multiplier in the old GS secdb model was simply the speed of the dev cycle vs speed of the code. As a strat you could push slang changes to pricing and risk literally in minutes with full testing, backout, object audit logging etc.

C++ changes went out in a 2 week release cycle so changes were still fast by most standards but much slower. But yeah we had 20m + lines of C++ code so it was extensively used.


For some context (as an ex-Goldman employee myself), "Armen" in the quote is most probably https://www.goldmansachs.com/insights/outlook/bios/armen-ava... , who has quite a legendary reputation within the firm for the work he's done. He was also one of the first to be hired as a "strat", which used to be how Goldman referred to its quants who sat between front office and tech systems and worked with both sides.


SecDB/Slang originated around 1992 at the commodity trading shop J Aron, which GS had bought 1981. Later (end of the 90s) there was a push to extend it to the rest of the firm, first fixed income, then equities. Armen flew the whole world-wide strat team to NY and gave a presentation, and to drive the point home, he played a clip from StarTrek: "You will be assimilated. Resistance is futile!"


> Prior to joining the firm, Armen was a member of the technical staff at Bell Laboratories in Murray Hill.

Wow. Bell Labs too.


I worked on Athena at JPMorgan for 8 years, and loved it.

Seeing Python at the core of trading, risk and post-trade processing for Commodities, FX, Credit etc was such a great developer experience.

By the time I left JPM, there were 4500 devs globally making 20k commits weekly into the Athena codebase. (I did a PyData presentation on this [1] for more details).

The one downside was the delayed transition from Py2.7 to 3; I left just as that was getting underway.

[1] https://www.youtube.com/watch?v=ZYD9yyMh9Hk The


That's funny they mentioned replayable financial message queues. Those are a hit here


There are a number of workplaces where I'd have been willing to rely on "probably wouldn't be fired", but a bank is definitely not one of them. Congratulations on shipping something useful in the face of that risk and uncertainty.


The trading community walk on a knife edge all the time, it's not a place for the faint of heart. I used to support derivatives trading systems and a few times there were issues that meant they'd lost control of orders on an exchange. Scary stuff. It requires a crazy mixture of careful, deliberate, calculated risk control on the one hand; but once you commit to something you jump in with both feet and throw everything into it.

You need to be both meticulously risk averse, and also willing to do whatever needs to be done when it needs doing, and accept responsibility. It was great!


I worked at UBS on the G10 FX options desks in Stamford and Singapore. Remember being very surprised by how my interview and training were incredibly stressful. It was very intentional as well, where my trainer knew the exact reactions he was creating with his behavior.

It was only a couple of weeks in, when I had to react within 60 seconds (USDJPY option expiry, NYC cut) on a position our front book would have lost MM on; with senior sales MDs screaming at me as well. Lo and behold, I was just used to it and could focus and execute based on my training.

My wife calls my thinking on the training - Stockholm syndrome. I still believe those skills were incredibly valuable for me, just perhaps delivered in a more 2021 acceptable approach.


An excellent description.


> but a bank is definitely not one of them

Investment banks are basically risk-management shops. The partner made an assessment and evaluated the potential benefits as higher than risks. Note the word "probably".


Also worth mentioning that "unapproved software" on bank infrastructure is what an aggressive prosecutor would call "felony bank hacking".


Almost definitely false. Provided you weren't doing anything intentionally malicious with it, the risk would be that regulators might fine the bank for inadequate controls. As such, the bank might fire you for doing something that could lead to such a situation, but I don't see a criminal charge. There was actually quite a decent bit of "unapproved" software in use at one of the banks I worked in - mostly stuff that was in the process for approval, but that could take forever, so it was reasonably common for teams to run through the checks themselves (security scan, license review, etc) and move forward while the official review confirmed no issues.


Well the login message I was greeted with on every ssh connection certainly threatened criminal prosecution for unapproved software at the extremely large bank I worked at.

Unlikely? Sure. But a lawyer somewhere thought it was worth reminding me 10x/day, so going to assume it's possible provided your unauthorized software caused a serious monetary loss.


Investment banks at that time were a little bit different.


I joined the BAML grad scheme 10 or so years ago. We had a presentation from one of the Quartz guys and someone asked how they’d manage upgrading the version of Python. They were using something like 2.6.5. The whole move to 3.x was a thing. The Quartz guy just flat out said they wouldn’t upgrade.

Seemed crazy to a new grad back then, but now I wouldn’t want to consider it either.

Thanks for your contribution! It was amazing that even in my role where I didn’t use Quartz, I could see and search all the code. Felt quite novel back then.


BAML(Quartz) and JPM(Athena) both had Python 3 migrations well underway as of PyCon UK 2019.

It took me more than half way down the article to identify I have not worked at the same bank as the author...


A friend I've known since I first started on Wall Street now rides herd on the BofA Quartz libraries. One component of his job is to make developers aware of existing libraries they can use to solve business problems instead of reinventing the wheel. His theory of why they always have excuses not to do that is that they have no training in software development. They are still at the point in their learning where the are just excited that they can press keys on a computer and get it to do things they are barely able to understand.


Hello Sean,

I remember Slang when I first saw the code, a parse tree based evaluator in 1997. Come on folks. Separate parsing from evaluation. Opaque types with message passing. Inference anyone? Clearly no one read hindly milner.

Add parse time optimizations, add locals, hey globals and locals weren't handled properly. Python in the 90s anyone?

Shitty KV store with 32K object limits related to some random OS2 btree limits. Add huge pages.

Deal with random rubbish related to inconsistent non transactional indices.

Figure out you should layer nodes in a dag. A dag is topologically 2 dimensions, fairly limiting.

Figure out that's somewhat similar to layering databases, it's just another dimension.

Hmm bottom up evaluators, you actually need top down as well, create a turing complete specification. well, limit it a bit.

Ah KLL points out that layers on top of dimensions end up being combinatorial, but you can actually cache the meta traversal and it's small n.

Lots of people point out category theory parallels. Haskell is pretty but completely unusable. I'm a math guy, and it's still unusable, I don't like feeling smart when I do simple things.

But interestingly creating imperative functions with pure input/outputs with implied function calls is pretty interesting. You can create an OOP paradigm with self and args as known roots aka linda tuple spaces.

Ah and each tuple space point can be scheduled independently, some issues with serialization and locality...

Go to another bank and choose to use python, foolishly decide to rely on pickle. Do that twice. Bad idea.

But write a much better petabyte scale multi-master optimistic locking database with 4 operations. Insert, Upsert, Update, delete. WAL as a first class API.

Finally decide that writing a coding scheme to convert python objects to json is not really hard. And of course cloud native and cloud agnostic is really the only way to go nowadays.

I'm always confused why people complain about Athena/Quartz, hell we wrote it all, fix it if you don't like it. Open source it if you want other people to contribute. If we made stupid decisions on pickling data, well there's a version id, add json serialization, it's not hard, don't take things as given.


> foolishly decide to rely on pickle.

This scared me a bit TBH. It’s one of those decisions that come back to bite us repeatedly.


Oh wow, I remember Quartz at BAML… Though this was several years after initial deployment and when core devs left.

One day I will sit down and write a small poem about the insanity of software development based on my experience with Quartz. It will be an intriguing story of love and hate being told through a sensual dance between sales and engineering. The battles will not be epic but the consequences of one’s actions will be far reaching.

It was indeed and experience worth having.


Tell me more about what worked and didn't..I recall the pain of watching QzDesktop load and Bob/HUGS jobs failing..but what else, and what did you enjoy


All great stories use mundane circumstances in life to convey much deeper and abstract ideas.

One such story, I humbly hope, would be my poem.

Yes the b00bs were all over the place as they are in real life, and in my opinion how they got there is as interesting as real life.

Original people solving the original problem had a good understanding of what they are dealing with.

However with subsequent generations it became a monkey problem https://m.youtube.com/watch?v=5QuwPeH9P7Y

When all you have is a DAG all solutions end up with sales person coming at your door problem. But years down the line we didn’t have traveling salesman problem but out of date libraries problem.

The monkeys tried to make their risk taking more safe by introducing all kinda of random constraints not realising that what gave power to the whole idea was actually risk taking.

And no, they didn’t try to manage risk but have completely fail to understand that risk is what made product good in the first place.

And this is how every great idea in human history fell apart, the original people had different view of a current problem however the following generations have only understood the simplified problem and down the line the the solution to original problem had became the actual problem.

I still believe the poem would make greater justice to the whole idea, but what I tried to explain here is that the simple things we have all witnessed actually hide much deeper truths about life in general.

And from there I assume QzDesktop loading times where not the issue at the beginning of this deterministic chaotic system, but simply one of the possible generational products.

I am yet to understand how to solve the monkey issue.


Curious why would this high-powered person go to bat for a technology decision they didn’t seem to have done any risk assessment of? Wouldn’t he be liable if something was exploited and hurt and company, like his head would be on the chopping block for giving the go ahead when they traced it up the chain of command?


That's precisely why he did it the way he did. He had total deniability. Here's how the conversation would go if it went wrong somehow:

"Armen, did you tell Sean to install python?"

"No".

"Sean, did Armen tell you to install python?"

"Err... no. He said I probably wouldn't be fired."

"Well it turns out he's not right about everything. Here's a cardboard box for your things."


This is only my opinion, but I think the reason Armen said it like he did was because by not making it an order he's giving Sean the option of not doing it, if he's not up for accepting the risk. However the risk was both of them could have got fired.

Armen must have known people would know Python had been put on these machines and that he authorised it, in fact what's the point of putting it on them if nobody knows and nobody uses it? I can guarantee you that within 24 hours someone was asking Armen why he'd authorised this and was justifying it. There cannot have been any possibility of dodging responsibility for this decision. If anyone got fired it would have been Armen, with a possibility of Sean going as collateral damage.

This is the big league. You make your decisions and you accept responsibility for them.


Exactly, so what is this Armen character getting out of this other than a potentially big amount of liability and unarticulated risk.

The OP said he told people openly that Armen told him he could do it when asked.

This makes no sense to me, what’s the upside to Armen? If he is business savvy, he needs to be gaining something in exchange for having his name thrown around by OP as signing off on this.


Guess the Python was useful? From an FT article:

>In 2011 Goldman Sachs put its top computer wizard, Armen Avanessians, in charge of the division. He has helped turn round its fortunes. The arm’s assets under management reached a nadir of $38bn in 2012, but it now manages $91.8bn...


Wow hope OP got a chunk of that!


It's a bank, so probably his reward for good performance was 10k at Christmas.


Hahahah 10k maybe if he was a new intern. Off by an order of magnitude


You may have dropped a couple of zeroes there.


Maybe that bank is more generous. The one I worked at begrudgingly counted out the pennies like it was coming out of the war orphans fund or something.


He's doing his job, which is to ensure people have the tools and resources available to do their jobs. You know, furthering the goals of the organisation.


Money or less risk of losing money. Partners originate deals and/or manage risk. He was looking for accelerated process to make informed decisions.


Nah, I don't think that's why he did it.

You say he reached out to you and asked if you liked Python. He probably wanted to roll out Python and was looking for someone who wanted to do it. If he told someone to do something they weren't passionate about, they would fail. He wanted to make sure it succeeded, so he reached out to you.

If he's such a bigshot and everyone was frightened of him, he must not have been afraid of them. When he said, "you wouldn't get fired," he probably meant what he said. He was giving you air cover. And it worked. When the gnomes came out after you, you just sent them to him. And they didn't bother you again.

I can imagine how the conversation went:

"Armen, did you tell Sean to install Python"

"No, did he"?

"Yes he did!"

"Great!!"

Now the gnomes are on their backfoot and have to defend why Sean shouldn't install Python If this guy Armen told you to do it, Armen has to defend himself to them.


Hehe. Well I guess you would have a unique insight into his thought process. ;-) But yes indeed that's certainly another explanation and it did indeed work that way.


Air cover is crucial, but you can't take land without good boots on the ground ;)


Well thanks for the air cover, and for all the other opportunities you provided for me and others at GS. I really appreciate it. It was an amazing time and I learned a great deal.


Ah, now I finally know what it means when I get job ads for Python+Quartz ...!

I could tell from the context it was nothing to do with QuickTime...


I worked for GSAM for a bit as my first NAPA project - I guess you're referring to Armen Avanessians? Haha haven't heard references to the "train" in so long. Did you ever do any Slang/SecDB dev? I was mostly in FICC Tech so was pretty much slangin' slang most of my time there.

JSI (Java Slang Integration) was just getting off the ground but there wasn't too much for the front office tech teams to do there until it was to mature in the coming years.

Good times, thanks for sharing the history ;)


Yes, that's who I was referring to. I did absolutely tons of slang and a fair amount of TSecdb as well as a lot of work on the C++ infra and the build and distribution code.


My former team, previously known as AIM, wrote JSI :)


If you don’t mind me asking which year was this?


Python Quants for the win ! I was lazy enough to stay with R but it was nice of Wes of building pandas, made adoption super easy.


Very interesting story. I'm actually impressed, you're allowed to give us such "internals" about GS.


Beautiful!

I worked a bit with Athena and was in the group that got Anaconda into the banking sector.

Small world we live in. I think I remember your name from JPMC days, but it has been awhile.


Ha, cool. I learned Python on your distribution ca. 2014.


There was no single person that introduced python at Citigroup that I am aware of. It came in via a variety of teams mostly because of the fact that the alternative was perl, and no one wanted to write perl (yet somehow kdb was acceptable a few years later).


What year was this out of interest?


I think 2002 or 3. I worked there from 2001 to about 2010 and it was pretty early in my time there.


what version of Python was it?


I'm seeing a lot of people speculating about which bank this might be; I think the point is that it's all of them. I could loosely describe a previous job as implementing Morgan Stanley's Walpole and integrating more source code management into Minerva (even though that system wasn't actually Python-based).

Having a global view on everything is large banks' value-add, it's why they haven't been outcompeted by their more nimble competitors. Being able to calculate the risk of the whole bank isn't just a cool feature, it's the core value proposition of this platform.

Being able to just upload your code and run it is really cool, and if you squint it looks a bit like what the outside world is trying to set up with serverless/lambda-style platforms - just write a function, submit it, and there, it's running. (But it's worth remembering that Python is not a typical programming language; python build, dependency and deployment management is exceptionally awful in every respect, this isn't as big a pain point in other languages). Obviously there's a tension between this and having good version control, diffs, easy rollbacks etc. - but because Minevera is already designed to do all that for data (because you need that kind of functionality for modifications to your bonds or whatever), doing it this way strikes a much better compromise than something like editing PHP files directly on your live server.

What this article calls data-first design has a lot in common with functional programming. I hope that as the outside world adopts more functional programming and non-relational datastores, Minerva-style programming will get more popular. It really is a much better way to write code in many ways. The difficulty of integrating with outside libraries is a shame though.


Does everyone use a giant Pickle dump ? I mean - how big is that ? Petabytes ?

I'm kind of surprised nobody monkey patched python serialisation to use a database (much like GitHub did with ssh key lookup in MySQL).

What does the devops there look like ? Snapshot every minute ?


It's not a single giant pickle dump; each individual object gets pickled and stored in Minerva (which works more or less like Cassandra or something). It's a pretty similar high level design to what the likes of Google or Facebook do do where you store everything as protobufs in BigTable - the bank uses pickle rather than protobuf because they put a higher priority on being able to store arbitrary objects and deal with robustness/compatibility later, rather than having to write a proto definition and a bunch of mapping code up front. You wouldn't want to use a relational database because they're not properly distributed (and, frankly, kind of bad and overrated).

The Minerva I worked on was temporal and append-only, like a HBase that never did compactions (so "delete" actually just writes a tombstone row at a particular timestamp - there was an "obliterate" command but you needed special authorization to use that), and it was distributed (with availability zones even) so you didn't really worry about losing data; loading data as-of a particular timestamp was part of every query (and implemented efficiently). There were probably regular dumps somewhere too but I never needed to encounter those.


So Minerva is like a distributed datastore, specifically for python object storage ?

Interesting. Do you think you would do this today with a Cassandra/Hbase? Can it be done - let's say take python 3.10 and the latest Cassandra (or even better - something like Firebase or Cloud Spanner).

Just curious that in a post AWS/Firebase world, can something like Minerva be built, without investing in writing the db store ground up.


The incarnation of Minerva I worked on actually used Cassandra as its storage backend. But it's something that's not particularly useful piecemeal; the great value of Minerva is that all the bank's data is there and it's all temporal, all access-controlled and all the rest. The most fragile and cumbersome parts of Minerva are the parts where it integrates with an external/legacy datastore - but if you tried to introduce a Minerva-style datastore as a small piece in a system that was otherwise using a "normal" technology stack, those integrations would be most of what you made.


ZODB is the object oriented database as a giant pickle dump. Surprisingly, it works and scales wuite well. The downside is that non-Python tools cannot access it all.

https://zodb.org/en/latest/


I learnt Python via Zope in 2000, and attended the Zope Conference in Python that year.

Joined JPMorgan in 2010 to work on Athena, and immediately had a real sense of deja vu... Athena's Hydra object db (essentially an append-only KV store of pickles) felt like a great grandchild of Zope's ZODB.


I remember explaining our tech stack (Python and Zope) to clients.

“Where is the code for that page?”

“It’s in the database”

“Oh… Like MySQL?”

“No. It’s an object database”

“???”

I called it “Martian Technology Syndrome”. But it worked. At later stages we paid the price and had to serialize the datastore for migrations, but that’s what you get for relying on pickles.


I use Pickle quite a lot for caching, a file read is almost always faster than a DB query.

For long-term persistent data ? Seems very dangerous to me, even reading a pickle from say PyPy vs a Cython intepreter corrupts the damn thing.


This works until...

Specifically, until you realize that pickle changes based on python version, so updating from py3.x to py3.x+1 will prevent your application from reading previously stored data.


This is wrong. pickle can read old files just fine and lets you generate files in old pickle format versions if you require backwards compatibility further than when the current protocol was introduced (it does not get increased with each python version).


I think it’s mostly true. The complaint with this kind of “industrial global Python code base”, be it at banks or elsewhere, is that often they are hastily cobbled together and depend on extreme user care to not flop over all the time.

I guess banks are the archetypal places that care only about feature creation and not about maintenance or technical debt. When something does break in the end, someone senior just shouts at the poor devs until it works again - usually fixed with a hasty patch again.

Similarly, documentation? Access control? Sanity? These seem to be left behind.


You could not be more wrong. Quartz has first class documentation, solid tooling, very well thought out and rigorous code review and access controls. Banks are regulated up to the eyeballs, so everything has to be audited and justified in detail.

It's not nirvana, these are real working systems built by humans with human failings. There are tradeoffs. Not every application is suited to these sorts of platforms, but the people building these things are top notch technologists and know what they're doing.


For some reason people think everything is as good as it can be at FAANG and other big name tech companies, and everyone else just walks around with their pants down at their ankles bumping into walls until 5:00. It’s just not true.


I mean, it's not just FAANG. Literally everyone except banks use the standard Python environment. simonh is right about the reason for the forked tech stack.

Actually, it's probably not FAANG folks at all. I'd expect ex-FAANG folks to be more sympathetic to the forked python situation... FAANGs have an abundance of non-standard and frustrating infra (wasn't 5TB just posted yesterday?), and maybe even on steroids compared to banks (do any of the FAANGs not have at least one custom linux kernel?) Hell, both As roll a shitload of their own silicon.


Exactly.

Facebook literally maintains a python fork: https://github.com/facebookincubator/cinder

Google invented "NoSQL" before anyone else knew what it was, and all those "cloud" tools they used internally were obviously proprietary (except the ones they open sourced). Ex-Googlers I work with typically had to spend quite a bit of time re-adjusting to the "inferior" tools and processes in other companies.

Microsoft invented their own development ecosystem, and the only reason it's "common" or "standard" in the tech community is because they sell it as a product. This is the same for Apple at least for iOS development, and Amazon for their cloud service offerings.

When companies have millions of dollars to spend on maintaining a custom development environment that they think will give them a competitive edge, they will do it. It's the smaller shops that can't afford not to go with the flow, so to speak.


I mean Google has Bazel which has reliable hermetic distributed builds with modern statically typed languages. I don't think this weird custom Python system is really in the same league. Barely even playing the same game.


Well, I don’t know about Quartz. I worked with DB systems and they were awful in that regard. They worked, but largely because people stuck to convention.

For example, changing scheduler jobs required submitting a change in Excel and having it approved (twice...) by someone. Except the table was world-writable and changes not logged. So in principle only your appropriate superior could approve change, in practice anyone could, and you’d never even know.


Youch, that's nasty. I can completely believe it though, banks are huge unwieldy organisations. I spent a fair bit of time working with auditors though, so I know a huge amount of effort goes into rooting out things like that.

The thing is just because a team in a bank did this thing, that doesn't mean "The Bank" thinks that's a good idea. Like any company, banks are communities. I'm not making excuses, the fact this system wasn't properly architected is a failure of governance, but I've been on the other side of this trying to get teams to fix their problems and adopt resilient processes and procedures. Every offender thinks their service is special and their violation of the standards is justified.


> Quartz has ... solid tooling

Is that how you'd describe the IDE and (integrated) VCS?


Yes, it's fine IMHO. Ok it's not your favourite Java IDE, but it's way, way better than some of the crap I've had to use at various places. But then I wrote a 5Kloc PyQt desktop app almost entirely in IDLE so yeah, maybe I'm not the best judge.


Hmm, I found the opposite - the fact that there was this global framework that managed all the data and code meant that access control was actually pretty good, better than most tech companies I've worked for. You had a single source of truth for what your access rights were, there was integrated Kerberos any time you needed to access a system outside Minerva. And having all the code in a managed place meant good deprecation cycles - not instant deprecation like the Google monorepo, but tracking and policy for which old versions of libraries were in use and how much that was tolerated. Documentation was at least attempted, and while platform stability/enhancement work did have to be coupled to business initiatives to a certain extent (e.g. "we're doing this performance work to enable us to run risk estimation more often to meet MIFID requirements at low cost") there was leadership that put a value on maintaining high quality code and this paid dividends.


This was 100% my experience too.

The biggest productivity gains were:

- having a single source of truth for both data and code (in a closely coupled environment)

- strong, battle-tested libraries to take care of all infrastructure concerns.

- enforced code dev/test/review/deployment workflows

This let the front-office devs be highly productive on adding real business value for their trading desks.

Remember also that these systems at GS, JPMorgan and BAML started around 2007-2010. The infra we all take for granted today at AWS/GCP/Azure simply did not exist back then, and banks' data security policies at the time did not allow cloud processing.


> Remember also that these systems at GS, JPMorgan and BAML started around 2007-2010.

GS had „these systems“ well before 2000 (via J Aron). I think around the time you mentioned they spread to other firms (in their Python reincarnation).


> I guess banks are the archetypal places that care only about feature creation and not about maintenance or technical debt.

It depends which department you are in, but in general: absolutely not. Actually the reverse is true. Banks have huge risks to manage: just imagine for instance what damage a hack of their account system could cause. Or a crash of their payment system. Therefore it is of the utmost importance that systems are stable and bug free. In most departments, feature development is only in second place: stability and reliability have priority one.

This concern is so important that it is not just left to the responsibility of the banks themselves: for many systems, banks have to comply with external standards and are audited for that by external agencies.


And that's why many old banks still use COBOL.


Well it s repeated a lot but I work in a big bank and I have yet to see tech older than Java 6 :(


> python build, dependency and deployment management is exceptionally awful in every respect, this isn't as big a pain point in other languages

I'm not sure how to react to that, but these features in Python are miles ahead of what many other languages have (or actually don't have).


If you compare Python's deployment and dependency management to those of statically compiled languages like Go, Rust, Zig, or Nim, you quickly see the experience with Python is quite poor.

In all the above languages, you simply ship a statically compiled binary (often just 1 file), and the user needs nothing else.

With any sufficiently complex Python project, the user will need:

1. virtualenv 2. possibly a C compiler 3. recent versions of Python (and that keeps changing. 3.0->3.4 are "ancient", and 3.6 seems to be the absolute minimum version these days --- due primarily to f-strings) 4. Or you ship a dockerfile and then the users need 600mb of Docker installed

I sometimes joke that in the future every Python script will require a K8s deployment and people will call it "easy".

Python is a great language, but deployment is a massive pain point for the language.

When I know I am writing something that has to "just work" on a wide range of systems that I don't necessarily control, well I don't write the solution in Python. I pick Go, Nim, or Rust (Zig would be a good choice too).


Or you just provide your own Python package. Most of the time that will be less than 100 MB if you don't include huge libraries. You can test and build automatically. For deployment you then have an installer or rpm that is probably smaller than most of the other enterprise software your customer's infrastructure admins are handling.


The problem is "less than 100mb" is unacceptable (for my use cases).

There are use cases where 100-300mb is no big deal and customer can handle this.

But single binary deployments with a statically compiled language where a fully-featured binary can weigh in from 5-30mb are what I'm after.

And honestly, with upx I can take even those fat Go binaries down from ~35mb to 7-8mb. That's an order of magnitude less than 100mb of Python and all it's dependencies. Not to mention with all those languages I mentioned (Go, Rust, Nim, Zig), I get multi-threading and high-performance as well.


I think there are too many options, or not enough direction for busy people. Once you understand how it all works and pick / build the right tools it all works pretty well.


I would agree for all but deployment. I know my way around python reasonably well, but pyinstaller and friends still make me have bad days pretty regularly.


Four Python projects, same customer, five different deployment systems. Docker, a Capistano look-a-like I coded in bash, git pull (their former standard), git format-patch plus scp, zip archives. Yes, python file.zip works if it contains the right files. Probably the latter is the easiest way, except it doesn't address the dependencies.


> except it doesn't address the dependencies

It does if you put them all in the zip :)

(and build for exactly the platform your customer is going to deploy on)


Python is literally decades behind. Dependency resolution is nondeterministic by default. The way you run a build is still not remotely standard (e.g.: I've downloaded the source of one of the top 20 packages on PyPI. How do I run the tests? Perl had a standard way to do that back in the '90s). Deployment is so bad that people recommend using containers as a substitute for something like fat jars.


That's just untrue. Maybe compared to C++ Python isn't too bad. But try something modern like Go, Rust or Deno. Python is light-years behind.


Am I the only one who has never had enough headache over the years to say its awful? I do think dependency management is a somewhat difficult problem to solve and a lot of systems have pros/cons but I never have had huge issues with Python's.


it's been 15 years and i'm still a bit traumatized by aurora


BAML Quartz was conceived by a bunch of front-office quants who had not the first idea about the software needs of a big bank beyond the front office. There was an arrogant assumption that front office software is obviously the most complicated/difficult variety of software within a bank and therefore any system designed with front office requirements at the forefront would, of course, be perfect for universal use.

This assumption was challenged at the time by various groups - I was closest to the Equities Operations software team (although not part of it) who absolutely dug in their heels and refused to use Quartz. The assumption was explosively invalidated when people started implementing in Quartz applications that fell under Sarbanes/Oxley regulations and Quartz picked up a severity 1 audit finding - because Quartz was explicitly designed for "Hyper Agility" (literal quote from the quartz docs) - and anyone-can-change-anything-at-any-time does not make for applications that the regulators trust.

There was an interesting trajectory of Python hiring during my time at BAML. I joined just as Quartz was getting started and we managed to easily hire tens of python devs in London because it was easy to sell the fact that BAML was making a strategic investment in Python and therefore their (at the time relatively uncommon) skills would be highly valued. But as Quartz matured, Python developers generally came to dislike it (for reasons see original article) and it became hard to retain the best ones. And after a while Python 2.x became a massive embarrassment and, as Python became a more common skill in the marketplace, it became harder to hire good developers into BAML.


> BAML Quartz was conceived by a bunch of front-office quants...

It was worse than that. What they actually built was a system designed to support complex hybrid structuring. It's what markets desks had been making a lot of money in prior to the crash esp GS. Unfortunately, post-crash there wasn't much money in structuring so the Front Office was more interested in investing in flow. Quartz was really, really bad at flow.

It took a long time (and the departure of Mike, Kirat et al) to get Quartz to a position where it was a reasonably sane FO system for the world as was rather than as it had been.

Fun times.


I was at BAML when the sev 1 audit finding happened. My view was from an application support team in Risk. For us Quartz was fantastic, and it had a pretty decent permissions system. The problem is there were two miss-aligned goals.

On the one hand the goal was to build a single enterprise scale system with a holistic view of the bank's data to do rapid ad-hoc position evaluations and meet new needs rapidly.

On the other hand, access to all that data and all the code is clearly a security concern. By the time I left the sev 1 finding was well on the way to being mitigated, but for example it meant that instead of handing out quartz developer accounts and IDE access like candy it had to be restricted to technology personnel only.


> in London because it was easy to sell the fact that BAML was making a strategic investment in Python

I reckon I "felt" that push to hire at one of the early PyConUK, where your boys suddenly showed up with a big contingent. I even thought about applying, but I was not based in London - and there were some red flags, like running a pretty old Python version (I thing it was 2.2 or 2.1, when 2.4/2.5 were the expected mainstream), that kinda sounded like I'd be signing up for the modern equivalent of mainframe maintenance.


Can you have a career in finance as an engineer with Python and without C/C++ (professional) experience?

Your post really made think it, it's an attractive area to work in.


Also consider Application Support. I know it's not sexy rockstar dev stuff, but if you can get into App Support on the Quartz (or Athena I suppose) environments you get a dev account and access to all the tools. You can view all the code, config and running systems. If you have a good relationship with your dev team you can submit patches e.g. to improve logging. The live log files of all your applications are just a URL away.

If you're up for it, you'll spend a significant amount of time in the Quartz IDE. There are teams within App Support that develop monitoring and compliance reporting tools in Qz and do about 50% development. I know because I ran one. One of my team transferred into our dev team.


Yes. Many adverts will specify financial services experience but it's worth applying anyway. You'll probably find that roles in back-office technology areas (operations, finance etc) are less demanding in this respect. I hired mostly from outside the financial services industry because other industries had, on average, better-skilled developers, lower salaries and better development practices.


Absolutely yes.

Depending on what kind of engineer, it is far better to go to the finance (front office quant, back office risk) side than the tech support side. They are less snobbish about autodidacts and pay is far better if you are willing to learn about things outside the dev sandbox.

(Our front office has a few quants and ex-quants with electric engineering background, I don't know of any software engineers there.)


Thanks for detailed pointers. What's the deal with front/back offices?


Rule of thumb: the closer to the business (ie front office), the more money and stress.

(Front office deals with clients, and in this context comprises sales, trading, structuring. Middle office run control functions, reporting, risk, compliance, etc. Back office would be settlement, accounting, operations, etc.)


Yes. Tons of it is in Java.


> I've mentioned that programmers are far too dismissive of MS Excel. You can achieve a awful lot with Excel: more, even, than some programmers can achieve without it

This is one of the most underrated topics in tech imho. Spreadsheet is probably the pinnacle of how tech could be easily approachable by non tech people, in the "bike for the mind" sense. We came a long way down hill from there when you need an specialist even to come up with a no-code solution to mundane problems.

Sure the tech ecosystem evolved and became a lot more complex from there but I'm afraid the concept of a non-tech person opening a blank file and creating something useful from scratch has been lost along the way.


Reminds me of the investment banking dev cycle in the 2000s:

* trader writes a pricing "app" in Excel

* trader discovers MS Access db

* traders (plural) start copying around Access db files

* problems

* "IT" gets involved

* convert MS Access to oracle or sybase

* write some server process(es) in C++

* write some replacement front end (spend months arguing over best grid component to replace excel) in C++/MFC

* trading system emerges...

* rewrite in C#, Java

* etc.


problem as described to me is that excel starts being used for regulated processes and it's not well auditable, access controlled, changed controlled, tracked, etc etc. Then people need to implement the exact same process across departments and they're all using a separate excel sheet and they all submit different numbers. becomes a huge mess and so much more complicated and expensive systems become commissioned.


Fun story: I was at a bank that used Excel for everything. As you say, there came a complaint from the auditors that it's not well auditable, and there needed to be "a system".

Solution: the bank put together a system that constructs (from Excel templates and the bank trading data and market data) Excel spreadsheets from scratch every day, then used those for the calculations, and stored them. But now it was "a system", so all good.


Well you can audit the code that generates spreadsheets, which seems to solve the audit problem. Kind of like I prefer reading a Dockerfile that builds a program from the GitHub repo, rather than downloading a pre-compiled package I can't trust.


sounds like a great system. we have something similar where we put excel in and out but doesn't sound as slick as that. on top of the system there is access control, versioning and such. the data gets approved and then stored in the backend to feed the regulated process.


This describes what I've seen happen with Excel over and over again. I'm curious if the use of collaborative Google sheets could be a fix for this? Something where a portion of the sheet could be shared globally, but the rest of the document would be local to the instance working on it.


There's an excellent example of this phenomenon in the JPM "London Whale" report where -- at various points -- poorly maintained and validated spreadsheets appear as minor villains in a $6.2bn loss.


The jargon for this is "user-developed application," and auditors do keep an eye out for these. Banks, from what I've seen at least, typically have some process to document these as they come up, replace them with supported solutions, and retire them. At least, that's the "happy path," where people are willing and able to get all that done before a big-three auditor comes in and tears you a new one.


Plus, a spreadsheet is basically purely functional (unless there's mucking around in VisualBasic), and has a beautiful dependency graph and calculation engine! (And that is a big part of what SecDB/Slang/Bank Python brought to the table.)


I think part of the problem with Excel (or clones) is that you can do so much haha. Its such a powerful tool, that you end up doing things in it that it really wasn't optimized or designed for and managing the change history in excel is pretty tough.

But for 95%+ of analysis you really cant beat it.


Very true. I often prototype algorithms and things in google sheets. One time I had backpropagation working in there, with a little button to process the next "row" of training samples.


I used to work for a very successful company that produced mobile games. The entire logic of the game, the rules, etc. was all in Excel


So Excel spreadsheets were deployed on mobile devices? What about the runtime?


The problem is sometimes analysts turn into shadow BI or even DE who only know Excel. They know Excel so well that they create a whole monstrosity in Excel. MSFT has been sort of encouraging that too by introducing some Power BI feature and now Javascript into Excel.


I can see the benefits of this collection of tools within an all-in-one monolith. Ease of deployment is a big benefit. I can also see the costs. As a stack its probably better in some ways than how a lot of other businesses operate as well as worse. There's probably a lot both ways.

The mainframe mindset might be a factor here as well. The giant mainframe where all the magic happens is still a thing to behold and this is definitely part of banking's history and present. Mainframes are beasts and are still far from any kind of obsolescence. A monolithic Bank Python with a standardised set of libraries etc would slot right in to that mindset and way of thinking.

The part about programming languages frequently not having tables is interesting. The closest as mentioned is the hash, but you lose so much in that abstraction eg the relational aspects. The counter argument then becomes the obvious: why aren't you using a database library, or in a pinch, sqlite? Rightly so. Why would you add relational tables to python rather than have a generic python database spec or a collection of database connector libraries. Databases are separate and large projects in themselves.

I'd still be overly disturbed if they were running some old python 2.5 or similar. Just saying. That would be a source of pity.


> The part about programming languages frequently not having tables is interesting. The closest as mentioned is the hash, but you lose so much in that abstraction eg the relational aspects. The counter argument then becomes the obvious: why aren't you using a database library, or in a pinch, sqlite? Rightly so. Why would you add relational tables to python rather than have a generic python database spec or a collection of database connector libraries. Databases are separate and large projects in themselves.

The separate datastore is the problem to be solved here - databases, especially relational databases, are extremely poorly integrated into programming languages and this makes it really painful to develop anything that uses them. You can just about use them as a place to dump serialized data to and from (not suitable for large systems because they're not properly distributed), but if you actually want to operate on data you need it to be in memory where you're running the code and you want it to be tightly integrated with your language and IDE and so on.

(It's not even the main benefit, but just as an example of that kind of integration, when you're querying large datasets Minerva works a bit like Hadoop in that it will ship your code to where the data is and run it there)


Funny thing is, databases were tightly integrated into programming languages all the way back in 80s - that's exactly what dBase was, and why it became so popular. FoxBASE/FoxPro, Clipper, Paradox etc were all similar in that respect.

And yes, it made for some very powerful high-level tooling. I actually learned to code on FoxPro for DOS, and the efficiency with which you could crank out even fairly complicated line-of-business data-centric apps was amazing, and is not something I've seen in any tech stack since.


> FoxBASE/FoxPro, Clipper, Paradox etc were all similar in that respect.

> the efficiency with which you could crank out even fairly complicated line-of-business data-centric apps was amazing, and is not something I've seen in any tech stack since.

Did you ever get to try Delphi? Those "line-of-business data-centric apps" is what it was all about.

And I'm not quite sure, but I think and hope Free Pascal / Lazarus is close to that in ease and power.


> The separate datastore is the problem to be solved here - databases, especially relational databases, are extremely poorly integrated into programming languages and this makes it really painful to develop anything that uses them.

Hence "Active Record" ORMs like Rails and Django being highly successful. They functionally embed the RDBMS into the language/app (almost literally if using SQlite), which is a huge boon for developer productivity...

...but also a significant footgun, because it means the database is now effectively owned by the Active Record ORM and its (SWE) team, and not by some app-agnostic data team.

Want to reuse that juicy clean data managed by Django? Write a REST API driven by the app; don't try to access the data directly over SQL, although it may be tempting.


> Hence "Active Record" ORMs like Rails and Django being highly successful. They functionally embed the RDBMS into the language/app (almost literally if using SQlite), which is a huge boon for developer productivity...

Right, those are a step in the right direction, but still a lot more cumbersome than properly integrating your datastore with your application.


The first-blush conversion from Excel to this ecosystem only needs lookup tables. Excel has some static database I/O, but people who only know Excel use it as dat input for lookup tables.

The Python results of that first conversion need to test against Excel, so it’ll have identical lookup tables.


> The part about programming languages frequently not having tables is interesting. The closest as mentioned is the hash, but you lose so much in that abstraction eg the relational aspects. The counter argument then becomes the obvious: why aren't you using a database library, or in a pinch, sqlite? Rightly so. Why would you add relational tables to python rather than have a generic python database spec or a collection of database connector libraries. Databases are separate and large projects in themselves.

This is covered in the article, in the distinction between "code-first" and "data-first". Databases means that you leave the interaction with data to a third party, and the only thing you do is send commands and receive results. This is very different from having all the data in your program, and starting from that. I'm not sure if "code-first" is the right word from it. Perhaps another way to put it would be that when data is the most important thing, you don't want to encapsulate it in a "database object", you want it to be right here.


Both frontarena and murex use python as their “vb” kind of language. If you thought your deployment pipelines were weird, ours have included putting entire python apps into single strings and inserting them to an oracle db, where a fat windows client selects them and runs them on a windows python interpreter … via citrix… :/


This reminds me of an e-commerce system which stored data in a mixture of Oracle, and text files on the local disk. We handled backups by loading the text files into blob columns in the database, and then just backing up the database.


This made me laugh out loud. Technology is awesome.


Username checks out :-)


The horror


I worked at one of the largest of these systems. It seems to be the one referred by the post.

The global distributed store of pickled python objects using Event Sourcing was one of the most horrible and expensive database systems I've ever heard of. It runs on THOUSANDS of expensive servers with all data stored in-memory. To get the state of a single deal you had to open, decompress, deserialize, and merge hundreds if not thousands of instances. And 90% of the output was more often than not discarded.

The Python interpreter extensions reveal the ignorance of Python by the original developers. There was no good reason to fork CPython.

There were many small subsystems created and supported by lone rangers with impressive CVs and astronomical salaries. A JIT better than any other one out there (but with a lot of limitations). A meta-query system extremely elegant.

But this all was a sham. The actual daily crunch/analytics was run on more classic SQL/Columnar clusters. From the distributed object database hours-long running batch jobs loaded stuff on old school DBs. And those blew up frequently. Sometimes those blow ups cost many millions in delayed regulatory reports. The queries running on top of SQL were beyond stupid and the DB engine could not optimize for them. And of course, people blamed SQL and not the ridiculous architecture and the OOP dogma.

Don't work for old school banking, hedge funds, or anything like that. They are driven by tech cavemen and their primadonnas. Exceptions might be some HFT and fintech shops.


It looks like sound ideas, poorly implemented. A lot of these capabilities crop up in BEAM or Smalltalk. I do wonder if Erlang had better developer ergonomics we wouldn't have seen that instead of Python as a basis.


But it was good for the first generation who worked there. Big dollars, total control and even the opportunity to create a proprietary IDE...


That proprietary IDE was a piece of crap. Monkey patching was prevalent, exponentially increasing startup time depending on the last time you opened it, and an “online” source tree where one could easily modify the source code in someone else’s ‘private’ workspace.

PyCharm was a move in the right direction, but the way it worked was absurd- it would run the internal IDE in the background and sync to the file system. Given the proprietary IDE took up more resources than PyCharm, you constantly had to shut down apps so the machine had enough memory.

IDEs should not be a requirement- they are tools… but you had no choice but to use it and their totally flawed code completion. Their measure of success was tantamount to having a Jupyter notebook- write code and get back results immediately.


I guess it was originally created as a productivity tool as there was not any good Python IDE back before the year 2010. But after a while it became a monstrosity and new tools emerged but it was rooted so deeply that it was impossible to implement the new tools.

But as said I'd really love to work on those projects. Most of the people don't get the prestige to own one's own baby in a big corporation. 99% of the job is maintaining a shit mountain of code and piling new shits on top of it. It really took a lot of luck to be able to make it happen.

I mean even for people who get to work for Jetbrain or Microsoft Visual Studio team, they don't get to create new IDEs, they are buried deeply in a shit mountain of code and JIRA issues.

Plus the pay and vacation is really good for the banks.


Reminds me of what we used at the ATLAS experiment at CERN* . Python was tightly integrated with the application framework, Athena (which I just realize has the same name as JPM's Python framework!). You could use it as a job description language, and you would compose computation steps from classes you could write in C++. I think there was a separate `athena` executable that was just python with some packages pre-loaded. Because of all the binary modules, but even more so because of the minor syntax changes, the transition to Python 3 was really a problem (I hope they did it by now :-D).

There was also a bespoke time-span database. You could store keys and values in there, but every data point had a start and end time. Then you could query what the values were between certain times, or run numbers (operational periods). We used it for example to store what configuration the detector was using when a certain dataset has been recorded.

(* I've been out for a couple of years so I don't know what they use now, but I imagine it hasn't changed much.)


The Greek and Roman gods have always been a go-to for project names, LOL. We need to give some other cultures a shot!


With particularly poor timing, I chose the Egyptian goddess Isis.


One of the systems downstream of the JPM ‘Minerva’ (Athena) was called ISIS.

It was renamed…


I go for James Bond references, when I can. 'Moonraker' is always a great choice.

Sometimes I'm constrained to a specific starting letter, so I've had to stretch it at times, like when I needed an 'S-' word... ended up going with 'Sinatra' since Nancy Sinatra performed 'You Only Live Twice' for that movie


Sean ? Spectre ? Sexism ?


Sean would be a weird codename and Spectre sounds a bit too ominous... We'll have to settle for Sexism


Welcome to your first day on Project Sexism.

Our PERT charts are really pert!

We burn story points not bras round here !

Agility is a core value ... nudge nudge ;-)

... oh well, The 1950s want their jokes back.


I’ve been a fan of metals (and more broadly elements) for project code names lately. My current project is called Cobalt!


I'm a fan of Norse!


Doesn't seem that strange compared to K[1] or Q[2], which are used by Wall Street banks. K encourages you to use single-letter variables and bunch your code up as tight as possible into long lines. Here's an example: [3]. Interestingly, their Github repo has some K-inspired C[4], Java[5], C#[6], and Javascript[7].

[1] https://en.wikipedia.org/wiki/K_(programming_language)

[2] https://en.wikipedia.org/wiki/Q_(programming_language_from_K...

[3] https://github.com/KxSystems/kdb/blob/master/holiday.q

[4] https://github.com/KxSystems/kdb/blob/master/c/c/odbc.c

[5] https://github.com/KxSystems/kdb/blob/master/c/jdbc.java

[6] https://github.com/KxSystems/kdb/blob/master/c/c.cs

[7] https://github.com/KxSystems/kdb/blob/master/c/c.js


Thank you. Now when I'm thinking some code I have to untangle is bad, I'd always remind myself "at least it's not K-inspired C"...


This took me a while to figure out but K/APL code is built on a different value system than most software. Specifically, the goal is to be able to see and operate on the entire program at once. Obviously this only works for programs up to a certain size but that size is larger than you'd expect when abstractions, variable names, and general legibility are sacrificed. I wouldn't write code this way but I can see how someone would find it valuable.


The big banks don't write code in K4 though, managers generally encourage people not to write code in it due to the difficulty of finding developers who are fluent in it.

They all use q and q is very wordy and highly readable if you speak english. It's mostly just developer defined functions which are compositions of the keywords of which there are not many: https://code.kx.com/q/ref/#keywords

Most of the code you would see in a kdb+ system in an investment bank won't look like any of the links you've provided.


Do you have any suggestions for q code to look at? Every time I try array languages I bounce off the ubercompact and I feel like there's actually a chance I could learn something more q like.


I have a copy of Fun Q sitting in the tsundoku pile on my desk that seems pretty good from a quick flick through.

https://www.fun-q.net/


Very interesting.

I am feeling the urge to learn array processing languages.

There is something very tempting.


Project Euler frequently has ultra-short solutions in K/J/whatever other single letter they’re using at the moment. It is quite intriguing, but ultimately I put too much store in readability, so decided not to pursue these.


You mean like Numpy? =P


This pattern in large part came from SecDB at Goldman, and then a few people who moved to moved to JPMC and BAML.

Dependency graphs are an elegant solution to risk management and pricing etc. There’s a reason this approach works in IBanks.

Check out Beacon.io which is the a SaaS implementation from the same team.


> Dependency graphs are an elegant solution to risk management and pricing etc.

Dependency graphs are not a solution to risk and pricing. They are, in certain circumstances, a very useful tool. That's all. They also scale notoriously painfully.

Putting a dependency graph as a mandatory component in your risk system was one of the worst technical decisions I've come across (and I've been doing this lark a long time).


Wouldn't an observer pattern work better? The graph itself could even be used to instantiate subscriptions in a pub/sub system where changes in underlying pricing could be dealt with via an event queue. Compaction and debouncing could be applied on top of the queue to avoid lots and lots of redundant execution.


> Wouldn't an observer pattern work better?

Better as a solution to what problem? In some cases a dependency graph is an excellent solution. In some cases it's not. In some cases it's fine for small graphs but scales poorly as it can be very hard to reason about (as attested by pretty much anyone who's supported a really big spreadsheet).

But that's the point; it's a really useful tool. Sometimes.


I worked on Quartz for a while as a contractor. Hated every second of it. Python version was old (2.4 if I remember correctly when 3.x was already the popular version). But that wasn't it. It was the proprietary version of everything in the stack that got me. Proprietary ide, source control, libs etc. I noted that none of the others who have been there for years have any transferable skills that can cary them out of banking and into a startup for example. All were good devs but only knew quartz. I can say they were great quartz devs. The pay was great, but the work was soul crushing.


> Proprietary ide, source control, libs etc

The reason this was a problem was that it meant investment was needed for each of these things, and as such fell behind.

The IDE fell behind most modern IDEs, presumably because it didn't get the budget for it. The source control/ libs where usually modified versions of existing libs, but now needing to be maintained internally to remain compatible with the mainstream versions (which, again, they did not, and so fell out of compatibility).

> any transferable skills

It puts you in a position of arguing that you are familiar with <some-lib>, just a modified proprietary version of it.. Makes those conversations a bit more difficult..

> All were good devs but only knew quartz

To be fair - this is their own fault. It's difficult providing proof in terms of "what you worked on in your last job"; but there's no reason a professional python dev couldn't become familiar with the popular versions of things on their own, given they are fairly close in functionality. Many of the quartz devs I knew already had backgrounds in Python, attended pycons etc; so knew more than Quartz.


It's not just the "knowing some lib". It's a way of working that is not compatible with the outside world. I had every single character i type having to get director approval. I sopped counting broken things that required human intervention (on rota). The style of programming is... well.. . bankish? I would have had to take a big gamble hiring most of these guys.


Yes, it is their fault, but the organisation didn't even attempt to nurture professional development. Stagnation was a feature, not a bug. Arguably, these guys were paid very well so they would have taken a pay cut anywhere else anyway.


I think this would have depended on the teams/individuals you worked with. I recall some devs being completely clueless about environments - my code (that depends on dozens of other files) in uat is producing different outputs compared to my code in dev, why? Some didn't know how to debug. Many went on to FAANG as senior engineers (including Uber when it was the next big thing), hedge funds (citadel, 2s), startups (twilio, is twitter still a startup?). And as far as python knowledge is concerned - I recall attending a number of python talks given by my former colleagues at Python conferences, various PEP discussions around whether a given PEP would help or be detrimental to qz. As a matter of fact one of the qz core engineers is now also a PSF core engineer. Quartz was polarizing, but there was/is plenty of talent among the engineers. P.S.: fwiw, when I left baml, the migration to 3.6 was nearing completion, and the migration to 3.7 was in progress. I guess at some point we realized python and qz were not going away, we must to migrate, so infra was built out to make future migrations easier.


I have no doubt the people who BUILT Quartz are top notch. Users of it... Mileage varies i suppose.


The builders/maintainers (AKA the "Quartz Core Team") seemed to have a poor view of the users (AKA the "Line of Business" development teams). Of course you wouldn't catch them saying that openly, but it was sometimes implied during interactions. I will admit, that view was not completely unwarranted.


I second your opinion (interned @ Athena, not Quartz) - compared to my current BigN experience everything was worse by a magnitude: the IDE, the source control, the review mechanism, the job scheduler and so on.

I'd expect with so many devs working on this the DevX will be ironed out


I wonder how did they make the IDE? Must be an interesting job for whoever got to write it, and hell for whoever is maintaining it and using it, lol.


Some questions are best left unanswered... :)


Compared to one major IB bank Python system, this is all extremely clean and neat.

Consider a Python API that is a thin wrapper on COM calls intended to be used from Excel. Want to request some data? Fill in a 2D virtual Excel table. Want to pull some data? Query it and parse a text-dump of a table excerpt (remembering to parse #NA! Etc as nans). Want to automate a job? Enter it as a new row to a global spreadsheet. And for Gods sake, do NOT edit any of the other rows, lest the whole house go down in flames!!!


this is the approach I'm familiar with, but back when I did it in Perl (yes Win32 perl with COM bindings) and Java (we wrote native Java plugins to do COM to Excel). all the important stuff is Excel VBA code that they spent years developing and can never replace, so any front end type of thing had to somehow get back to the Excel models.

We eventually did rewrite the Excel models in Java, released something, and then the whole project probably got cancelled or something, 9/11 happened a few years later and the whole building in which all this code was written had to be demolished.


I remember back in 2000 converting the windows line of business app team at $ISP (I was mostly on the provisioning automation side) over to using a COM component called JendaRex which wrapped the perl VM just to expose the regexp engine.

This basically came about after the Nth time they asked for regexp help and I had a trivial solution that didn't work in whatever native implementation they were using and I basically gave them a choice between JendaRex and "not having me debugging their regexps anymore".

They unanimously chose JendaRex and everybody ended up happier as a result.


Or they could use instead use CSVs. What could possibly go wrong?


Everything is fine, as long as no Americans come and write 1,000 where obviously they should have written 1000 or 1'000. /s


Or some american writes the date somewhere.

edit: /s we love you american colleagues,


Working with and for Americans is why I always use ISO year-month-day format.


That, and I like how they neatly sort.


I can confirm that, in a non-bank financial institution, date formats involved with Excel->CSV->XML->Certain (shit) magic application were considerable pain point :|


Or those pesky Europeans writing 12,34 when they obviously meant 12.34!


> There is an uncharitable view (sometimes expressed internally too) that Minerva as a whole is a grand exercise in NIH syndrome.

My brief experience with this (in an adjacent area - proprietary trading) was that the more charitable view is that these firms need to be able to fully own their software stacks, and have the resources to pay for that luxury.

Reading these descriptions from the article, I can't help drawing a connection to the Smalltalk ecosystem. It sounds like, to at least some extent, what these banks have built is a system that exhibits many of the more interesting characteristics of an enterprise Smalltalk system, only on top of a tech stack that they could own from top to bottom.



Sounds like the Quartz platform at Bank of America. When I interviewed with that team they joyfully espoused the virtues of their Principal Engineer who quote, “created the database software from scratch!”

Edit: For the record, I implement third-party vendor Excel functions (DLLs written in C++) in C# and it’s a great way to send useless processes to the shadow realm.


That's why working as the FIRST generation of programmers in any big financial shops is so fun. You get total ownership to whatever you build and others totally rely on you for their job. Even better, you can take months to reply to a requirement, if it doesn't come from a key stakeholder.

Edit: also applies to any large corporation.


I'm curious about experiences in other similar orgs:

I work as a portfolio manager for a large reinsurance/insurance company but spend significant time in SQL, Python, Excel (not unexpected I'm sure).

The Wapole platform in the article struck a chord with me. We built something roughly similar - call it Trek - that handles jobs. Jobs encompass lots of tasks - reading/writing from Excel, executing SQL, running Python, running C#. I could list many limitations, but realistically, the biggest limitation is that the platform can't handle something that it can't configure and run. In other words, the platform isn't set up with R - so no creates data pipelines/jobs that use R. Lots of people here use R (among other tools)

One key problem (maybe?) is this all action happens inside the business. Trek was built by a talented actuary/programmer. No software engineering org involvement at all. I'm sure lots of folks here can imagine why: lots of red tape, general adversity to software that isn't already here, long stretches of time to get things done. Also, frankly, lots of our software devs write bad code.

For folks familiar with the orgs in this article, and other similar orgs, is what the article discusses happening mostly with software devs in IT functions? Are these folks embedded in the business? And also are there folks using the more technical bits of the systems that are business-oriented - analysts, investment professionals, etc.

Realize the lines are very blurry these days, but interested to learn from everyone here about the types/roles of end-users


I work in a Technical PM role at a large North American Insurance company and used to work at one of the largest Banks as a Sr. Business Analyst (or Sr.Systems Analyst depending where you are).

>lots of our software devs write bad code

Ultimately, you get what you pay for, all our full stack devs are making 6 figures... While it might not be FAANG money they also almost never work overtime and the stress levels are relatively low.

>Are these folks embedded in the business? No, thats my job. As a Tech PM I'm supposed to know exactly the business requirements, what my guys can do (to manage expectations) and any limitations of the software/business. I find the best PM's are the ones that have some Dev experience but also have extensive people skills and understand how to manage stakeholders.

>are there folks using the more technical bits of the systems that are business-oriented? It varies, I started off as that business-oriented person (corporate finance) and eventually made my way over to the Data side of things and finally some programming work and now I'm running projects. While you won't get many analysts/portfolio managers doing dev work I do try and get them to have a hands on approach especially when doing QA and UAT work.


> Ultimately, you get what you pay for, all our full stack devs are making 6 figures... While it might not be FAANG money they also almost never work overtime and the stress levels are relatively low.

Yes, it's the blessing and the curse. They are lovely people. They have lives. They aren't working 24/7. But there is misalignment between senior folks who want to innovate and build internal tech around core IP and the talent level of the folks tasked with actually getting that done. Insurance hardly unique in that regard, but an acute issue nonetheless. Firms like GS, JPM, etc have fatter margins (I think) and can afford to pay devs/strats/etc.

Interesting to hear that you went from corporate finance to technical PM. Quite the journey. Would love to dig more into that if you're willing.


>Firms like GS, JPM, etc have fatter margins (I think) and can afford to pay devs/strats/etc.

From what I've seen on my end the Insurance firms have started paying somewhat of a premium to compensate for the lack of "excitement" that is associated with the insurance industry as a whole.

>Would love to dig more into that if you're willing.

Always willing to chat!


Really interesting comments.

I'd guess that you have proprietary (e.g. valuation / capital calculation) systems that need to interface with Trek in some way. Could you share how you've approached that at all?

Also not clear why R couldn't be added to Trek alongside Python and C#?


Certainly - let me try to share succinct version germane to my day-to-day

- We regularly perform group/segment level risk roll-ups. Involves running computationally expensive (by insurance standards) in-house and third-party models that estimate loss from hurricanes, earthquakes, etc. A lot of our insurance data is unsurprisingly stored in disparate systems that don't talk to each other, and in some cases, don't have any useful interface that someone like me can query/view. Things are changing, but still quite a lot of history to overcome

- We also have lots of stuff from outside underwriting parties in form of Excel, CSV, MDF files.

- We have to bring all that together to make sense of the portfolio, so we use Trek to do a lot of the various involved tasks like running the models, processing CSV data, processing Excel data, attaching databases, creating dashboards in PowerBI (tangent: hate it)

- Sample pipeline: query portfolio data from one DB, read CSV file from another third-party, pull both into risk model, kick off analysis, then execute a script to pull together results in the model's databases or elsewhere.

Happy to expand or answer other q's as you have them.

As for your other comment about R - it's just a matter of the install. Someone has to install R so that Trek can use it. Not a major problem. Pointed this out as a contrast in our org that is probably smaller and has many fewer devs compared to what I'm reading in the post where "bank python" sort of feels like the platform in which everything happens / everything is configured.


Thanks for a really comprehensive reply. Enjoyed your comment on PowerBI.

My background is mainly life which is dominated (at least in Europe) by computationally demanding proprietary liability modelling systems but I think Python / R is getting a foothold in capital calculation / aggregation.

My perception that there is a lot more use of in-house models in the GI / property-casualty worlds so more Python etc but sounds like you still have to interface with proprietary modelling systems.


Absolutely - and for quite a long time (I work mainly property).

There's not much (if any) appetite to completely rebuild 3rd-party geophysical vendor models. Those folks have 20+ years of work behind them and a different talent base (e.g. different types of scientists building the base models).

But we do focus on all the other stuff. Making data input easier/more accurate. Same thing re: output. Also the vast majority of our capital and group-level risk work happens in-house - R, Python, etc.


>read CSV file from another third-party Generally, if you're getting a CSV file from a client that indicates to me that theres a good chance they're exporting info from their systems and sending you the CSV.

Have you considered developing a client facing API that can be used to send and digest data?


We have, but that's not something my team (actuarial) can accomplish without help/blessing/oversight from IT.

I don't know all the details but we do have connections to some of our MGAs through XML dumps or perhaps real-time feeds (I'm doubtful). But that data is often missing some of the details as I need. It's useful for policy admin - not for all the other stuff.

We've also explored portals but those are fraught with concerns about double-entry.


Title should be ".. of investment bank python", trading and risk has little in common with a retail digital bank like say N26.

The problem with these projects is that the folks leading them have never built a real trading system in entire their lives (the ones who have been there for many years worked with end-of-day batch systems) and there is a layer of useless and incompetent "business analysts" who hide behind their incompetence by finding ways to malign developers..

Pro tip: Dont work for a bank before assessing its open source repos. They have none? Run in the opposite direction.


When you say "The problem with these banks..." you mean the ones like N26 (not investment banks)? (Seems like coffee hasn't kicked in for me yet)


Edited.


Thx


> In order to deploy your app outside of Minerva you now need to know something about k8s, or Cloud Formation, or Terraform. This is a skillset so distinct from that of a normal programmer (let alone a financial modeller) that there is no overlap.

This rang a bell. How did deployment become such an arcane skill?


There are too many factors here.

One of them is you don't own where your code runs anymore.

This might scare some people until they realize they can get high availability without waiting 3 months for some server to arrive, but it makes deployment harder.

I still believe the main reason people adopt CI/CD today is that "suddenly" deployment in a complex environment becomes easy and software gets tested. A lot.


> One of them is you don't own where your code runs anymore.

That's not new. The first money I earnt from computers was making websites. We didn't own the webservers; we rented webspace from some company. To deploy it, we literally just uploaded PHP files to a place using an FTP client. It was that simple and it worked.


Your example highlights the value chain of CI/CD!


1. Sysadmins had to find new careers after cloud providers destroyed their livelihood.

2. cloud providers try very hard to lock you in, by offering all sorts of advanced goodies. They tend to come with a learning curve, and they all accumulate. Sooner or later someone comes up with cross-provider solutions, and they too have learning curves.

3. inventing new ecosystems means creating new work for advocates and ninjas. You don't become a rockstar by diligently doing what has been done before, but by finding (or inventing) a niche and becoming a guru.

4. some problems are indeed hard to solve, and the more products try to do that, they more they get complex.

5. everyone thinks they will have Facebook-scale problems, even when they never will.


It is not in the value stream.

Our job as sold by the zeitgeist is to write code for features. Fix bugs. And run production.

The logistical part in the middle, how to get the code from commit to prod, is owned by noone and is not considered worth a budget.

There lie the reason we have tools to configure prod, but nearly no tool to deploy code. Even docker and k8s dodge that.


> How did deployment become such an arcane skill?

Prior to industry wide "solutions" such as Docker, or rather linux namespaces and cgroups, there was no obvious process isolation, so "deployment" was copying tarballs or using Windows installers.

Also, investment banks want "support" from vendors, so hardware was either Windows servers (mostly for Exchange and AD), or Sun Solaris boxes.

So although linux cgroups came around 2006 (?) and namespaces in 2001 (?), banks didn't do too much with linux until after 2005 (when Redhat were providing the aforementioned 'support'). I don't think the 'industry' widely recognised the potential of cgroups and namespaces.


Nahh mate you want to talk to the facilities team, they deal with that sort of stuff.


I can report from another bank (in the top 10 globally), that recently moved from a more bespoke system (not even on Python) to having Python+Notebooks+Labs available to all - using Apache products and a global Anaconda-like Python distribution. The fact that you can use the Python, R or whatever programming language seems to be a factor.


would you mind telling, which Apache products. I've been thinking about pushing that to around my organization (mostly for replacing reporting/...) but especially the I/O-interfaces are not that great if you are not living in a central database yet.

I imagine something container-like with the notebook/batch-job (can be anything really), hooking up to datasources such as SMB-shares, thus allowing people who want to automate generating report Z to just request access to folder X for their job and thus be able to seamlessly create dashboards/... even if a lot of the org still is using "traditional" workflows.


> This kind of Big Enterprise technology however takes away that basic agency of those Excel users, who no longer understand the business process they run and now has to negotiate with ludicrous technology dweebs for each software change. The previous pliability of the spreadsheets has been completely lost.

> Financiers are able to learn Python, and while they may never be amazing at it they can contribute to a much higher level and even make their own changes and get them deployed.

Coming from a slightly different part of the finance world (insurance) this rang very true.

I think there is a huge opportunity here to build on the Python ecosystem - which is gaining more and more ground - and provide much more powerful alternatives to Excel and legacy proprietary systems.


Also in insurance. I'm a big fan of using python to generate "read-only" pretty formatted workbooks. It makes the process more reproducible but people who "need my data in excel" still get that.


Sounds like it could be JPM's 'Athena' platform?

context: https://www.techrepublic.com/article/jpmorgans-athena-has-35...


Yes, but it is also probably similar to what they have in GS & BAML.

Model seems to originate from GS: https://news.ycombinator.com/item?id=29104401


Certainly does. Some discussion here from a few years back:

https://news.ycombinator.com/item?id=23819270


They are all similar but in this particular case this is definitely BAML's Quartz.


I think Minerva is clearly a reference to Athena.


Could be a misdirection because all of the rest fits Quartz to a tee.

The Quartz database is called Sandra (referred to as Barbara here).

The Quartz directed acyclic graph is called Dag (referred to as Dagger here)

The Quartz job runner is called Bob (referred to as Walpole here which is a reference to Robert Warpole whose shortname is..Bob)

These and the horrible proprietary IDE make it obvious which particular system he's describing.


How are the Barbara databases synchronized - as multiple nodes are mentioned ? The description makes it sound like it's just a large set of pickles in something like a Berkley DB?


Each server in a ring has a complete copy of the data for that ring. Each ring consists of a network of servers which may have nodes in different geographies. They're called rings, but are actually acyclic networks (IIRC).

Replication occurs automatically so you need to manage consistency in your app architecture. For example if you have instances of an app running in different geographies, the specific data for those instances should be in different folders.


I think Athena has equivalents to all of these but I don't know what they're called. I only know Qz.


The Athena database is Hydra; The Athena graph is Pixie; The Athena job runner is also Bob


I find it amusing that Bob made it in most of the banks, but few of the other names stuck.


Always funny to see the objections of new hires without finance experience to the use of floating point for pricing. It’s more related to the inherent inaccuracy of any pricing model though, rather than clients not caring about pennies.


Exactly. Floating point is inappropriate for anything related to accounting (payments, balances, etc.), but when the numbers you're producing are effectively forecasts or estimates, it's no different than what floats were originally invented for (numerical modelling in physics and engineering).


I remember someone demanding that we needed to run our Monte Carlo pricer on 1024 paths, that 256 just wasn’t precise enough and one of the risk guys said “Well since we know our assumptions are wrong I’m not sure what difference it makes.”


The difference between precision and accuracy in a nutshell.


Can cause issues if you are accounting for things, e.g.

"The sum of these values need to equal the sum of these values"

In that case you'd then needs to avoid

"sum_a == sum_b"

and use instead

"abs(sum_a - sum_b) < SOME_SMALL_VALUE"


You don’t do that though, there’s no use case to account with the output of a model.


I'm not sure what you mean - aside from speculative pricing models there are regulatory constraints too, that are a part of the same codebase.

Not to mention that there is a use case for auditing pricing models (as in external requirement, or internally) or comparing alternative models.


Sure we add up the numbers, but there are thresholds for everything anyway. We might not be concerned about something within a 1MM range let alone a floating point inaccuracy. The uncertainty is accounted for already.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: