Hacker News new | past | comments | ask | show | jobs | submit login
Python toolkit for quantitative finance (github.com/goldmansachs)
285 points by tzury 9 months ago | hide | past | favorite | 60 comments



You can also check OpenBB, one of the most known projects in this category: https://github.com/OpenBB-finance/OpenBBTerminal

Most data vendors will require a free API key because that's their GTM. They want you to create an account with them to get a free API key and then expect to be able to upsell you over time.

Or you can access "free data" (e.g. yfinance) that relies on projects who actively scrape financial data from a website - these tend to need a lot of updates from main maintainer since there are no aligned incentives between maintainer of the scrapped API and company that has the data.

PS: I'm the main creator behind the OpenBB project on GitHub.


> Most data vendors will require a free API key because that's their GTM. They want you to create an account with them to get a free API key and then expect to be able to upsell you over time.

> PS: I'm the main creator behind the OpenBB project on GitHub.

As you point out the GTM of others, can you tell us what your GTM is?


Of course.

We have a paid enterprise product: OpenBB Terminal Pro (https://openbb.co/products/pro).

In it, users have access to several datasets that we have redistribution rights for. This enables users to export any dataset but also to access that same financial data through our Excel Add-in.

Most retail financial products only have display rights as it's much cheaper than the redistribution license.


Is there a way out of this dilemma?

Are there any decent APIs for UK data, by the way?


You can pull public company data straight from SEC / EDGAR. They have a public and free API


Not really.

We have the OpenBB Platform CLI, a command line interface (CLI) that allows users to access a lot of financial data but needs to bring each API key.

The reason why this works is that once you run the CLI, you are running it on your machine and leveraging your own API keys - which, when you subscribe you sign for Terms and Conditions.

And, in it, it usually states that you cannot use the data for any commercial purpose.

This is what most data vendors do. Not only because they are trying to upsell you but because giving you access to data costs them money.

So, most of the APIs that you find that are free and have the rights to distribute that data are usually from governments (e.g. FRED or Companies House https://developer.company-information.service.gov.uk/get-sta...)


The only utility here is to study the design.

Access to anything useful is behind GS specific data APIs via https://developer.gs.com/docs/gsquant/authentication/gs-sess...


I was really surprised there aren't any real examples here. You have a couple of videos linked, but am I really going to pause the video and hand copy each line of code?


How could one study said design? By generating class diagrams or what?


By reading through the code, understanding how they've laid out their abstractions and user interfaces. I am in the field, and in my experience, the most impactful design decisions for libraries like this are related to how researchers will actually interact with the tool.


This seems pretty basic, really just classes of common data structures used in finance. Closer to what you would expect for a final project for a undergrad in OOP course.


I have bad news for you. Most code you tend to come across is at that level.


That's what you find in a lot of domain specific libraries written by scientists, mathematicians, etc. Professional engineer-quality code written by people who aren't professional engineers is rare. Or it's some enormously popular library that has had a lot of attention from engineers over the years.


The toolkit may be free, but the data is very expensive.


Does GS still use their proprietary Slang language or has that been phased out in favour of Python?


Still very much present, it powers "SecDB" (which is pretty much the nervous system of the entire markets business). While there's certainly been openness to Python/etc and tech to integrate Slang into the 21st century, it's the kind of thing that's hard to imagine ever being phased out.


I worked at Bank Of America for a while on the Quartz platform, which is their Python based clone of SecDB. The lead architect was one of the founders of the Slang/SecDB platform at Goldman. It as great fun to work on. Incredible power at your fingertips.


Rather than just dump on it, I’d find it more interesting to hear how it came to be and what it’s used for internally (if so).

I’ve seen surprising stuff that had rational reasons under closer investigation. Companies have cultures and internal priorities that make sense when you’re inside the bubble, but look weird from outside.


Huge missed opportunity here to name the library "vampire-squid"...

From the README this looks like a piece of advertising to developers about what GS does, more than anything useful to the outside world.


I wonder why gs-quant/gs_quant/timeseries/statistics.py contains the SIR compartmental model class for infectious disease transmission, as well as the SEIR model (https://en.wikipedia.org/wiki/Compartmental_models_in_epidem...)...


I wouldn't touch anything goldmansachs with a 10 ft pole

https://en.wikipedia.org/wiki/Sergey_Aleynikov


I did a trawl for quant libraries a year ago and didn't see this. Any other big firms out there with quant libs esp vol related?


what's the use of epidemiology models in quant finance?

https://github.com/goldmansachs/gs-quant/blob/master/gs_quan...


"Copyright 2020 Goldman Sachs."

Probably added in panic mode around March 2020?


Aha. Wonder if they ever actually used it xD


For those interested in applying some of the models to crypto i suggest checking out cctx.



What about the @plot_function bit? Also, wrapping dependency calls for a more consistent interface isn’t necessarily a bad thing.


Does someone have LOC as a performance indicator?


Unironically probably. The only place I've ever worked that used LoC as a performance indicator was a hedge fund.


It’s so bizarre my first reaction was that surely it must be so an LLM can make some kind of sense of it, otherwise what on earth are they doing?


The pointless act of "encapsulating" nothing? I see that a lot unfortunately =[


Presumably @plot_function does something. Also there are lots of other functions in the same API that are more than one line.

I suspect he wouldn't have thought it was over engineering if it didn't have such a long comment for one line of code... Which is silly.


[flagged]


I’m a trader


Surely you've worked with other traders who get confused by import statements. Whole point of this file is to provide a common interface for common math operations that might be exposed by either plain python or pandas/numpy functions.

You're obviously not the target market if you're able to review good and bad code.


That was unnecessarily mean. What did you mean?


This code commits so many sins. The contributing standards are so strange. And what is up with the licensing?

Looking at this code hurts my eyes.


The financial industry never considered a serious open source strategy to be aligned with their interests and that has painted the sector in increasingly narrower corners.

Think eg. the comparison with the acumen of the adtech sector, which supports (among countless other things) the most used open source mobile OS, the most used open source web browser, the most sophisticated open source suites for machine learning etc. etc.

In fact a good reason why "adtech" is (absurdly) considered part of "big tech" is that no other business sector has managed to articulate a long-term sustainable digitization story.


The problem is that the models (closed sourced or open source) only get you part of the way. For example, (to name just a few items) a stock option pricing model is useless without

- holiday calendars

- ex dividend dates

- interest rate curves

- real-time stock prices

- corporate actions database

Are there open source and free sources of the above? For the first two, sort of, for the remainder, no. And I'm sure I'm forgetting a number of other inputs.


You can limp into a fair bit of the corpact data via open source/free channels, but reliable sources are definitely expensive.

Not to mention the real-time data which is, quite simply, catastrophically expensive. And that’s assuming the least sophisticated (retail) implementation of this stuff.


I can't see the financial industry getting behind FOSS, but there's very much a missed opportunity for Data API companies. Let trading firms pay standard rates to get access to high-resolution, realtime data; the secret sauce in each firm then becomes what kind of trading algorithm you write to make the most profit off that data. Ensuring that everyone has potential access to the same underlying data helps dissuade claims that profits are made from insider trading. There should be all kinds of data for all kinds of domains available when you fork over a little money for an API key.


Any data requirement in the above list that is public knowledge can be solved (in principle), but it takes coordination between parties that are not used to collaborative/coopetitive behavior.

There is also the bit of data cleaning work that is costly - somebody must be paid to design and operate it, but again with modern tech solutions its likely that this could become immaterial.

Yet there is broader challenge beyond concrete applications: the financial industry is 100% an information processing industry but is largely inconsequential and absent in the development of modern digital technology.


Yes the US treasury has an open API to get the yield curve from and yahoo finance has a free stock price API


That’s not the correct curve to use for pricing in general. You’d infer discount factors from overnight indexed swaps (OIS) instead as the overnight rate (ESTR, SOFR, SONIA, etc.) is what is used for collateralisation typically. To create such a discount curve you need OIS swap rates.


several IBs and hedgefunds are the largest contributors and maintainers on some of the most popular libraries in Python.

Jane Street keeps OCaml alive.


[flagged]


Isn't the interview the same anyway, just that candidates from top schools get interviews and the rest don't?

I woulld love to work at a hedge fund or a trading company but since I don't have experiene in a similar company or prestigious education, I feel like I have no chance at all of even getting an interview. I would love to know how realistic it is to get a job at one of these companies with a non-conventional background or if I should give up trying.


My advice. Attend tech meetups and networking events that quants frequent. Socialize with them, and try to get them to refer your resume for open positions. Side projects and contributions to open source projects will help get their attention too.


It is very unrealistic to land a gig at places like Goldman Sach if you don't have the socio-culturally compliant identification card that is bullshit education. There is really no point in trying.


This isn't too far from the truth. In a twisted way, firms outsource the screening process to the admissions process of elite schools. Also, the reputation of elite schools adds credibility to your team, which is very important to get clients to trust you enough and pay you lots of money to manage their finances.


else: print("lol no") pass


[flagged]


J P Morgan and Bank of America have Slang/SecBD clones built by devs hired from Goldman, they both use Python instead of Slang. They’re heavily used for risk management applications and quantitative analysis, among other things.


Nope, all critical internal models for pricing and risk management are written on top of SecDB in a language called slang. This package is something so that hedge funds can access GS data APIs


Cool to see some "bank Python" that Cal Paterson had described previously is now open source https://calpaterson.com/bank-python.html


Nah this looks pretty orthogonal to that. This just looks like a collection of pure python libraries for doing common quant work. The thing Cal Peterson is describing (which is pretty transparently JP Morgan's Athena) would be SecDB at Goldman and would be running on their proprietary scripting language called "Slang". None of that is open source.

Goldman was the first place to do a system like that, and when it was copied at other investment banks like JP Morgan and Bank of America, they opted to use Python instead of an in-house language and so "bank python" was born. Actually, the banks all poached engineers from one another, so many of the people that built the system at one place ended up building it again at another, hence why there are so many similarities between the equivalent systems at all these US investment banks. Some of those people eventually went on to build it again as a SaaS offering: https://www.beacon.io/


Beacon is more PaaS than SaaS from what I've seen, but it's all very neatly integrated and they even wrote their own compute scheduling engine. The data model is interesting: https://www.beacon.io/wp-content/uploads/2021/05/5.-WhitePap...


> Beacon is more PaaS than SaaS from what I've seen

You are correct. (Full disclosure: I work at Beacon.)

Financial institutions are extremely sensitive about where their data is held, processed, stored and/or sent to. Some of it is just basic corporate governance ("we do not like the additional risk"). Some you could lump in with secrecy and competitive edge ("this is our secret sauce, no way are we going to let anyone else get it"). Some is driven by regulations ("we hold/process highly sensitive financial and personal data on individuals, sending it to a third party is a huge no-no"). And some is just garden variety contract obligations.

[Note that I intentionally chose to omit any consideration for "plain" security. In this industry that can get political.]

Where data governance/sovereignity is concerned, the term "SaaS" is commonly understood as: "send data to a third party, get results back". You can imagine how well that plays with any data an institution considers precious.


In readme, they should write tutorial on how to make money with this. I mean is there any other reson for using this software?


>In readme, they should write tutorial on how to make money with this.

You don't make money with this.


To make money, write software for people who will make money.


Trading is a zero sum game, no one is ever going to give up their secret sauce




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: