Hacker News new | past | comments | ask | show | jobs | submit login
The anatomy of an ML-powered stock picking engine (principiamundi.com)
298 points by muggermuch on Sept 27, 2022 | hide | past | favorite | 103 comments



This was a very enjoyable read. I built a nearly (architecturally) identical system a few years back that also had to be scrapped for different reasons. This brought back a lot of memories. The sanity checks, the index reconstitution issues, dealing with the insanity of security identification and tracking through time.

The fun cases are the ones where it's not even clear what the right answer truly is, e.g. company A spins out company B, and then 5 years later they re-merge. Who's time series and associated data is "the" canonical one? The data vendors often try to give their answers to this question, but maybe their answers don't make sense for your analysis.

Then there's the fact that a lot of vendors don't really do point in time correctly. They like to go back and helpfully revise data points for you that they or the company initially misreported. This is all well and good except that if you were trading for real, you wouldn't have known the correct information at the time, and so any backtest based on the updated information will be invalid. Vendors are a bit better now about providing true point in time data sets, or at the very least accurately describing when they are/aren't doing this. But we had a few cases where they said they were, but they definitely weren't.


Thank you, and ha! An emphatic yes to all the points you raised! It's especially daunting when there are multiple vendors with incompatible point-in-time hygiene setups, a situation I faced at the beginning of setting up Didact.

Also, this was really my first time with equities - my professional trading career was derivatives-focused - both listed (CME) and OTC (FX forwards/swaps). I think I lost the first few months simply trying to reorient my style of thinking.

Water under the bridge I guess.


Yea, mergers and acquisitions is a hard table to incorporate.


Someone asked about how difficult it is to get outside investment....

It's usually very difficult and it takes a lot of money to run a proper fund.

Let's say you raise $50M. You can maybe charge 1 and 20,meaning you get 1% of assets each year for running the fund and 20% of profits.

1% of $50M( and keep in mind this is a large raise for someone without a track record on the sell side or inside another fund) give you $500,000 a year to pay:

- salaries( lets say you pay yourself $100,000 all in plus the same for a single analyst

- a Bloomberg terminal $30,000 including data feeds

- market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).

- rent $50,000/year for office space

- outside lawyer fees and outside accounting fees $100,000/year

- similar fees for someone to run your back office, roughly $100,000/year.

And on the other side of expenses you have the money making side of things. Which as the OP pointed isn't great. If you return 10% on the 50M you get to keep 20% of that so a 10% return gives $5M in profits and you keep $1M.

That allows you to bonus out yourself and analysts on good years. If you lose money one year then you get no bonus and have to bonus out the employees out of the retained earnings you kept from previous bonuses.

it usually gets worse as most funds have what's called a high water mark. This means you don't collect the performance fee until your fund gets back to the high water mark. So if you are down 10% one year you need to make that back before you start to make any performance fee, which is why most funds shut down if they go down more than 20%.

As to raising money.....Anyone can show a model that makes money. that doesn't mean its easy to create a model, its just that there are alot of people capable of building such a model.

Its the risk management that people with money are really looking for and sadly that's just really hard to show out of a model as part of the risk management is things like positions sizing and showing your model doesn't pile into one asset class or trade correlated products.

it bodes well for the OP that they talk about market regimes as, IMHO, this is one of the biggest risk management tools that aspiring traders ignore.

And this risk management is why people ask for a track record of more than a year.


Working in the industry, I can confirm that the above numbers are approximately correct except for the employee costs -- those are roughly double and up. You also need to hire a fund administrator, auditors and compliance firms (maybe $50k to $100k per year each) which add on even more costs. And you can't skip the lawyers, outside administrator, outside compliance, etc. as they are required by regulations/law.


This is actually way too optimistic.

Your first 1-2 seed investors will:

- Only pay 1 and 10 (1% fixed fee and 10% of PNL)

- They will also get ownership of the actual fund management firm and will get that in the form of 20% of REVENUE (not equity, revenue, think about that)

This is one reason new fund formation is way down. The economics are bad for years. Know a bunch of HF people that started vc-backed tech firms instead.

The other reason is 10+ year run where stocks, bonds, private firms and real estate just went up. No need for diversifying return streams.


BTW, data costs also too low.

Just a BB terminal around 30k and a lot of extra data from BB costs extra (can be 200-300k per additional product).

For quant strategy probably looking at 500k up to 2M for data initially. And you will likely be at a disadvantage to existing firms that have been collecting data for years.

And that is at the low end. Spent many millions per year for 1 strategy at last large firm. And that was small fraction of total firm spend.


I guess if you have a reasonably profitable strategy it probably just makes more sense to run it with your own money? I guess the only reason why you would want to trade as a hedge fund is if you want to scale up, but most strategies aren't really that scale-able anyways from what I understand, since when you start trading in any real significant size you start moving the markets.


Ha, you need a better network:)

I don't know too many people who have started funds in the past 5 years, but of the 3 who did, none gave up any ownership in the management firm.

That's a suckers game and the only people who would need to give up any ownership are people who are very green:)


facts, the people i know who started funds in the past few years gave no ownership. they alsp made the numbers work down starting at about $75mm iirc, tho it's a lot easier once past $500mm.


> a Bloomberg terminal $30,000 including data feeds

> market data feeds you need $25,000/year for basic market data and fundamental data that you are allowed to warehouse(you can't store data you get from the Bloomberg terminal).

Total nitpick: you can get those using soft dollars.

But your numbers are spot on. In my job we estimate running a hedge fund with AUM < 250MM is just not worth it.


Sure, if you generate enough commissions to pay for them, which is not a given, assuming the size of fund we are talking about.


Thank you for this comprehensive response!

I have often found myself struggling to explain the difference between building a strategy or trading system (which reduces to a technical/intellectual challenge) and running a hedge fund (essentially running a complex information-driven business).

Your cost breakdown really puts matters into perspective.

> it bodes well for the OP that they talk about market regimes

I concur. Market regimes (modeling, detecting, reasoning about them) are too delicious of an intellectual puzzle to resist.


Hardest part is raising those $50M - even more so if you have zero professional experience in high-finance. Getting a foot inside investment banking, hedge funds, private equity, etc. is extremely competitive to say the least.

I think the best shot for any outsider programmer would be to seek (and team up with) those finance professionals that are already thinking about exiting to start their own funds, and in the need of some technical partner...but even then, you're also competing against experienced devs. already in the field.

In the end, it is just really, really difficult for outsiders to just enter this sector, if they have any hopes of working with any substantial amount of capital.

I guess the better option would be to make some product you can sell as SaaS to the masses, or figure out how to manage thousands and thousands of low-$ investors.


What's the market beta? What's the average turnover/holding period? How are transaction costs modelled? What features explain most of the variance? How are they related to known factors? What's the beta hedged performance?

These are all things I'd want to know before deploying something like this. (Perhaps some mentioned in the post, might have missed them.)

To first order, I'd forget about fat tails and similar popular concerns. They matter, but not as much as structurally understanding what this model is up to. Perhaps one feature is explicitly selling tails? That might answer it already.


I know a bit about this industry and I have worked on some profitable systems. Honestly not a bad effort for someone working on their own with low-cost data. Don’t let the haters get you down. I would recommend you to pick up a more recent textbook on portfolio construction like Isichenko’s recent book.


Thank you for the note! Just picked up Isichenko from the online bookstore we all love-hate.

I'd love to get in touch (as per your HN profile) - my email is am(at)principiamundi.com


My heart goes out to this author, but you can tell even by his first table that he doesn't quite understand the mathematics of financial markets, the purpose of a hedge fund, how they grow etc.

1) It's plain by quickly looking at the allocation of capital in investment firms, that AUM is not made by performance; it's marketing. At best people invest when they believe a person is connected to inside information. Saying you have an ML advisor is really just a pre-req to these people.

2) Is that allocation stupid? No, it's not, because actually the powers of mathematics and by extension ML are intrinsically limited for investment returns because they are fat-tailed </Taleb>. For example this author quotes a realistic sharpe (0.8), but didn't calculate the standard deviation in his sharpe, which I would bet a large sum was _at least_ 0.8. Ie: he doesn't really know what his sharpe is. This is because equity assets behave like a student-t distributions with a degree-of-freedom parameter ~2 or less </Mandlebrot, /Bergomi, /Gatheral etc.>. Ie: higher moments such as uncertainty in sharpe, literally do not exist or converge and are unknowable. The only exception is if your strategy explicitly cuts off tails.

Once you understand 2) you begin to understand that there's no such thing as a real quant fund (ie a fund which truly makes money predictably using models) which doesn't trade a liquidity limited book that has quite advanced hedging. Wealthy people are aware of this, which is why the author can't market this product.

If you're doing something silly like holding equities without tail risk control, you literally cannot be quantitatively investing. You are just slowly rediscovering what Kelly, Bergomi, Mandlebrot, Bernay's etc. realized with a little deep thought over pen and paper (while clumsily writing boilerplate software.) That markets are entropy machines rougher than a normal distribution, and any gains come directly from information. (see: Kelly: "a novel interpretation of the information rate".)

For a high latency (ms) market data feed, the returns on information are very very small. Markets are efficient.


> That markets are entropy machines rougher than a normal distribution, and any gains come directly from information.

Isn’t this partially what this model is accomplishing with sentiment analysis?

Also has there been a lot of investment into sentiment analysis for algo trading? I’m sure there have but references including books would be interesting.


No. Because to train the sentiment model you need estimable distributions.


I found your comments about rediscovering Kelly et al interesting. Could you recommend some textbooks that describes what you are referring to? If there are good overviews of the subject?


If you can read and understand bergomi's book you basically understand financial math


Nice writeup, thank you for sharing so openly!

The three things I always want to know from stock picking ML people:

1. Did you put your own money in it ?

2. How'd it go?

3. How well does your engine do vs a fixed stock allocation based on trend-statistics computed on the whole time window (i.e., compared to a fixed optimal portfolio computed with mean/std values you don't have access to, but which isn't allowed to change its choice. what's the regret if you are familiar with online learning)


Thank you for appreciating the article; I tried to disclose all that I could!

1. Yes, I did put my own money in it (low 6 figures).

2. It went as described in the article - for the capital I allocated to Didact, I beat the market (SPY) by ~20% since inception.

3. If I understand your question correctly, this would be the equivalent of the payoff on an optimal lookback option (https://en.wikipedia.org/wiki/Lookback_option). I haven't actually done that analysis, but it sounds like a nice idea.


>2. It went as described in the article - for the capital I allocated to Didact, I beat the market (SPY) by ~20% since inception.

This seems extremely hard to believe. You should be running a multi-billion $ Quant fund if this is the case. The idea that you would try to push this as a newsletter rather than just taking investor money and becoming a billionaire literally makes the story seem farcical.


It is very easy to believe.

I could have flipped a coin, gone long or short at beginning of this year.

I would have had a 50% chance of outperforming the market by 40% this year (given it is down roughly 20%).


Right. The difficult part is doing it consistently.


>You should be running a multi-billion $ Quant fund if this is the case.

You seem to underestimate the level of effort and rigor required to achieve this level of capital allocation. In contrast, beating the market by 20% is table stakes. Folks in the industry do it all the time; the difference here simply is that I built an ML-powered engine to do it systematically.


Starting a hedge fund is a lot harder than beating the market by 20%.


I've spent the last few years helping to launch a quant fund, so I have a sense of what institutional investors look for. I'm impressed with the thought and hard work that went into Didact, but this guy never had a shot of attracting interest from the types of institutional investors who fund large quant funds.

The strategy has a 18% correlation to SPY, so "beating the market" is the wrong benchmark. The proper reference point is probably 0, when correlation is that low it shouldn't matter much whether the market's up or down.

The strategy had 14% return and .82 Sharpe ratio, so 17% vol. That's bad. With large asset levels and a long track record a Sharpe of 1 might be OK, for 1 year with minimal assets a Sharpe less than 2 isn't necessarily that impressive.

Another huge issue: this strategy was run with less than $1mm. It would certainly perform worse at higher asset levels as market impact becomes meaningful, the only question is how much worse.

Finally, results matter, but fund raising is primarily a sales process. Investors aren't just looking for the highest numbers. They're going to evaluate the people and processes involved, the risk management philosophy, really every aspect of the business. OP has some professional finance experience but it doesn't sound like he has the connections or reputation that would help with fund raising. If his sales pitch was anything like this article I don't think most institutional investors would be impressed (EG, minimal references to risk management, frequent comparisons to SPY performance when that's not an appropriate benchmark.)


Definitely possible doing so many things. Following trend and just being in DXY or short SPY. It's a super short time-frame. Anything can happen. Trust test is 10 year + horizons.


Every gambler thinks they have a system, but often fails to recognize a game is unfair long before they arrived. lol =)


You can think outside the box to beat the unfair game but then you end up in jail.


Some simply build a portfolio by copying those who can't be charged for violating market rules. Not sure why some folks find this strategy so controversial. =)

Congress member holdings report:

http://clerk.house.gov/public_disc/financial-search.aspx

Senate member holdings report:

https://efdsearch.senate.gov/search/


That's a very interesting idea. One obvious downside is a congressman might change their position faster than you can because they have advanced knowledge, so it would be riskier to hold the same position. Especially if that linked page updates even a little slowly. Secondly, they might have that position knowing exactly that - that they will have advanced knowledge and there it's worth having it at all. They know they have advanced knowledge so they are capable of riskier things. They also know that people can see their positions so they could game everyone by luring them into positions which they could exit quickly so as to pump and dump.


This actually makes me think that an appropriate way to deal with congresspeople trading stock may be to require trade disclosure, say, a week in advance.


Do they actually do that though? Pump and dump their followers?


I'd wager it is heavily correlated with political lobbyist campaign donations. Theoretically they couldn't really trade quickly due to the speed of bureaucracy, but could change future legal policy that affects an industry commodity. =)


Is there an ETF yet?


Unsure if this was a joke Q, but the answer may just be yes:

"Two proposed exchange-traded funds would mimic stock trades made by members of Congress and their spouses. If approved, the ETFs would track trades by Democrats and Republicans, under tickers NANC and CRUZ. "

https://markets.businessinsider.com/news/etf/stocks-etfs-nan...


If you have a tool that can generate great returns, then why fall back to a newsletter?


Great question.

If I beat the market by 20% (say SPY generated 0% for the year, very optimistic at this point), and I have allocated $100k to this, I make $20k before taxes.

That's less than minimum wage.

Meanwhile, allocators expect a track record of at least 3-5 years.

Ideally, if I have an asset, I'd like to extract as much revenue as I can.

Hope this makes sense.


If you're sitting on a gold mine, you can wait 5 years. This does not make sense.


OP could parlay this experience into a high-paying finance job. Algorithmic edge tends to be short lived.


You also don't know if your alpha is going to last 5 years. The gold mine can run out of gold.


Indeed. I haven't shut down development, just shut down the newsletter. I'm continuing to work on it.


I'm a self proclaimed world class DevOps engineer. Can I help contribute in order to get access to the model?


:) hmu on LinkedIn!


How difficult is it to get investors when you can show your model beats the market consistently?

Of course, they have to check your not trading a strategy with extreme tail risk, but here it sounds like that's not the case?


Very, because historically a levered long position in the market beats the market consistently over long periods. Beating the market turns out not to be a very interesting metric to sophisticated investors.


It's difficult. We made a lot of pitches. Investors/allocators require a fairly long track record and are extremely reluctant to fund (what they perceive to be) black box strategies.


Learn about hedging. Basically, for $100k, if your prediction could consistently beat some index, you don't just buy a stock, but you sell some other(short) stock/index at the same time. So you own 0 worth of stock but you get the difference in the increase as your profit. Obviously in real world, you would need some sort of deposit, but you could bet millions for $100k.


You're talking about both hedging and leverage and this is a very important difference.

Turning a long-only equity strategy into a long/short strategy or an "outperformance" strategy[1] with added leverage can seriously affect the volatility of returns and the risk of ruin so it's really important to understand well before embarking on this, because it will affect position sizing and a bunch of other things. You can indeed bet millions for $100k, but if your strategy has 10% volatility unlevered you can get completely wiped out in doing so whereas the risk of ruin of the unleveraged strategy is far lower.

[1] You could say long/short is where you long some things and short some other things generally whereas outperformance is where you long some things and specifically short an index. So in the latter case you are betting on the outperformance of your picks in particular and in the former you are just saying you have the ability to pick both things that go up and things that go down.


> I have always kept in mind is that feature engineering is almost always the key difference between success and failure

I also developed an ML-powered service heavily relying on feature engineering

https://github.com/asavinov/intelligent-trading-bot Intelligent Trading Bot

Its difference from Didact is that this intelligent trading bot is focused on trade signal generation with higher frequency of evaluation. It is more suitable for cryptocurrencies but also works for traditional stocks with daily frequencies so it could be adapted for stock picking. What I find interesting in your work is the general design of such kind of ML systems relying on feature engineering.


Hi, fellow HN'ers! Author here, please let me know if you have any questions or thoughts!


I'm not at all interested in finance / stock picking but found this to be one of the best walkthroughs of an ML system end-to-end that I've ever read. I'm not in the field of ML but I'm interested in learning more and this was fantastic, thank you.


Thank you so much for your kind words! Your comment made my day! :)


This is great! Thanks for writing this!

I have wanted to do something like this for a while, purely for learning. The thing which puts me off is that there is a huge amount of knowledge needed in understanding the features vs the ML.

Could you recommend a base system / reference one could use to get started which explains or bakes in some of the feature / signals engineering work?

Also would this approach work with crypto?


> Also would this approach work with crypto?

Some of it works on crypto. TBH I've stayed away from the asset class, but only because I find it difficult to build mental models and think about features (in my mind, it's a mix of commodity factors and currency factors, but I'd have to test it out).

I seem to remember coming across papers that have tested momentum factors at larger time-frames (e.g. weeklies).

> Could you recommend a base system / reference one could use to get started which explains or bakes in some of the feature / signals engineering work?

The references I put in at the end of the post will really help with this! I might actually write out a separate blog post about starting out in this space from an ML perspective. Thanks for the idea!


EDGAR filings (structured text) is an area unto itself, I see you've limited yourself to quarterlies.

Across any market area (eg: mineral resources) there are thousands of documents released daily across multiple exchanges (via EDGAR, SEDAR, etc) ranging from two line advisories, to 4,000 page technical reports on projects | acquisitions, alongside the usual quarterly | yearly annual reports, etc.

There's plenty to do parsing common forms for generic changes (board members, board member share changes, etc) and market regime specifics (exploration property aquisition) and trends (series of related aquisitions) for those that like the weeds.

Some might argue that 'understanding' these patterns lead the changes in stock price movements, and give insight wrt weathering short term changes for longer term returns.


This is a very insightful remark, thank you.

I focused on 10-Qs for the EDGAR filings module as you rightly pointed out - it seemed to be a good balance between implicit information and usefulness of the data. TBH I didn't actually investigate the other (many) patterns.

Having said that, I have really enjoyed Kai Wu's research from Sparkline Capital (https://www.sparklinecapital.com/), especially his extraction of the innovation factor from EDGAR filing texts. He's appeared in numerous podcasts, and they have all been super useful to listen to. Maybe someday when I re-investigate EDGAR filings and go further, I might target these signals you talk about here.


You're welcome.

14+ years back a small group of West Australians put together what became

https://www.spglobal.com/marketintelligence/en/campaigns/met...

which was based upon integrating (GIS and regular DB) every daily mineral lease record across the accessable globe together with every publicly filed document across the relevant stock markets (AU, TSX, South Africa, London, etc) using (and updating|refing) templated patterns that appear in various classes of forms .. I would assume that territory has been revisited with better ML techniques.


Thanks for sharing! Really curious about ML stock market models as they seem extremely difficult to outperform the market consistently over time.

A few questions:

1) Were these stock picks for major stocks / ETFs? Or small market cap stocks?

2) How many people were subscribed to your newsletter?

3) What do you estimate the impact was of creating a “self-fulfilling prophecy” of entering a position and then recommending your subscribers take the same position?

4) Do you think your asset mix outperformed the market by picking high risk / high reward stocks in a bull market? Or picking safe stocks in a bear market? In other words, how do you know the engine wasn’t biased towards the market trend that happened to play out? For example, if I have a basket of tech stocks that would typically outperform in a bull market and flip a coin to buy or short them and guess right, I could outperform the market by chance. How did you account for this?

5) Have you backtested the engine to see what it would have returned in previous years? (Obviously on unseen data, rather than data it used for training)


This was not only a very informative read but felt like an amazing achievement if everything described here was developed by one person (the author - @muggermuch).

The breadth of knowledge demonstrated by author from technology (bringing performance down to 14 minutes) to ML to deep understanding of financial markets is super-impressive.

Granted the author has an educational background in computer science and has been a trader which probably explains many of his abilities but to my small brain it feels next-level achievement.

Maybe I live in average circle of finance but I have never met nor heard of a person who could single-handedly conjuncture and implement such a system. To my knowledge, a typical hedge fund has several highly-paid people in different teams to build and maintain such a system.

I never thought one-person could do it. I genuinely wonder how he managed to wrap his head across this much knowledge. He seem to fall in 10x category. Kudos!


Nice report. How did you did risk management? Have you been leveraged? Have you paid for data? Kudos for a view from the trenches.


Thank you!

>How did you did risk management? I put in a basic position management layer (1% fixed stop). Also, the market regime module would modulate participation, i.e. in really risky environments it would dial down the number of stock picks. I can definitely do much more on this front, but I wanted to nail down the stock picking first! :)

>Have you been leveraged? No leverage.

>Have you paid for data? Yes, my monthly running costs for data are ~$1.2k.


Have you looked into Kelly criterion?


Yes! I use fractional Kelly extensively in my (separate) higher-frequency strategies (on MES/ES/NQ/VX futures).

I'm thinking of writing some follow-up posts on how to reason about ML-driven strategies in an intraday setting. Thanks to low-cost brokerages, there's a lot of alpha that can be captured by small league speculators such as myself.


This is true. I ran a 3+ Sharpe and over 40% annualized strategy on PredictIt back when they had the tweet markets a few years ago. It was literally just fitting a Poisson model to Trump, Pence, White House, POTUS, and VP twitter accounts and then Kelly betting based on the difference between the market price and the Poisson model.

I strongly believe the only reason I got such a solid performance is not because I’m some kind of trading savant, but simply that the $850 per contract limit prevents smart money/institutional traders from moving the market towards efficiency.

Similar opportunities exist in the equity world — one huge advantage is that little guys don’t really incur market impact.

For a lot of systematic RV plays, you might only have 10bps alpha and then a 7bp trading cost each way.


what data sources did you use? im interested in working on something similar.


If you have a guaranteed compounding money machine that out perform the market by 20% just let it run, sooner or later you will be able to buy out those who did not invest in you. If its just a useful recommendation engine than there are indeed a lot of questions relating to personal finance or investment strategies that have nothing to do with machine learning that needs to be addressed for PMF. You don't need better models you need to understand the needs of your customers.


Have you considered submitting your predictions to the Numerai Signals? It's market neutral so as long as your models can generate some alpha you can still get good returns.


That's a good idea. I'll try it out, thanks!


I'm curious what happens if you look at your returns and other metrics at different time scales, i.e. monthly and weekly, in addition to yearly. You can't make any argument based on a sample size of 2.

As someone who used to work in the industry, I am 99.99% confident that you cannot have any alpha with a system like this, you are basically flipping coins, as some other commenters have pointed out.


If your predictions are good, I'd be happy to get you $100 million in assets to manage. It's very unlikely that your predictions are good...


I've been thinking about trying to build something from scratch with a similar spirit, but very different methods, but I also doubt that my predictions will be any good without far more time investment than I have available.

As you say you have expertise in the area, any chance I could ask you for advice or how to decide if its worth trying?


Sure ping me on twitter


It's very unlikely that you're able to get OP $100 million in assets...


"Inadequate Equilibria".

If you picked a HN comment at random, then the person who made that comment are overwhelmingly unlikely to raise $100MM at the drop of a hat. Picking a HN user at random won't do it either.

But there's not a lot of finance-related submissions on HN. The people in the comments may be unusual. And off the top of my head there certainly is an existence proof for at least a few investors on the news site for Y Combinator, a VC fund, such as pg or sama. There are obviously other VC people on here as well.


I founded and ran a quant trading firm for about a decade :)


no one said it would be fast but plenty of people on here know how to talk to allocators and have helped raise before.


"steadily beating the S&P 500 for over a year on a weekly basis"

Can be achieved by chance alone.

If not chance, I can give you a strategy that would be highly likely to achieve such a result: it would take a lot of risk though!

I love it when people post this stuff to HN. Naive people try it, loose a bundle to market makers, then go back to their day job.


> "steadily beating the S&P 500 for over a year on a weekly basis"

If you're going to make a claim like that, you should actually follow up with the calculations. When you do that, you'll realize that the issue is quite a bit more complex than this shallow dismissal.

He has very low correlation to the index, which means he's not just levering beta and getting lucky on a trending market. His standard deviation is smaller than the index, which means he didn't just make one large and lucky bet. The evidence that he has real alpha is certainly not incontrovertible, but the numbers look quite good.

It also doesn't appear that he cherry picked his reporting/aggregation cadence, because he sent out a weekly newsletter ex ante, and all his stats are reported weekly. He could still just be lucky, but his numbers are much better than this sort of dismissal would imply.

One real risk is that, in some implicit way, he's pursing a negatively-skewed strategy. That is, one that has a latent large downside risk. Strategies like this can produce very good looking numbers for longish periods, but still have ultimately negative alpha. Judging whether or not that is the case here is hard without more detail, but nothing he says in the writeup indicates to me that that is the case here.


If he can do what he claims (which as I said above is less impressive than it sounds) he can take it to a Chicago prop shop. They'll give him a budget and a share of the PnL. Very straightforward, it happens all the time.

However, his write up is completely devoid of talk of risk (beta is not risk), bankroll, Kelly sizing, etc. This is integral to understanding the trade.

For example, he could have a successful strategy that works in small lots. However, absent from nearly every ML model is the impact on sizing up. As soon as you post a sizable bid, the market will lean against you, and the edge evaporates. Same if you cross bid-ask, plus you're now giving up edge. ML cannot take this into account, at least not very easily and with the usual models.

Most programmers with models like this fall into this last category.


usually the answer for a negative strategy is set aside a long period of data for backtesting, right?


Thanks for the article! What tool do you use to create the figures? I like the sketch style.

For anyone else, what tools do you recommend for generating pretty for system architectures, workflows, etc.?


OP, what are you using to draw the diagrams? They look nice and are very readable.


Thank you!

I used Excalidraw (https://excalidraw.com), and I highly recommend it! It gives me 'xkcd' vibes.


"Predicting" markets isn't the challenge. Implementing real world strategies with associated frictions is. Show me a cash p&l or it's just a student project.


Do these frequent trading strategies ever account for taxation? Where I live, the returns compared to a buy and hold passive strategy would be cut by 50%.


Brilliantly written. As someone considering a move into the Quant field it is very informative.


Thank you!


this is very cool! where did you get your data from and how's the transition to airflow?


There are commercial feeds available via Nasdaq DataLink (FKA Quandl). I also bought bulk historical data to feed through my backtester (I haven't talked about this in the post; it was getting to be a bit too long).


Let's get a write up of your backtesting framework too please! Terrific post @muggermuch - thank you!


:) Will do! Thanks for the encouragement!


did you use vectorbt or event driven backtesting?


Great post! Very informative! Thoroughly enjoyed it.


Lmao this engine is down 6.9% for the year, when literally it's as simple as just buying some puts.


> this engine is down 6.9% for the year

That’s pretty damn good, and still beating the market by nearly 20%. Of course you can always make more with riskier strategies.


You realise that puts have a cost that is determined by the market?


Excellent article


Thanks!




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: