Hacker News new | past | comments | ask | show | jobs | submit login
I've reproduced 130 research papers about “predicting the stock market” (reddit.com)
701 points by starpilot 34 days ago | hide | past | web | favorite | 204 comments



Spoilers:

> Literally every single paper was either p-hacked, overfit, or a subsample of favourable data was selected (I guess ultimately they're all the same thing but still) OR a few may have had a smidge of Alpha but as soon as you add transaction costs it all disappears.

> I should caveat that I was a profitable trader at multiple Tier-1 US banks so I can say with confidence that I made a decent attempt of building whatever the author was trying to get at.


But also:

> Almost every instrument is mean-reverting on short timelines and trending on longer timelines.

i.e. he confirms the momentum factor, which isn't surprising since there's more solid evidence for it than anything else, going back hundreds of years.

He doesn't say what fundamental factors he looked at, so it's possible that value, size, and profitability/quality would hold up as well. All those have been studied pretty extensively in academia, in papers going back decades. The author took only a fairly random sampling of recent papers.


All the finance folks I know use "fundamentals" to mean some class of attributes, that they all know what they are. Like when I say "GPL3" to someone who has a-priori knowledge.

Not exactly sure but I think the "fundamentals" are top-line revenue, unit cost, unit margin, yoy-growth, EBTDA, cash-on-hand, default-alive - and likely the ratios those values produce.


These sort of papers don't look at all the fundamentals at once. They'll isolate one factor, or a small related group, and see if it's predictive.

Two simple factors that are well supported: on a risk-adjusted basis, stocks of small companies do better than big ones, and value stocks do better than growth stocks. ("Value" means the stock is cheap relative to some simple fundamental measure like the company's book value.)


You’re pretty correct.

A simpler way to think of it is any data that comes off the Income Statement, Balance Sheet, or Cash Flow Statement - or can be directly calculated using such data.


Fundamentals mean zilch for trading unless your'e talking earnings announcements or like news.


Please don't downvote this comment. There's a huge difference between short term trading and investing. And yes for short term trading (minutes/hours/few days) fundamentals matter rather litte - after all you can be a successful trader of Bitcoin, which has pretty much no fundamentals. Except on volatility events like macro data releases, earnings, FED decisions, surprise news (looking at you GE) etc.


> fundamentals matter rather litte

Having a small impact is not the same a having zero impact.

Traders operate over longer timeframe even if they are holding an individual stock for minutes there is a limited pool of stocks. Keep playing the game and longer term impacts add up.


But do the fundamentals of an individual investment matter in the short term compared to the fundamentals of the market as a whole? I would suspect that the “longer term impacts add[ing] up” would just be market health. Day-trading randomly-picked stocks with random buys and sells is a Markov approximation of an index fund :)


I think that depends on how stocks are chosen when you are day trading. Limiting things to high volatility stocks for example creates a bias in your index fund approximation.


I know as little about this as you do; my own gathering is that “fundamentals” are any signal you can get from a public company’s mandated quarterly reporting. So, every piece of data on the published balance sheet, plus maybe any reported board actions.


Look up technical vs fundamental analysis. It's trading 101.


Discredits a whole field -- doesn't post any papers that he attempted to replicate.


I had the same result from most papers. My personal conclusion is that when people get a winning strategy they don’t publish. I personally put my money where my mouth was for a few years:

https://austingwalters.com/backtesting-our-100-yoy-profit-ge...

That being said, I try to be honest too. This can disappear any time and the model I use may only be good in this environment. I do not know. I think that’s the challenge with papers, you don’t honestly know when or if the strategy works. It clearly won’t forever regardless.

That’s why I don’t share my exact method. And after doing all the research myself AND trying to sell my algorithm. I honestly don’t think the industry knows what it’s doing either. People are worried about sharpe ratios and all this BS stuff. The reality is for these models you mitigate risk via temporary and ever changing methods. Can’t really publish on that.


> I had the same result from most papers. My personal conclusion is that when people get a winning strategy they don’t publish.

This is kind of obvious to me. It is also the reason the OP posted their results. I'm sure if they found one strategy that worked, after putting that much time into their research it would be really stupid to announce to the whole world that it works.

On the other hand, in the trading world where everyone is a competitor, you might want to deliberately introduce some confusion - but it looks like plenty of actors are doing this anyway.


I hope that your personal implementation of your strategy takes into account 2008 ;). Because, hooo, boy, your 20% YoY returns sound great, but may not be taking enough risk of a similar collapse into account.


It does take those risks into account and I tested it on 2006-2008 previously. However, there is little point in testing a collapse with the same model. If the stock market collapses anyone investing would be out of luck. Your standard models won’t be able to track that, because it’s usually something like a “black swan” event[1].

Instead you’d want some sort of meta model. In either case. I’m getting 100% YoY returns when I augment my model, in real life. I think even a 50% loss one year wouldn’t be the end of the world.

[1] https://en.m.wikipedia.org/wiki/Black_swan_theory


ill take the warren buffett position


If your algorithm was successful, why did you try to sell it?


Just a guess, but: if you come up with a successfull algorithm, you still need to have money that can be invested in order to use the algorithm. So maybe someone else with 100x more money to invest would pay more for the algorithm than you could earn from it in a lifetime.


If you have a 100k ARR business and someone offers 5-6x* rev today, why not check out early and move onto the next one?

* Don't know what trading strats trade at


Is your "winning" strategy so different from other social data / sentiment analysis approaches? There is some novelty with how you weigh the sentiments (based on how much of an insider or expert they are) but I am sure existing trading strategies weren't just taking a dumb average of a twitter firehose either. Shouldn't it be easy for some large firm to replicate your approach and make the alpha disappear?


> Shouldn't it be easy for some large firm to replicate your approach and make the alpha disappear

First, I don't think alpha ever fully disappears.

Second, after speaking with twenty or so firms very few are using sentiment directly. Those that do, I suspect don't take the additional steps to build complex NLP based systems and weight insiders/experts. Even if you weight experts, the methods for doing so are also complicated (cross check against LinkedIn should be easy enough, but also limits information).

Anyway, I personally haven't seen much difference in the back testing.


It seems that you consider the sharpe ratio to be not worthwhile. Would you be able to elaborate on that point? As someone who is getting into algotrading, I’m currently using the sharpe ratio to quantify risk, but would like to hear another take on it.


Think there's a paper by Andrew Lo about how to adjust it for different distributions.


Winning strategies are not published generally speaking. You wont see many business people blogging in detail about how they went for 0 to 1. Which is shy most blogs about business are a load of BS. Reading your statement and this article confirms my long suspicion that relying on internet strategies is a bad startegy.


There are no winning strategies. Its all BS.

There are people who are addicted to tracking weather fluctuations on Pluto and there are people who are not.


There are definitely winning strategies. The issue is whether they can be reliably identified beforehand (thus rewarding skill), or whether people who implement them are just lucky (regardless of whether they believe they were skilled or not).


If you can't tell untill aftarwards, it is merely a strategy that won, not one that will make you win.


strategies like accusing your loyal customers for insurance fraud sounds like a win...


Ever hear of the Medallion Fund, which has averaged ~40% annual returns since the late 1980’s?

https://www.bloomberg.com/news/articles/2019-03-07/jim-simon...


Well, if you get 30 persons to flip a coin six time, there is good chance one of them will get all tails or all head. Now ask him his strategy for such amazing coin flipping skills! (example taken from the book "statistics done wrong"[1]).

I would be interested in seeing a total distribution of hedge funds return, not just outliers.

[1]:https://www.statisticsdonewrong.com/


I understand what you are saying, but the equivalent to the Medallion fund would be something more akin to winning that same coin flip 30 times in a row. They have been running approximately 30 years, beating market returns by a significant margin, year after year. The 40% average is net of fees, so the overall return of their strategy has actually been higher than that. The odds that they have accomplished these returns through luck alone are astronomically low.


The trick is this: the chances that a specific fund will do well that long via luck are very low, but the chances that there exists a fund among all that exist that has done well via luck are quite high.


I can tell you haven't actually done the calculation. There's only been about 20000 hedge funds in total over history. The odds of random chance producing Medallion's track record with that many draws are actually very low.


This is true, though I don't know how many edge funds operate. I guess some people really have working strategy!

The example I gave was more of a word of caution regarding past performances as indicator of expertise, but I guess I made it sound more generalizable than needed!


Jim Simons is certainly an outlier and the chance that the implemented strategies are not winning strategies is astronomical low.


But if you keep an eye on the lucky ones, they should also go back to being noise, if what you're saying is correct.

Is that what happened?


That would be the reversion to the mean[1]. This is a term I really dislike because it makes it sounds as if there is some sort of equalizing force making over-performers under-perform later. This is more the following: if you overestimates the expectation of a random process, you are going to be disappointed.

In our case, this does not makes the "hot streak" any less probable when you start looking, for a specific edge fund. It is true it would be interesting to select a group of over-perfomer and study their future return to know if past performances are a good predictor of future performance. I feel you would probably get mixed results!

Although as I said in the sibling comment, you are probably right, and no amount of statistics could explain performances seen in this particular edge fund.

[1]https://en.wikipedia.org/wiki/Regression_toward_the_mean


How can I pick out the Medallion Fund of the coming thirty years?


It's probably still the Medallion Fund. Except it doesn't take outside money.

One characteristic of a good money manager is that he knows when to start rejecting money. Large AUMs tend to converge by necessity towards index funds (or something underperforming an index fund).


Professional quant here. I have to say I strongly disagree with the conclusions of the OP.

> They were all found by using phrases like "predict stock market" or "predict forex" or "predict bitcoin" and terms related to those.

Yeah, searching for any finance papers with "predict" or "machine learning" is literally the lowest quality tier you can get. These papers are often written by grad students who can pump an easy paper out by "applying" some already known ML algorithm to financial markets. Of course it's not gonna work. It also kills me when I see ML models who need stationarity assumptions applied to non-stationary time series data. Yeah, good luck with that.

THAT being said, there is lots of high quality research which has been replicated over and over, showing that alpha does exist in the market (and which funds have made billions off of). I would like to see the OP try to replicate some of these instead. To give some simple examples:

1. Try searching for papers with the keywords "and the cross section of expected returns". For example, the momentum factor which can be tested and replicated with only linear regression. > There is substantial evidence that indicates that stocks that perform the best (worst) over a three- to 12-month period tend to continue to perform well (poorly) over the subsequent three to 12 months. https://papers.ssrn.com/sol3/papers.cfm?abstract_id=299107

2. Statistical arbitrage strategies which were known to work well until the mid 2000s. Also been replicated many times, furthermore, you can see the gradual decline in profitability pointing to the theory that "alpha decay" in this case is real. https://www.math.nyu.edu/faculty/avellane/AvellanedaLeeStatA...

3. High frequency strategies. No way OP or any retail trader can replicate this, but firms make billions of dollars per year consistently doing this.

In conclusion, to make a claim that there is no alpha in the market seems highly suspect, and perhaps just needs a more nuanced view of how trading firms make their profits.


I also find it highly unlikely anyone is able to implement 130+ papers in 7 months.

This would require insane productivity, implausible access to pricing and news data resources (which are often not freely available) and expertise in machine learning, natural language processing, finance, and data science. OP had to implement financial, time-series and linguistic feature engineering pipelines, as well infer the architecture and hyper-parameters used AND train all these models.

He also claims he "web scraped" all the data which is highly unlikely as pricing datasets are often sold for a pretty penny and not publicly available in the detail described in several of these papers.

OP must be a genius to pull this off, all the while being a trader at "a Tier 1 US bank" (in itself that description is ridiculous).

All OP has to show for all this work is a hastily written Reddit post with dubious claims. There is no proof of the work done whatsoever, no code samples, not even result tables or graphs. And at the end OP chills his cryptotrading bot.

What's worse HN seems to gobble it up naively. Seemingly because OP is critical of something that is popular to criticize.


> OP must be a genius to pull this off, all the while being a trader at "a Tier 1 US bank" (in itself that description is ridiculous).

Mostly agree with you, but what's ridiculous about that description? His LinkedIn says he worked at Merrills and Citi. Those are normally considered top tier US banks?


If those aren't top tier I don't know what is...

Generally the banks are no longer the place to do prop trading though. You're better off at a hedge fund. They will not only pay better but have way better access to resources, tech, experienced traders, etc


A lot of papers use data from a few sources which are typically available to universities. Also a surprising (maybe not?) scrape data themselves. That being said as someone who was involved in paper replication and investigation there is not enough time to implement substantive papers in that time frame.


Exactly, I work in the field of text mining financial news and have replicated parts of studies and 7 months for 130 papers is impossible.


Perhaps he lead an R&D team at his bank, and most of this work was being done in parallel by his reports.

Or perhaps there are fewer ideas contained in these papers than there are papers themselves. I can “replicate 130 papers”, too, if they’re all substantially the same experiment on different data-sets. Just download all the data-sets and loop through them :)


> I also find it highly unlikely anyone is able to implement 130+ papers in 7 months.

Indeed, when I first read the headline I thought for a second this was one of those studies intended to show how easy it is to submit fraudulent work.


Hey, at no point did I made a claim that there's no alpha in the market. I generated around 25% annually myself on a fairly large balance sheet and made reference to rentech's stellar performance several times. AND I'm working on a new commercial project which relies on finding alpha. but yes the rest of your comments are valid and fair.

Also, for the other comments I edited the post to add links because I was asked several times what I was up to now.


This is a really informative comment, but the OP explicitly excluded the three strategies you mentioned. (There’s a blurb about ignoring “alpha” strategies, and he only went back 8 years).

I read this article more as answering the question “Did anything useful come out of the last few batches of finance PhDs?”, than “Are investment strategies are totally futile?”


Most academic white papers on trading strategies are horrible. There are written by academics or grad students (as you point out) that have no firm understanding of the actual market mechanics. You can often tell this right away just by the language and terminology used... Even before you get to the actual math or strategy.

Nobody will make any money in the markets relying on others work. It's just how arbitrage works. I've you want to succeed you need to be creative beyond what's already been done.

Once you're there you should follow the first rule of trading:

Don't talk about your strategy.

The second rule of trading is the same as the first rule.

I cannot emphasize enough how much you need to keep things secret in trading.


Actually, I think the author is in agreement with you that there is alpha. He apparently wrote this article to attract attention to his scheme for selling access to his cryptocurrency trading "bots", which he claims are guaranteed to be highly profitable once he finishes developing them. Certainly, his survey is not credible and does not appear any more genuine than his business offering.


Trading strategies come and go but selling shovels is forever :)


I have seen the same stationary overlook on other fields and always blows my mind that people involve would get away with that.


In my view, the predominant mistake made by those who seek to create profitable strategies is that they approach trading as if the market is a zero-sum game. In particular, doing things that harm the markets, like naively adding to existing momentum, is just promoting price overshoot and instability by reinforcing positive feedback loops. Such approaches hurt others and while they might make money for long periods of time, they will almost surely end up losing all that profit and more during a small number of extreme market events.

If you want to be reliably profitable, you need to first understand how the markets are not a zero-sum game and then you need to construct methods to improve the markets with your trading. There are countless ways that markets deviate from truly efficient behavior. Find some and develop strategies in areas that can benefit from your cognitive, experiential, and educational strengths. General examples of how one might improve the markets include things like providing liquidity when it's needed, limiting price overshoots when it's warranted, and incorporating new information about instrument values. The market will pay you in return if you do such things in a sound way. As long as you also do a good job of estimating and limiting your risk, you can be consistently profitable.

There will be no significant public disclosures of detailed ways to trade profitably. The markets rely on the robustness provided by many different points of view addressing market needs in a variety of ways. Any parties that overweight one of those points of view will ultimately lose money in the process of adding market instability. There's too much of that already. Don't trade until you figure out how to make the markets better.


> you need to construct methods to improve the markets with your trading

What percentage of traders do you think will approach the market with such altruism?

I think the markets are way too complex to rationalize "improving a market". Markets are by definition good markets when you have long term and short term traders mixed in with technical and fundamental traders all with different alpha time horizons. This is a healthy market.

If you're a speculator, so be it. As you point out, the market may teach you a lesson at some point. Those guys go away but new ones will join.

It's a beautiful virtuous cycle.

Not to say there aren't bad actors. Most speculators are not IMHO. But they cross the line when they undertake certain market activities, like spoofing for example. This is why good markets also have good regulatory oversight (finra, sec... )


> What percentage of traders do you think will approach the market with such altruism?

Approximately 0%. I don't know of any entities that trade as a nonprofit. The point is that if you find something the market needs, you will have inherently discovered an opportunity that can make money because there is demand which exceeds supply for the service. As an additional benefit, such strategies are unlikely to run afoul of regulatory oversight or be eliminated through future rule changes.

> I think the markets are way too complex to rationalize "improving a market".

This is not rationalizing, it is about how to effectively identify profitable opportunities. Let's turn the argument around. Do you expect a trading strategy that harms a market to be reliably profitable without risking fines, banning, or imprisonment? If you don't, then it probably makes sense to exclude such approaches from your strategy search space. Additionally, I would argue that strategies which are of neutral benefit to the markets are likely to generate small returns relative to their risk because entities on the other side of your trades are not receiving value and thus the market is more likely to turn against you. If we eliminate "harmful" and "neutral", we're left with "beneficial".

I completely agree with everything else you say.


> Do you expect a trading strategy that harms a market to be reliably profitable without risking fines, banning, or imprisonment?

Non-trader here, what do you mean by a particular trading strategy "harm[ing] a market," how would you actually measure harm to a market by a given trading strategy, and would whatever metric you decide on for that not just be a dressed-up subjective argument?


By "harming a market", I mean "causing a market to be less efficient". It is typically easier to identify harmful behavior than it is to quantify the amount of harm caused. Sometimes you can easily estimate a rough lower bound on the harm's cost. Easier cases include 1) Spoofing, where you could estimate the harm as being greater than the captured profit, and 2) Self-trading for the purpose of receiving liquidity provider incentives, where you could estimate the harm as being greater than the incentives received. Harder cases include situations where poor strategies caused market instability by underestimating risk or trading in a way that is too similar to others. Extreme examples of these include the Long-Term Capital Management (LTCM) blowout in 1998, the 2010 Flash Crash, or any number of firms that contributed to the financial crisis of 2007-2008. You should be able to find plenty of academic papers attempting to quantify the harm of those.


What are good resources for reading up in very granular executional detail how others have made markets better historically with strategies that presumably are no longer delivering alpha because they've run their course for whatever reason?


I don't think you're going to find much execution detail that is credible in that context. There is plenty of higher-level analysis of such things. You might start with Andrew Lo's papers and work your way out from there.


This is fantastic advice.


Currently I remain skeptical, 130 pages in 7 months plus meaningful experiments is quite some going. A list of the papers (so at least the authors can defend themselves), the source code and data used (because some of these methods require social media inputs) would definitely help.

After doing so much work though, why wouldn't you go the extra small few steps to publish? That way the work can be peer reviewed and the Scientific community has a chance to learn from it.


Yeah when I got near the end of the post, I was quite surprised. You spent a boatload of time on this and all you do with it is make a PSA on reddit going "hey lol I looked at all these papers and they're all overfit". Someone in the comments asked about sharing the work and was brushed off with "nah, the code is a mess".

This reeks of unscientific work, exactly what s/he describes the papers to be. While the results are in line with everyone's expectations (hence it getting upvotes despite offering zero proof, people already believed it and like it being confirmed), and so I don't really doubt they did try to reproduce papers and failed, it also isn't a reliable source in the slightest. If they were serious about disproving the papers, they could just have written down the steps, even if it's just a few lines of notes with snapshots of the code for each paper.

Just a list of papers isn't going to help the authors defend themselves, the methodology (if it can be called that) is so vague that the authors would basically be starting a "he said she said" discussion.


Yeah the paper does not pass a basic smell test.

OP also claims he does not know what a meta analysis is while doing all the work for a what could be a great meta-analysis.

He also answers evasively when people ask for results or proof.


> That way the work can be peer reviewed and the Scientific community has a chance to learn from it.

After trawling through so much peer reviewed work which in their view is utterly broken, I can understand why they'd be despondent/mistrustful of submitting it to a peer review process.


If you want to go that route, why not write a detailed blog post? Or a white paper in an archive? Or a GitHub repository? Not putting anything out there seems awfully unproductive.


Best survey of the subject I have found (most are bullshit) is Finding Alpha by Eric Falkenstein which he has graciously offered for free on his blog. By the subject I mean, finding an edge in trading. Spoiler alert there is no system to follow. He wrote algos for a local options market maker, had a significant Econ PHD thesis. The basic premise of the book is that real Alpha is rare, idiosyncratic and gets exploited by those in the know and eventually the edge goes away after several years. He gives several classic historic examples which are what made this book so interesting and unique to me. http://falkenblog.blogspot.com/2016/08/finding-alpha-pdf.htm...


I'm not surprised by this finding; too many smart people are working on the same problem.

When it comes to investing in public markets as an outside player, I've seen three moves which can work....

1) Identify a consumer product category going through the technology adoption S-curve, ideally something which isn't very subject to short term innovation cycles or disruption.

My wife spotted the trend of people spending more on pet drugs before we got married and invested. We've ridden it for well over 20 years...

2) Buy when the entire industry is a flaming wreck and there is blood in the boardroom; assuming there is a good reason that demand for their products to continue.

I've done this several times in natural resources (gold in the early 2000's, oil in 2016) and it worked out decently.

3) One I haven't executed yet; certain firms are basically bets that a specific event will occur, at which point the demand for their product will go bananas. Buy in the quiet period and sell once the event happens.

Requires great patience to execute well.


I wish it wasn’t this easy to get crypto-scam SEO articles to the front page of Reddit and HN.


THANK YOU. I feel like I'm taking crazy pills watching everyone eat up the article where HE ASKS FOR YOUR MONEY.


Wait, what did I miss? I thought the writing was bad (both contents and style) so I didn't click through to the medium article, is that where they ask you to invest in their stuff instead?


That's where it is. He then calls it a "set it and forget it" crypto bot. Yes, please forget about the $1,000 you just gave me...


Wait you DON'T want to invest your money into a black box crypto trading bot system with guaranteed returns?

For a community that prides itself on its critical thinking, HN is quick to laud a post with barely any substance when it confirms their preconceptions.


You think this one was clever, take a closer look at the next couple of posts that blow up here having to do with HN’s favorite category of “privacy hysteria.”

One thing you learn when you start your own side projects, everything on the internet is content marketing. Everything. And that’s not necessarily a bad thing since it aligns incentives. Expecting people to produce great content for you for free makes no sense.

In this instance however, the post is obviously made up and the product this guy is hawking is a scam. I wouldn't have a problem with the self promotion if he provided real value, but he didn’t.


HN's culture of critique often involves blindly praising anything that is critical in tone. This makes it easy to guerilla market anything as long as you criticize anything established and well-used/liked.

HN is of course not the only place that has problem, but it is very obvious here.


How did you do 130 papers in 7 months? That’s just over a paper every 2 days.

What was the setup, how did you set up a pipeline? Was it R or Python? What was the data source?

I am more surprised by your productivity than anything else.


It was answered in a comment in that thread. Python+Pandas+Keras.

I’d also love source code + data for this; without it, it’s a claim with nothing to back it, yet.


I don’t think he actually did it. He linked to a crypto scam medium post lol...


> crypto scam medium post

Could you expand on that? I've only skimmed the article, but I don't see any "crypto scams" pushed in the article. The article is just about something the author seems to know about (algorithmic trading), applied to the cryptocurrency market. It does promote the authors project (why else would you write a Medium post), but in the worst case, that would be a normal scam and not a crypto scam.


The whole article just seems like an attempt to steal money from uninformed people. He starts by giving vague information about trading strategies in general, then linking to an article about Renaissance Technologies as an example for successful algorithmic trading, then stating that most trading bots aren't successful and that the crucial differentiator for deciding whose bots to trust is the person's professional experience, which is obviously a reasonable thing to do, however the picture at the beginning of the article of him at a trading desk and him repeatedly mentioning his 7 year experience as a trader combined with the complete lack of any actual proof that his bots are actually profitable, make it seem as if he just tries to profit off his previous experience.

He ends the article writing this:

"All you’ll need to get started is: 1. $1000 2. To press a single button to get the bots started"

Furthermore in the reddit comments in response to the following question: "130 papers re-implemented in 7 months? I'm blown away. Write a software engineering book about how you did it so quickly. Then write a self-help book about having enough motivation to see it through." he writes:

"100-hour weeks and a desire for a better life for the ones you love will get you there pretty quickly"

This guy seems like a complete fraud, I find it sort of sad that this has landed on the frontpage of HN with that money upvotes.


Oh don't get me wrong, I wouldn't trust the guy farther than I can throw him either. I just wanted to be a bit pedantic about "normal fraud" vs. "crypto fraud" (= ICO or similar).


As long as he's not even willing to publish the list of those 130 papers, I think it's OK to be a bit skeptical, yes.


There was a fair amount of overlap in the papers. Makes the testing much easier.

> So with the papers, I found as many as I could, then I read through them and put them in categories and then tested each category at a time because a lot of papers were kinda saying the same things.


This is one of those cases where I would have guessed this would be the case, but it's nice that somebody else spent their time to verify, since I'm unwilling to spend my time to do so. Also nice that they shared their experience with the rest of us.

If it worked, it wouldn't be published, or at least not until it stopped working.


> Also nice that they shared their experience with the rest of us.

Except they didn't share the results with the rest of us. I know you said "experience" not "results", but when disproving papers, the least you can do is write down three sentences about each paper as you go along reproducing them, noting what you are seeing, perhaps with a snapshot (just a zip file or so) of the code. This is calling a whole field nonsense (that everyone expects to be full of nonsense) without giving enough evidence for anyone else to dispute your claims.


OP makes highly dubious claims and does not pass a basic smell test:

ALL CLAIMS MADE ARE DUBIOUS: Implementing 130+ papers in 7 months is highly implausible. This would require:

- Insane productivity.

- Implausible access to pricing and news data resources (which are often not freely available).

- Expertise in machine learning, natural language processing, finance, and data science. OP had to implement financial, time-series and linguistic feature engineering pipelines, as well infer the architecture and hyper-parameters used in all papers AND train all these models. ALL WITHIN 7 MONTHS. All while previously being a trader professionally i.e. not likely an expert in many of these fields.

- OP also claims he "web scraped" all the data which is highly unlikely as price datasets are often sold for a pretty penny and not publicly available in the detail described in several of these papers.

- Down the thread, OP says he does not know what a "meta analysis" study is al the while being capable of implementing 130+ papers. So someone who is an expert in ML, statistics, data science and finance does not know one of the most basic types of scientific study. All the while essentially engaging in a meta analysis study.

- OP describes himself as "a trader at a Tier 1 US bank" to lend credibility to his post: in itself that description is ridiculous and sounds like a naive attempt at instilling authority.

- When others encourage OP to publish results, he answers evasively: "probably a bit deep for a public forum but I was kinda glad to see the back of that work. It was an awesome learning experience but it's pretty soul destroying experimenting with tonnes of stuff that just doesn't work."

EVIDENCE PROVIDED: Non-existent.

All OP has to show for all this work is a hastily written Reddit post with dubious claims. There is no proof of the work done whatsoever, no code samples, not even result tables or graphs. The discussion of basic results are often made criticisms of this line of research.

MOTIVATION:

At the end OP shills his cryptotrading bot. This post was likely all just purely made-up to market his cryptotrading bot service. OP uses some common criticisms of market prediction research to garner authority as a wizz-kid to attract people to his crypto scam.

What's worse many on HN and Reddit seem to gobble it up naively. Seemingly because OP is critical of something that is popular to criticize.


I read the post and .. he basically said he did it, and just claims the results (that are so tasty because they fit everyone's preconceptions). And that's all the information there is. Not even listing the papers that he supposedly reproduced.

And then at the end of the post, you get a link to a blog post about crypto trading bots, which isn't even relevant in context, starts out reading like a intro/tutorial but it ends with "all you need to get started is $1000 and to press this button" ...

Upvoted all the way to the front page? What the hell, HN. Flagged.


more important than this study if done correctly is the fact that he built a framework that can ingest all this data and that he had access to all these datasets

typical hedge fund spends millions of dollars in order to build such frameworks and buying datasets, sure most academic papers fail if you replicate but the framework and datasets are very valuable because you can eventually find something on your own or an improvement on existing ideas if you keep trying hard enough + there are other sources of ideas like quant research from brokers, ideas from platforms like quantopian etc. but yea in general if you have an outstanding idea that works - you would have very less or no incentive to publish it. why would jim simons have his researchers publish anything when they can make money for him all day long everyday ... just my 2 cents.


i asked the author where did he source his data, his reply was "i scraped data"

how can you scrape pricing data? not every data in this is on public domain, otherwise there would be no Bloombergs, CapitalIQs selling data for millions(sure they're overrated and overpriced but still!). Or in other words, if he is right - he can sell data and make millions. no need of looking for an investment strategy. just my skeptical side saying :)

you need clean data to accurately test ideas. for instance getting tick data is quite expensive. most universities have free access to Bloomberg, CapitalIQ etc. datasets the reason professors can test and also the reason some smart guys in the industry work for university on the side


You can get data for free from places such as alpha vantage

https://www.alphavantage.co/


it's not enough data to test all of those 130 papers, he said he has also tested short term reversion trends. for that you need tick data(or atleast minute/hourly data)


Agreed — something about this post doesn’t pass the smell test.

1. He’s a profitable trader at a tier 1 firm who has the spare time to not only develop a series of algorithms based on 130 research papers, but also sufficiently backtest them in 7 months?

2. He said he looked at the past 8 years of papers, but refers to multiple models correctly predicting the 2008 financial crisis.

3. Where are the code samples?

Edit:

Lol, just realized his medium post ends with a crypto scam.


Ditto on the "130 papers in 7 months." I am not familiar with the field, but I assume the process would look like this:

* Read and understand paper

* Find and download appropriate input data

* Code paper model and validate (he said he wrote his own code)

I can see myself being able to do this for ONE paper in maybe a week. He claims he was doing 1-2 of these per day. Wow. So either there is some exaggeration on his part, or he is a total wizard in his field.


I think you are a quick study; typically it takes me a week to figure out the detail of what's being done in a paper. Getting the data and testing would take longer. Charitably he/she has a framework with all the data required sitting ready to go and is just writing wrappers to the models. But even downloading frameworks from Github and getting them working takes a couple of days - for me. For example I've been playing with the Graph-network code from deepmind for a few weeks - I had to learn how the graphs were represented, how to build them and access them and how the models were made and put together. Just working that out was a solid three day job. Now I can build things and test out what's going on in the examples and get a feel for the framework, probably (if there was a problem) I would be in a reasonable position to say "this doesn't work like they think it does" (it does, but no surprise) but unless you've done that leg work I think you can't really. I think a proper replication effort is really 1 man month of expert time - or really you're just throwing stones.


Depending on the complexity of the model it would take me at least a month for a single paper. What makes it fully unbelievable to me is the claim of detecting p-value hacking in many of these 130 papers while doing 3 papers every 2 days.

To make that claim for a single paper I would 1. have to be able to reproduce their p-value, and 2. spend enough time with the model to understand how/what assumptions were unfairly tweaked to get to that p-value.

Just running your own implementation of a model on your own dataset and getting an insignificant or different p-value is not enough. You might just have implemented the model wrongly.


>Literally every single paper was either p-hacked, overfit, or a subsample of favourable data was selected

including methods that use:

>News Text Mining. - This is where they'd use NLP on headlines or the body of news as a signal.

I have to call this out.

Is this author suggesting that you couldn't have made money by shorting Enron stocks milliseconds after the scandal was made public? Is it impossible to make money by buying a stock in a small company, seconds after an acquisition is announced? If a CEO gets sent to prison, will that company's stocks not be affected?

And then there are other methods that use:

>Fundamental data. So ratios from the income statement/balance sheet

So buying stocks in companies with good financial health is not profitable?

Something's being left out here.


> Is this author suggesting that you couldn't have made money by shorting Enron stocks milliseconds after the scandal was made public?

No, he's claiming that any of the actual published systems, if they would correctly have made money on that one special event, would not do so on enough other events to make up for those they would lose money on plus transaction costs on all the trades they would make to actually beat a broad market index.

It's pretty easy to (with hindsight) design a system that would make money on Enron or any other isolated event. It's harder to build a system that will consistently beat the market on future events that it's not designed against.


The important concept to understand about profitable investing is that you have to have a strategy that others are not also using.

Sure, investing in companies in good financial health is profitable. Unless everyone else does it too, and they drive up the price of the profitable companies, until all upside is gone (i.e., price is baked in). You're not better at finding profitable companies than anyone else.

Shorting stock on headlines? Sure, if you can beat everyone else. (You can't.)

The other is merely stating that, according to his analysis, apparently all these strategies did not bring an edge to the market.


> The important concept to understand about profitable investing is that you have to have a strategy that others are not also using.

Not quite, because stocks tend to go up.

What really is hard is making more profit than just holding stocks would. Because that takes actual new ideas.


Stocks do not tend to go up, that would imply a greater than 0 return average for series. We instead get mean blur and skewness (which is actually often to the left).

The aggregate of the traded stocks, i.e. the market, goes up on average. That's why you make money by holding diversified portfolios.


"The important concept to understand about profitable investing is that you have to have a strategy that others are not also using."

People always say this, but it doesn't make sense to me.

If I go to the grocery store and buy produce, and I assume I'm not more knowledgeable than an expert purchaser for a food service business, then am I necessarily better off buying stuff at random without even looking at it? If I'm really clueless, can I not learn to choose good stuff by examining it and trying it, and finding out what other people look for?

The problem, I think, is that when people are looking for a way to beat the market, they, pretty much without exception, look for ways to process the predigested information about a select group of companies that is already in a structured form. That stuff is the information that's most absorbed into prices, so why not stop "looking for your keys under the streetlight"?

Precisely because the market is very efficient at pricing everything that people can quantify, you don't have to quantify much at all! If you can tell rotten fruit from fresh, then you are adding value and you can assume that everything that you find difficult to evaluate is already factored in.

What if you just spent 10 seconds looking at each business description of a public company's 10-K? Instead of looking at numbers at all, just treat your investing like you have a stack of several thousand resumes and you need to hire 20. Of course, you want to look at some numbers later on before buying, just like doing a background check on a prospective employee, but just looking at a broad cross section of what's out there is enough to observe really obvious and educational patterns.

Here's something I read in a 10-K recently:

"In July 2014 as part of a diversification strategy we acquired companies engaged in the manufacture and marketing of electro-hydraulic servo-valves and the development of optical fiber hardware and software solutions for the security and protection industry. However, in the fourth quarter of 2015, we decided to focus on our hog farming operations, sell our operations in electro-hydraulic servo-valves and optical fiber based security & protection and seek to grow through internal expansion and acquisitions of businesses in the agricultural industry."

Now, if the market is efficient, and I am not an expert in hog farming, servo valves, or investing then how can I tell if this company is a better or worse bet (at the current price) than, say, Apple?


I see where you're going with the grocery store produce analogy, but it's not a useful analogy for how markets work.

1. Those grocery store items (let's say tomatoes) are priced equally to each other, by decision of the supermarket. This is not true of securities, which are individually traded and priced separately, which allow differences to be priced in.

2. The market for tomatoes at your local grocery store is geographically constrained, limiting the number of participants. This is not true of publicly traded stocks that nearly anyone in the world can trade.

3. The amount of money that can be put to use finding efficiencies at your local market is small. In global markets, the payoff is in the billions, and so many people are scouring for similar efficiencies - and in the process, eliminating them.

4. The tomato market has high transaction costs. What are you going to do when you find a better tomato for the same price? Sell it to someone else for a higher price? No. The arbitrage opportunity for tomatoes doesn't exist.

Efficient markets is not a thesis that is required to be true of all markets by some sort of law. They are a consequence of liquid markets that are large and traded by many well-funded participants. The analogy must have the same factors to be valid.


they're probably not accounting for HFT. that's not unreasonable to expect.

> So buying stocks in companies with good financial health is not profitable?

everybody has the same common sense, prices reflect all available information (at least, if you believe the efficient market hypothesis, which I do to some extent). so you shouldn't expect the method to be profitable in excess of the overall market profit -- what we call alpha.


Friendly reminder: markets are only efficient if P=NP.

(Strong form market efficiency has been disproven already, so this is weak form efficiency).

That's enough to make me believe the efficient market hypothesis is BS.


The market is not perfectly efficient. Small cap stocks outperform large cap stocks; the S&P500 outperforms leaving money under your mattress.

I am more than willing to bet that if you combine these methods with an algorithm that estimates a stock's "proper price" with the information, a sophisticated algorithm should be able to at least outperform a layman's "Buy-and-Hold" strategy.


You can put your money on that, but you would pretty consistently, it turns out, be wrong. Which you would know if you had read the OP.

Profiting off of other people's tendency to trade too much and too confidently is some of the surest money in the market, because regardless of evidence people want to believe that they can positively effect the outcome. Nest eggs are more like soufflés than caramels.


>Profiting off of other people's tendency to trade too much and too confidently is some of the surest money in the market

This proves my point; this statement is counter to the efficient market hypothesis, and it shouldn't be too difficult to algorithmically find trigger events that cause people to trade too much and too confidently.


Published academic studies on the size factor go back decades. I'm not ready to consider them refuted because someone on reddit says he google searched papers from the last eight years and found them lacking, especially since he doesn't specifically claim to debunk any of the major factors, and the momentum factor he even confirms.


> the S&P500 outperforms leaving money under your mattress

This doesn't conflict with the efficient market hypothesis. The S&P500 also outperforms 'investing' in blackjack. No one is claiming that holding your money as cash is on the efficient frontier.


>No one is claiming that holding your money as cash is on the efficient frontier.

But it would be in an efficient market. As more investors invest in more profitable assets, the price of those assets rise, which makes the return on those assets fall relative to the initial cost. The Efficient Market Hypothesis, in it's strongest form, implies that every asset is on the efficient frontier.


No, it won‘t.

1) You‘d still get a risk premium, because only known information can be priced into the stock. Unkown information is risk.

2) Capital is rare. There is no unlimited supply, so it‘s distributed between assets as well as available information allows. But there is still unsatisfied capital needs where the money can be employed more efficient than holding it cash.


So an investor that can anticipate an increase in, decrease in, or general level of a) market risk, b) a market's risk premium, or c) available market capital, can predict market movements. Just because CFAs use fancy names for market imperfections doesn't mean that they're not exceptions to the EMH.


> Small cap stocks outperform large cap stock

Yes, and value stocks outperform growth stocks over long enough periods of time too. But what does that have to do with the market not being efficient?

The academic explanation for why small cap and value outperform large cap and growth is that small cap and value companies are riskier investments. The factor risk premiums exist because investors need a higher return to reward them taking on more risk.

Risk premiums actually support the existence of an efficient market.

Additional reading:

- https://faculty.chicagobooth.edu/john.cochrane/research/pape...

- https://www.investopedia.com/ask/answers/022715/are-small-ca...

- https://www.investopedia.com/terms/v/valuestock.asp


> Is this author suggesting that you couldn't have made money by shorting Enron stocks milliseconds after the scandal was made public?

Enron's collapse was 18 years ago. I suspect if this happened today, with today's trading environment, the answer to your question would be "yes." The algos today will parse an article, enter & exit a trade faster, than a human can read the headline.

> So buying stocks in companies with good financial health is not profitable?

That alone, probably not. You need to have an edge. If everyone else knows it's financial health clearly, then the price is already "bought up."


This brings an interesting point regarding reproducibility in economics. It's possible for a paper to be legit and be good science, but the moment it's published it becomes irreproducible because other actors are going to use the published approach from now on and the balance in a game theoretic way is not the same. By publishing a paper you can change the thing you are studying.


This is what the author is talking about when he says "alpha decay." He tried to account for it with backtesting (so simulating a market that has no knowledge of these strategies) and the strategies still failed.


> That alone, probably not. You need to have an edge. If everyone else knows it's financial health clearly, then the price is already "bought up."

This is untrue. FAANG (all healtly companies) have been outperforming the S&P500 consistently. In fact, investing in FAANG is probably the "dumbest" smart play you can make. And, alas, you still come out on top.


Yes, and Enron was incredibly healthy too. Netflix has recently seen a downturn. The "health" of companies is not a constant. Can you predict when it will sour? Or are you certain that these companies will never fail? If so, why?


It’s well known black swan events are not predictable (see Nassim Taleb), but this doesn’t mean FAANG doesn’t, on average, outperform the S&P500.

A sound strategy would also hedge against black swan events — so some money might be in gold or jewels or something. But that’s beside the point.


I think it's a bit of a stretch to say that FAANG stocks have "consistently" beaten the S&P 500. Most of the FAANG companies haven't even existed for long enough to draw meaningful conclusions from. The one that has (Apple) once underperformed the S&P 500 for 11 years from 1993 to 2004.


No one got fired for buying IBM.


false


The claim is that you can't do it consistently. Your sentiment detector has to more accurately capture the state of a randomly selected set of companies (not one selected with the benefit of hindsight, like Enron) based on news sentiment than the information already incorporated into the stock's price.


Right? Just like how you could have made money shorting Facebook right after their recent FCC fines came down. Oh wait.

Market is irrational in the short term. Hindsight is 20:20.


I think this is not rigorous enough to draw any real conclusions.

If he had done a proper job of reproducing he would have created a write-up of his work explaining his reproduction methodology. The next step would be to get his work peer reviewed.

I think only then you have come close to the amount of analysis and rigour necessary to discredit so many authors of (possibly peer reviewed scientific articles) academic research.

The fact that he mentions that he doesn't know what a meta analysis is in the comments suggests that possibly _his_ results might not be what he purports them to be.


Exactly, I also find it extremely dubious a "Tier 1 professional quant trader" implemented 130+ papers in 7 months.

This means obtaining the same data, reproducing feature engineering and hyperparameters. Implementing learning algos. Maybe the guy is genius and god-like in NLP, finance, data science and machine learning but even then 7 months is too little time.

I was amazed at how few people call out this obvious lie here.


Some of the papers I've seen are ridiculously obviously over-fitted.

For example published in 2018, but "tested" on 3 months of 2010 prices of GBP/USD, USD/SEK and USD/THB. Quality forex data is so easy to get freely, that picking 3 months from 8 years ago on one major pair and two other random minor ones just stinks.


Where can I get good quality forex data from?


I used TrueFX for an assignment in a statistics-for-finance class I taught: https://www.truefx.com/


They don't test the predictive power of their models?


Has anyone tried a simple approach - trying to predict which of the S&P 500 will have the lowest 10% returns, and build an (S&P 500 - 10%) index? It seems obvious that the S&P is stacked with some great companies and some old dogs. Does that method not work?


Since it's obvious, everyone knows it. And since everyone knows it, it's already priced in. You cannot find an edge by acting on widely-known public information.


I’m acting on the fact that almost all my investments are in the S&P 500, and I don’t have a (easy) way to pick my favorite 450 out of those 500. How would knowing the worst 50 be already priced in? They would be priced lower than they should? Good let’s get them out of my index. How else?


They'd be priced low. There is no "should" when it comes to markets - the market price is whatever people will transact at.

The problem is that stock price movements depend upon future events - making money in the markets is effectively a future-prediction problem. So if your strategy is to discard the bottom 10%, great, you got rid of Foot Locker and Sears. You also would've gotten rid of Apple in 1998, which was responsible for a good portion of the index's gains over the last 20 years. And you would've kept losers like PG&E, which went bankrupt over a black-swan event (they were doing fine until they burned down a town).


Prices of stock are based on future expectation of earnings. This is why you can have a company like Amazon have incredible earnings, but still have the share price tank. IE, expectations were that they would have even more incredible earnings, but they were merely incredible.

This is also how you can have companies that are on the verge of bankruptcy get huge stock gains if they defy earnings expectations.

As such, expectations of future results are already priced into stocks. It's literally the stock price. Whether a stock moves up or down is based on the delta between reality and expectation.


There are no quick easy hacks that give you reliable above market returns. If there were, enough people would use them that the pricing would correct for it because of demand and the above market return opportunity would disappear.


Intuitively the problem you run into is that occasionally those obviously bad companies have amazing comebacks and you completely miss out on those, so you can still end up underperforming the S&P 500.


Everyone can drive a car in a straight line looking into the back mirror. Trouble only starts on the first turn.


The stock market is a complex adaptive system where the agents are constantly changing their strategies so that even if you were to find inefficiencies or patterns, they are only ephemeral.


From the post:

> The easiest way to test whether it was truly Alpha decay or just overfitting by the authors is just to reproduce the paper then go further back in time instead of further forwards. For the papers that I could reproduce, all of them failed regardless of whether you go back or forwards. :)


This is why the truly successful quant groups like Renaissance continuously adjust their strategies and come up with new ones. Renaissance in particular has invested heavily into their data processing pipeline which enables them to have a significant advantage over the rest of the field.


Yes, if I remember correctly from an interview I saw with the founder he mentioned that the barrier to entry used to be high and that commodities markets 'used to' trend.


Interesting but it would be nice if the author would, you know, write up his/her own detailed analysis with replication steps and post on arxiv or something.


I got the feeling this was more of a case of "I did all this work for myself. Nothing useful came up, so here's what I found." The author may be ok with spending an hour sharing the findings, but doesn't want to spend more time than that.


Which honestly feels like a reasonable position. Though the untrusting, conspiracy-minded part of my brain wonders if I did review 130 different research papers and found 1 that worked if I'd keep it a secret and just tell the world that they were all crap.


Maybe he could work with someone else to get it to a publication stage? It's super interesting, and combats a well-known problem that you get a lot more out of publishing positive cases (we found a correlation) vs negative cases (we found no correlation), although they're both valuable knowledge.

Look, I know writing stuff up sucks. But it could be a great opportunity to learn a lot of things from a very knowledgable person. With the right prof, it could be a great undergrad project.


>Nothing useful came up,

Except plenty useful came up. "Literally every single paper was either p-hacked, overfit, or a subsample of favourable data was selected" out of 130+ is a significant result. The media would jump on this, and they just might even without any proof of work.


I was referring to the word "useful" within the context of the author's hypothetical goal, not the parent poster's (i.e. a strategy useful for making money opposed to sharing knowledge).


Which is mildly interesting, but not something other people should use for investment decisions.


I use the "everything is at least 10x greater on the internet" rule.


the author is trying to sell crypto trading bot ! looks shady to me if anything guarantees you profit, run away from it as fast as you can : https://credium.io/

https://towardsdatascience.com/crypto-trading-bots-a-helpful...


"The most frustrating paper:

I have true hate for the authors of this paper: "A deep learning framework for financial time series using stacked autoencoders and long-short term memory". Probably the most complex AND vague in terms of methodology and after weeks trying to reproduce their results (and failing) I figured out that they were leaking future data into their training set (this also happens more than you'd think)."

- Not sure how author tried to implement it , but is this not how you train LSTM networks by feeding t+1 data back into the cell again to predict t+2 data. It will be easier if author made it open source as well


I wrote my thesis last year comparing different RNNs against each other using this exact paper as baseline and basically concluded that you would be better off predicting the price yesterday than using their results. Authors did not respond when prompted for implementation details or comments.

Overall, concluded that amongst RNNs the GRU architecture proved most favorable but still would not outperform simple stochastic models of the financial industry toolbox.

You can check it out here https://github.com/jensgrud/financial-forecasting-lstm/tree/...


Leaking future data in would be using t+1 for t, e.g. something like a bi-directional LSTM. I assume he means the actual training dataset had some kind of signal in the data that was also in the test data.


People do this by doing things like testing that their features contain information in both the training and test set. Because they are not exposing the data directly to the classifier they think that they haven't compromised the test set - but what they have done is increased the chances of a chance correlation.


I could bet a ton that most people will make excuses as to why the papers failed. There's something within us that wants to hit the stock market lottery.

I truly believe that there are streeks to profits in the stock market in the same way you will find streeks in any set of random numbers but they are impossible to find in a consistent manner. The road to wealth for most in the stock market is time and investing in a basket of good stocks.

Whoever thinks that they have found a system to profits in the stock market. Test and retest your method a few times. It's unlikely you have a winning system.


But there is a foolproof way of profiting from the stock market. Insider trading! I find it fascinating that people do not believe it occurs at a grand scale given the low risks and huge rewards. Exactly like how people believe athletes don't use steroids so they get all upset when every once in a while one is caught. :)


One of my favorite not-quite-conspiracy-theories is that all of quants and crazy trading algorithms are just to provide cover for the insider trading.


The first hedge fund Jones & Co. that launched 1949 hired business writers that would consistently "find" profitable trades. The logic was that Jones & Co. assumed that they had a line to insider information.

Yup, you can make a consistent buck if you know the right people with the right info. That system will always work.


No, there are trading strategies that do work, over multiple years, and can then be adjusted and refined to work even longer.

But obviously nobody is going to publish such strategies. You’re looking at a negative selection of papers.


Describe one. I know if there are any current ones no one will talk about them. But have there been any consistent ones in the last 100 years? I'm sure someone would have at least written about them in their memoirs.


I don’t know any good ones... AFAIK value, size signals, merger arbitrage, and other kinds of predictionsused to work well in context of statistical arbitrage... but yeah their existence is evidenced by successful companies / hedge funds (Renaissance, Citadel, Jump, Two Sigma, ...)



Wouldn't any given approach rapidly lose efficacy as soon as its published?

I would even guess that a paper being published means that, at the point the paper started to be written, its alpha had already decreased to zero. Otherwise the writers of the paper would still be using that approach. That's how it can appear to provide no value even if you extrapolate it back in time.


The author mentions this, and said he tested for "alpha decay" by applying method to datasets that preceded the data on which the model was tested/trained.


I barely skimmed this https://journals.plos.org/plosone/article?id=10.1371/journal... for the "most frustrating paper" but how did he determine that they were leaking "future data"?


TLDR, no results, no code, no details, just admission of failure from anonymous redditor with SEO link on his unrelated crypto-trading project.


Why is this getting downvoted? The original post is clearly marketing clickbait. It's trivially true that most "predict the stock market" papers are going to be bunk.


It’s baffling to me that anyone would believe this post. Do people here not have an ounce of skepticism?


I think people aren't skeptical about this because it lines up with what they already expect. That's the case for me.

If the guy is lying and it's all made up, I'd still wager that if you assembled a collection of 130 papers on this topic, at most 10% would be valid [1]. So even if he's wrong, he's probably not wrong.

[1] I have an academic background, and I think invalid papers make it through peer review constantly. Peer review is not what people think it is.


I tend to follow Jack Bogle's advice. Jack made many people wealthy.

His advice? "Nobody knows nothing." (Read Bogleheads.org to see how to leverage this into wealth.)


yeah but you can make a lot of money preaching technical analysis to your congregation a few times per week

protip: a fibonacci retracement from a randomly selected extreme will always tell you something

protip: it takes 5 months for your congregation’s account to get eaten up from transaction costs when their stop limits keep getting hit

in the mean time you can just play TA roulette and they’ll always be impressed by your “uncanny” perceptive abilities


If you know that 5.6% of the users (automated, or parroting the preacher) of an exchange will use fibonacci retracements, and that 15% of the amateur market will follow the market price change caused by either buying or selling activity, then you can play roulette with a decent edge. Of course, not as much as the preacher, who is allowed to bet before (s)he will speak to his or her congregation.

When you gather enough of these commonly used technical analysis, it's like having to predict in which startup Ron Conway will invest, but you can calculate Conway in a Python one-liner, and keep up-to-date by going to weekly sermon.


yes, this is possible, with more trades becoming a self fulfilling prophecy, and have stop limits in place all the times it didn't work.

I wish the technical analysis flock would merely incorporate different things. TA takes a time series chart and imagines time series patterns. It primarily neglects what didn't get printed on a time series chart, and why. How big are the orders at the resistance level, do you have a record of the order sizes that appeared at the last resistance level? who is selling at the resistance level and why? The TA answer is "just because its a psychologically round number for the resistance price" or "because thats how high the last high candle was", but you can greatly improve your win rate by understanding who is in the market and why, which is possible to understand and a large portion of my trading strategies. It can be much more data intensive though so I can see why 1980s gurus did not do it.


Some of the smartest people I've ever known from college became professional stock market investors and traders. Not a single one of them has significantly outperformed the market.

I think of stock market investing as a white collar gig economy job. Both depend on some central entity, which makes a killing (brokerages in this case), and have extremely low barriers to entry.


Hasn't there recently been shown that only strategies using simple momentum-derived technical indicators were able to consistently bring returns in the stock market?


Not just recently, it's been known for a while. The author actually confirmed it:

> Almost every instrument is mean-reverting on short timelines and trending on longer timelines. This has held true across most of the data that I tested.

(But momentum isn't the only thing that appears to work.)


He didn’t state the time frame things are predicting. Going long is still the best way to beat any hedge funds or “pros”


Source OP deleted the post. Is there a mirror?


To OP (janny_kul@HN and chiefkul@RD): "Show me the code! Show me the code" - in obvious Cuba Gooding Jr. voice

Until then I call this a fraud


Buy low, sell high. Focus on reliable, proven stocks. It's not hard. Buy on bad days, sell on good days. Dont overcomplicate things.


Disappointed that this made it to the front-page of HN. Had it show up in my push feed because it was so popular; quite upset to click it and see the comments in here falling for someone who's a crypto scammer:

- Redditor for 28 days; first post made to /r/investing and contained self-promotion for a website that functions as a search engine for stock market fundamentals (post removed by moderators, take a look using https://ceddit.com)

- Has made a few, low impression comments across other trading subreddits (/r/algotrading and /r/securityanalysis)

- Claims to have been a trader at "multiple Tier-1 US banks" for 7 years but a glance at their LinkedIn shows less than five and a half years (65 months) between two organizations

- Claims that they found, analyzed, and reproduced "130+" academic papers in only "7 months". Even by superhuman standards, the outlook for completing this much work (at high quality) is grim. Here's some napkin math:

7 months * 30 days = 210 days 16 hours (superhuman) * 210 days = 3,360 hours 3,360 hours / 130 papers to reproduce = 25.8hrs per paper

These numbers are intentionally rough to show that even if we're being generous, the idea that any one person could somehow recreate a full academic paper in less than two 16 hour working days is absurd. Add to this OP's claim of spending "weeks trying to reproduce [a paper's] results" and it the task becomes even more daunting.

This is possibly the biggest red flag of all -- or OP is the world's first (and only) 100x developer.

- Does not provide any empirical data to back their claims, only provides the name (not a link) of one paper, and when asked in comments to provide more information -- fails to deliver.

This post desperately wants to come off as a research article but it's missing all the fundamentals to make it so. If OP's claims were true, why wouldn't they post their raw findings? We could make that argument that there's far too much to post, 130 papers would be a good bit of code, but there's no reason that OP couldn't provide a listing of the papers they "reproduced" at the bare minimum.

OP has made extraordinary claims: most if not all "predicting the stock market" papers are fraudulent, but has failed to provide any supporting evidence to back this up beyond their own words. As someone who just analyzed well over a hundred papers and postures themselves as a data scientist, OP should know that citing yourself doesn't fly in this scenario.

- OP ends their post with the following: "I try to write a bit on medium even though I'm not a great writer if you wanted to read more from me."

There's absolutely nothing wrong with self-promotion, provided you're a quality creator and are transparent about what is being promoted. OP is neither of these things.

Clicking the Medium.com link will take you to a blog post titled "Crypto Trading Bots · A helpful guide for beginners [2019]". If you're like me, you might assume this is a tutorial related to crypto trading bots. Perhaps information on setting one up, coding one yourself, or an overview of the landscape. It is none of these things.

Much of the article explains how cryptocurrency trading bots work -- which is great, but quickly goes down the path of telling the reader that most bots are garbage and won't return a profit. Near the end of the post we're advised how to choose a viable trading bot and are provided with three questions to ask ourselves:

1. What is the professional experience level of the senior leaders of that firm? 2. Are their algorithms widely known and openly available to anyone? 3. Is their success aligned with your success?

Immediately after these questions are the following lines: "Unfortunately, choosing a trading bot to go with isn’t as trivial as answering these three questions. In my opinion, everything ultimately comes down to people."

What people, you ask? Perhaps someone like OP, who happens to be the founder of a trading bot platform. The post finishes off with a not-so-subtle advert for his company, along with the extraordinary claim that it will "take full responsibility for the profitability of our clients". According to the OP, all you need to get rolling on his platform is "$1000" and "to press a single button to get the bots started", never mind that the platform hasn't launched and AFAIK there's no start button to press.

Clicking through to the platform's website will bring you to a scroll-jacked landing page full of marketing fluff. Scroll further down (or click the "Get Started" button) and you'll see a pricing table with only one option currently available: pre-order a "$145 single fee lifetime license". Compared to the two unavailable plans, $145 is a steal -- the next plan down would run you $660 ($55*12) a year. Combine this with the "First 90 days profitability or money back guarantee" and the whole damn thing sounds like an incredible deal. But you better act fast because this offer is only available to the "first 1000 members".

OP's other trading bot articles aren't much better and in my opinion, directly promote his platform.

~~~

The whole thing sets off numerous alarm bells in my head -- and it should for you too.

A trader who worked at "Tier 1 US banks" should know that guaranteeing the profits on your first 1,000 customers is not only ridiculous but so ambiguous as to be useless. Every developer on this site should know that not a single one of us would be remotely capable of maintaining a death-march level working pace for 7 months, launching a (credible) startup, followed by making non-zalgoized Reddit posts lacking any reference to the void.

OP isn't an OG superhuman developer who worked for a bunch of big banks and learned all the secrets. They're a former trader turned wantrepreneur that's resorted to dirty tactics to promote their venture. As far as I'm concerned, and maybe it's inflammatory, but OP is a liar. Nothing more.


Technical analysis is pseudoscience.


Let's say an event occurs and you know a particular stock will go up 20% but you pump enough money into the stock to make the stock go 30% and then you let others chase the stock and publish news about this market move through media houses / content spamming / fake accounts. Then you short the stock once you get the desired movement and finally you remove the money from this stock so it goes into free fall and all people start selling. Finally, you close the short position when you notice there is no room for stock going further down. Now let's say stock ended up at 10% so you buy more stock so it goes 15% just below what you initially predicted.

Assuming you've billions to move the stock.

Why such strategy will not work?


Such a strategy does work and it's also illegal. It is called market manipulation specifically 'pump and dump' https://en.wikipedia.org/wiki/Market_manipulation


Literally every single paper was either p-hacked, overfit, or a subsample of favourable data was selected (I guess ultimately they're all the same thing but still) OR a few may have had a smidge of Alpha but as soon as you add transaction costs it all disappears.

I could have told you that without testing. If anyone had a lucrative strategy would they disclose it in a paper to the general public? I think not.


Proof can be more useful than a hunch.


And the author has shown none.


CAPM, Fama French Three Factor, or five factor? Are you serious? They don’t work anymore but they did at one point. The foundation of modern finance is built on published Chicago school papers.


From what I've read, they don't work quite as well but they do still seem to work. Economists have spent a lot of time trying to figure out why. Generally, they come up with behavioral explanations, like recency bias, and structural ones, like agency issues.


This is unsurprising. P likely is not equivalent to NP [0], and predicting the market is NP-hard [1]. It's nice to see empirical work in the field, though, and especially nice to see reproductions of published papers.

[0] https://www.scottaaronson.com/papers/pnp.pdf

[1] https://arxiv.org/abs/1002.2284

Edit for downvoters and repliers: If enough market participants are irrational, then it can still be possible for people to predict other people, instead of the market, and make money that way.

NP-hardness indeed doesn't rule out heuristic approaches, but experience with 3-SAT and other NP-complete problems suggest that there will be arbitrarily bad times, and that in those times, the amount of loss can be exponential in the length of time that the heuristic poorly predicts the market.


Your second link doesn't say predicting the market is NP-hard. It says the opposite: that the market is only efficient if P=NP.

According to the paper, if you don't believe that P=NP then you believe that the market is inefficient, which means there's profit to be made. The paper even suggests how.


What do you mean? People are successfully making millions or even billions on predicting the market. Seems like a stretch to relate this to P=NP


You are talking about writing an algorithm that has a 100% accuracy on an NP-hard problem, and taking the impossibility of this to discard an approach that may yield 58% accuracy.


Philip Maymin seems like a serious guy... but that EMH ↔ P=NP paper is absolutely not even remotely a proof. Was genuinely very curious and it's at best an intuition. Some claims, e.g. Knapsack and 3SAT are (almost?) isomorphic to the efficient market hypothesis, are pretty bold. And the justification is hand-wavy at best.


It's not just hand-wavy. It's outright wrong. The size of the search space isn't just exponential in the amount of history -- that's the number of possible histories. The number of mappings from histories to positions is doubly exponential, so they can't even be uniquely described in less than exponential space.


He's not a serious guy at all, he's a nutjob who likes listening to himself talk.


Why must serious guy and nut job be my only two options. And why must they be mutually exclusive.


Warren Buffet must have a secret proof of P=NP then, that would explain how he has made billions trading.


Warren Buffet is an interesting case, because his approach seems largely to consist of ways to reduce the N of what he is considering: choose a few bets, bet big, hold for the long term, choose bets where you already understand the market. Sampling, rarely recalculating and selecting subsets of N where the related data is already in cache are all well-understood techniques for addressing computationally-hard problems.


He's done a good job of avoiding expensive mistakes, not being pushed to invest cash in something marginal just because it's sitting there.

Half the battle is not doing something stupid.


NP-hardness is irrelevant here because algotrading is not 100% formalizable.




Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: