Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: How I built a trading signal by scraping Nasdaq for short interest (quantopian.com)
89 points by fawce on April 5, 2013 | hide | past | favorite | 51 comments



My comment in the last thread opened with a post from this source:

Past performance does not guarantee future results" is still the operative principle here. Data-mining discovers patterns, but it doesn't lead to deep insight into causes, and markets are perturbed by many events that you don't put into your training algorithm. "The market can remain irrational longer than you can remain solvent" is still important investment advice.

You can never build a trading signal just by scraping historical data, unless you like losing your shirt.

Can you tell I'm reading Antifragile: Things That Gain from Disorder just now? I'm very sensitive to errors in statistical thinking today.


You can do it; you just need to only evaluate your algorithm on data it hasn't been trained on. The same as with any machine learning problem really. Though it is indeed dismaying how many programmers dabbling into ML tend to do so without any scientific rigor...

Whether the stockmarket can or cannot be predicted on the short term based on its past is another question... but I've been gathering some convincing evidence that it cannot (as in, its variations have no intrinsic structure; though it can still be predicted based on various external factors).


Cross validation doesn't change the fact that you're trying to predict a non-stationary distribution. Machine learning techniques generally make an assumption that your samples are being drawn from at least a close enough distribution to any future data that may enter the system. With problems like text classification in NLP you can generally make a safe assumption that, though language does change, it changes slowly enough that you don't have to worry. In other cases a model may simply need to be retrained after a certain period of time. Even in text classification real world systems will often incorporate an unsupervised novelty detector as well to indicate when the models may need to be retrained.

Additionally normal cross validation (random or stratified sampling) does not work on time series data since you are in effect cheating by training on events in the future, which your model will not be able to predict. In order to test your model on time series data you would need to hold out a significant chunk of data at the end of the time series.

If you're interested in the general case of predicting the stock market you might enjoy reading up on the Efficient Market Hypothesis[0] which deals quite well with just that question.

http://en.wikipedia.org/wiki/Efficient-market_hypothesis


Indeed. One reason why ML is quite powerless when looking at the stockmarket (from the point of view of past stock values) is that, even if you assume there is a latent model to be learned, it would seem reasonable that this model evolves as fast as the economic landscape and that is to say, quite fast. To such an extent that the valid data points available (the past few years of price variations) would not suffice to train even a very low-dimensional latent model.


what's an example of an external factor? maybe I can find it on quandl and backtest it.


The stockmarket is primarily driven by the news, both on the ultra-short term (minute scale) and at the scale of a few days. So, scraping the web and performing semantic analysis of investor sentiment can be a good way to get an edge in the game.

Though in order to be really effective you'd need to do it before Wall Street traders' reactions adjust the prices, ie. get access to Bloomberg's B-Pipe [1], do real time semantic analysis and place orders with ultra low latency. Which quite a few trading firms are already doing...

Other factors include P/E ratio (only accurate in the longer term), div yield, recent growth...

[1] http://www.bloomberg.com/enterprise/enterprise_products/data...


Back testing is a real bitch. I've been building my own app for back testing recently, my specific interest being how published insider buys (SEC Form 4 transactions) affect the prices of stocks in the short near and long term. You can get dividend data and stock splits easily enough from some public feeds. But where do you get a database of ticker changes, bankruptcy events, and spin-offs, especially on the OTC markets? You can't unless you're willing to shell out a lot of money. Back testing properly is probably out of the cost range of the individual investor.

Some examples:

* Lehhman's ticker changes on the way down

* GM going bankrupt and then coming back from the dead!

* Skye International used to trade under SKYY (at 0.35c/share), but now SKYY tracks a cloud SaaS ETF 20.60/share). Think you got a big win using that strategy that including buying SKYY? Think again!


I've been working on a similar strategy after having read Nejat's book: http://www.amazon.com/Investment-Intelligence-Insider-Tradin...

The plan is to derive trading signals from insider purchase data while taking into account the insider's relative risk-aversion (estimated from age, salary, sex). At this point I'm just trying to recreate Nejat's results. Data-quality seems to be an issue (stock splits aren't recorded in the yahoo data).

If you would like to collaborate or trade ideas message kal00ma on reddit.


You're speaking truth. We (quantopian) deal with all of those headaches, and test the algos with fully adjusted data. Splits, symbol changes, mergers, divestitures, dead companies, dividends - they're all covered.


Have you looked into CRSP (Center for Research in Security Prices)? I know they have all that data, I'm not sure how much it costs though. Probably not profitable for the average retail trader.


tickdata.com has split adjusted, survivor bias free data. It isn't cheap however. You'll still need to go elsewhere for the SEC filings though.


I have for two years now been playing around with Algorithmic trading as a hobby and I am amazed by people who think wave riders or simple mathmatical transforms will get them profits in the market. I have found that the best method is still a good mix of modeling and trader input. I don't think a model exists that you can just turn on and have it print you money. So attempts like this to make one of those really are a waste of time. Your systems should be tuned to listen to you and then take what input you have and do what you cannot ( make decision in sub-second windows )


I don't know where you're getting your data from, but I know of at least one high frequency algorithmic trading firm that make ~$1B a year using mathematical models. The models aren't simple, but they're entirely automated and they behave exactly the opposite of how you describe them: you turn them on and they print unbelievable gobs of money.


High frequency trading is a different game, though: quoting from an excellent article [1]:

"Most HFTs run a market making strategy. What this means is they play both sides of the table - they take no position on whether a stock will go up or down. Instead, they try to offer securities both to buy and sell. If you want to buy, they will sell to you at $20.10. If you want to sell, they’ll buy from you at $20. As long as their buys and sells match don’t get too out of whack, the HFT will collect $0.10 = $20.10 - 20.00."

[1] http://www.chrisstucchio.com/blog/2012/hft_apology.html


Anybody else think this is like, inherently bad? I mean making money from nothing, producing nothing, doing no service to anybody. The only way you could possible get that billion without doing nothing is to take it from other people, essentially stealing it. Why is this legal?


Is your insurance company stealing from you when your house doesn't burn down?

When someone buys or sells a security, they take on risk in exchange for money. When someone is on the other end of the transaction, they give up money to reduce their risk. That seems fair to me. Just because it's a zero-sum game with respect to money doesn't mean it's a zero-sum game in totality. It's just that you can buy less iPods with risk than you can buy with cash, so you mentally overvalue cash and undervalue the lack of risk.


RenTech doesn't do "nothing". They do a ton of research and write a lot of code. Then, they purchase securities from people who want to sell them, and sell them to people who want to buy them. Why is this more palatable if there is a trader in between?


They create a more efficient market. Ensuring, for example, that the future price of a commodity matches the spot price when the future expires.

They provide an anonymous financial service.


To a financial-impaired mind like mine those looks like great explanations, thanks you all that responded me, now it does seems a little more fair.


They aren't producing anything, but they are providing a service. Algorithmic futures traders are typically uncorrelated with for example the S&P so pensions funds use their services to smooth out volatility and generate better returns.


Not knowing the exact firm I would still debate that its purly mathmatical. _Every_ professional I have talked to has pretty much said that no matter how good your math there is still additional information required other then the raw data about the security.

Also I am talking about trading not market making which can be done automatically, market making only works when you can beat the other guys at making deals and yes you can do that with pure numbers, but again that's not really what were discussing here.


Algorithmic hedge funds trade on statistical models successfully. I work for one at the moment, and over the long term they expect to make 15-20% per year. We have no traders and there's no intervention - it's purely the statistical model that determines how and when to trade.

So it can be done, but you need a brain the size of a planet. (I don't have a brain the size of a planet, so I don't build the models)


I can confidently predict that no, you won't make 15-20 percent as a long term average, though you may well manage a number of good years in a row. Save while the going is good.


May I ask where your confidence in this prediction comes from?


You comment about needing the brain the size of a planet is what leads off my doubts. Its like people who say that credit swaps are too complex to understand so don't even pay attention to them. Your basically admitting you have no idea whats going on and trying to cover it up by saying its just way to complex to understand


That was a somewhat flippant comment... The guys who build the models have phds in physics plus advanced degrees in statistics and stochastic calculus etc. I understand the basics of how the system works, but due to my lack of that advanced maths education and a certain amount of secrecy I don't have every detail.

So in summary: it is too complex to understand, unless you have a very advanced education in the statistical techniques they use to build the models.


Agree, that always bothers me, hand waving under the guise that it's too complicated to be understood. It may take some time but it can always at minimum be on a high level understood. If it can't then someone is lying or hiding something.

(For example I don't have a PHD in physics but a few years of reading and following up on linear algebra, and I can hold my own in a conversation on sting theory with with a PHD Physicist).


One example, there are many:

These guys certainly knew their stuff, and they also had a system: http://en.wikipedia.org/wiki/Long-Term_Capital_Management


Completely agree, been doing the same for last 3 years using a combination of machine learning techniques which without some human input are only at best as successful as putting money into a savings account, or in most cases would loose money.


what do you mean by trader input?


news events? info from discussions on trading/economic/product related forums? (insider info? dont tell the SEC :P)


Can somebody explain to me why, if this really works, you would publish it in a blogpost? Shouldn't you be hunting down investments of $X to turn $1.093X?


Like selling pickaxes during a gold rush, Quantopian is probably better off encouraging hobbyists to start speculating via their platform.


Yes. Which seems like a better investment: dedication to a platform that certainly will attract speculators and is underserved in general or dedication to a trading algorithm in an arena overpopulated by very smart people competing in a zero sum game -- against you? Both provide the potential for scalable, massive returns, yet in the quant realm, selling pickaxes is a much safer bet and is equally -- if not more -- lucrative. (In the contextually correct parlance, the sharpe ratio is much higher for selling Quantopian.)


IMHO, when developing a trading strategy it helps to document and share your strategy with others as you'll come to better understand it from the questions and observations others make. No one strategy can or will be successful forever.

Many algorithms stop performing when market conditions (lasting hours, days, weeks, months) change. Having a deep understanding of your algorithm and what makes it successful for any given period of time can better help you make adjustments when needed.

Lastly, this may be where the algo started but not necessarily what they will run in production. It's much more likely to no longer be discussed at this point. Perhaps similar to ideas are worthless, execution is everything for startups.

edit: typo


If an algorithm stops performing after hours or days, it's likely you haven't discovered anything, but are simply seeing the effects of random noise on your hundreds, thousands, or millions of signal possibilities.


One example that comes to my mind could be an algo closely related to the price of another security or index of what have you. At times, this algo could be highly correlative and at other times less so.

I agree with you about random noise. Ultimately I'm just looking for something to make me feel like I'm taking an "informed position." You never really know what's going to happen.


Have their been any studies on the accuracy of backtesting data in predicting actual returns? Comments like yours always appear on these articles... "You can backtest, but you never know what's going to happen!" Well, obviously. But at what point is testing against 12 years of backdata not enough? Is it a matter of understanding exactly what your algorithm depends on and watching for those conditions to fail?


To me, backtests are constructive but like you said, not predictive and perhaps questionably meaningful. Any good strategy using technical analysis will be able to identify a trend in any market. What differs are entries and exits. These factors can be influenced by current events making them tough to model into an algorithm especially with one with a shorter timeframe.


I'd be hard-pressed to call this algo as "working". If you look at the PnL, it makes most of its profit from Sept 15 2012 to Nov 15 2012. The rest of the time, it's either drawn down or chopping back and forth. 2 months of gains over a one year period sounds more like noise to me than anything else.


When the broad market is rising by over 10% annually, it is very difficult to come up with a trading strategy that looses money.

For example, buying SPY and holding it for the same period would have outperformed your algorithm.


Sorry, but saying that "it is very difficult to come up with a trading strategy that loses money" means you really have no credible experience with running trading algorithms that use real money.


Do your care to address his point? If the S&P 500 genuinely outperforms "your" (his? someone's) algorithm, said algorithm is a priori unimpressive.


Serious question: does this meet "Show HN" criteria? I mean I value sharing the algorithm, but I thought that Show HN is reserved for entire projects (ie. sites, saas platforms, etc.), not using ones platform to put up a description of algorithm and some numeric data. I'm not trying to troll, just wanted to know how the community understands "Show HNs"? In this case it can be seen as more of a Quantopian show off (which is interesting service, but had already been showcased) than the algorithm or project itself?


Why the downvotes? I explicitly said I just want an answer about how the community sees "Show HNs", not attacking anyone. Is that really that offensive and unconstructive? How are we suppose to improve on quality of this environment if one cannot ask about community guidelines?


While I'd love to take all the credit (blame?), the reformed academic in my feels compelled to admit that the idea to look for predictive value in stock loan data is not original to me. The finance literature has some fascinating articles on this dating back as far as the late 80s (look for Desai 2002, J of Finance, Asquith 2005 J Fin Econ, or most recently http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1570451).

The intuition behind this signal as a market inefficiency, or 'anomaly' is that the market sees short sellers as informed investors, the so called 'smart money', and there is a herding effect to follow their trades which generates abnormal returns. The same logic can be applied to disclosed insider trades or institutional holdings filings made public via the SEC's EDGAR database.

Fawce's slick implementation of a 'Days to Cover' signal is a great way to highlight the power of aiming new tools like Quantopian at freely available public data stores (which exist expressly to increase market transparency). And sure, it doesn't go the whole way for you on execution details like borrow costs, liquidity etc. but those aspects tend to be unique to each trader.


You should put in some kind of protection for a max drawdown loss, like if you lose x%, you exit. Sometimes your algorithm messes up, or market conditions are bad. http://www.businessinsider.com/hedge-funds-smashed-worst-qua... Long short equity funds did poorly in 2008 financial crisis, and also in 2011, when there was high volatility.


It would be cool to do that with this signal, if the algo was buying/selling on another signal. Maybe use the short interest signal as a gate on momentum investing for example.


Very interesting stuff. "The Benchmark" is the SP500 I'm guessing? I couldn't find the answer after clicking around for a bit, sorry if I'm dumb. You might list the reference security in the chart, or do something like "SPY (benchmark)" in the key.


Did anyone actually try this with real money? Does the model include transaction costs and market impact effect?


If you click on the code and search for commissions, you'll see how those costs are taken into account. The big missing thing is the market for borrowing the stock to do the short side of the trade.

No money has traded on my version no. But, I understand that asset management firms have licensed the more sophisticated one Jess wrote at TR, so I would think they use it with real money. From what I understand, firms look at numerous signals like this, and then make investments based on a combination of the signals.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: