Hacker News new | comments | show | ask | jobs | submit login
Knight Capital Says Trading Glitch Cost It $440 Million (nytimes.com)
73 points by rkaplan 1788 days ago | hide | past | web | 90 comments | favorite



These sorts of problems will occur in any distributed system which runs at a high rate without humans in the loop. There is no chaos monkey for the stock market, and transaction rollback is available only in the most extreme circumstances.

I have yet to see a convincing rationale for how high frequency trading adds value to the system - it certainly doesn't seem to add pricing stability. Because this is hacker news, I suspect a few people reading this either work in the industry and/or have strong opinions about it, and I would love to hear why I am wrong. Because I do like it when clever computer scientists make money, and that seems to be pretty much the only social benefit of HFT, at the cost of flash crashes and things like the example above.


These sorts of problems will occur in any distributed system which runs at a high rate without humans in the loop.

This implies that humans are actually better than machines at making data-driven decisions at high rates. They're not. They're astoundingly not once you compute the cost of humans versus the cost of machines on a per-decision basis.

Liquidity in the market used to be provided by large groups of sweaty, overpaid alpha males yelling at each other. We replaced them with dueling robots. The robots can provide liquidity for a fraction of the price. The alpha males really, really hate competing with robots, because the alpha males invariably lose, so they complain that robots are stealing the money that the alpha males used to extract from their customers by right of being the one with a license to be shouting and sweaty at a particular physical location.

Occasionally a robot blows up. Not a problem -- robots are easy to replace. Besides, humans blow up all the time. We only ignore their ridiculous strictly-inferior-in-every-way-at-this-task nature because they look more like us than the robots do, and because these particular humans being displaced used to be rich, whereas e.g. telephone operators tasked with manually doing call routing (also clearly inferior to highly reliable distributed algorithms) were poor.


Another way to look at this: it's a well-known phenomenon that expert radiologists will misdiagnose an X-ray image, even contradicting a past judgment they made on the same X-ray. It's feasible that a properly machine-learned computer could make better judgments on a long-running average basis.

But when a computer screws up an edge case, versus the many, many times a doctor will screw up a diagnosis, which instance will get more attention. People are scared of black box machines and will take particular note of the times an algorithm has screwed them over, regardless of the times a human has screwed them up.

Also, when the possibility of examination exists, computerized decision making is far, far easier to audit (i.e. determine blame)


Exactly. It is the same as the problem with robotic cars.

Cars with human drivers kill thousands, but nobody blinks. The day a robotic car kills a single person, all the news networks will have a field day.


I think you're right about this. Maybe it has to do with appearance of control/self determination. It was a mistake (an accident), but it was my mistake/accident (as opposed to something one didn't have control over).


I once posited the question to a group of friends, would you rather drive a car with a 60% chance of crashing fatally in your lifetime, or use a robotic car that as a 2% failure rate. Everyone chose the 60%, even though statistically it makes absolutely no sense.


Not (entirely) true. Robots are not strictly superior to humans in this case because the emergent properties of the system change due to the speed of robot action.

As more robots crowd into the very fast strategy space, there tends to be more collision of strategies and thus more extreme price movements as many robots all try to buy/sell the same thing at the same time. If they all pile onto one side of the market, liquidity disappears and price movements become dramatic. In the very fast strategy space, there is less capability for complex strategies requiring long memories. If you want to save some nanoseconds to win at this winner-take-all market, you need to reduce your memory and processing time.

http://arxiv.org/PS_cache/arxiv/pdf/1006/1006.5490v1.pdf

This isn't saying that humans would never display herd behavior, but that they do so less often.


Humans absolutely do display herd behavior. This has been documented in financial markets and other venues.

However, they do it (by definition) on a time scale that is perceptible to other humans.


I agree with your premise -- the issue is that when humans blow up it doesn't cause a cascading runaway failure like robots are susceptible to.


Humans are susceptible to cascading runaway "ethical" failures. I'm thinking the $2 billion UBS loss. Humans also don't scale very well so risk tends to be distributed across several of us. Perhaps robots should be setup in the same way; that there are several of them with different algorithms tuned for different levels of risk... or perhaps it really doesn't matter all that much because this is one firm and not the entire system so the risk is already distributed correctly.


That's the first time I've heard an explanation of the benefits of HFT that actually adds up. Maybe that's because it's appealing to my prejudices, but thanks.


I think I stole "dueling robots" from tptacek or yummyfajitas, but sweaty alpha males are totally mine.

Edit to add: The dueling robot at HNsearch says a) it was tptacek and b) I should trade with it because it remembers HN comments better than I could ever hope to.


Well put.

A lot of this "but ... but ... but ... we NEED a human in the loop" handwringing that we're seeing on the news channels is obsolete people who used to be ridiculously overpaid complaining that they have been replaced by fast computers and efficient code.

(Disclaimer: Non-sweaty alpha male HFT algorithm writer here ... ha.)


There are cases where having a human in the loop is appropriate, particularly manufacturing. A human, thanks to a combination of slowness and observation skill, will pick up on a fault after the first error while a machine, thanks to a combination of speed and lack of observation skill, will produce fault after fault until things come to a crashing halt or someone notices before that happens.

This $440m example shows that a human should be thrown into the mix. Let the human/machine strengths cancel out each other's weaknesses.


The 'Does HFT add value?' question gets flogged to death every time one of these articles gets posted.

http://news.ycombinator.com/item?id=3894302

http://news.ycombinator.com/item?id=3852341

http://news.ycombinator.com/item?id=2828538


To return the discussion back to your main question, there is strong evidence that high frequency trading, or algorithmic trading (AT), increases liquidity: http://www.afajof.org/afa/forthcoming/6130p.pdf (See Figure 2). Increasing liquidity and reducing the bid/ask spread adds value to the system by increasing efficiency and information dissemination.

For a period after the introduction of autoquote in 2003, providers of liquidity captured most of the surplus and enjoyed larger margins on trades (See Figure 3). However, this advantage quickly dissipated as more parties implemented AT. The first-mover advantage doesn't apply in the world of equity markets; competitors quickly duplicated AT strategies and competition swiftly lowered spreads in the second half of 2003. Today, spreads on equities are much lower than pre-2003 largely thanks to AT.

As for pricing stability, let me disregard AT glitches for the moment. Algorithms can tirelessly monitor market information, whether media reporting, filings, event rumors (eg M&A), order trends, etc. Humans are are relatively constricted to a few information sources when executing trades in comparison to AT. In addition, AT reacts faster to new information sources and can adjust bid/ask near-instantly. Therefore, price volatility increases as a result of increased information efficiency.

Glitches and fast-crashes are a negative counter-example to the information efficiency argument above. I leave it to the reader to decide if liquidity benefits justify the occasional flash-crash. However, recognize that this phenomena is not exclusive to AT: many human traders have caused similar crashes of their own -- I'm looking at you London Whale.


People make massive mistakes all the time. http://dealbook.nytimes.com/2012/06/28/jpmorgan-trading-loss...


What would happen if a computer trading operation used the JPMorgan London unit's flawed risk model and a computer executed trades based on it as many times as the algo saw "opportunities"... until the humans who created the model noticed a problem with it?

Tens of billions? Hundreds?


The usual criticism of high frequency traders doesn't really apply to Knight. Knight is required by law to execute orders that are sent to them, instead of 'investing' in stocks for the long term.


"These sorts of problems will occur in any distributed system which runs at a high rate without humans in the loop"

You write a risk check gateway once. It rarely if ever needs to be touched.

If you do this right, you roll out changes on one machine with strict risk controls (1000 share position, for example) to prevent such a blowout. Then you transition everything else.

Ironically, this should be exactly what other tech firms should do: roll out changes to a small area, check basic problems, and then move on to a larger rollout.


> You write a risk check gateway once. It rarely if ever needs to be touched.

Such checks are only as good as the imagination of the author.

> Then you transition everything else.

Which is all well and good until you have an issue where two independent systems creat a feedback loop. Such problems are only evident 'at scale'.

I'm not blaming or trying to exonerate anyone. I love armchair quarterbacking as much as the next guy but a small trading shop runs a little differently than america's largest electronic market maker.


I went through this process myself in setting up a self-clearing BD, so I am fully aware of all of the relevant rules and regulations.

>Such checks are only as good as the imagination of the author.

SEC and FINRA regulations require very specific risk checks. One of those involves looking at potential positions if all of your (buy or sell) orders get filled. You have to sum your max potential gross position across all symbols and decide if you will trip a limit. If so you have to halt.

This is not imagination; this is codified. If you send an order for 100 shares, you HAVE TO ASSUME they were filled when you send the next order. That's the rule. This was not left to imagination.

"Which is all well and good until you have an issue where two independent systems creat a feedback loop. Such problems are only evident 'at scale'."

There should be no feedback where the trading engine affects risk check. In fact, the new rules require separate code bases and separate legal entities.


You created a self-clearing broker/dealer? Thats incredible. I would hope that the benefits of that paperwork outweighs the incredible costs of setting that up.


Nobody except for HFT people has ever argued that they add value to the system.

Everyone else believes that HFT exists because it's profitable, and that it subtracts value from the system.

You can choose who you want to believe.


Market makers used to be humans. Now, due to advances in technology, computers can do that job. Computers are (generally) cheaper than humans so they can do the job for less money. You can see this in the fact that spreads have decreased as market making has been turned over to computers. That's the value they are adding.

Market making has always been work so it's always cost money. But now it's cheaper. There's always been money subtracted from the system, but now it's less than it used to be.


This is incredibly strange.

You get back an acknowledgement with each fill. There are risk checks in place that should check whether the order was a take or add liquidity fill (as per rule 15c3-5 amongst others)

This is a risk check that should have been handled by a component which appears not to exist. They could lose FINRA and SEC authorization if it is as bad as it sounds ...


Dark Knight Rises spoiler alert:

Isn't it really odd that a company called "Knight" makes all these nonsensical trades just a couple weeks after the movie where a similar thing bankrupted a company?


Hah this is the best comment I've heard in a while, what a great way to start the day. Thank you sp332.


Lesson learnt: keep your fingerprints safe, people!


I love it.

Looks like someone's HFT system backfired. I'm looking forward to the Nanex analysis of this one.

Nanex on the 2010/5/6 'Flash Crash':

http://www.nanex.net/FlashCrashFinal/FlashCrashAnalysis_Theo...


> I'm looking forward to the Nanex analysis of this one

Already posted:

http://www.nanex.net/aqck2/3522.html


Wow, you can almost see the bug happening as you read this. Amazing stuff.


Second that.

Nanex has done some great work. To date, afaik, they're the only ones to try and figure out the Facebook IPO trainwreck from the technical side. And it's a doozy.

http://www.nanex.net/aqck/3099.html


$chadenfreude


It seems that Knight Capital's ended up buying at the offer and selling at the bid. In order words, Knight Capital essentially collected the opposite of a rebate by paying someone for no reason aside from attempting to create liquidity in the market which resulted in these losses.

As a result, they were pushing stocks higher and higher which causes a huge incentive for a system like Knight Capital have. Again, this explains not only some volume surge which was seen on the market but also massive moves in certain stocks like China Cord Blood Corporation (CO) [1] who rose several hundred percent, until someone finally stepped in & the important thing here to note that there isn't a reason under the current SEC order cancellation methodology to bail out Knight Capital and its algorithm.

Similarly, they were trading Exelon Corporation (EXC) [2] and were losing $0.15 on every single trade and when its being done 2.4k times per minute, its very easy to lose money.

[1] http://finance.yahoo.com/q?s=co&ql=1

[2] http://finance.yahoo.com/q?s=EXC&ql=0


WTH is "China Cord Blood Corp" - they collect cord blood for stem cells I suppose - id be very wary of any such product like that from china, both for quality and humanitarian reasons.

Take a look at the Vice documentary on Tiger part harvesting. China states that tigers are endangered, yet there are these secret farms that breed thousands of them and the make tiger dick wine, all sorts of things with their other parts as well.

I wouldn't think any biological products being produced in china on the level.


My first job out of college was at a small HFT firm. The night before rolling out my first changes is forever imprinted in my memory. Lots of tossing and turning and images of exactly this kind of disaster.


How did its testing/security practices measure up to other places you've worked? I understand that the tech division of a financial firms might be top notch, but the culture of the moneymaking dept. may mismanage them.


SEC/FINRA have required practices. The risk controls have to be firewalled from the trading desks precisely so that this type of blowup is contained.


Is that new? Back then it was two devs and two traders in a room. See a problem? Press the big red button. Deploy change. Press the big green button. Cowboy style. I STRONGLY recommend against doing it this way. I was young and stupid and I'm pretty sure it took years off my life.


The issue regards naked access. At that time the broker had requirements. Now that two-dev shop has to worry about risk etc.


I am highly skeptical of this being a software glitch and here is why: I have been working for banks' brokerage and capital markets groups for years and the amount of checks and reviews processes that they have put in place is just incredible. One does not simply deploy a new software into production without there being several small pilot runs, staging tests etc. etc. There are committees with onerous questionnaires and test results compiled by QA that need to be documented and approved before a new deployment is approved. Therefore I don't see this being a software issue. If the checks were not in place then it seems that the management was lax in safeguarding its capital.


I don't understand how it went on for 45 minutes without human intervention. Do they really not have a live person at the very least monitoring trades?


It probably _was_ noticed, but once it started you can't just pull all of the plugs since that you make you lose even more...


i'd wager they thought they were operating in simulation


Thank heaven I didn't write that code.


Or deploy it. Imagine being the last one in the push queue that day.


on the other hand, it's a good story to tell your grand kids one day...


"Trading Glitch" = a very good excuse for very bad trades. I'm not implying this happened, but if I was an analyst at Knight Capital and we just lost $440 million, I'd blame it on a "trading glitch".


Why is admitting you have buggy trading software better than admitting you have traders who made bad trades?


Because if I'm an analyst(employee of the company), a trading glitch isn't my fault. A bad trade is.


Huh? Of course the employees in charge of the algorithm and the trading platform are going to get blamed. It's their system that went haywire.


There's no analyst involved in these trades. Knight is operating as a dealer providing liquidity. They provided a bit more liquidity than they intended today.


Buying high and selling low, several times per second for hours, seems like a glitch. That's not a "bad call" or even an insane call, that's a systematic, continuous destruction of value.


Yep.

They're calling it a "glitch" to suggest something unavoidable, a problem that could have happened to anyone.

It's same vein of bullshit we heard in 2008 "We couldn't possibly have known!" and more recently with the string of "rogue traders".

In every case, its just a bunch of folks who don't want to take responsibility for their gambles when they lose.


You are completely wrong. Try looking into the circumstances before you make some unfounded accusation about what caused this. Comparing a 45 minute isolated incident at a firm with an otherwise-excellent reputation to the housing crisis or rogue traders is moronic.


You're missing the point entirely.

They gambled on an algorithm and everything related to engineering and operating it.

On some level or another they failed spectacularly and they're calling it a "glitch" - a minor malfunction, a transient error, a spurious little blip.

Say what?

$440 million dollars pissed away over the course of 45 minutes is not a glitch by any stretch of the imagination.

It's a failure of epic proportions at multiple levels. Failure to test, to review, to react and finally failure to own up.


They did not gamble. Market making doesn't involve gambling. Someone checked in an egregious error that testing failed to find. The fact that it is in a trading system makes the cost of the bug easy to measure.


What on earth are you talking about?

Market makers gamble on their ability to set spreads that will produce a profit.

They're using code to generate spreads, execute head fakes and stuff quotes to their advantage. They're "players" as much as anyone.


How are they not taking responsibility? They lost the money. They aren't getting it back. What more responsibility could they take?


Bingo, and this is the main reason for the downfall of our world.


Given that there was suspicious activity in ~150 names of ~45 minutes "trading glitch" is much more plausible than "bad analysis".


I've always wondered, what happens to the actual programmer[s] responsible for a bug like this? Assuming they find the exact issue, and then go back through the version control logs to the exact commit, how responsible, if at all, is the individual programmer? Or the QA team, for that matter.

Incidents like this make me think that I'd have a panic attack with every commit, if I worked in the financial industry.


I think this talk is relevant: https://ocaml.janestreet.com/?q=node/61


Why?

You might be far less likely to blow up the way Knight did, but you're hardly immune.


It is reasonable to use technology to make the universe of potential errors significantly smaller.


I'm not sure, if it was a technical failure, but here is a great talk from Def Con 19 about security and high trading systems https://www.youtube.com/watch?v=ncpm6vRi9F4


Is there anything good written about failsafe programming? I try to code defensively so terrible bugs like this can't snowball. A heuristic like "only make 1000 trades a minute" could have saved Knight $400 million!

I try to build global failsafes into my code, particularly network code. For instance, a recent Twitter project I wrote has various loops calling their API. That code has a global counter that says "this program can only ever make 25 calls" that's enforced in the bottom level HTTP calling function. If the rest of my code is correct I'll never trip that limit, I have application-specific logic that means I should only ever need to make 10 calls. But I appreciate having the failsafe in case that more complex logic fails.


> A heuristic like "only make 1000 trades a minute" could have saved Knight $400 million!

A heuristic like that would not allow Knight to function. Knight is an electronic market-maker (one of the biggest) and is probably making markets in thousands of symbols and likely updates those quotes several times a second. Their normal quoting rate is on the order of millions of messages/minute and in volatile times may spike 10x or more and still be within 'normal' limits.


A heuristic like "do not lose more than $100m a day" would have saved the company.


PNL limits are required risk checks. Clearly Knight didn't implement them ...


Stop-losses are not guarantees of execution. If you're in an illiquid market, or one that moves faster than you're prepared for, then your PNL check may alert you, but it won't save you from the loss.


More likely they thought they had implemented them but they were implemented incorrectly.


Use your imagination to come up with a more appropriate heuristic.


Look up Jane Street Capital. They use functional programming with a very strict/smart type system (Ocaml) to help make sure that, for example, there are no logically possible situations in the code that they haven't programmed a response for. They have some pretty good eductional material online.


"there are no logically possible situations in the code that they haven't programmed a response for"

Until they blow up one day themselves...

Call me cynical, but no type system, no matter how sophisticated, nor team of programmers, no matter how talented & experienced, is immune to this type of thing.


As I said, it helps make sure that doesn't happen.

Anyway, I don't think they'll blow themselves up big, like Dark Knight Capital did. First, this specific kind of thing doesn't happen often (where one firm just messed up so much that it makes world news). Second, it's clear from Jane Street's videos that they're pretty serious about quality. If I were going to bet on which market maker will screw up, it would be somebody else.


Sure, it doesn't prevent them from doing the wrong thing in that situation, but it does require that they have given the situation enough attention to write code for it, which is a serious improvement over not handling it at all. As I understand it, they play defense in depth with regard to code quality.


It's only as good as the specs (APIs, conventions, contracts) they are working against.

In 2003, I was using a trade API that gave me confirmation about canceling an order at 10am. And then executed the same order at 1pm; Apparently, even though it did confirm cancellation, the order wasn't canceled. No matter how well your code is equipped against that - it's a legal problem, not a technical one.

Similarly, when the exchange retroactively cancels ("busts") trades after a flash crash, with a rather arbitrary rule (e.g. all trades between 3:45pm and 3:54pm in AAPL with price < $500) - you're left holding the bag, regardless of how good your code is.

The only winning moves in these cases is not to play - unfortunately, you can only tell retroactively.


Interesting!


My initial skepticism makes me want to think this was maliciously orchestrated. Where one firm stands to lose, someone else would undoubtedly gain.


Interesting theory, but there is no implicit parity between # of winners and # of losers. Whereas 1 firm lost $440M there were thousands (maybe tens of thousands) of firms and individuals that made hundreds or thousands of dollars.


Unless someone comes in and scoops up Knight for cheap--the stock is trading down 65% or so over night.


They are out of cash apparently. Likely to be sold or file for bankruptcy shortly it seems.


Is it possible to purchase a ridiculously expensive insurance policy protecting yourself against the risk of dangerous bugs like this?


No. Buying the insurance would probably increase the risk as checks would get more lax. And you can't hedge it.


I'd love to see what the patch looked like.


I wonder whats going on with the IT executives/CIOs of Knight at this point . They must be hanging by a thread?


They're probably very close to hanging from a thick rope.


easy come, easy go.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: