
How to lose $172,222 a second for 45 minutes - _wmd
http://pythonsweetness.tumblr.com/post/64740079543/how-to-lose-172-222-a-second-for-45-minutes
======
adambratt
The week after this we had a trader in our office who had a meeting at Knight
on the morning it happened.

He said he saw the whole dev team just power off and go home at 11am, followed
quickly by the rest of the employees. At that point, there was nothing they
could do.

The craziest thing is that it went on for so long. No one caught it until
their own traders so it come across Bloomberg and CNBC. They actually thought
it was a rival HFT and tried to play against it.

The only people that came out of this ahead were aggressive algos on the other
side and a few smart individual traders. A lot of retail guys had stop losses
blown through that normally would never have been hit. After trading was
halted they set the cap at 20% loss for rolling back trades. So if you lost
19% of your position in that short period of craziness, tough luck.

~~~
chollida1
I dont' want to call your friend a liar, but this is most likely false.

> The week after this we had a trader in our office who had a meeting at
> Knight on the morning it happened. > He said he saw the whole dev team just
> power off and go home at 11am, followed quickly by the rest of the
> employees.

1) Dev and Trading/Sales happen at different physical locations.

2) I actually know someone who spent their day cleaning this up and according
to someone who was on the tech team and working that day no one went home
early.

Think about it, the firm just lost a shit load of money due to an IT issue.
Dev's were frantically searching the code for the bug, Sys admins were rifling
through server logs. No one had nothing to do:)

> After trading was halted they set the cap at 20% loss for rolling back
> trades. So if you lost 19% of your position in that short period of
> craziness, tough luck.

This is just plain false. The normal procedures for busting trades were
followed. There was no 20% "cap" for losses, how would you even determine what
a 20% loss is?

~~~
adambratt
I really have no reason to doubt this guy as he's a pretty prolific trader.
That said...traders are rather known for hyperbole.

Per the 20%, I forget what it was but I know trades were rolled back and there
was some kind of threshold for them. When I first wrote this, 20% losses was
what I remembered. There's an article on it somewhere with the actual amounts.
I think it had to do with how far stop limits were off from the open price of
the stock.

------
jpatokal
Just another reminder of how systems that you'd think are rock solid often
aren't.

In my previous life working with telcos, I once tried to teach a particularly
huge customer how to use CVS how to manage configurations across a 10+ machine
cluster of machines. They didn't see any value in it, so they stuck to their
good old process of SSHing into each machine individually, "cp config.xml
config.xml.20131022", and then editing the configs by hand. Didn't take too
long until a typo in a chmod command took down the whole thing (= the node
couldn't take down a network interface anymore, so failover stopped working),
and they spent several weeks flying in people from all over the planet to
debug it... and they _still_ didn't learn their lesson!

~~~
fosk
Ha, probably no one ever got fired for manually SSHing into each machine. It
seems like a joke, but it really isn't.

~~~
falcolas
Considering that I've watched a new automated deployment process nuke an
entire network due to a mistake in the config... Sometimes slow and plodding
and potentially losing a node (as opposed to 100 nodes) is seen as preferable
to the PTBs.

Of course intelligent, incremental deployment works too, but let's not confuse
those poor suits...

------
scrrr
High Frequency Trading seems so abstract. There's no value created, it seems.
It's like something in between imperfect systems, scraping off the margin
created by that imperfection. It's fascinating, and interesting from an
algorithmic point of view (like a computer game), but at the same time I don't
feel sympathy for this company going out of business.

~~~
vecter
I really hate to go down this road because it's been rehashed thousands of
times on Hacker News, but high frequency traders add value to the market by
adding liquidity (and therefore reducing spreads --> cost to you for
executing) and price discovery.

~~~
sumedh
This liquidity argument is rehashed thousand times but did you know that most
of the orders made by HFT's end up getting cancelled.

Regulators found HFT's exacerbated price declines. As noted above, regulators
found that high frequency traders exacerbated price declines. Regulators
determined that high frequency traders sold aggressively to eliminate their
positions and withdrew from the markets in the face of uncertainty.

[http://en.wikipedia.org/wiki/2010_Flash_Crash](http://en.wikipedia.org/wiki/2010_Flash_Crash)

Berkshire Hathaway has a difference of 1000 dollars for its bid ask spread yet
you dont see a lot of people complaining, do you?

~~~
snake_plissken
>that most of the orders made by HFT's end up getting cancelled.

Thank you. That is all that needs to be said.

------
malbs
Just one of the risks of automation, and a good reminder why human monitoring
is necessary.

Having said that, we deployed a system that was mostly automated, with the
human operator to oversee investments and if any out-of-the-ordinary
transactions (based on experience) were taking place, to shut it down. She
happily sat there approving the recommendations even though the
recommendations were absolutely outside of anything we'd ever generated in the
past, and bled accounts dry in one evening, so sometimes even with a human
observing you're still boned.

~~~
fiatmoney
You should read the linked PDF - they had systems that were 100% dependent on
human monitoring, that no one was checking, or where no one recognized
anything unusual. If anything, their failures were due to massive lack of
automation in deployment, testing, and monitoring.

~~~
malbs
Yeah ours was monitoring just rubber stamped it. Afterwards everyone remarked,
everyone could tell they were bad just looking at what was in front of them,
our theory was she was too busy watching the breaking bad final episode or
something.

~~~
jrochkind1
You know how if the first 100 times a dialog box comes up, the correct
response is to click 'ok', then people start just clicking 'ok' on every
dialog box, and then the 201st one comes up "Destroy everything? [cancel]
[ok], and they click 'ok' too, and don't think anything of it?

~~~
Dylan16807
Sure, you click reflexively, but you should notice the text was bad either
after clicking or within a couple more clicks. Letting the first few errors
through is reasonable, but letting a wall of them through without noticing
anything wrong, when reading them is your _job_ , is inexcusable.

~~~
jrochkind1
I don't know if it's excusable or not, but it may be incompatible with typical
human cognition to expect someone to be able to do that.

Maybe you have to figure out a way to test people for unusually high aptitude
at looking at mind-numbingly dull repetitive things over and over again, but
then still being able to notice the aberrant ones. And then only put people in
that job with unusually high aptitude there.

Or have people only do pretty short shifts at that task.

I'm pretty confident that this person wasn't unusually negligent, if you have
most anyone doing that job hour after hour day after day they will lose the
ability to flag the aberant stuff.

~~~
comrade_ogilvy
Yours is the correct viewpoint: it is incompatible with human cognition.

If an alert system is not perceived as highly reliable in directing positive
action, then the humans involved _will inevitably_ disable the alert system,
either by pulling out a screwdriver or rewriting their mental rubrics to
ignore the messages as noise.

Knight Capital is just the finance version of Three Mile Island and Deepwater
Horizon -- the means to mitigate or prevent disaster were on hand, but the
people in charge just dithered by the kill switch because they were confused.
Well, if the people in charge are confused, _that_ is a reason to start the
emergency procedures.

~~~
jrochkind1
in the ancestral comment here, it wasn't even an alert system! It was just
"your job is to sit there all day and watch every single transaction and flag
the aberrant one"!!

~~~
Dylan16807
"Whoa this transaction is way bigger than normal. So is this one. And this
one."

People ignore repetitive things, but they usually notice when it changes. They
can tell you that it's shaped different or explain how it sounds different
from normal.

If this system made dissimilar transactions look very similar to the monitor,
then it is to blame, not the idea of having a monitor at all.

------
manishsharan
Don't humans also make similar large scale mistakes? Merill Lynch's infamous
London whale comes to mind. Also. I could be wrong but aren't most of
derivatives a zero sum game: don't I have to lose money on my puts for you to
make money on your calls ? Didn't so many people lose money on securities
because they misunderstood their exposure ?

The Knight computer error was spectacular and catastrophic but us humans have
a longer track record of making catastrophic financial decisions in the
market.

~~~
adambratt
Options are complicated. At their most basic level they are no different than
a bet, so yes zero-sum. However when used in a spread or as a hedge or any
other way to avoid risk or when sold against stock you own as an income
generator, it's tough to call them zero-sum.

Puts and Calls are confusing as they are both something you buy. It's not like
a sports bet where you're betting on the team to win so the other side will
lose if they win. You can buy a put and a call in the same stock and profit on
both if your strike prices are aligned properly.

The opposite side of buying a call is selling a call. Just like shorting a
stock, when selling an option the most you can make is 100% and your loss
potential is infinite. As an option seller, you're hoping the option expires
worthless and out of the money so that you can keep the initial outlay. Most
brokerages require a high level of clearance to allow you to sell naked puts
and calls as it's generally a bad idea if you can't cover the potential
upside.

~~~
bradleyjg
For most products there is a legal framework that forbids buying an insurance
contract unless you have an insurable interest. However, under the Commodity
Futures Modernization Act of 2000, designed by Summers, Greenspan, Levitt, and
Rainer, state insurance regulators are forbidden from regulating OTC
derivatives as insurance products. They were already forbidden from regulating
exchange traded derivatives (e.g. options and futures).

------
sirsar
I'm shocked they didn't have a killswitch or automated stop-loss of some kind.
A script that says "We just lost $5M in a few minutes; maybe there's a
problem." Or, a guy paid minimum wage to watch the balance, with a button on
his desk. $172,222 is a lot of minimum-wage years.

~~~
ryporter
I work for a small automated trading firm (in foreign exchange), and marking
positions to market is one of the difficulties in designing an effective kill
switch, because these marks can easily make the difference between a large
gain and a large loss. In fast-moving markets (which is when a kill switch is
most useful), it's very hard to determine the true mid-market rate. Our system
of course always has such a notion, but if we used it to shut down our system
every time it looked like we had lost money, then a market data glitch (which
is not at all uncommon) would impose a large opportunity cost as a human
intervened during an active market.

Instead, we designed our system so that there's a very low threshold for it
stop trading if it appears to have lost money, but to only do so temporarily.
If our marked-to-market position recovers shortly thereafter while the trading
system is idle, then the apparent loss was probably due to a market data
glitch. On the other hand, if our position does not recover, then the
temporary stoppage becomes permanent, and a human intervenes. (Obviously,
there are more details here, but this is the general idea, and it's worked
very well for us.)

~~~
dmak
What kind of technology stack are you guys using? Also, is your system
constantly being improved to detect these things or was it just a onetime
setup kind of thing?

~~~
ryporter
Our trading system is purely in C++.

This particular subsystem was a replacement for a previous version, which was
a kill switch and had led to opportunity costs. We spent a lot of time
designing, implementing, and testing it, but haven't felt the need to touch it
since then. It's sufficiently general that it doesn't need to be adapted as
our strategies change, and it doesn't need to adapt to changing market
conditions (as, e.g., a trading strategy does).

------
protomyth
"During the deployment of the new code, however, one of Knight’s technicians
did not copy the new code to one of the eight SMARS computer servers. Knight
did not have a second technician review this deployment and no one at Knight
realized that the Power Peg code had not been removed from the eighth server,
nor the new RLP code added. Knight had no written procedures that required
such a review."

That is just painful to read. How many times do we hear a company couldn't
figure out how to migrate code properly? Do any software engineering programs
teach proper code migration?

Next time a manager questions money spent on integration or system testing,
hand them a printout of this SEC document and explain how much the problem can
cost.

~~~
HillRat
Look up the 1999 WorldCom outage due to a screwed-up load of Lucent's Jade
platform upgrade. Fun times. Best I can say about that is that within a year
WCOM execs had bigger problems than just pissing off CBOT....

~~~
protomyth
Oh my, that sounds like a serious Career Ending Event for someone.

------
OSButler
The title reminds me of hosting clients, who would complain about losing
thousands of dollars per minute when their $10/month website was experiencing
downtime.

------
fiatmoney
"The best part is the fine: $12m, despite the resulting audit also revealing
that the system was systematically sending naked shorts."

Cool - all you have to do to get away with financial crimes is create a system
with no protections against breaking the law.

~~~
judk
This connects back to the 'autonomous corporation' being discussed on the
bitcoin discussion on HN today.

~~~
malandrew
Related link for others and for posterity:

[https://news.ycombinator.com/item?id=6589067](https://news.ycombinator.com/item?id=6589067)

------
mgav
Very interesting, though I was happy to see Knight Capital take the huge loss,
since they were such complete scumbags who stole hundreds of millions of
dollars by backing away from trades* during the dotcom boom and bust.

*Backing away is when a market maker makes a firm offer to buy or sell shares, receives an order to execute that transaction (which they are ethically and legally obligated to do) and instead cancels the trade so they can trade those shares at a more favorable price (capturing enormous unethical profits in fast-moving markets while regulators did virtually nothing to enforce the rules in a meaningful way)

Learn more: [http://bit.ly/1ddUzWP](http://bit.ly/1ddUzWP)

~~~
simoncion
Would you kindly provide the un-minified version of your link? This isn't
Twitter, and HN has its own method of shortening overlong links while also
displaying the original link target.

~~~
Ricapar
Un-minified link:

[http://books.google.com/books?id=geCHWBx-e9EC&pg=PA126&lpg=P...](http://books.google.com/books?id=geCHWBx-e9EC&pg=PA126&lpg=PA126&dq=%22backing+away%22+market+makers&source=bl&ots=UeWYm3gjwc&sig=ZMul3CSEI-
vr60iMbcH6GpERdV4&hl=en&sa=X&ei=m1NmUonyOZTs8ATdtoGoCA&ved=0CDkQ6AEwAzgK#v=onepage&q=%22backing%20away%22%20market%20makers&f=false)

~~~
mgav
Gracias Ricapar.

------
vincie
I would love to hear from an ex-Knight tech. Wouldn't be surprised if they
wrote something along the lines of: "Management just wanted this thing in
ASAP!", or perhaps "Tests weren't part of the kpi's". I may sound biased
against non-techs, but I have seen this time and time again. Testing is a
barrier to quick deployment, and "How much money are we losing while doing all
that stoopid testing?".

~~~
SideburnsOfDoom
> Testing is a barrier to quick deployment

I really feel bad for people who think like that. A process where tests and
deployments are automated and repeatable are vital to quick, _robust_
deployment. Quick deployment without tests just isn't going to work well.

------
yogo
I remember when Knight was in the news regarding this but never the technical
details about what took place. It's scary stuff especially given the money on
the line, and it makes a good case study for devops. I understand the
temptation to re-use a field but normally I'm for using new values in those
fields.

------
pallandt
Wow, this could have been prevented at so many 'checkpoints' that it reads
like an almost cautionary, fake anecdote rather than a real story.

------
Narkov
Out of interest, what would have been the outcome for Knight if their
positions had caused them to be winners? $12m fine, keep the spoils and "carry
on" ?

~~~
vinceguidry
It's not really possible to fail upwards this way. It would be like forgetting
how to play chess in the middle of a game and then winning. Anomalies are
universally negative in high-stakes environments, or if they're positive, only
engender modest improvements.

------
at-fates-hands
>>>What kind of cowboy shop doesn’t even have monitoring to ensure a cluster
is running a consistent software release!?

I think you'd be surprised at what happens in large companies. I went through
four, count em' four major releases with a company and each time the failure
was on load balancing and not testing the capacity of the servers we had prior
to release.

Even after the second release was an unmitigated disaster, the CTO said we
needed more time to do load testing and making sure the servers were
configured to handle traffic spikes to the sites we were working on. It
happened again, TWICE after he said we needed to do this.

You would think something as basic as load testing would be at the top of the
list of "to do's" for a major release, but it wasn't. It wasn't even close.

------
sitkack
Dead code takes down another system. A perfect storm of failures that they
made themselves. My gut feeling is that most trading firms could suffer a
similar loss. Having worked for a 3rd party accounting management firm that
kept logs for smaller traders I really realized how borked the whole system
is. 60s era pen and paper stuff moving at the speed of light.

> Sadly, the primary cause was found to be a piece of software which had been
> retained from the previous launchers systems and which was not required
> during the flight of Ariane 5.

[http://www.vuw.ac.nz/staff/stephen_marshall/SE/Failures/SE_A...](http://www.vuw.ac.nz/staff/stephen_marshall/SE/Failures/SE_Ariane.html)

~~~
frankc
Actually, I would characterize this as the removal of dead code that brought
down the system, which is basically the opposite.

~~~
sitkack
Dead code (latent functionality) in both systems was reactivated without
proper testing or knowledge and both systems failed as result. We need to be
very weary of latent functionality.

------
mischanix
Well, this makes me 1000x more scared of working in a DevOps role.

~~~
hedwall
How come?

DevOps isn't a role (to begin with) and a lot of the practices documented in
the text is the opposite of good DevOps practices.

~~~
twic
AOL. True devops is about building systems that make this kind of disaster
impossible. Or at least very hard.

However, like agile before it, despite the fact that it really means something
purposeful and rigorous, the word "devops" has become widely abused to
camouflage undisciplined, thoughtless, cowboy behaviour.

A handy way of telling the difference is to ask yourself "what would Devops
Borat do?"; if it's something Devops Borat would do, it's the false devops.

~~~
nasalgoat
I like to think of DevOps as one of those cheap, 3-in-1 printers that you buy
thinking that you'll be saving money and desk space.

Then you discover it does a mediocre job of each of those tasks as compared to
a dedicated printer, scanner and fax machine. Sure, they'll take up more desk
space, but you'll get higher quality results.

------
dror
Is there any benefit to the market as a whole to have these high speed
transactions trying to game the system?

Seems like as a rule, they're likely to cause instability, and I have a hard
time seeing any benefits in them.

~~~
tempestn
The standard answer to this is that they provide liquidity. Whether that
benefit outweighs the drawbacks is a subject of debate.

~~~
nhebb
A non-economist wants to know: if liquidity is beneficial to our economy and
liquidity is a function of time, how how does the time-benefit curve look as t
approaches 0? I don't know if you can quantify the benefit and map this curve,
but if you could I don't imagine it would scale to infinity as time approached
zero.

~~~
gd1
Liquidity isn't a function of time, it is a function of _relative_ time
between the predators (arbitrageurs) and the prey (market makers). If the
predators are much faster than the prey, liquidity will disappear since the
market makers can't survive (their prices are too stale and they are getting
taken advantage of). It has always been this way, even since Nathan Rothschild
used carrier pigeons to get news of the Battle of Waterloo.

------
zipfle
The original report is remarkably well-written. It's nice when you get someone
with the domain knowledge to understand an issue and also the language skill
to explain it clearly.

------
telephonetemp
I assumed they had redundant servers with consensus algorithms in place in
finance but apparently they don't. Would it be impractical?

~~~
clearly
Yes- latency is a big issue for this type of trading system.

~~~
telephonetemp
Perhaps you could do consensus checking retrospectively? I.e., out of N
supposedly identical servers a random one gets to make any given decision in
real time but then a separate system goes back and compares all servers'
results and stops their operation if there's divergence?

~~~
clearly
I guess, but it's more typical to do something like cap the total trading
volume, position, risk limits etc. It's a more fundamental check on what you
are doing.

------
tantalor
That explains how the deprecated "Power Peg" model was activated, but why was
that model so flawed?

~~~
michaelt
You may have heard people with things like backups and emergency generators
saying "you have to test this stuff weekly, in case someone has broken it so
it'll fail the moment you call on it."

Software is the same.

Knight had code that hadn't been run in 8 years. Sure, the code worked 8 years
ago, but things have changed around it since then. As the problem code never
ran, no-one noticed it getting broken, or had any reason to fix it if it broke
in testing.

Most likely the code worked fine 8 years ago, broke in the intervening 8
years, and hence was broken when activated.

~~~
Pitarou
If I understand this correctly, this isn't like having untested code around.
It's more like leaving highly toxic medicine in the bathroom cabinet when you
no longer need it, or leaving an electrically powered band saw plugged in when
it's not in use.

------
Houshalter
They fined them for losing money? What?

~~~
joshAg
No, the SEC fined them for losing money stupidly. In order to have access to
the market like they did, they had to follow certain laws that are enforced by
the SEC. When they were losing all that money they weren't following those
laws.

It's like if you cause an accident while you're driving by breaking the law;
you get a traffic citation (and the accompanying fine), even if your car is
totaled as a result of the accident, because you did something illegal.

~~~
Houshalter
Right but they didn't do anything except offer trades. The thing about selling
shorts they couldn't cover makes sense, but just for "acting stupidly" seems
silly.

~~~
comrade_ogilvy
These are pros who are paid very well to have a clue. It is not silly if
"acting stupidly" is described in writing, such that all parties adequately
understand when the hammer is likely to come down. I am sure a lot of traders
push the envelope and "drive 71 in a 65 mph zone". But it still not silly to
give the guy driving 81 a ticket.

------
avty
Someone made $172,222 a second for 45 minutes on the opposite side of these
trades.

------
shtylman
Hindsight is 20/20

~~~
MBCook
Having code on your production servers that runs billions of dollars of
business per day, which you haven't run for _8 years_ , is obviously bad.

Deploying in such a way that all your servers are not running the same
codebase is obviously bad.

Deploying to production with no plan for how to roll it back if something goes
wrong is obviously bad.

Not having anyone monitor things closely enough, including the hundreds of
warning emails they got before the market opened, is obviously bad.

There is no hindsight necessary here. You could look at what they were doing
and predict a catastrophe.

~~~
coldcode
I wonder how you can have code sit on a server unused and then 8 years later
have it be called? What language was this written in?

~~~
yareally
Why would the language matter? Dormant code is the same in any language.

~~~
twistedpair
Poster was probably thinking of directories of interpreted code (i.e. Python,
PHP, Ruby), compared to compiled binaries (C++, C, C#, Java), since many
compiled languages have dynamic deployment mechanisms like OSGi's hotswapping
of individual modules.

------
meepmorp
Powder Keg is a distinctly un-reassuring name for finance related
functionality.

~~~
frankc
It is peg, not keg. Peg refers to an order where the limit price is
automatically adjusted to some benchmark. For instance, you always want to be
1 penny away from the best bid. I don't know specifically what "power peg" is,
though.

~~~
meepmorp
I seem to have misread, but not in a way that markedly distorts things.

~~~
ycombobreaker
"Powder Keg" is a very marked distortion of "Power Peg".

~~~
JonnieCache
In the context of this story though, it's very apposite. I did the same thing.
Quite disappointed after I read it correctly.

------
drill_sarge
I still find just the fact scary that at this moment automated systems are
shoving billions of fake money back and forth around the world.

~~~
phyalow
How is it fake?

~~~
rosser
As opposed to all the "real" money in the world?

~~~
qznc
How is it "real"?

