
Show HN: I'm building an open-source, high-frequency trading system - zygomega
http://scarcecapital.com/hft
======
kasey_junk
This is a nice little research project, and I hope you learn a lot from it.
Having written algorithmic trading systems I think you are missing on a couple
of central points: 1\. There is no such thing as the "best" algorithmic
trading platform, because algorithmic trading is such a broad term.
Architectures that make sense for one class of trades does not make sense for
other class of trades. 2\. Contrary to popular opinion on this forum,
algorithmic trading is not necessarily only low-latency and high-frequency.
Frequency, latency, and algorithmic levels are really dials on the
specification system and along with cost dictate how you will build the
system. Do some work figuring out the trades you are targeting before setting
those dials. 3\. The actual execution system part of an algo trading system is
usually the easiest part. If you've written one before, writing new ones is
usually trivial. Finding the trade, building appropriate risk
systems/practices, building back testing frameworks and exchange reference
data systems are all much more challenging and take the majority of the time
in these systems. 4\. Finally, if you are truly targeting low-latency and high
frequency events concurrency is your enemy, not your friend.

To everyone saying that the cost in these systems is the cost of co-locating
servers or fpga cards, you're wrong. You can get hosting deals/leases on those
kinds of things for the same cost as high-end web hosting. The cost of running
these systems are 2 fold: 1 paying the employees (because your competitors can
pay them a lot) and 2 having deep enough pockets to survive the bad
days/weeks/months. These are the same costs that have always existed in the
trading space and have nothing to do with electronic trading. In fact,
electronic trading has lowered the information asymmetry and made it easier
for new participants to be involved in the markets.

~~~
bink-lynch
> ...truly targeting low-latency and high frequency events concurrency is your
> enemy, not your friend.

This reminds me of the LMAX financial trading platform where they started with
a concurrent model but ended up using a single-thread "...that will process 6
million orders per second..."

<http://martinfowler.com/articles/lmax.html>

~~~
zygomega
Yes, but the concurrency has been shifted to the two disruptors at either end
of the business process logic. And I suspect the business logic is fairly
simple and lends itself to the straight through solution. If you want to react
to a news event (say), you have to somehow send a signal to the feed
requesting recent history (last 20 minutes say), have something waiting to
look at it, crunch the numbers when they arrive to see if somethings up, then
interupt the existing trade decision/risk management schedule with a
potentially better trade. Complexity increases quickly and concurrency helps
do all of that at high speed.

~~~
kasey_junk
As your complexity curve increases, your latency expectations must go down.
That is what I meant when I said there is no single trading system that can be
the "best".

If what you are interested in is complex decision making then it may make
sense to use a different sort of messaging technology than LMAX, but you won't
be getting into anything remotely "low latency". Nothing wrong with that, just
needs to be a known expectation.

~~~
zygomega
Yep, I've poured over the technology and concepts there and its awesome. So
awesome I think you can increase complexity without too much of a latency
penalty. Just behind the fast guys but way ahead of the straight algo guys is
where I think the opportunities lie (I'm kind of answering your previous
question about where the dials sit in my mind). I might be wrong but, and
would like to test out all dial positionings at once before locking in. And
that's where a flexible haskell version could shine.

~~~
kasey_junk
I have no opinion about the choice of haskell. That said, you are already
making decisions about the dials if you go in with an architecture that relies
on massive concurrency, functional languages, etc. You cannot for instance get
the latency dial very low with that central architecture decision.

Again there is nothing wrong with that. It's just better to state it up front
as an expectation and/or a goal than to assume you are going to be able to
pivot easily once you have an architecture in place.

------
nazka
I am curious about this project because I love this subject.

However, it is impossible to build an open source HFT system especially
because it will be open source. The other sharks will know how you trade and
they will only need to play against you by sending opposite orders every time.
That is what happens when the source code of a big US bank were stolen. So it
should be more appropriate to say it will be a fast trading system. The other
reason is you can't build a perfect HFT trading system because it's a moving
target. It is the war in the micro seconde market. The sharks will send you
wrong orders just to disrupt your system or they will manipulate the market.
And finally there is a hardware problem. To do a HFT trading system you need
to rent your own severs inside the market, use FPGA cards, etc.

In conclusion, I only say there is a wrong goal to say "I will build the best
open source HFT trading system" rather than "I will build the best open source
fast trading system". This project is interesting (and there is Haskell cool!)
so I will stay connected to see how it evolves.

~~~
olalonde
Couldn't the infrastructure be open source while the specific trading
algorithms kept closed source?

~~~
VLM
This could be a lot more fun than classic codewars.

For 99% of the "system" there's no particular reason you'd have to connect to
the NYSE and trade real stocks for real money. Looks like they're feeding from
iqfeed, but I think it would be huge fun to create an imaginary competition
exchange, convince about 100 algo writers to compete, and shove them up
against each other purely for the fun of it, see who's a better algo writer.

With modern virtualization, it should be pretty easy to distribute a pack of
images, including practice data, to replicate an entire financial system on a
small scale in your basement, then once you think you have a decent algo,
upload your image to the competition league. If your hft codewars league wants
limits for storage or cpu cycles, virtualization is a pretty easy way to
implement it.

Superficially the most obvious thing to do is totally make everything up, from
top to bottom, but it would probably work just as well to use real live market
data and overlay on top of it. Just as air traffic control simulation has very
slowly moved over the decades from purely random to sorta realistic to actual
airport data.

This could be quite a bit of fun.

~~~
windust
We did this in collaboration with UofC, using our Algo trading platform
([http://www.optionscity.com/event/uchicago-midwest-trading-
co...](http://www.optionscity.com/event/uchicago-midwest-trading-competition))
and it was really fun.

Hm, I wonder if there is an interest to make this a more open event :)

~~~
fnordfnordfnord
>Hm, I wonder if there is an interest to make this a more open event :)

Yes. I bet there is.

------
boothead
Have you considered python for the research/algo side of things?

I work in a hedge fund in London at the moment and there's a massive shift
away from R towards python in the algorithmic shops that I know about here.
There's also the great work that quantopian are doing on their backtesting
framework zipline [1]

[1] <https://github.com/quantopian/zipline>

~~~
zygomega
Yes, I am aware of that trend and considering a switch. I'm personally more
used to R is all, but if I found collaborators I'd jump to where the code base
goes.

The only other factor is I find R pretty aligned with haskell having a
somewhat functional pedigree, so that code translates pretty nicely between a
rapid hack at the problem to the more robust approach.

~~~
fawce
zygomega, I'm one of the zipline maintainers and we'd love to collaborate with
you. There are a few people doing HFT research with zipline, and there's a lot
of work to do. At quantopian (my day job), we focus on longer hold periods, so
there is room in the zipline ecosystem for you to do HFT.

The main benefit we've found with python as the algo language is that it
allows for stat programming with pandas, but also OO or functional programming
for the algo logic. This smoothes the transition from research to production,
just as you're describing with R -> haskell, but you can stay in one language.

I think one of the biggest potential wins with parallelization is if you can
assume all positions are closed overnight, most often true for HFT. That way,
you can simulate all the trading days in a test range in parallel. This is
quite similar to the parallel processing we do to handle the large number of
concurrent backtests running at quantopian. We did all of that with python,
but I'd be fascinated to see it done with haskell.

~~~
boothead
I would see haskell very much as the plumbing side of things. The tools
available for handling and reasoning about streams are streets ahead of
anything I've seen elsewhere. With zeromq and protocol buffers (that's what we
use in our stack) you could very nicely separate the plumbing of the data from
its consumption. I'd love to see something like this as well!

How would you handle the position sizing part of the algo if you're testing
all days in parallel? Wouldn't the trade size depend on the all of the
previous day's PNL?

------
white_devil
Are you trying to get hired by the financial industry, or is there some other
reason for doing this?

As you may well be aware, HFT is a scourge on the world's economy, and it's a
game only the biggest and best-connected players benefit from.

~~~
zygomega
No, I'm already in the finance industry. There are two main reasons I'm doing
this: \- I think the finance industry is very closed when it comes to
intellectual property development, and an open source approach can be
seriously competitive. An open approach may well be the future when it comes
to being 'connected' \- HFT is an interesting multi-disciplinary problem and
the shear breadth of expertise required - modern chip design, low-latency
software/hardware interaction, lock-free concurrency, fault-tolerant system
design, adaptive learning algorithms, k-means clustering - means I'm learning
heaps every day.

I just don't agree that HFT is a scourge. It's an ecological shift (neither
good or evil) and longer-horizon investors need to evolve.

~~~
white_devil
> an open source approach can be seriously competitive

Do you really think any Joe Schmoe off the street could just grab your open
source HFT and start making money with it? If not, how does your project
benefit anyone?

Are you working in the industry, but not in HFT? Maybe this is project is just
practise for getting a job in HFT? I imagine that's where programmers get the
fattest paychecks in the world. Only a fraction of what the sociopaths running
the show get, of course, but good money nevertheless.

> I just don't agree that HFT is a scourge.

Good thing you're not at all biased.

~~~
zygomega
Good points. To be more precise, I think that an open source approach to
system research and design can be hyper-competitive versus closed-door secret-
squirrel development run by a committee looking for short-term wins. This
project benefits me firstly because I get feedback on my initial scratchings
and maybe even collaboration on areas outside my comfort zone. And it could
well enable me and others to avoid having to work for the sociopaths.

I am biased in thinking that monitoring the market and trading at low latency
isn't fundamentally a bad thing, and shouldn't be taxed because others are too
lazy to do the same thing. I certainly think that hft has been used by evil
people to front-run unknowing third-parties, often in collaboration with
middle men and women who turn a blind eye to the morality of their business
models.

~~~
acjohnson55
Others are two lazy? It's my understanding that the average Joe has no access
to the hardware and low latency network connections to be able to do this on
his own.

~~~
zygomega
There is a rather large economic rent attached to HFT at the moment. If my
project or others like it can eat into that rent a touch, it might not stay
that way.

~~~
fnordfnordfnord
Exactly, if you take the position that gains from HFT are basically "stolen"
profits from otherwise more productive market participants, then any that can
be "stolen back" is a win.

~~~
fbru02
There have always been "market-makers" why do you care if they are algorithmic
or human ?

------
cmdkeen
I'm curious about the "best trading platform" claim - what is going to make it
better than any other? Especially as the real High Frequency Traders are busy
spending fortunes on placing their hardware as close to the exchange as
physically possible.

~~~
zygomega
Speed is only one issue with autonomous algorithm design. Yes, the speed thing
grabs the headlines but the boiler-plate objective is to front-run the slower
players. There's a wealth of opportunity in processing the event stream in a
more robust way than others and faster too. Think semi-HFT, semi-autonomous.

Most trading platforms are primarily loss leaders for the 'professional'
version and otherwise attached to a non-open source business plan.

~~~
cmdkeen
But doesn't that mean you're at the mercy of the professional HFTs who can
"front run" anyone running on your platform?

~~~
kasey_junk
You've hit a pet peeve of mine so sorry about the tirade. People use the term
"front run" incorrectly and it is important to point this out. Front running
is a very specific activity and is illegal. If there is evidence of front
running people need to be prosecuted.

That said, I've worked on several systems and never seen any of them that
would allow a faster market participant to see your order flow without your
permission. This is what front running is.

People who don't understand the term seem to think seeing market data before
other participants is front running it's not. There have always been and will
always be some traders that see (and can react to) market data faster than
everyone else. In the era of electronic trading this information latency cost
is going down, not going up.

------
fnordfnordfnord
FYI: This isn't an HFT bot but is a decent, recent Python + ncurses trading
client framework for Bitcoin on MtGox. <http://prof7bit.github.io/goxtool/>

Even if you aren't interested in Bitcoin, it might be useful as a real-ish and
cheap place to test with low barriers to entry. I've found that backtesting on
historical data is usually not realistic enough since most people fail to
consider liquidity.

I'm interested because I like to trade, would like to learn more, and I have
had some small success. But I have trouble sticking to my plan. I let emotion
get the best of me and wind up losing my gains.

~~~
zygomega
Isn't MtGox over as a bitcoin trading platform? I agree more generally that
bitcoin could be a useful market to test out.

And your story fits exactly with a hands-off automated approach to trading. We
wont ever be charting a price series in the production environment because
computers don't get charts.

~~~
fnordfnordfnord
>Isn't MtGox over as a bitcoin trading platform?

I wish. I keep waiting for some more interesting news from Coinlab [1][2], but
so far, Mtgox are still the largest exchange by volume. Here is a chart
showing volume by (selected) exchanges.
<http://bitcoincharts.com/markets/currency/USD.html>

The recent events seem to only have got them more customers (though it remains
to be seen how many of them will stick). In any case, some of the other
exchanges have API's that are similar to mtgox.

[1] <https://mtgox.com/press_release_20130228.html>

[2] [http://www.forbes.com/sites/jonmatonis/2012/04/24/coinlab-
at...](http://www.forbes.com/sites/jonmatonis/2012/04/24/coinlab-
attracts-500000-in-venture-capital-for-bitcoin-projects/)

------
ablut
Isn't the barrier to HFT the fact that you need enough capital so that your
profits cover the cost of co-located servers and FPGAs in the exchange
datacenter, without which you have a latency handicap? (in addition to, of
course, coming up with good algorithms)

~~~
zygomega
Yes, if the only edge you have is speed. But a good algorithm may well be the
next battleground as the benefits of ultra-low latency reduce.

------
mangrish
Cool! Very interesting.

I think given the languages you have selected you are coming more from a quant
background? These languages are great for heuristics and analysis but you
would really want all 'static' components such as connectivity built in
assembly/C/C++. For 'algo' components I like Java as you can still pull
microsecond order latencies when crunching numbers but more importantly it
gives you a huge time to market advantage than the C/C++ for almost the same
speeds. I'm also not clear on if you are connecting directly to the market for
market data or using aggregation (like reuters). The latter would be too slow.
I'm also not clear on what middleware you are using which is probably the
biggest decision you will have to make. Most either use inhouse tech or 29west
LBM (everyone still calls it that even though they were bought out).

An overlooked part of HFT in my experience OS optimisation and even things
like TCP bypass (for some components) which can lead to huge speed advantages
and end to end latency reductions. I agree with those about FPGA.. in my
experience they really don't come into the equation except for components that
rarely change.

Anyway a few guys including Martin Thompson have felt similarly to you and
initiated the lodestone project (<http://lodestonefoundation.wordpress.com/>).
If you are keen to learn more their architectures for low latency, distributed
and componentisation then I feel that you could join forces and contribute to
that initiative. FYI: Martin built most of the technology behind the disruptor
(<http://lmax-exchange.github.io/disruptor/>).

Great to see interest in making this knowledge more widely available :)

~~~
zygomega
thanks mangrish,

quant. How could you tell? I'm conecting with an aggregator cause the direct
market feeds are monopolistic price gougers. It's an easy switch if we make it
up that curve.

I'm not sure if I even understand what middleware is (I'm a quant), but I
think the answer is the disruptor!

~~~
mangrish
From the languages.. I've worked with lots of quants and they all rave about
R!

Yeah..the aggregator is where you get done in both cost and latency. The prop
shops and hedge funds pay through the nose for that stuff so unless you come
packing a little capital, true HFT is an issue.

On the middleware, not really.. so you would have a market data component that
will be pushing stuff to various components (real time risk, the pricer, and
the trading engine). The disruptor sits on the 'in' queue to those
components.. the middleware is what pushes messages between instances running
on the same/different machines.

Hope that helps :)

------
carterschonwald
In another month or two the analytical tooling for doing stats and numerics in
Haskell land are going to have a huge leap forward in capabilities. Might be
worth considering going full Haskell then :-)

~~~
sseveran
Whats happening then?

~~~
spitfire
From my talks with Carter previously he's working on a haskell based platform
for analytic tooling in Haskell. Basically the core primitives for making
large scale data analysis apps in Haskell.

Or that's what he was up to in august. Hopefully he can chime in here and
update everyone on his progress. I honestly hope he succeeds in his plans.

~~~
carterschonwald
Still working on it!

Been taking a bit longer to get the core worked out that I'd have liked, but
life happens (eg my mom had cancer for a month this winter, though she's fine
now, which is awesome. She didn't even need chemo or rad!!).

Also, I was original planning to _NOT_ write my own linear algebra substrate,
but I quickly realized all the current tools suck, and that I needed to come
up with a better numerical substrate if I wanted to do better.

What do I mean by this? With all the numerical tools out there presently,
there are none that address the following falsehood that many folks believe is
true: "you can have high level tools that aren't extensible but are fast, or
you can have low level tools that are extensible and fast.".

I want high level tools that are fast. I want high level tools that are fast
_AND_ extensible. I want it to be easy for the end user to add new matrix
layouts (dense and structure, structured sparse, or general sparse) and have
generic machinery for giving you _all_ the general linear algebra machinery
with only a handful of new lines of code per new fancy layout. I want to make
it idiomatic and natural to write all your algorithms in a manner that gives
you "level 3" quality memory locality. I want to make sure that for all but
the most exotic of performance needs, you can write all your code in haskell.
(and by exotic I mean, maybe adding some specialized code for certain fixed
sized matrix blocks that fit l2 or l1, but really thats not most peoples _real
problems_ ).

Heres the key point in that ramble thats _kinda_ a big deal: getting "level 3"
quality memory locality for both sparse and dense linear algebra. I think I've
"solved" that, though ultimately the reality of benchmarks will tell me over
the coming few weeks if I have or not.

Likewise, I think I have a cute way of using all of this machinery to give a
sane performance story for larger than ram on a single machine linear algebra!
Theres going to be some inherent overhead to it, but it _will work_ , and
doing a cache oblivious optimal dense matrix multiply of 2 square 4gb+ ish
sized matrices on a macbook air with 4gb of ram is going to be be a cute
benchmark where no other lib will be able to do out of the box. Likewise, any
sparse linear algebra will have lower flops throughput than its dense
equivalent, but thats kinda the price you pay for sparse.

What I find very very interesting is that no ones really done a good job of
providing sparse linear algebra with any semblance of memory locality. I kinda
think that I have a nice story for that, but again, at the end of the day the
benchmarks will say.

I at the very least hope the basic tech validates, because there needs to be a
good not gpl lin alg suite with good perf for haskell. Hmatrix being gpl has
cock blocked the growth of a nice numerics ecosystem on hackage /in haskell
for years, and its about time someone puts on some pants and fixes that.

Assuming the tech validates, I really hope the biz validates too (despite me
likely making various pieces open source in a BSD3 style way to enrich the
community / get hobbyist adoption / other libs written on top, people in
haskell land try to avoid using libs that use licenses that arent
BSD/MIT/Apache styles), because theres so much more that needs to be done to
really have a compelling toolchain for data analysis / numerical computation /
machine learning / etc, and I really really like spending my time building
better tools. Building the rest of that stack will be outlandishly tractable
assuming my linear algebra tech validates having the right regimes of
performance on large matrices. (amusingly, no one ever benchmarks linear
algebra tools in the 1+gb regime, and i suspect thats because at that point,
vectorization means nothing, its all about memory locality memory locality,
and a dash of cache aware parallelism).

thats the vague version :)

And thats also not even touching my thoughts on the analytics / data vis tools
that go on top. (or the horrifying fact that everyone is eager for better data
vis tools, even though most data vis work is about as valuable as designing
pretty desktop wall papers to background your power point presentations.... so
even if i get everything working... I have a horrifying suspicion that if i
allowed unsophisticated folks to use the tools, most of the revenue / interest
would be around data vis tooling! Which would mostly be used to provide their
customers/end users with pretty pictures that make them feel good but don't
help them!)

Point being: i want to be able to say "you understand math, you understand
your problem domain, and you can learn stuff. Spend 2-3 weeks playing with
haskell and my tools, and you'll be able to focus on applying the math to your
problem domain like never before, because you didn't even realize just how
terrible most of the current tools out there you were wrestling with are!"

~~~
carterschonwald
I really really really hope the biz+tech combo validates... because then I
could occasionally stop and think "holy fuck, I'm bootstrapping my fantasy job
/ company, the likes of which I imagined / dreamed of as way back as middle
school and high school!"

Realistically theres 3 different outcomes:

the tech doesnt validate (and thus the biz doesnt either) --- then i'm looking
for a day job ... (and I'm pretty darn egomaniacal and loud, finding a good
fitting dayjob would take a bit of work!)

the tech works yet the business doesnt --- Not sure how that would happen esp
since no investors means enough income to support myself would still be a
successful business, though I guess i'd have some compelling portfolio work if
I went job hunting

the tech and biz both validate, and earning enough to move out of my parents
--- magic pony fantasy land of awesome. what more could anyone want? MORE
AWESOME PROBLEM DOMAINS THAT NEED BETTER TOOLS (i mean, that would really be
sort of the ideal, but remains to be seen if that can happen.)

~~~
spitfire
Thanks for the update Carter. I'll root for you.

one point to make. The interface and the performance don't have to appear at
once. The interface will be the longer lived portion. So sort that out, and
you can focus on performance as problems crop up.

I know it's horribly boring to say, but getting those first few customers gets
you into a virtuous cycle. Given that you're bootstrapped, even a few
customers will get you in a very good place, where you can spend on
development.

I remember us (or rather I) talking about Mathematica. When it first came out
it was _horrible_ for numerics. Truly terrible. But it was easy to transfer
technical papers to. You simply wrote down what was on the paper, and you were
done.

So people used it, and eventually performance got better over time as they
invested in it.

~~~
carterschonwald
Agree with everything you say. Hence why I'm just going to be releasing the
Lin alg soon. It actually turns out for the linear algebra code done right,
the API has an intimate relationship with the possible performance! (This will
be more apparent once I get things out the door).

There's a lot ill not be even trying to do in the first release : eg
parallelism, distributed computation, sse/avx intrinsics.

Fret not, things are moving apace, and basic tech validation and thence
conditioned upon that, public release, are approaching scary fast! :-)

------
tolitius
1\. Timestamps in your seed data might benefit from nanoseconds if you really
talking about "high" frequency.

2\. I agree with you comment that it is easier to think about concurrency in
Haskell than in something like C++, however you can't really compete with
C/C++ in Haskell. Not even with cgo (Go packages that call C code), not with
OCaml or any other higher level beasts that promise the speed. Fortran would
be the only one faster for the "algo" part of your initiative. But again, if
this is just an exercise, Haskell and others (I prefer Clojure for example :)
will do just fine.

3\. Would make sense to split the "platform" in two (very different) parts:
"Quantitative Analysis" (a collection of tools and rules) and "Technical Glue
to Read and Stream". Each can/should be divided further of course, but the two
above are essential yet very different for a true "HTF Platform".

~~~
zygomega
I wish I could get down to nano units. iqfeed (a good value feed) just got
millisecs in so will settle for that.

I'm preparing some speed tests between C++ and haskell on an identical block
of processing so stay tuned! You might be surprised - haskell is way ahead of
clojure on compiler smarts.

The split you suggest is exactly what I think is wrong with the way things get
done right now. I'd like to integrate the quant inside the read and stream -
now that's potentially a large speed up that might compensate a tight budget.

~~~
mayank
Keep in mind that if you're not an experienced C/C++ coder, you're going to be
(largely) benchmarking your relative ability in either language rather than
the intrinsic speed of each language.

~~~
kasey_junk
This is true of any language/coder. It's what makes cross platform
benchmarking so difficult for non-trivial problems.

------
fmstephe
I'm building an open source matching engine. I would love to pair up the two
systems for a stress test. I'll keep tracking your project and ping you again
when I have a system up a running if you're interested.

<https://github.com/fmstephe/matching_engine>

~~~
zygomega
We have some plans on the drawing board for market maker simulators that could
do with some fast matching so I'll keep an ear open for that ping.

~~~
fmstephe
What sort of performance baselines would you require?

~~~
zygomega
I have no idea yet. I see you're geeking out in go and there's been a lot of
advocacy coming through to look at go as an alternative.

~~~
fmstephe
Yeah, Go is a great language. However, I am trying to develop some low level
queues that should be faster, and less generic, than the ones that Go
provides. But I am having some difficulty putting memory fences into my code.

But if I can get a reasonable level of performance out of the standard Go
queues then I will just use those in the meantime.

------
rbc
I worked on a framework for managing stock and options positions based on
feedback methods developed by Stafford Beer. It doesn't do the
algorithmic/transaction part, but rather focuses on providing reporting and
alerting for large hierarchies of institutions, accounts, stock and options
positions. The proprietary algorithmic code could be added by subclassing the
code that I wrote. It's called the Viable System Agent. It's written in
Smalltalk, tested under both Squeak and Pharo. It's licensed under the BSD
license. It can be found at:

<http://home.rbcarleton.com/rbc/software/smalltalk/VSA/>

Look at the RBC-VSA-Portfolio category. That's where all the stock/option code
is.

------
SeanDav
The industry is doing the vast majority of its HFT development in C++ or Java.
Data analysis is mostly done using Python although I still see a lot of R.

If you are doing this to break into the industry, I suspect the languages you
used should have been the above. Also the above languages would probably have
been better to attract open source developers who are also hoping to use their
code and experience from this project to break into the industry.

~~~
zygomega
I've been in the industry for too long (though not in low-latency) - I'm doing
this to break out of the industry! And I think the finance game is long
overdue for a good disruptive technologic event (not implying my humble
project is it). R is a personal choice for data analytics (I love ggplot2).
But the haskell thing is more than that. It's been done before
(<http://www.starling-software.com/misc/icfp-2009-cjs.pdf>) and even goldman's
has a thriving erlang/OCaml hacker ethic
([http://www.zerohedge.com/article/aleynikov-code-dump-
uncover...](http://www.zerohedge.com/article/aleynikov-code-dump-uncovered)).

------
alexforster
I have no experience in the financial industry, but HFT has always fascinated
me as a potential source for very high rates of "events". Could you share your
general insight about the sheer volume of data that commonly gets pushed
through an HFT system? I'd also be terribly interested in a multi-
megabyte/gigabyte "recording" of HFT trade data.

<http://www.nyxdata.com/capacity>

~~~
sseveran
This project is not HFT. It does not use a full book data feed. Really even
the feeds that come from the exchanges are not precisely timestamped enough
and HFT firms stamp their own . The data is not huge although trading system
need to be able to deal with large spikes in the rate of events. Usually now
people are using 10G/40G/Infinibad as a connect to the matching engine.

------
jschulenklopper
This could be an interesting course for people wanting to know more about
'computational investing' and the algorithms behind stock (market) analysis
and trading: <https://www.coursera.org/course/compinvesting1>. Not specially
targeted at HFT, but interesting basic background information nevertheless.

------
qompiler
Sorry to burst your bubble but high-frequency trading is only viable and
necessary for market makers. The end result of any type of algorithm you think
of is always a curve-fitting function of existing data.

~~~
zygomega
No need to apologise - curve fitting is a big issue. But I would think that
everything that happens on kaggle would be curve fitting by your definition
right? And how is the stuff that happens in brains something other than curve
fitting? It's how you connect the data dots and what you choose to focus on
that makes the difference.

~~~
qompiler
You are asking how the human brain works, that's a whole different issue and I
don't know what Kaggle is/does.

------
cr1t1calh1t
Check out IBrokers:
<http://cran.r-project.org/web/packages/IBrokers/index.html>

It'll get you up and running with IB in no time.

~~~
brotchie
On a similar note, a while back I wrote a bi-directional, fan-out adapter
between the Interactive Brokers API and ZeroMQ.

<https://github.com/brotchie/ib-zmq>

The annoying thing about the IB API is that there's no framing; that is, you
can't simply consume the message types you are interested in without parsing
the entirety of every variable length messages.

ib-zmq resolves this annoyance by parsing incoming messages and placing them
individually into ZeroMQ message frames.

I also wrote an alternative to the IBrokers R package with a much nicer
interface using this ZeroMQ adapter. It parses most IB API messages, but
hasn't been used in production yet.

<https://github.com/brotchie/r-zerotws>

------
zopf
I assume you've checked out Marketcetera ( <http://www.marketcetera.com> ).
What will be your platform's primary differentiators?

~~~
zygomega
Not having to register and login is one immediate differentiator. Marketcetera
is a big, big code base - I think our ambitions are more focused.

------
z3phyr
Good to see Haskell. I must give you a complement for being part of pushing
Haskell to the mainstream!

BTW, many trading firms do use haskell in Trading, but thats all privately
held.

------
relaxitup
On a possibly related note, Josh Levine, creator of the Island trading engine,
released the Foxpro source a few years ago.. Interesting stuff..

<http://josh.com/notes/island-ecn-10th-birthday/default.htm>

[http://josh.com/notes/island-ecn-10th-
birthday/ISLAND.PRG.TX...](http://josh.com/notes/island-ecn-10th-
birthday/ISLAND.PRG.TXT)

~~~
ycombobreaker
Jerk boy!

------
otikik
I would be much more interested in an open source solution to _fight against_
HFT, or at least make it more difficult.

I see the financial world as a necessary evil, and I think some of its parts
have benefits for the rest of the world. HTF is not one of those parts; I see
it as a pure burden.

You could have made an open source Patent Trolling system and I would feel the
same.

~~~
zygomega
Wow, that's a pretty low ranking being shoved into the patent troll category
of evilness. When someone goes and patents an obvious algorithm at some point
in the future and my little project shows up as prior art would I be redeemed
:)

------
guiomie
I thought HFT was now based on FPGAs and ASIC circuits ... how could this
compete in terms of latency/processing ?

~~~
zygomega
By putting it on FPGAs maybe. Stage one is to get a robust process for
measuring and researching latency - I'm in Australia and US data is bouncing
into my mac via a lousy connection and a tcp port in 250 millsecs on average.
There's a long way to go. I suspect that if you can keep within a bulls roar
of the low-latency crowd the big gains will be on the algorithm processing
side. Algorithms still add up a few million numbers every time they
recalculate a moving average - there's better ways to do it.

------
nezza-_-
You need to do <i class="icon-iconname"></i> instead of <i>icon-
iconname</i>... The links all have icon names in them.

Edit: Ah. It now works only with JS on when putting the icon-name into <i> and
not in the class. Sorry for that, seems to be new in Bootstrap.

~~~
Alexx
It's Twitter bootstrap. <http://twitter.github.io/bootstrap/base-
css.html#icons> Bootstrap V3 has tweaked the way icons are implemented
slightly.

~~~
zygomega
thanks heaps for that!

------
dhosek
Aside from all the other points raised against this, there's also the fact
that the timing on HFT is so close that it requires close colocation of the
physical hardware to the trading systems. Only big boys get to have access to
those racks.

------
dasickis
I have access to servers in all the major trading rooms if you need to start
deploying this to a production environment to start trading. Contact me at
dasickis [at] gmail.com (I'll reply back from my non-filtered e-mail address)

------
bdunbar
> using haskell,R and emacs org-mode.

That last is so very, very, cool.

I'll have to dive in, take a look, see if I can find out _how_ you're using
it. Been wanting to do a project with org-mode for a while, now.

~~~
zygomega
It is - I can't imagine working without it. But I'm not sure I'm the poster
child for how to use org-mode properly. Every piece of code in the repo
actually resides in the org file which is a big monolithic journal of where
I'm at.

o-blog is a great emacs package that lets you publish sites straight from org.

------
rags_123
+1 for using Haskell.

------
mikevm
How can someone who has absolutely no idea about the financial industry and
HFT learn about it?

Can anyone recommend any resources (books, tutorials, etc..)?

~~~
relaxitup
mikevm, check out Dark Pools by Scott Patterson. Great book about the
evolution of HFT:

[http://www.amazon.com/Dark-Pools-High-Speed-I-
Financial/dp/0...](http://www.amazon.com/Dark-Pools-High-Speed-I-
Financial/dp/0307887170)

------
ttty
Which software you used to build the diagram on the main page?

~~~
zygomega
The graphviz dot language: <http://www.graphviz.org/Documentation.php>

It has good support in org-mode and a there's a nice haskell package that
takes the dot code and turns it into an internal graph representation.

------
niggler
Have you actually used this in a production setting?

~~~
zygomega
No, and I am making no claims. I have a market event feed coming in, a good
idea of what the event processing looks like and a rough idea of how to send
an order to a broker. I think the project needs to get to production fast
though.

------
conformal
you're doing it wrong: haskell is too slow.

~~~
zygomega
I thought so too for a long while, then I tried to do a touch of concurrent
code in c++ and had to gouge my eyes out. I'm excited about the speed up
haskell brings to development. You can plan things in haskell you can't
imagine in other languages.

~~~
NateDad
+1 for doing it in Go. If concurrency is your objective, Go makes it easy...
it's also very fast both to program in and to run.

~~~
cakebread
> +1 for doing it in Go

Conspicuously you don't address Haskell at all. Then again, who cares what
might actually be best for OP? There's an advocacy bingo card in play!

