Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Historical order book reconstruction API for crypto markets (tardis.dev)
78 points by tardis_thad 64 days ago | hide | past | web | favorite | 31 comments

I doubt I’ll ever be trading enough volume to support this expense, but you just might get me addicted with those monthly free API access days...

I have worked before on something similar: cryptomarketplot.com

Free API access is not 1 day but 24/7 on : http://cryptomarketplot.com/api.json ; if you have traffic limits check http://cryptomarketplot.com/api.json.br added a few month ago but not shown on the mainpage (I did that https://news.ycombinator.com/item?id=18653590 )

Only drawback is the limitation to 1 minute resolution. Still good enough if you are not HFT.

If you are HFT they had packages like "all you can eat" feeds. It was like $500/month for the BTCUSD pair on any 5 exchanges out of 75 supported, with different pricing for different latency offers (going all the way to colo!) using custom client and software integration (for SLA on latency targets)

I'd have to ask if they now provide historical data.

If you have precise symbols, date ranges and exchanges you're interested in, get in touch I'll work something out so it's not that taxing on your budget.

Hi, founder of https://tardis.dev here. Happy to answer any questions you have.

Hey this looks really well put together, nicely done! Maintaining a connection to all these exchanges, and managing that data, is no easy task.

A few random questions:

1. Can you speak more to the "synchronized clock" part? You augment data feeds timestamps with your own?

2. What kind of database did you choose for this? How much data are you managing?

Also, (promise you'll give me a discount if tell you this?) this seems underpriced. I haven't seen anyone sell truly high resolution depth data going months back yet. For BitMEX nonetheless. But maybe i'm out of the loop. Anyways, congrats on the launch!

Thanks for the feedback, really appreciate it.

1. Yes every message has also local timestamp (100 ns precision), all exchanges feeds run on singe VM host hence 'synchronized clock' - definitely could use better wording there. I know alternative services have problems with that (out of sync timestamps for different symbols on single exchange for example), that's why it's mentioned.

2. It's all stored in Google Cloud Storage, plus Wasabi (S3 alternative) as a backup so no DB - it would be too expensive for what I wanted to achieve, currently it's around 4-5TB of compressed data (so 25-35TB uncompressed - need to check to be sure). Data filtering (by symbol and channel) is done on demand, plus API clients that cache filtered data locally. All written from scratch in .NET Core, plus Cloudflare Workers as public facing API auth and caching proxy.

Indeed pricing is supposed to be very affordable as it's targeted at independent algo traders so without spending huge amounts of $$$ they can have good data to backtest on professional level (arguably if you can call crypto trading professional as some argue). Happy to provide you with the discount, please get in touch with me via email if interested.

Your timestamping approach is interesting.

It is well known that exchanges like bitmex suffer from special latency issues. People pay very well to study that and provide mitigation.

Many approaches can be done. But running on a single computer, let alone a single VM, is very dangerous.

I worked for a company that offered multiple timestamps + hardware timestamping for accuracy.

A VM will cause too much latency. Even very well configured, your jitter will be above 100ns by several orders of magnitude.

So be very careful about considering your extra timestamp as authoritative.

You are spot on the other challenge: database. The person who focused on that achieved impressive results using a special hardware + software mix.

Thanks for the details. Flat files in the cloud, and heavy caching on the client sounds like the most effective way to go about it to me too. I work at a company that collects data like this for the traditional derivatives markets and fwiw about half of our data is in S3 too. The other half, Dynamo.

I appreciate you making this accessible to indie algo traders, it's def what we need. Will keep an eye on this.

This is a cool service. I worked as a researcher for a trading firm and we had similar internal tools.

Some hopefully not too harsh feedback:

You're capturing data in London. I know nothing about crypto markets, but they probably aren't all colocated in London, and your users won't be either. You should try to collect data at each source, synchronize it well, and let users adjust timings to suit their needs.

Data integrity is critical. You have incidentReports in your API, but I didn't see what goes in there. Ideally, make this machine readable (begin/end timestamps for each incident interval) or call the user saying data is good/bad as they stream it.

To make this more useful as a product, consider building a normalization layer on top of what you have here. It's great that you provide the actual exchange messages for those who need them, but researchers often want to answer questions like "which market has the tightest average bid-ask spread over the past month?" without learning details of a dozen APIs and writing boilerplate code for each.

I'd suggest providing the user with a standardized object representing the limit order book for a market and ticker. Clients would subscribe to it and receive generic events like snapshot, order/price level added/deleted, trade, etc. As the data is being streamed, they could also access the current state of the book at each point in time through this object to get information like the best prices, size and number of orders at each price, spread, etc.

Thanks, really appreciate constructive feedback! Some of the points you've mentioned were already on the roadmap and I'll definitely consider the rest, although crypto markets are quite specific and can't be 1 to 1 compared to traditional wise hence my initial choices.

The main thing you offer is order book replay, am I understanding that correctly? I have been interested in something similar, but I am not sure how to justify the extra data actually.

Could you give a scenario where order book data at this granularity might come in handy, as opposed to say a single measure of liquidity (however that would be defined)? Thanks

Yes, you are correct, but it's not only order book but also trades, liquidations etc- full market data replay. If you trade on higher time frame, it's not that useful, you can use daily OHLC data,but for intraday and more HFT algo strategies it may be handy. General common knowledge is that order book is noise and fake data mostly but I disagree - check out https://www.reddit.com/r/highfreqtrading/comments/av5c4m/mar... for some ideas why such data is useful.

Isn't all this data available for free from the exchange's APIs? Haven't looked at this in depth, but what is the value proposition?

Indeed this data is available as real-time stream via public exchanges APIs, but you can't 'go back in time' and subscribe to data from for example two months ago replay it again and recreate exact market state at that time, using this API you can, does it make sense?

Yes I think I understand. Historical data is freely and publicly available but this lets me replay the market rather than just do static analysis. Not an active trader, just trying to clarify my thoughts about why I would need this/pay for it. Thanks

Yes, also there is no way to get historical order book data via exchanges APIs

Looks like a fine service, but as a matter of policy I (and many of my peers) do not partner with crypto services that do not list basic company details on their website. For new services, I'm looking for specific names of founders, location, mission statement/values etc.

Thanks for the feedback, noted, will change soon.

I saw for coinbase it returns client_oid. Is that a user account? Does that mean you could build a user's history on coinbase?

Not in general. client_oid is meant to be different for different orders. It is a "cookie" that the clients can use to later identify orders placed through the rest API. "later" here can be via the websocket stream, or after a crash/restart.

Indeed, it's client order id, not customer id.

Do you plan on extending the product offering?

Edit: Do you use FIX API for the exchanges that provide it?

For those who don't know: the 3 main way to connect are REST, WebSockets and FIX.


But dedicated shops use custom made formats, with custom made software, often running on custom hardware.

If you want FIX, you may also need a low latency feed and at least a custom client where you just add your algorithms.

No immediate plans for FIX API, many exchanges focus solely on WebSocket and REST APIs and even if there's FIX it's often built on top of WS API.

https://www.kaiko.com/ seems to the same data but with far longer historical coverage (Tardis starts from April this year). The drawback of Kaiko is the higher price tag.

Yes, kaiko provides similar service and there are others in this space as well, but it's normalized data only and only snapshot of 10% of the top of the order book taken every minute - not streaming order book data (initial snapshot + incremental updates). It works for some use cases, but not all, hence my API which I'd hope fills that niche.

For fun, what's your tech stack and how do you think of it now that you built this project with it? Since we're on HN after all.

All backend is written in .NET Core, with little bit of Cloudflare Workers (authentication, request validation etc). No DB, just Cloud Storage plus Cloudflare Workers KV store. Hosted in GKE in London. I'm very pleased with this combination so far.

Great offering and slick design! I hope you guys expand to traditional markets as the providers there are a bit messier.

I'm pretty sure the crypto exchanges own this data and will be hostile when they see it being repackaged and resold like this.

Is this a Stripe product? The site is almost identical.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact