Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I got exposed to kdb+ while interviewing in quant and would kill to use it in a non-financial metrics setting, but the language and ecosystem just can't let me get there. kdb+ is so fast and powerful you'd think it impossible.

They're not directly comparable, but kdb+ would crush InfluxDB head to head in an apple/orange comparison if used for the same thing. I actually can't think of any time series-like store that remotely comes close to the feats I have seen kdb+ with Q accomplish. Too bad the productivity and knowledge overhead is so high.

Seriously, I'm working in realtime metrics right now and I spent a week playing with kdb+ as the central store. I can tell there's a gold mine there, but I'm just not swinging a big enough pickaxe. If there was a dumb Python/Go/etc typical ops hacker frontend to that whole system I would be throwing every dollar I have to my name at it; my kingdom for burying that system under a few layers of abstraction and paying some quant Q hacker a Ferrari or two a year to hide the sausagemaking from everyone else.

(Related aside: Like building furniture from stacks of cash? Learn K/Q/kdb+ and head for NYC.)



>kdb+ is so fast and powerful you'd think it impossible.

I suspect there is a religious/mythological aspect to K/Q's reputation for speed. During the year that I was using Q I often found it to be slower than the equivalent Matlab code (which benefits from a JIT) and or even NumPy (which has many naively implemented operations).


IMHO, I feel Matlab is too clunky and inelegant for data transformation compared to APL-inspired languages, which needs a slightly mathematically inclined user to appreciate and unleash it's true power, with 10 times less code.

The biggest advantage of a language like Q is that it's inherently parallel. Map-reduce is naturally built into the language, so this allows the users to think in terms of breaking problems down into sub-problems and parallelizing them via the "parallel apply" construct[1]

Also another very powerful feature is the braindead simplicity of Inter process communication between KDB+ instances [2]

[1] http://code.kx.com/wiki/Reference/peach [2] http://code.kx.com/wiki/Startingkdbplus/ipc

This makes a real difference in ad-hoc environments where you want to setup and tear down complex trading software in real-time


> Map-reduce is naturally built into the language

I hear this often about both array and functional languages but I think this kind of slogan leads to a misunderstanding of the MapReduce framework (and what's special about it).

In a functional or array language, map and reduce have the following signatures:

    map : key list -> (key -> value) -> value list

    reduce : value list -> (value, value -> value) -> value
These are strictly less powerful than the functions used in MapReduce:

    read : block -> (input_key, input_value) list
    map : (input_key, input_value) -> (output_key, output_value) list
    reduce : (output_key, (output_value list)) -> (output_value, output_value -> output_value) -> (output_key, output_value)

Combining this API with a (1) a distributed file system which lets you co-locate computation with data across many computers and (2) a parallel partitioning algorithm enables petabyte-scale computation. It's qualitatively different than in-memory array processing.


kdb+ map/reduce implementation operates across data in memory or on machines in a cluster.


Does the Q/K implementation actually execute in parallel? I've always had a hard time figuring out exactly why K is supposed to be so fast (except for column stores, interpreter fitting in L1 cache, and fast hand-written primitives).


Yes, you can start the server with any number of slave processors. As long as you're not writing data to global namespace (i.e, if you can keep your code purely functional), then as pointed out above, one can use the "peach" (parallel each) operator [1] to parallelize execution. Further you also get the "distributed each" to distribute load amongst multiple processes instead of threads [2]

[1] http://code.kx.com/wiki/Reference/peach

[2] http://code.kx.com/wiki/Reference/peach#Peach_using_multiple...


It's fast compared to other DBs because most datasets are in RAM


kdb+ also supports Petabyte distributed databases. i.e. much larger than RAM.

E.g. http://kparc.com/q4/readme.txt (Trillion Row Benchmarks)


Exactly. I'd say it closest comparable is CEP.


>I suspect there is a religious/mythological aspect to K/Q's reputation for speed

It depends what you're doing, but the vast superiority of kdb+ versus its competitors has been demonstrated:

http://kparc.com/q4/readme.txt


>machine: 16 core 256GB (in all cases: date partition, sym index. all queries in RAM.) ... >all query data is cached in RAM (no disk access).

1) What happens when you run these benchmarks on a machine with only 16GB of RAM?

2) How does the KDB performance compare to doing the equivalent operations on a Pandas DataFrame (which, since these are simple in-memory operations, seems like the only fair comparison).


> 1) What happens when you run these benchmarks on a machine with only 16GB of RAM?

kdb+ uses ~ 1.2 MB of Resident RAM at startup. In addition, kdb+ storage model in memory and on disk have very little overhead (a few hundred bytes) per column over the raw binary data.

The above, combined with memory mapping, allow for large databases to be queried with great performance.

I query billion row futures databases on a Macbook Pro with 16 GB RAM with kdb+.

> 2) How does the KDB performance compare to doing the equivalent operations on a Pandas DataFrame (which, since these are simple in-memory operations, seems like the only fair comparison).

Could someone provide the equivalent Pandas code running against similar TAQ data for the queries in the following benchmark?

http://kparc.com/q4/readme.txt


> They're not directly comparable, but kdb+ would crush InfluxDB head to head in an apple/orange comparison if used for the same thing. I actually can't think of any time series-like store that remotely comes close to the feats I have seen kdb+ with Q accomplish. Too bad the productivity and knowledge overhead is so high.

Do you have any more specific use cases for the kind of analysis that kdb+ is used for? I'm working with timeseries sources that represent streams from internal variables from embedded systems and would like to evaluate whether kdb+ brings something to the table that we could use. (What we are doing probably falls into the category Complex Event Processing, i.e. windowed operations on the data stream with some predicates being evaluated globally.)


kdb+/q is great for this sort of thing (data stream processing and analytics). The "traditional" kdb+ example/benchmark, is some query on big data, and it excels at that. But another use case is to setup several real-time kdb+ processes, like a processing pipe-line, or distributed work-load, that react to real-time messages/events, and produces some output (derived analytic, alert, action, etc...). One example is trading, where you have real-time market data being sent to some kdb+ processes, which run some analytic/rules, and issue/manage orders sent to the market. It takes surprisingly few lines of q code to set something like this using kdb+.


One very important use case is the asof join. i.e. - "line up" two or more time series by time so that they can be compared or analyzed together.

kdb+ does this very efficiently.


Could you provide representative CEP use cases? I may be able to provide some insight.


The salary for kdb is very good. If you are a student and interested in learning kdb, we are currently providing free access to our online kdb training course: http://www.timestored.com/kdb-training/free-student-access


There's an interface to it all from Python (https://github.com/cpcloud/qPython)...


To be clear, I didn't mean just bindings (and I did find that).


I really like J and what array-based languages I've played with in the past. Can you quantify building furniture from stacks of cash? Particularly in comparison to the stories recently that have been circulating [0] comparing AmaGooSoftBook salaries?

0 - https://blog.step.com/2016/04/08/an-open-source-project-for-...


What levels of compensation are we talking about here? Indeed shows ~$150k when searching for kdb in NY.


In the interests of transparency..

That sounds like the low-end base salary for ~entry level. It probably gets topped up with ~100k bonus. There are a couple places that hire many but don't pay well.

At what I think is the high end, I see about an offer per quarter of $500-900k base with $1-2M bonus (usually guaranteed for the first year) for pure technology role. Add more front office work and the bonus potential shoots up (but it's a tough gig at the moment at least).

For context, non-kdb principal level roles I've seen on west coast top out at around $1M, mostly in RSUs, from a say a $250k base at the usual names.

I've seen offers in the $1-10M range on the east coast to build competitive technology. Amusingly, on the west coast, I've only seen the stereotypical "be my technical cofounder and get screwed on comp" offers to do so.


Wow. I've been playing with APLs on and off for a while now, so I'd be very interested to hear more (e.g. where one finds such offers). Any chance you'd be up to chat? My email's in my profile.


Drop me a note (email in profile). I'm not a recruiter, these are the offers I've been contacted with, through my personal network based on past gigs.

It's not just knowing an APL though -- far from it. For the chi/nyc it's a mix of hardcore low-level stuff, and some market knowledge. For the west coast, a mix of low level stuff (less hard core) and more distributed systems stuff.


Your profile looks to be blank (https://news.ycombinator.com/user?id=stuntprogrammer)? Depending on the specifics, I might actually fit that bill though.


I'd also love to chat about this, when you add your email to your profile. :-)


Fixed; added to 'about' as well as email field.


> I've seen offers in the $1-10M range on the east coast to build competitive technology.

I'm interested to know more about this, assuming you have time.


I shouldn't say too much. But more than one large firm has explored writing an in-house replacement and floated actual $ offers.

Occasionally someone pops up wanting to do a company to compete with them but they're typically offering paper of dubious value, covered in slime.

In any event, I'm not particularly interested in going to war to steal their market share. I do worry that they're going to go into a long decline to irrelevancy though -- I'd very much prefer that not to happen. I'd rather they/FD write a new generation building on the best of arthur's work plus some things more suited for new platforms and workloads. There are big opportunities there.


Agreed. The variety of workloads (a la Stonebraker's One Size Doesn't Fit All paper) is where many big opportunities lie. I'm interested to see what kinds of workloads and use cases are of interest in the marketplace.


It is very sad that they won't open up, and very short-sighted as well. I'm sure Whitney and KX are perfectly happy with their millions, but they are sitting on one of the most impressive software achievements around and refusing to share it. It's not even like it would cost them. First Derivatives could grow a consulting empire. I mean, look at open-source MongoDB. It is a piece of shit, and it is valued at $1.4B. KDB is finest thing I have ever used, and First Derivatives purchased a controlling stake in KX for £36.0M. It makes no sense.

Maybe kOS will be open-source. Maybe it will be the big one, Arthur's chance to change the course of software development history.


It is really sad indeed. But I suggest you take a look at the J language. It's free for commercial use, unless you want to integrate the JDB, the equivalent of KDB+ column store. It also provides a rich feature set:

http://code.jsoftware.com/wiki/JDB/Announcement

http://www.jsoftware.com/jdhelp/overview.html

https://scottlocklin.wordpress.com/2012/09/18/a-look-at-the-... comes very close to capturing the ingenuity of J language.

After going through the J primer, KDB+/Q feels like a distillation of J - it basically took the most accessible ideas of J and packaged them for mainstream production use - some of their features such as these make it easy to setup a computing harness for electronic trading in no time:

http://code.kx.com/wiki/Cookbook/LoadingFromLargeFiles

http://code.kx.com/wiki/Startingkdbplus/tick

Also, the footprint of the kdb runtime and installation is unparalleled. I've run systems 24 x 7 x 365


I am aware of J, but it seems to me that J is comparable to K, not Q/KDB+. If you want a python comparison, then K is python, and Q/KDB+ is more like pandas. Except much, much quicker, and more powerful, but without as many tools and packages to support it. But that could be fixed - those tools and packages don't exist because the community isn't there, because it is (sadly) closed source.


J versus K is like C versus C++, similar syntax but different programming style. J is array-based and K is list-based. J is mostly just APL converted to ASCII representation. Q is mostly just some light syntactic sugar over K.


"they are sitting on one of the most impressive software achievements around and refusing to share it"

they tried. About a year ago they released 32 bit version for free with shockingly liberal license - anything goes, including commercial use, unlimited. About couple of months ago they pulled it back and replaced with free non-commercial use only. well, they are the owners, it is their call, but still k will remain as "one of the most impressive software achievements around", orthogonal to most of the [fucked up] IT trends of the past 15 years. It is a shame if it fades into a quantitative elitist oblivion.


This is the classic "innovators dilemma". My 2-cents: Kx has a good business selling kdb+ licenses to the big banks and trading firms. They've been exploring how to expand beyond this niche, but many big data/analytics startups go straight for the FOSS solutions. In addition to the cost/licensing advantages of FOSS, a large user-base and ecosystem has developed around things like mongoDB, giving them compelling business advantages over kdb+ (even with kdb+ performance advantages).

Although mongoDB is a demonstration of how to build a successful business around open-source, it's very different to start out that way rather than move that way after you're already established as closed/licensed product. Kx would have to essentially kill-off their existing profit stream with the goal that they could build a larger business around FOSS kdb+. That's a risky proposition, fraught with many pitfalls.

It is a shame they backed-off of the fully-open 32-bit version. At least that had some potential to spur some user-base/ecosystem growth (at a minimum, this could have encouraged the development of novel clients/editors/REPLs/debuggers/charting/etc...), without threatening their core profit stream.


It's not obvious to me that an open core business can sustain the necessary margins to be interesting as an engineering company rather than a glorified services business. There are been very few examples (friend of mine argues it's just RedHat).

You are correct though that their model over the years has been to extract a large amount, up front, from a small number of users. They (mostly FD) have failed to make the leap to users outside finance due to what I'd call cultural reasons. They also lack strong technical leadership imnsho (they're, at heart, not an engineering company).

I'd certainly take a swing at doing an open source version for them but not clear to me that they'd know how to play it.


If you got a free 32-bit version while while the terms were "shockingly liberal" then maybe those are the terms that govern the use of that binary? I don't know for sure.

I do know in addition to changing the license they have made changes to the software since that time. The size of the binary has increased.

I worry more about Kx being acquired by some large company, maybe a competing database vendor, that cares little about software quality.


yes, if you got the binary during that "liberal" period, you are free to use it under "liberal" terms.


KDB with a 2GB addressable RAM limit is almost pointless. It was good that they were at least trying to encourage new developers, but you could never build anything serious with that limitation.

Ask the commenter one down from me, 'wsfull, he knows what I'm talking about!


q/kdb+ is _enormously_ fast (or it was when I used it), but that was a result of 1) having a specific subset of problems to which the binary was highly tuned, 2) by Arthur Whitney who, in my mind, is one of those 100x engineers who was the key reason behind why it was so good as evidenced by 3) he's no longer with them and as such you'll notice an appreciable decline in code quality. I'm sure if AW was still working on the code-base, you'd be seeing a lot more AVX512 SIMD usage and clever things like that. (Take IDA to your q binary, it's...acceptable but nothing magical like AW's work. In fact, the lack of attention to detail is so significant the new engineers allowed even a poor rev. eng.[1] like myself to use a very very standard method to get a symbol table fully populated, confirming my suspicion that there's very little platform specific code).

RE: 36MM - that makes sense. q/kdb is the definition of "low volume, high margins". You only have so many IB's in Midtown, wealth management funds in Stamford, and a PIMCO here and there in Newport to shop your product to. (As opposed to say, Oracle, where there's broad appeal, residual government contracts as a result of legacy PL/SQL code from 20 year old code that's kept in production.)

[1] E-mails in the profile if you want to hear how I got a symbol table in, but I'm sure you already guessed by now.


Have there been any new reports of kOS? It was supposed to be "impending" in 2014...

http://archive.vector.org.uk/art10501320


A lot of the decisions that some might quibble with are not actually Arthur's end of things. The former CEO drove a lot of it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: