Hacker News new | past | comments | ask | show | jobs | submit login
Shakti, the new data platform from Arthur Whitney (efinancialcareers.com)
122 points by anonu 13 days ago | hide | past | web | favorite | 60 comments





Morgan Stanley in recent years has used this where they’d previously used kdb:

https://github.com/Morgan-Stanley/hobbes

From the article, it sounds like the Kx folks are working on the issues that led to the development of hobbes in the first place.

I’d be surprised if they’ve changed their opinions about type systems though (or bullying people with lawyers — those guys get nasty when their ideas are challenged).


Does hobbes have any kind of db or persistence functionality? As a programming language, k/kdb is nice (and I personally prefer it to APL and J), but without the db part I suspect that it wouldn't have been as popular (in the niches that it is popular).

Seems like a mostly eager haskell; am I correct? How does it compare performance-wise to ghc? (It obviously does much better on embedding and interfacing C++)


Yes there are persisted data structures and databases built on top of them. The whole type system applies to persisted data structures so you can make your own data structures and fill out a few type class instances to query them with plain list comprehensions (e.g. btree indexes are defined and queried this way in some internal tools).

Eager Haskell with a structural type system, deterministic garbage collection, and no overloading penalty, that would be a fair summary.

It'd be interesting to see how GHC would do in the same area, there would be some significant challenges. I haven't tried to use GHC there so I can't say.


Interesting. So they're taking the "app engine" concept that Kdb provides and promoting that bit. Anyone have experience using Hobbes in the wild?

Scrolling through the thing: I'm pretty sure not many (possibly not any) people are going to have experience using this thing in the wild.

Except at Morgan Stanley (where the same could be said for kdb 20 years ago).

Well, do you know anyone at Morgan who is using it? I must say that I do not.

Five of his last eight posts contain links to it. So I surmise he is the primary author of it.

Yes I do.

Great! Glancing at it, I suspected it wasn't actually used; looked too "Buck Rogers," but maybe I am old and set in my ways.

Do you have a pointer to setting something like a ticker plant up like in the old K3 manuals? Maybe over here? https://hobbes.readthedocs.io/en/latest/logging/hog.html#on-...


Something about the language itself, the presentation and documentation, or the dependency environment?

BTW, it would be a treat to see an open sourcing of kerf at some point! I am an admirer.


I might be misinterpreting your message, but if not - Shakti is not made by Kx, which AW has been bought out of. Shakti is a new company (although AFAIK everyone there is ex-Kx)

They use hobbes mostly for high speed trade capture and limited querying.

KDB is still used extensively for less structured /performant examples.


An evaluation copy is available for installation via Anaconda:

https://anaconda.org/shaktidb/shakti

The tutorial mentions streaming and distributed capabilities, but then doesn't cover them:

https://shakti.com/tutorial/

Having used q/kdb+ a bunch in the beginning of my quantitative finance career, I hold a special place for Arthur's languages. I look forward to what Shakti develops into.


How widely used is kdb+ today?

Nitpick from the article: it suggests kdb underpins most algorithmic trading systems. This is simply not true. I've worked in a number of banks and hedge funds, and most have their own home-brew storage and analysis engines (I've helped build a few myself).

When this was first announced, they used the word “blockchain” a lot. It’s funny that this is now deleted, as they’ve learned the word is credibility poison.

A group of k enthusiasts working under the 'kparc' brand has been writing docs for this new in-development k:

https://kcc.kparc.io/ https://ref.kparc.io/


The submitted URL was https://shakti.com/database-software-history/.

Their press page links to https://news.efinancialcareers.com/uk-en/3002752/shakti-arth..., which reads like PR but arguably contains a bit more information (e.g. that it was released last month), so I guess we'll link to that instead.


It seems like Shakti is just the k language version 7 as per README at https://github.com/kevinlawler/kona

I remember reading an interview with Arthur and he said that he writes each version of k from scratch. Most likely this version has also been written from scratch.


It sounds like they are using the AVX primitives to accelerate certain operations, when they talk about 256 and 512 bit wide instructions.

Maybe this is why shakti fails to start when I have tried it with 1.6 GHz Intel Atom and Celeron CPUs. When I tried on Ubuntu with Celeron I got "Illegal instruction (core dumped)"

Last I was told - removing the avx hard requirement is on the roadmap.

This Intel SDE emulator took care of issues I ran into on an older laptop running shakti release circa ~ September:

https://software.intel.com/en-us/articles/intel-software-dev...


The first thing I would suggest is 'ldd binary' where binary is the name of the file--and make sure all shared libraries are found.

https://anaconda.org/shaktidb/shakti/2019.09.20/download/lin...

The k binary in this tarball is statically-linked.


I doubt there would be a hard requirement on any AVX instructions, just that it can take advantage of them if available.

The version I downloaded from the site does (did?) depend on AVX instructions and would die on startup. There are other ('nightly builds') that fix this. One of the goals of the current team seems to be portability.

FWIIW J is moving in this direction as well; the last release version of it (807) needed a special build for older intel hardware (they subsequently made separate binaries for older hardware). AVX has been around since 2011/2012, so it doesn't seem like that big a deal.

Admittedly I myself have older hardware kicking around I see no real reason to upgrade, but AVX is such a big win on array processing you might as well make it a dependency for the kinds of use cases Shakti or J get used for in business.


A hard requirement on AVX instructions is exactly the sort of thing I would expect out of Arthur Whitney. His software tends to be very opinionated, and that's one of the things that lets him do so much in so little code.

At present, there is a hard requirement, but from what I've heard, it is to be lifted in future

As a k email list reader i noted in

https://news.ycombinator.com/item?id=21660005

that shakti(k7) MAY become FOSS...


highly unlikely it'll go that route IMO, the messages in google group are just wishful thinking

I'd like to see some real metrics. All the words are subjective and the technical ones are common to other databases as well. How many results can it return in how many seconds from a table with how many rows? Can it do full text indexing? Can it aggregate?

It feels like something from wallstreet, the language is almost pump and dump "I have this really great stock tip from this guy who made millions once, he's a legend. He says buy, so you should."

We need more tangible and quantifiable information about this database. Until then it's not even hype it's vaporware.


Calling anything from Arthur "vaporware" is really laughable.

This article isn't great, but it's from a finance site, not a technical site: of course it's going to be low on technical details. You would think that people who want technical details would know how to use a search engine.

Further, kdb/kdb+ (which Shakti is the successor of) is a columnar-based database. "How many results can it return [...] from a table with how many rows?" is completely irrelevant.

There are hundreds of benchmarks for kdb+, and the speed of k is unrivaled.


>There are hundreds of benchmarks for kdb+, and the speed of k is unrivaled.

Eh, I worked on array languages in grad school and the speed of k was significantly over-hyped. It's trivial to beet with Python and no contest if you want to work in a compiled language. Wall Street develops weird religions around tech.


In my experience it is mostly trivial to beat with C if you adopt columnar storage (which isn’t idiomatic), but even numexpr doesn’t beat k (though it seems like it should), and numpy is still way behind; pandas and plain python not even in the same ballpark.

What approach did you use?

The one thing nothing beats k on is functionality per time spent (if you are proficient). It is often 100 times shorter but only takes 10 times as long to write.


I'm going to sidestep arguments about its terseness since some people are into that and it's really an aesthetic choice. But for machine learning or numerical algorithms, I found it quite hard to make k nearly as fast as vanilla NumPy code. Overall there's really no good reason for it to be faster. It didn't have (as of ~2011) a JIT or any mechanism for operator fusion, so you end up allocating many intermediates, sometimes using quadratic space which gets immediately reduced along some axis.

Yeah, I know "Arthur Whitney is really smart"...but that's not actually a technical reason why k would be faster. I don't remember the specifics but I think that many of the array operators weren't even multi-threaded and I remember only a few of the array operators getting FLOPs that would correspond to SIMD acceleration. So, for the most part, it seemed like a lot of vanilla single threaded C code behind the operators.

It's possible that the implementation got smarter over the past decade, but when I was working in the space it seemed like the hagiography and actual performance were worlds apart.


I didn't try modern machine learning stuff, so I cannot comment on that.

It doesn't do operator fusion or JIT, but neither does numpy ... numexpr, numba and pythran do, but you were referring to "plain python" in your original post.

Indeed, K is mostly vanilla single threaded code. When I started using K, i was also thinking "neat, simplifiied APL, but it won't be fast" (I had previous positive experience with APL). But it was way faster than plain threaded code should be. The reasons, AFAICT at the time were:

1. Access patterns are extremely predictable, which means much of the data is in L1 much of the time; That, on its own, delivers a factor of ~10 speed compared to having the same data scattered in L2 if you are memory bound.

2. The code is incredibly small and in 2003 when I looked at it, mostly fit inside the I-cache, which meant even the branch prediction available at the time was doing way better than one would expect.

3. The primitives are very well chosen and play well together.

The hidden elephant in the china shop is that your code has to be idiomatic for these to matter. See e.g. [0] from Stevan Apter, for aggregating functions in a toy database e.g. group[avg] - the version he uses, vs. 4x slower one in the comment.

I have no doubt it is possibly to beat k with numpy; it might even be easy if you are more proficient with the latter than with the former. However, for the kind of exploratory data work I did in the day, it is very hard to beat overall when you take thinking, debugging and experimenting into account. And if/when you have converged onto a well-enough defined computational pipeline, the best tool for that specific job (C++, CUDA, TF, MKL) is often the only reasonable choice.

> when I was working in the space it seemed like the hagiography and actual performance were worlds apart.

I don't know who you were talking to, but for me it was always an optimization on the "effort/result" metric, not on either on its own. Python often requires less effort, but iteration runtime speed (even with numpy) is horrible. C+CUDA delivers faster results but the iteration development speed is horrible. For me K struck a sweet spot. YMMV

[0] http://www.nsl.com/k/t.k


But for machine learning or numerical algorithms, I found it quite hard to make k nearly as fast as vanilla NumPy code

Observe that KX’s approach to ML is... to just wrap Python https://code.kx.com/v2/ml/


At the least, 'range' (!3 -> 0 1 2) is now deferred (including with offsets eg 1+!3).

Here are some actual benchmarks comparing Python (with pandas/numpy) vs k (aka q/kdb+). It seems k is orders of magnitude more performant - without applying any no brainer optimizations/indexing:

https://www.linkedin.com/pulse/data-analysis-example-python-...

https://www.linkedin.com/pulse/lists-python-q-side-by-side-f...


You must have been writing incredibly terrible k, or using a bootleg interpreter.

> Calling anything from Arthur "vaporware" is really laughable.

kOS?


It was a research project, and it actually booted on bare metal and ran what it was supposed to (milspec bare bones os with gui, filesysten, db, etc in a few hundred KB)

Do you have a link? This sounds fascinating.


kOS exists, at least one HN user has used it ('geocar). As a research operating system, seeing it released into the public was never a likely goal, and as far as I know no one claimed it would be..

Shakti themselves used rows as the comparison metric in the original link: https://shakti.com/database-software-history/

Again, saying the speed of k is unrivaled is hype speak. You're adding negative value to the discussion.


To give you an idea, here are some benchmarks comparing Python (with pandas/numpy) vs k (aka q/kdb+).

https://www.linkedin.com/pulse/data-analysis-example-python-...

https://www.linkedin.com/pulse/lists-python-q-side-by-side-f...

I think that software history page is meant to convey general workloads achievable, rather than performance.

-(not the author of the benchmarks nor an employee of shakti FWIW)


It really depends on what specific benchmarks you're talking about, but I would be really interested in seeing benchmarks in which k beats C with AVX

I'm super skeptical of these articles about programmers with superhuman powers.

"Superhuman" hyperbole aside, some people are demonstrably orders of magnitude more productive than others, and that's true for almost any field of work (depending on your metric, of course - but that's true for some of the metrics that actually matter to most people), and programming is no different.

Arthur Whitney, Fabrice Bellard, Linus Torvalds, Mike Pall are well known examples.

It obviously requires a lot of skill, and I agree with [0] that it also requires the skill to be selective - some problems require a lot of work for which there are no "better ways". E.g. Mike Pall, alone and in short time, produced a JIT that runs way faster than V8 (which probably already had 100x more man-months[1] of really really smart people at that point). This amazing feat is a result of Pall's amazing talent - but it was only possible (IIRC, he said so himself), because Lua, although it is essentially as dynamic as JavaScript, made a few choices that made it much more amenable to effective JITting.

[0] https://yosefk.com/blog/10x-more-selective.html

[1] I use it properly here as a unit of effort/cost, not as a unit of productivity.


I suggest reading the J interpreter Whitney wrote in a weekend and reconsidering!

https://code.jsoftware.com/wiki/Essays/Incunabulum


Writing in that style is not that hard, I do something similar if I need to make a prototype of something. The short variable and function names really save time as long as you remember what is what. Reading or debugging it even a week later is torture tho.

Writing the interpreter in an afternoon is impressive but most likely he had done something similar before and had most of the algorithm in his head before he even began.

You don't need to be superhuman to do that.


Maybe they're not superhuman?

Maybe they applied their time, practice and attention to developing skills that have yielded very big returns?


q is an interesting language and am using it for the advent of code. While the language superficially seems simple, it features a lot of hidden complexity and dubious design decisions. Every operator is massively overloaded on rank and type. There are functions that accept functions or lists or atoms as arguments: supposedly there’s supposed to be mathematical commonality, but coming from the statistically typed FP world, this kind of polymorphism makes me extremely uncomfortable.

The dynamic typing bugs me as well, it would really help to have an editor be able to tell ahead of time whether an expression is valid.

Speaking of valid code, the error messages are absolutely terrible and stink of arrogance. I’m talking about ed level error messages: no words, no pointing out the error, just some one letter message that is supposed to indicate a type or function argument error.

A lot of the culture of the language is built in Arthur Whitney’s own image: error messages are bad, comments are for stupid people, white space is for “Qbies”

Speaking of cultural, the jargon of the language is didactic and annoying. All over the documentation references to monadic or niladic or diadic functions. Also calling things verbs or adverbs or nouns is annoying and unhelpful. Even in the book “Q for mortals” the author prizes using different terminology than literally anyone else and strikes me as trying to be deliberately obtuse. He even spends an entire chapter talking about the mathematical nature of natural numbers, which strikes me as both unhelpful and narcissistic.

Going back to the language, not only are functions overloaded to a ridiculous level, but the syntax of the language itself is inconsistent. Multiple ways to call a function, all in the name of reducing unnecessary typing. Also the syntax and name of some functions strikes me as very add: ‘/‘ is not used for division, but instead for both comments and folds. ‘,’ is used for appending lists, while ‘;’ is used to separate function parameters. Instead of parents, Q uses ‘[]’ for function parameters. Here’s an example of finding the square of the hypotenuse in idiomatic q:

pyth_sq:{x^2+y^2}

One can use xyz without explicitly declaring them as parameters, which is both nice and arbitrary. I would much prefer to have something like Scala’s underscore or maybe something like _1,_2,... Here’s the function written with explicit parameters:

pyth_sq:{[x;y]x^2+y^2}

Personally I think a syntax (and spacing) like:

pyth_sq = { (x, y) => x^2 + y^2 }

would be nicer.

While the naming of the functions and syntax is arbitrary, I beginning to understand the rational and design of the language. For Whitney, code is the root of all evil: less code, less bugs. Also less code, the more you can fit in your head. This is the reason why everything is overloaded I think: because Whitney ran out of ASCII symbols and didn’t want to use names (more information to remember).

Regardless, the real reason people use q isn’t because of the language, it’s often despite of it. The kdb+ database and q sharing the same memory space is the real advantage. The performance of the system is so exceptional that a single server can outperform even a large Hadoop cluster: the code is out of this world optimized.

I’ve began to think that q is a natural and easier to use extension of Chuck Moore and Forth’s philosophy and language. The languages both share the same tenets of performance, terseness, and minimal syntax. While forth works on a fixed number of stacks, q works on a arbitrary number of n-dimensional vectors, a natural extension to the one dimensional stack.

Anyways, sorry for the rant. I’m still learning q so take what I say with a grain of salt. I can imagine an expert q developer becoming extremely productive and comfortable: I’m personally just not there yet. When I finish advent of code, I’ll have a better perspective.


> Speaking of cultural, the jargon of the language is didactic and annoying. All over the documentation references to monadic or niladic or diadic functions. Also calling things verbs or adverbs or nouns is annoying and unhelpful.

That is a valid opinion, but this is not a k thing. The same jargon is used by APL, J and practically any other array programming language. In fact, the term monad was first used in this context [1].

[1] https://en.wikipedia.org/wiki/Monad_(functional_programming)...


> Speaking of cultural, the jargon of the language is didactic and annoying. All over the documentation references to monadic or niladic or diadic functions. Also calling things verbs or adverbs or nouns is annoying and unhelpful.

For what it's worth, Kx appears to agree & have been updating their documentation to try & use more "familiar" terminology for these things; monadic/niladic/diadic & noun/verb/adverb are all going away. There'll still be a lot of older stuff scattered around the web with the old terms, but an effort is being made


Interesting rant.

Your function pyth_sq:{x^2+y^2} should be pyth_sq:{(x xexp 2)+y xexp 2}.

While you're right about all the idiosyncrasies with the overloading and error messages, they tend to melt away when you get a hang of it.

The language is only partly the cool thing. The real power of Kdb is, IMHO, the ability to quikly get a distributed "app engine" on it's feet. App engine with hot code swapping and 0 downtime.


Ten trillion rows, zero information, nice



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: