
Shakti, the new data platform from Arthur Whitney - anonu
https://news.efinancialcareers.com/uk-en/3002752/shakti-arthur-whitney
======
kthielen
Morgan Stanley in recent years has used this where they’d previously used kdb:

[https://github.com/Morgan-Stanley/hobbes](https://github.com/Morgan-
Stanley/hobbes)

From the article, it sounds like the Kx folks are working on the issues that
led to the development of hobbes in the first place.

I’d be surprised if they’ve changed their opinions about type systems though
(or bullying people with lawyers — those guys get nasty when their ideas are
challenged).

~~~
beagle3
Does hobbes have any kind of db or persistence functionality? As a programming
language, k/kdb is nice (and I personally prefer it to APL and J), but without
the db part I suspect that it wouldn't have been as popular (in the niches
that it is popular).

Seems like a mostly eager haskell; am I correct? How does it compare
performance-wise to ghc? (It obviously does much better on embedding and
interfacing C++)

~~~
kthielen
Yes there are persisted data structures and databases built on top of them.
The whole type system applies to persisted data structures so you can make
your own data structures and fill out a few type class instances to query them
with plain list comprehensions (e.g. btree indexes are defined and queried
this way in some internal tools).

Eager Haskell with a structural type system, deterministic garbage collection,
and no overloading penalty, that would be a fair summary.

It'd be interesting to see how GHC would do in the same area, there would be
some significant challenges. I haven't tried to use GHC there so I can't say.

------
chrisaycock
An evaluation copy is available for installation via Anaconda:

[https://anaconda.org/shaktidb/shakti](https://anaconda.org/shaktidb/shakti)

The tutorial mentions _streaming_ and _distributed_ capabilities, but then
doesn't cover them:

[https://shakti.com/tutorial/](https://shakti.com/tutorial/)

Having used q/kdb+ a bunch in the beginning of my quantitative finance career,
I hold a special place for Arthur's languages. I look forward to what Shakti
develops into.

~~~
kragen
How widely used is kdb+ today?

------
pinewurst
When this was first announced, they used the word “blockchain” a lot. It’s
funny that this is now deleted, as they’ve learned the word is credibility
poison.

------
osrec
Nitpick from the article: it suggests kdb underpins most algorithmic trading
systems. This is simply not true. I've worked in a number of banks and hedge
funds, and most have their own home-brew storage and analysis engines (I've
helped build a few myself).

------
chrispsn
A group of k enthusiasts working under the 'kparc' brand has been writing docs
for this new in-development k:

[https://kcc.kparc.io/](https://kcc.kparc.io/)
[https://ref.kparc.io/](https://ref.kparc.io/)

------
dang
The submitted URL was [https://shakti.com/database-software-
history/](https://shakti.com/database-software-history/).

Their press page links to [https://news.efinancialcareers.com/uk-
en/3002752/shakti-arth...](https://news.efinancialcareers.com/uk-
en/3002752/shakti-arthur-whitney), which reads like PR but arguably contains a
bit more information (e.g. that it was released last month), so I guess we'll
link to that instead.

------
LittlePeter
It seems like Shakti is just the k language version 7 as per README at
[https://github.com/kevinlawler/kona](https://github.com/kevinlawler/kona)

I remember reading an interview with Arthur and he said that he writes each
version of k from scratch. Most likely this version has also been written from
scratch.

------
shrubble
It sounds like they are using the AVX primitives to accelerate certain
operations, when they talk about 256 and 512 bit wide instructions.

~~~
3xblah
Maybe this is why shakti fails to start when I have tried it with 1.6 GHz
Intel Atom and Celeron CPUs. When I tried on Ubuntu with Celeron I got
"Illegal instruction (core dumped)"

~~~
dantiberian
I doubt there would be a hard requirement on any AVX instructions, just that
it can take advantage of them if available.

~~~
tluyben2
The version I downloaded from the site does (did?) depend on AVX instructions
and would die on startup. There are other ('nightly builds') that fix this.
One of the goals of the current team seems to be portability.

~~~
scottlocklin
FWIIW J is moving in this direction as well; the last release version of it
(807) needed a special build for older intel hardware (they subsequently made
separate binaries for older hardware). AVX has been around since 2011/2012, so
it doesn't seem like that big a deal.

Admittedly I myself have older hardware kicking around I see no real reason to
upgrade, but AVX is such a big win on array processing you might as well make
it a dependency for the kinds of use cases Shakti or J get used for in
business.

------
emanuensis
As a k email list reader i noted in

[https://news.ycombinator.com/item?id=21660005](https://news.ycombinator.com/item?id=21660005)

that shakti(k7) MAY become FOSS...

~~~
SifJar
highly unlikely it'll go that route IMO, the messages in google group are just
wishful thinking

------
burtonator
I'm super skeptical of these articles about programmers with superhuman
powers.

~~~
jshaqaw
I suggest reading the J interpreter Whitney wrote in a weekend and
reconsidering!

[https://code.jsoftware.com/wiki/Essays/Incunabulum](https://code.jsoftware.com/wiki/Essays/Incunabulum)

~~~
kuroguro
Writing in that style is not that hard, I do something similar if I need to
make a prototype of something. The short variable and function names really
save time as long as you remember what is what. Reading or debugging it even a
week later is torture tho.

Writing the interpreter in an afternoon is impressive but most likely he had
done something similar before and had most of the algorithm in his head before
he even began.

You don't need to be superhuman to do that.

------
anon1m0us
I'd like to see some real metrics. All the words are subjective and the
technical ones are common to other databases as well. How many results can it
return in how many seconds from a table with how many rows? Can it do full
text indexing? Can it aggregate?

It feels like something from wallstreet, the language is almost pump and dump
"I have this really great stock tip from this guy who made millions once, he's
a legend. He says buy, so you should."

We need more tangible and quantifiable information about this database. Until
then it's not even hype it's vaporware.

~~~
kick
Calling anything from Arthur "vaporware" is really laughable.

This article isn't great, but it's from a finance site, not a technical site:
of course it's going to be low on technical details. You would think that
people who want technical details would know how to use a search engine.

Further, kdb/kdb+ (which Shakti is the successor of) is a columnar-based
database. "How many results can it return [...] from a table with how many
rows?" is completely irrelevant.

There are hundreds of benchmarks for kdb+, and the speed of k is unrivaled.

~~~
iskander
>There are hundreds of benchmarks for kdb+, and the speed of k is unrivaled.

Eh, I worked on array languages in grad school and the speed of k was
significantly over-hyped. It's trivial to beet with Python and no contest if
you want to work in a compiled language. Wall Street develops weird religions
around tech.

~~~
beagle3
In my experience it is mostly trivial to beat with C if you adopt columnar
storage (which isn’t idiomatic), but even numexpr doesn’t beat k (though it
seems like it should), and numpy is still way behind; pandas and plain python
not even in the same ballpark.

What approach did you use?

The one thing nothing beats k on is functionality per time spent (if you are
proficient). It is often 100 times shorter but only takes 10 times as long to
write.

~~~
iskander
I'm going to sidestep arguments about its terseness since some people are into
that and it's really an aesthetic choice. But for machine learning or
numerical algorithms, I found it quite hard to make k nearly as fast as
vanilla NumPy code. Overall there's really no good reason for it to be faster.
It didn't have (as of ~2011) a JIT or any mechanism for operator fusion, so
you end up allocating many intermediates, sometimes using quadratic space
which gets immediately reduced along some axis.

Yeah, I know "Arthur Whitney is really smart"...but that's not actually a
technical reason why k would be faster. I don't remember the specifics but I
think that many of the array operators weren't even multi-threaded and I
remember only a few of the array operators getting FLOPs that would correspond
to SIMD acceleration. So, for the most part, it seemed like a lot of vanilla
single threaded C code behind the operators.

It's possible that the implementation got smarter over the past decade, but
when I was working in the space it seemed like the hagiography and actual
performance were worlds apart.

~~~
beagle3
I didn't try modern machine learning stuff, so I cannot comment on that.

It doesn't do operator fusion or JIT, but neither does numpy ... numexpr,
numba and pythran do, but you were referring to "plain python" in your
original post.

Indeed, K is mostly vanilla single threaded code. When I started using K, i
was also thinking "neat, simplifiied APL, but it won't be fast" (I had
previous positive experience with APL). But it was way faster than plain
threaded code should be. The reasons, AFAICT at the time were:

1\. Access patterns are extremely predictable, which means much of the data is
in L1 much of the time; That, on its own, delivers a factor of ~10 speed
compared to having the same data scattered in L2 if you are memory bound.

2\. The code is incredibly small and in 2003 when I looked at it, mostly fit
inside the I-cache, which meant even the branch prediction available at the
time was doing way better than one would expect.

3\. The primitives are very well chosen and play well together.

The hidden elephant in the china shop is that your code has to be idiomatic
for these to matter. See e.g. [0] from Stevan Apter, for aggregating functions
in a toy database e.g. group[avg] - the version he uses, vs. 4x slower one in
the comment.

I have no doubt it is possibly to beat k with numpy; it might even be easy if
you are more proficient with the latter than with the former. However, for the
kind of exploratory data work I did in the day, it is very hard to beat
overall when you take thinking, debugging and experimenting into account. And
if/when you have converged onto a well-enough defined computational pipeline,
the best tool for that specific job (C++, CUDA, TF, MKL) is often the only
reasonable choice.

> when I was working in the space it seemed like the hagiography and actual
> performance were worlds apart.

I don't know who you were talking to, but for me it was always an optimization
on the "effort/result" metric, not on either on its own. Python often requires
less effort, but iteration runtime speed (even with numpy) is horrible. C+CUDA
delivers faster results but the iteration development speed is horrible. For
me K struck a sweet spot. YMMV

[0] [http://www.nsl.com/k/t.k](http://www.nsl.com/k/t.k)

------
smabie
q is an interesting language and am using it for the advent of code. While the
language superficially seems simple, it features a lot of hidden complexity
and dubious design decisions. Every operator is massively overloaded on rank
and type. There are functions that accept functions or lists or atoms as
arguments: supposedly there’s supposed to be mathematical commonality, but
coming from the statistically typed FP world, this kind of polymorphism makes
me extremely uncomfortable.

The dynamic typing bugs me as well, it would really help to have an editor be
able to tell ahead of time whether an expression is valid.

Speaking of valid code, the error messages are absolutely terrible and stink
of arrogance. I’m talking about ed level error messages: no words, no pointing
out the error, just some one letter message that is supposed to indicate a
type or function argument error.

A lot of the culture of the language is built in Arthur Whitney’s own image:
error messages are bad, comments are for stupid people, white space is for
“Qbies”

Speaking of cultural, the jargon of the language is didactic and annoying. All
over the documentation references to monadic or niladic or diadic functions.
Also calling things verbs or adverbs or nouns is annoying and unhelpful. Even
in the book “Q for mortals” the author prizes using different terminology than
literally anyone else and strikes me as trying to be deliberately obtuse. He
even spends an entire chapter talking about the mathematical nature of natural
numbers, which strikes me as both unhelpful and narcissistic.

Going back to the language, not only are functions overloaded to a ridiculous
level, but the syntax of the language itself is inconsistent. Multiple ways to
call a function, all in the name of reducing unnecessary typing. Also the
syntax and name of some functions strikes me as very add: ‘/‘ is not used for
division, but instead for both comments and folds. ‘,’ is used for appending
lists, while ‘;’ is used to separate function parameters. Instead of parents,
Q uses ‘[]’ for function parameters. Here’s an example of finding the square
of the hypotenuse in idiomatic q:

pyth_sq:{x^2+y^2}

One can use xyz without explicitly declaring them as parameters, which is both
nice and arbitrary. I would much prefer to have something like Scala’s
underscore or maybe something like _1,_2,... Here’s the function written with
explicit parameters:

pyth_sq:{[x;y]x^2+y^2}

Personally I think a syntax (and spacing) like:

pyth_sq = { (x, y) => x^2 + y^2 }

would be nicer.

While the naming of the functions and syntax is arbitrary, I beginning to
understand the rational and design of the language. For Whitney, code is the
root of all evil: less code, less bugs. Also less code, the more you can fit
in your head. This is the reason why everything is overloaded I think: because
Whitney ran out of ASCII symbols and didn’t want to use names (more
information to remember).

Regardless, the real reason people use q isn’t because of the language, it’s
often despite of it. The kdb+ database and q sharing the same memory space is
the real advantage. The performance of the system is so exceptional that a
single server can outperform even a large Hadoop cluster: the code is out of
this world optimized.

I’ve began to think that q is a natural and easier to use extension of Chuck
Moore and Forth’s philosophy and language. The languages both share the same
tenets of performance, terseness, and minimal syntax. While forth works on a
fixed number of stacks, q works on a arbitrary number of n-dimensional
vectors, a natural extension to the one dimensional stack.

Anyways, sorry for the rant. I’m still learning q so take what I say with a
grain of salt. I can imagine an expert q developer becoming extremely
productive and comfortable: I’m personally just not there yet. When I finish
advent of code, I’ll have a better perspective.

~~~
yiyus
> Speaking of cultural, the jargon of the language is didactic and annoying.
> All over the documentation references to monadic or niladic or diadic
> functions. Also calling things verbs or adverbs or nouns is annoying and
> unhelpful.

That is a valid opinion, but this is not a k thing. The same jargon is used by
APL, J and practically any other array programming language. In fact, the term
monad was first used in this context [1].

[1]
[https://en.wikipedia.org/wiki/Monad_(functional_programming)...](https://en.wikipedia.org/wiki/Monad_\(functional_programming\)#History)

------
__s
Ten trillion rows, zero information, nice

