Hacker News new | past | comments | ask | show | jobs | submit login
Wild claims about K performance (mlochbaum.github.io)
119 points by tosh on Aug 31, 2021 | hide | past | favorite | 70 comments

The notes by mlochbaum were also discussed briefly in the most recent ArrayCast episode on k


Yeah the fact that they forbid benchmarks pretty much tells you all you need to know. Anyway once you get to something that autovectorises array operations there's not really anywhere to go is there?

I seriously doubt it is really any faster than Matlab, Fortran, Eigen, etc. etc.

According to DeWitt himself, the DeWitt clause came about because he figured out that Oracle had no query optimizer.

However, the response of a lot of database vendors was to then include precomputed results for entire benchmarks, and also to have code that tried to figure out if you were running one of them. This obviously adds no value to the user, but is great for marketing.

Arthur is a non-nonsense guy, and, according to him, there's a ton of hand-tuned assembly language code in the intepreter. To make the most of it, however, you need to know how to set up the machine properly. Most people don't, and First Derivatives and Shakti don't have enough marketing people or consultants to set up machines for test runs or to explain away bad results.

In any event, k/kdb/q are wickedly difficult languages to program, but compared to modern C++ they are far faster to program and more flexible, and compared to pandas, they are faster and more likely to not have some massive performance hit due to a scalar function. I'm sure that they miss some optimizations that a good C++ compiler targeting a processor with some of the newer instructions can do, but they allow you to write fairly performant code that operates on vectors relatively easily.

> Yeah the fact that they forbid benchmarks pretty much tells you all you need to know.

I don't think this is the case. Oracle and MS SQL have the same clause and it's pretty hard to believe they aren't competitive with postgres and mysql given the resources behind the former.

More so it's just annoying of them to have this restriction.

> Oracle and MS SQL have the same clause and it's pretty hard to believe they aren't competitive with postgres and mysql given the resources behind the former.

I'm not trying to refute you here, just add some nuance.

I've used Oracle extensively for 3 years: installation, configuration, tuning, writing code against, querying for reporting, updating customer configuration etc.

There are things you can do in Oracle that can't be done in Postgres yet.

But I've never ever experienced a practical performance problem on Postgres that would magically disappear with Oracle.

I assume they are there (I'm not into running huge DBs with thousands of subscribers).

Also in my opinion it all comes down to knowing your database:

In 2009 I saw a Sybase Adaptive Server Anywhere -> MSSQL at the time migration fail miserably at the first migration attempt. Everything was ported correctly but it turns out Sybase ASA had "implicit indexes" that we needed to replicate.

In most cases I'd choose Postgres (or even SQLite) myself and in the rest I wouldn't immediately choose Oracle or MSSQL but rather stop and think carefully, do some tests and only then decide, I do not take it for granted that commercial databases perform better and they absolutely are more painful to manage.

> I do not take it for granted that commercial databases perform better and they absolutely are more painful to manage.

Oh for sure. It took me forever to figure out how to get Oracle running in a docker image as a new Oracle user. I'm not sure if I ever even figured out how to create a table. Postgres was hard to learn initially coming from the simplicity of MySQL and Oracle feels another step harder.

I didn't say I expect commercial software to perform better though, just that I expect them to be competitive.

I also wouldn't choose a commercial db personally but dunking on them doesn't make sense either. The engineers behind them are just as good as those behind OSS.


This is extremely easy to believe.

No one who has done serious work across the database platforms believes they restrict benchmarks because they're behind or uncompetitive. They both make very capable, competitive database systems.

The problem, as cited in the linked page, is that "It takes much more work to refute bad benchmarks than to produce them". We see on HN with regularity where people post egregiously flawed benchmarks, usually to demonstrate some preconceived notion or other. And while that can usually be ascribed to ignorance or sloppiness, it's trivial for vendors to contrive benchmarks that are purpose-suited to make their own product look good and the competition look bad, however completely artificial and unrealistic the scenario is.

DeWitt clauses are an abomination. They should not exist. But I get why they exist and are threatened, even if they are basically never actually enforced (seriously though -- are they ever actually enforced?).

I think even flawed benchmarks are good benchmarks.

Because real world code isn't optimized to the max. Because real world code makes bad or even wrong assumptions. Because there are usually much more requirements to real world code than "be as fast as you can".

Additionally, those flawed benchmarks make you think about a lot of interesting details. Details you usually don't think about when producing real world code.

Also it's a little bit alike as with statistics. "Don't trust any statistics that you haven't fabricated yourself". Still statistics are useful in general.

There's flawed benchmarks and there are completely wrong benchmarks.

I've seen so many benchmarks where the timing or loop overhead dominated to the point that they weren't measuring anything but the cost of taking timing values

I've seen benchmarks where they run a single time without warming up the disk cache, so whichever benchmark they happen to run first shows as 2 orders of magnitude slower.

I've seen benchmarks where the standard deviation was much larger than the difference between the times reported, but they didn't bother to check for this.

I've seen benchmarks where they don't even use the same algorithm, and the difference in the algorithm chosen dominates any other concerns.

Exactly! And even those very wrong results give quite interesting insides.

Such stuff is worthless as benchmarks, sure. But all the other things you can learn form such stuff is sometimes even quite fascinating.

So discussing completely wrong benchmarks has imho often some value.

It's entirely possible that the net utility of such benchmarks being published is negative; enough people come away with the initial wrong conclusion without critically investigating it.

It's almost certainly true that the net value to the company whose product looks bad in the benchmarks is negative, which is why these clauses exist.

Of course you can learn from these. But do they describe anything related to the product at hand?

If the benchmarks were the kind of bad that you allude to in your second statement it would probably be pretty reasonable for the reasons you state, and overall you would probably end up with a distribution of benchmarks that approximated overall performance. What about intentionally crafted, nigh pathological examples? With the kind of flaws that don't crop up in real world code? I don't think those have value.

Actually, I think I'm trying to distinguish between flawed benchmarks like you are talking about and actively malignant benchmarks.

Nah, flawed data is pretty much unusable. Like, if I were to benchmark some really short java program by restarting the JVM each time, do I get to say that it is slow? No, because I measured some utterly stupid metric.

I was on a team that migrated an app from SQL Server to Postgres a some years ago, and the performance dropped by around 30%.

Of course, due to the benchmark rules, I'm not allowed to post about it... ;)

That being said, in our application it came down to three main issues:

- Postgres doesn't allow optimizer hints, and the optimizer generally produced worse plans than Microsoft's (e.g. CTE were an optimization fence in Postgres).

- Postgres at the time did not support parallel queries.

- Postgres connection handling is terrible. Pgbouncer helps but introduces its own problems.

- Database bloat / vacuum issues

Generally I found SQL Server would perform very well with no tuning and minimal maintenance. And generally, if it doesn't do the right thing, a query hint will fix it. Postgres required a lot more configuration and testing to obtain acceptable performance.

We still switched to Postgres. It has lots of other benefits - and none of the licensing.

> Generally I found SQL Server would perform very well

Every database engine makes different trade-offs, or does some things better than others.

I've found that SQL Server handles complex queries over large data sets fantastically well, easily outperforming Oracle or open-source DBMS systems in like-for-like scenarios. It often papers over missing indexes shockingly well, because it automatically uses temporary hash indexes that it can build in parallel. Up to about ten million rows this can be fast enough that developers don't even notice that they forgot an index!

However, it has very high minimum latency that cannot be fixed through any mechanism. Not via RDMA, shared memory, named pipes, local loopback, or any other means. It always takes SQL Server about 125-150 microseconds to respond to a query, not matter how trivial, such as "SELECT 1". This makes it terrible as a KV store, and makes some types of ORM very sluggish when chasing references in code.

Similarly, there have been several articles published saying that while Postgres is well rounded and feature-full, it has high write-amplification that makes it an absolute no-go past a certain scale. Many orgs have written long blogs about how they started with PG and were forced to migrate to MySQL to get to the required write throughput.

I feel that the DBMS space is still immature, and there are very fundamental things missing. E.g.: There are no enterprise-ready open source pure in-memory relational database engines that have capabilities comparable to SAP HANA. I'm also yet to see a DBMS with decent programmability, so that developers don't need to write App->ORM->DB layers over and over.

The first two are already fixed, the third is fixed in v14.

The fourth will likely never be fixed as it is a byproduct of the extremely fine grained transactional capabilities in Postgres (which brings the advantage of fewer deadlocks). However, if you are disinclined to make that tradeoff, I believe that there is a backend in development that is designed to reduce bloat, using the new pluggable storage engine API.

I'm aware some of these things have changed, but equally SQL server has likely developed since then too.

That said for #1 are you saying PG now allows query hints or just the specific example is fixed? There are a lot of cases still where the optimiser does the wrong thing and with PG you're powerless to fix it.

With #3, do you have any more information about that? I can't find any on a brief Google.

I was specifically referring to the optimization fence on CTEs. PG still doesn't have query hints, and it seems like the core team has an ideological opposition to building query hint capabilities. Their stance has been that query hints often lead to suboptimal behavior because queries often don't change over time, but data sets do, so relying on statistics allows your query plans to change with different data. A query hint, on the other hand, may lead to an optimal query execution today, but a terrible one tomorrow.

Over the years Postgres has steadily chipped away at almost all of the reasons why I would ever want query hints. Only two frustrations remain:

* the optimizer is not aware of TOAST storage. This means that for tables with lots of large objects (geospatial in my case), the optimizer only sees the costs of the table with a pointer to the large object, as opposed to the cost of pulling table with the large object.

* The optimizer only has singular statically configured costs for functions and aggregates, but costs can actually dramatically vary based on the size of the data inputs, the size of the aggregate "window", etc.

Overall, these are rarely a big enough deal for me to not choose Postgres. I see the default planner behavior as better than the default planners for both Oracle and SQL Server, but it is still possible that the lack of query hints could be a deal breaker. If you get a chance, give it another try...I think you might be surprised at how far Postgres has come since the 9.X days.

As far as number 3, I guess I should clarify that they've dramatically reduced the connection overhead, but I shouldn't have said "fixed". It may or may not be reduced enough for your requirements. Here is a blog post with some links to various commits: https://www.depesz.com/2020/08/25/waiting-for-postgresql-14-...

Some rudimentary benchmark comparisons here: https://github.com/digoal/blog/blob/master/202008/20200817_0...

Also, wrt #4, I found one of the projects to reduce bloat with a different storage engine: https://cybertec-postgresql.github.io/zheap/

I do actually use Postgres still, despite its limitations.

I dislike their ideological opposition to query hints, partially because I've been bitten enough by the optimiser deciding in production to change the query plan to something that kills performance. If anything, I'd rather be able to pin the entire query plan! Obviously a better optimiser is ideal, but query plan changes are a big risk to app stability that are hard to mitigate. AWS Aurora for PGSQL actually has a query plan management system which helps with this.

Anyway, the difference between PG and the others doesn't make up for the licensing costs. Even in the specific cases where MSSQL does far, far better, the licensing costs still eliminate most of the benefit.

The query hint thing is a serious issue. I get the concern that people won't report issues and so on, but on heavily loaded DBs having no decent operational response to a bad plan change is really scary. In an ideal world something like oracle's plan stability would be pretty helpful.

Where can I read more about improved connection handling in v14?

How is that hard to believe? Clearly Oracle itself doesn't believe it. :P

Without general theories, the only reasonable question is "Faster for What"?

What are generalizable theories of 'fast'? Fewer bits moving less with fewer instructions, and using the most suitable, specialized hardware available.

Even once you have a reasonably well-defined goal, if it's complex enough you may start hitting subsidiary questions, like numerical precision and stability.

"This car is the fastest one on the planet, but we legally forbid you from talking about how fast it can go."


Not quite. More like "but we contractually forbid you from publishing the results of any speed tests without contacting us first."

It's possible, even probable, they look at your results and then give you permission to publish.

As long as they do not refuse permission or delay giving you an answer indefinitely, asking you to contact them first before publishing is arguably a reasonable requirement.

> It's possible, even probable, they look at your results and then give you permission to publish.

Which makes observable results inherently biased and any comparison useless.

How so. I dont understand.

Because what constitutes "bad" benchmark is subjective and db providers are incentivized to stop publishing benchmarks that put them in bad light.

And even if all they do is filter out actually crappy benchmarks - other db providers (like postgres or mysql) don't do that - so the comparison still isn't fair.

It's like you had 2 candidates for a job and phoned their previous bosses, but candidate A forced you to only phone guys he approves of.

Usually there would be some language that states they cannot refuse permission to publish unless they have reasonable grounds. Your hypothetical, i.e., a "bad" benchmark, would not qualify as reasonable grounds. Blocking the publication of benchmarks because they do not suit one's business interests is not going to be an enforceable provision in a license agreement. I challenge you to name a single case that went to trial over such a provision where a court upheld the provision's enforcement.

Might be worth considering a world in which legal pressure was enough to modify peoples behavior, even if in practice it was unenforceable. Still, wouldn't mind some examples, or a case study.

It's probably a clever thing lawyers put in there. I doubt it's enforceable.

So the author is incredulous but provides no proof or stats, just a long rant.

Another overlooked "performance" feature of terse languages is that of the developer. Vector based, concise languages give you more in one line of code. And I don't find that readability suffers. Yes you need to stare at that one line longer, but there's more going on.

I wrote this article to link in response to evidence-free claims about K, and have in fact used it in this way twice in the two months since I wrote it. Didn't expect HN in general to take an interest. So I need to stress again that K is a really cool language and I like it a lot!

If the article can provoke clear benchmarks against current Dyalog or J then that would be an ideal outcome, even if it proves me wrong. I already see some benchmarks linked here and will try to analyze these.

> So the author is incredulous but provides no proof or stats, just a long rant.

I thought the same thing but on the other hand, what can you do? If you're not allowed to publish benchmarks you only can rant like this even if it sounds petty.

By getting attention here on HN it helps put pressure on commercial vendors to stop sharing unconfirmable benchmarks otherwise they will lose credibility since they've been called out.

The author does make some good points about the fundamental nature of interpreters too. Although it's hard to confirm there too what these languages do without seeing their source.

I mean that if K or some language is actually interpreting without a virtual machine at some stage, that is indeed going to be slower than any language with code running on a virtual machine.

And also that any code that runs on a virtual machine must be slower than an equivalently intelligent compiled code since it's running on another layer of abstraction that compiled code isn't. ("equivalently intelligent" because yes a virtual machine could be faster than terrible compiled code.)

> I mean that if K or some language is actually interpreting without a virtual machine at some stage, that is indeed going to be slower than any language with code running on a virtual machine.

Not necessarily. The VM itself will introduce overhead and potentially blow out cache etc.. You can argue, as the article does, that this is a small proportion of execution time if you're executing a lot of loops, but that's inherently workload-dependent. If the language being interpreted is denser than the VM bytecode - something that's believable for a language like K - then interpreting directly makes more efficient use of memory bandwidth than interpreting via bytecode.

That's an interesting idea, I missed that.

Has anyone published a PoC of this (not necessarily using any of these languages) that would give some quantitative credence?

"And also that any language with a virtual machine must be slower than an equivalently intelligent compiled code since it's running on another layer of abstraction that compiled code isn't. "

I think this is not correct. A VM has access to the data that the code is operating on, while an AOT compiler doesn't. It's possible that a VM (eg. JVM) produces faster code than a normal compiler.

I'm talking about where the code is run. If it's run on a VM it's run on a VM. If it's run on the cpu by JIT-ed code it's not running on a VM.

Edit: But fair I did say "language with a VM" which would include Java JIT-ed or not.

Edit, edit: Modified the original comment to say "code on a VM" not "language with a VM".

Sadly it also has to consider the time it takes to compile code like that - we grumble, but not too loudly, at the time LLVM takes to compile things. If the JVM took that same time whilst running live that wouldn't be acceptable.

Surely this depends entirely on the total runtime of the program. A server-style program which will run for a couple of weeks can accommodate quite a bit of JIT compilation overhead.

> If you're not allowed to publish benchmarks you only can rant like this even if it sounds petty.

You can repeat whole-system benchmarks like STAC[1] or that some bloggers do[2], which specify the hardware and software, inputs and outputs for a problem. You can always do that.

[1]: https://stacresearch.com/m3

[2]: https://tech.marksblogg.com/billion-nyc-taxi-kdb.html

Interesting, but I don't understand how this is allowed under the license?

Edit: it's not that these licenses says you cannot ever publish benchmarks it's that they must be approved. So I assume these benchmarks were approved.

I'm not familiar with k, and I'm not understanding this whole "no benchmarks" thing.

So it's a proprietary lang? What stops benchmarks being run? Just how ridiculous are the licenses? Why has seemingly nobody run them anyway?

> I'm not understanding this whole "no benchmarks" thing.

Back in the 1980s, there was a paper published[1] that compared a number of database systems, and Oracle felt this paper harmed them. Perhaps if the authors knew Oracle better, they could have made more favourable benchmarks, they would argue, and whilst perhaps DeWitt et al could agree to that, they may point out they had not intended to harm Oracle, and being researchers and not marketers were given no notice or expectation that such a review could harm Oracle.

And so for the avoidance of doubt, Oracle, Microsoft, and many others (probably including KX, although I've never checked) insist in their licensing agreements that they be able to approve benchmarks, ostensibly so that they have an opportunity to correct any misunderstandings the author might have about the software before.

[1]: http://pages.cs.wisc.edu/~dewitt/includes/benchmarking/vldb8...

These benchmark things reminds me of an anecdote I saw today in the D group:

>Back in the 80's (!) computer magazines regularly ran C compiler benchmark results. (At one time I counted 30 C compilers available for the PC.)

>I took a class at Standford on compilers, which included a lot of info on data flow analysis. I decided to implement it. Data flow analysis optimizations basically deleted the benchmark code, because it didn't do anything (lots of dead assignments). This compiler was released as Datalight Optimum C.

>The article writer concluded that the compiler had a giant bug in it, because it ran the benchmarks impossibly fast, because it deleted the dead code. Instead of being lauded as the only C compiler on the PC that did global data flow analysis, it was labeled as a buggy piece of crap.

>By the time we dug ourselves out of that PR disaster, other compilers had implemented it, too.

>Ironically, these days DMD gets repeatedly charged with not doing data flow analysis, and clang is assumed to have invented data flow analysis.

>Just goes to show the power of marketing

> Back in the 1980s, there was a paper published[1] that compared a number of database systems, and Oracle felt this paper harmed them.

This is rather an argument why the benchmark code should be made public so that other people can check it and vendors can post modified versions if they feel treated unfairly.

Not sure about K, but I know for example that MS didn't allow publishing performance benchmarks of .NET (not sure of this changed with .NET core).

Both Oracle and MS SQL disallow this without approval.


I'm not aware of that ever being true for .net, but it's definitely not true now.


I think the article answers all your questions.

Here are some (micro) benchmarks of q vs python that they did publish: https://kx.com/blog/a-comparison-of-python-and-q-for-data-pr...

tl;dr, while q is orders of magnitude faster that naive python, it is, at best, 5 times faster than optimal python using numpy. Based on those numbers I would be very surprised if an optimal C program wouldn't be significantly faster than both.

Almost! I can't quite replicate these tests because it doesn't give the type (integer or float, and width) of the database columns. There are many possibilities[0] so I tested a few. Fortunately the testing CPU (Intel Xeon E5-2650 v4 @ 2.20 GHz) is comparable to mine (Intel Core i5-6200U CPU @ 2.30GHz). Turbo (one-core speed) is 3.00 GHz versus 2.80 GHz with advantage to the Xeon, but my laptop CPU is a year newer and on a smaller process so could have better IPC. Both have AVX2, and the server CPU has a much better cache, but I don't think Dyalog is bandwidth limited.

I adapted the vectorized Dőtsch solution, which gave the fastest times and is easy to write, to Dyalog APL. Pre-release 18.0 from when I worked there because I can't be bothered to do a real install, but I doubt the arithmetic code's been changed since 17.1. Here are results from a Dyalog session giving times in seconds for 8-byte floats, 4-byte integers, and 1-byte integers. From the article, Q takes 13E¯3 seconds.

          )copy dfns cmpx
          (sA sB pA pB)←?{1e6⍴  0}¨⍳4 ⋄ cmpx '(sA×pA≥pB) + sB×pA≤pB'
          (sA sB pA pB)←?{1e6⍴1e9}¨⍳4 ⋄ cmpx '(sA×pA≥pB) + sB×pA≤pB'
          (sA sB pA pB)←?{1e6⍴120}¨⍳4 ⋄ cmpx '(sA×pA≥pB) + sB×pA≤pB'
If the table consists of floats then Dyalog appears substantially faster, although this could plausibly be due to better IPC on my CPU. It could also be a real increase. Dyalog uses bit booleans for the comparison results which allows it to make smaller reads and writes, and code for packing results to bit booleans and multiplying floats by booleans does have to be written with vector intrinsics ([1] indicates Q doesn't pack booleans). If the benchmark uses 8-byte ints then they should be comparable to floats, and if it's using smaller ints then Dyalog is clearly much better.

Will see if I can get numpy benchmarks running.

[0] https://code.kx.com/q/basics/datatypes/

[1] https://code.kx.com/q4m3/3_Lists/#323-simple-binary-lists

Ran the following numpy program, which resulted in a time of 0.01116s, faster than the article's benchmark but slightly slower than my Dyalog timing of 0.0093s. Assuming relative times are consistent across CPUs, this would put Dyalog on 8-byte floats just ahead of Q on 8-byte ints. Floating-point vector instructions take longer than integer ones; while overflow checking could turn things the other way, K and Q famously don't do it. It's not consistent with a sum of 100 runs as geocar suggests.

    import numpy as np
    import time

    N = 1000 * 1000
    pA = np.random.randint(0, 5, N)
    pB = np.random.randint(0, 5, N)
    sA = np.random.randint(0, 100, N)
    sB = np.random.randint(0, 100, N)

    start = time.perf_counter()
    runs = 100
    for i in range(runs):
        bS = sA * (pA >= pB) + sB * (pA <= pB)
    end = time.perf_counter()
    print((end - start)/runs)

They're 8-byte ints, and the reported times are sum of 100 runs. This is what my 2019 i5 1.6ghz Macbook Air does:

    q)`sA`sB`pA`pB set'4 0N#1000000?100
    q)\t:100 (sA*pA>=pB)+sB*pA<=pB

Where do you get this information? Do you have some connection to the article?

It appears this benchmark uses vectors of a quarter-million, not a million, elements? My understanding is that 4 0N#a redistributes elements of a into four vectors without repeating them, and this is what ngn/k does. The article says a "sample table of size one million", refers to "one million rows" elsewhere, and later gives timings that scale linearly with the number of columns, so I don't think that's what is meant.

EDIT: Oh, there's generation code near the top. In an image for some reason so the Q version is transcribed below. Definitely a million rows. Seems both Numpy's random.randint and Q default to 8-byte ints? Dyalog would use 1-byte integers if the data fits, as it does in this example.

    N: 1000 * 1000

    t: ([] pA: N?5; pB: N?5
           sA: N?100; sB: N?100)

The author explicitly states that K licence prevents him from providing stats.

I'm surprised nevertheless that this is enforceable. Is there a precedent of someone being sued and losing?

If you're reliant on a vendor for a service why risk getting on their bad side?

What makes you think everyone doing benchmarks is reliant? it could be research, it could be curiosity, you actually could be competing...

Edit: and nevertheless is quite depressing that as a paying customer you should be afraid


You might enjoy the language created by the author of the article, then. BQN is a language that favours terseness, even above and beyond k. It is in many ways an improvement on APL and J, importantly through a more convenient set of operators than found in either.

Personally from what I've seen I wouldn't say BQN values terseness more than k, probably less than most other array langs.

(Edit: if you downvoted this: why?)

I agree; I don't think BQN values terseness more than k. I think k values terseness a lot.

I think it may be more convenient to write terse code in BQN over k, because there is greater facility to control function calls in a terse way. Things like under and over are quite common patterns that BQN finds easier (and shorter) than k.

I've not spent a great deal of time yet with BQN, so this could be wrong.

Yeah, under and over are great (wish k had more 'combinators' in general), but k does manage to cram a lot of stuff in with overloads. I feel like k is terser in general but I don't really have much evidence to back it up, just from playing a bit with both of them.

The different character sets make it impossible to say what this would mean. My limited code golf experience is that most often BQN solutions use fewer characters than K ones, but BQN has 63 primitive glyphs to K's 25 or so (depending on version). I do think K design values using its limited resources to enable terse code more than BQN, but if screen space is what you care about then BQN will tend to use less of it.

Concise languages are also more fun to use in a REPL (another aspect of "performance")

>So the author is incredulous but provides no proof or stats, just a long rant.

Isn't the burden on proof on the ones making the claim for the speed of their product?

Incredulity should be the default scientific stance here...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact