
Kerf: a columnar tick database for Linux, OS X, BSD, iOS, Android - luu
https://github.com/kevinlawler/kerf
======
CyberShadow
Do I understand correctly that this is proprietary (closed-source) software,
even though it's hosted on GitHub? This repository seems to contain two binary
blobs without source code.

~~~
paulasmuth
I don't get all the negativity RE: kerf not being open source in this thread.
It's obvious the implementation is not free software (check the README in the
actual submission) because the author wants to sell commercial licenses.
Nothing inherently bad about that -- some of us need to make a living.

The query language and concepts which are explained in the documentation are
an exciting/novel (at least to me) contribution and "out in the open" for
everybody to see/discuss/adapt/improve upon. Let's not dismiss that because we
can't read the source.

~~~
jcwilde
Strangely, there was no negatively in the comment you replied to, they were
simply asking if is is open-source or not, as it didn't align with their
expectations of github-hosted projects (I would argue this would be the case
for most people).

Contrary to what you state, it is _not_ clear from the README that this is a
commercial venture, just that the author provides an email address for any
queries in relation to licensing, and actually includes _no_ copyright or
restrictions on running the code in the provided documentation. One might even
assume that since we've been provided a copy of the binary without any
notices, we might be free to run said code without restrictions... but really,
who knows? Hence the OP's questions.

------
paulasmuth
Off topic; It looks like you linked the linux binary completely statically. I
am curious why you decided to do this and how you achieved it. Did you use
something other than glibc to compile these binaries?

Also did you roll the SQL engine from scratch or does it built on some
existing parser/runtime? Would love to learn more.

~~~
kcl
Linux linking: I wasn't doing anything special with the linking. The Linux
binary was compiled on a VirtualBox instance of Mint and stripped and
compressed, so it's possible your tools are misreporting, or I did something I
wasn't expecting. At any rate, I haven't made that as a conscious design
choice, and it's easy to change (fix?) compilation styles in a future build.

SQL engine: all from scratch

~~~
paulasmuth
Awesome stuff, I am actually hacking on something similar, so I would love to
hear your thoughts RE: supporting "windowed" aggregations. (E.g. a moving
average over a timeseries).

Are you planning on adding that and which syntax are you planning to use? Have
you looked at postgres "OVER PARTITION"? While it seems powerful it's also
fairly unintuitive IMHO. I was experimenting with adding a GROUP BY clause
that allows each input row to appear in more than one group in the result set.
Something like:

    
    
        SELECT time, mean(value) FROM mymetric GROUP OVER TIMEWINDOW(time, 60);

~~~
electrum
You can do that like this: GROUP BY date_trunc('minute', time)

~~~
paulasmuth
The snippet you posted will compute an aggregation based on a fixed time
interval. I.e. it will put every row into a "bucket" of per-minute granularity
and then compute an aggregate function for each of those buckets bucket,
taking into account only the rows that ended up in that specific bucket (i.e.
only rows from that specific minute). To put it another way this is asking the
question "Please give me the aggregate of some value per minute".

To make my question more precise; I was trying to ask specifically about a
"moving window aggregation" (e.g. a moving average over a timeseries). This is
more like asking the question "Please give me every minute an aggregate based
on all values in the last N minutes". To do that you need each input row to
end up in more than one bucket (or have a special type of aggregation function
like postgres does).

For example, if you were doing a moving aggregation with a 1-minute interval
("bucket size") and a 5 minute window ("lookback"), you would need to place
each row into 5 buckets: The bucket into which it belongs based on it's
timestamp and the 4 previous buckets. And a vanilla SQL GROUP BY can't do
that.

Hope that makes sense.

------
siganakis
I am really interested in this as it seems like a more accessible version of
KDB - especially in its support for SQL.

I'd like to look into it further, but I can't find any information about
licensing. Given that there is no source code in the repo, it appears that
this isn't an open source project.

~~~
e12e
I would generally consider it a bug to not have any form of license in the
repo, but from the top of the Readme (last modified 10 days ago):

"Contact Kevin (e.g., licensing, feature/documentation requests):
k.concerns@gmail.com"

So if you're interested, I suggest you send Kevin an email.

Personally I think it would've been interesting if it was Free Software -- as
this doesn't come with any license, it's essentially less useful than the
32bit kdb[1] -- and I can't imagine it's as feature complete, or has a similar
level of documentation, support or real-world testing.

If the intention is to make the binaries deployable/usable, a license
file/note in the Readme would be the absolute minimum requirement IMNHO.

Also find it strange to host just binary artefacts on github -- I can't see
how that's useful for the author or the users. I suppose one could submit
pull-requests against the python api, but then it would make more sense to
split the repo in two -- one for the source-available (unknown license) python
code -- and host the binary artefacts somewhere.

[1] [http://kx.com/software-download.php](http://kx.com/software-download.php)

------
mkevac
What is tick?

~~~
pmontra
I had no idea and googled this page
[https://www.mongodb.com/post/45116404296/how-banks-use-
mongo...](https://www.mongodb.com/post/45116404296/how-banks-use-mongodb-as-a-
tick-database)

"Tick databases are real-time database servers for capturing, managing, and
processing market data."

They go into the requirements after that.

------
thikonom
A somewhat similar project by Man Investments:
[https://github.com/manahl/arctic](https://github.com/manahl/arctic)

------
napkindrawing
As a data nerd, I got very excited once I read all the features, but I'm just
assuming this will wither and die since it's closed-source =(

------
james2vegas
I see Linux and OS X, where are the iOS, Android and *BSD binaries?

------
quotemstr
No source code? I'm not even going to give this thing a glance. There is
nothing so compelling in the database world that I would even be tempted to
give up source access. What an anachronistic throwback this kind of thing is.

~~~
quotemstr
And it's packed using UPX too, presumably to "protect" the program somehow.
Just say no to proprietary software from small vendors.

