tiredandgrumpy's comments

tiredandgrumpy · on April 3, 2014

and now they'll sue you for libel ;-)

tiredandgrumpy · on April 3, 2014

K is only hard if you try to read it without first studying it. Looping is achieved through adverbs. The key to it is understanding what is a noun (data), verb (operator/function) and adverb (takes a verb, creates a new verb to be used infix). A verb with a noun to its right is a dyad if there is also a noun to its left, and is otherwise a monad. If it is needed, the monad can be specified by appending a colon to the right of the symbol. Fortunately, most kdb+ developers program in Q, which has a bunch of helper routines defined in k, and assigns monads to names such as neg x instead of -:x.

tiredandgrumpy · on April 3, 2014

it came up here before, but this time is different. It is now free for commercial use and is not restricted with timeouts or expiry.

tiredandgrumpy · on April 3, 2014

If you look closely at it, there's not much there - it's actually easy to understand - defines a variant struct and an bunch of accessors to the different types within the embedded union. He prefers short names, and finally years later, java recommends short variable names for lambdas too!

ryanobjc · on April 3, 2014

So, I guess I need to look closer than the pixels then:

typedef struct k0{signed char m,a,t;C u;I r;union{G g;H h;I i;J j;E e;F f;S s;struct k0k;struct{J n;G G0[1];};};}K;

Sorry I guess I'm just not seeing the "not much there and actually easy to understand"

Whatever a 'H' is

beagle3 · on April 3, 2014

Well, it is kind of pointless to look at the K<->C interface without knowing K. If you read the Python.h it would be about as understandable (assume you don't know what a "class" is when studying the Python.h file - because K does a lot of things in ways different enough from most languages).

To elaborate: K uses one letter mnemonic codes for all of it's basic storage types:

    G = General = 8-bit unsigned int
    H = sHort = 16-bit signed int
    I = Integer = 32-bit signed int
    J = bigger integer = 64-bit signed int

(Note how G,H,I,J follow each other?)

    E = 32-bit floating point "rEal"
    F = 64-bit Floating point

(Again, they are near each other)

    S = Symbol
    K = "general list type", the central K language type

And that's mostly it; the last unnamed union (with fields "n" and "G0") is for vectors, n being the length and G0 being the data.

The only other field you are ever going to need is "t" for type (saying whether which union member is actually in use). The rest are internal implementation details, but are also easy to remember: r=reference count; u=flags; m and a have something to do with memory mapping and allocation).

There are a few more basic types: b=boolean, t=time,d=date,p=datetime,u=month - but they are merely different interpretations of the EFGHIJSK members above; to access data from C, all you need is the list given above.

ryanobjc · on April 4, 2014

interesting.

My thoughts on this area have changed a bunch, I think when I was young I was a lot more about cleverness and conciseness.

Now that I'm older and I've worked on a large variety of software systems, I am starting to believe that readability of code is one of the most important values. After all, you read the code a lot more than you write it.

I can say definitively that: - i have often regretted using single letter variables (outside of loop 'i') - I have very regretted using non-descriptive names - I have never regretted using longer variable/method names

Now a days in an IDE environment, longer names doesn't even convey a typing penalty. Yeah yeah I know Java, but it's a safe language, and in a world where I want to deliver working, correct, bug free code, safety is more important than single letter expressiveness.

After all, I don't think people hold up APL as good code.

beagle3 · on April 4, 2014

> My thoughts on this area have changed a bunch, I think when I was young I was a lot more about cleverness and conciseness.

I'm almost the other way around. I always valued elegance, which often manifests itself in conciseness (but most conciseness is NOT an example of elegance). Followed by readability, which usually manifests itself in verbosity (though most verbosity is NOT actually readable). And when decision time came, I'd prefer verbose inelegant code to non-verbose concise code.

But then, I spent some time using K. And I realized I need 100-1000 times less lines to achieve the same thing, AND it usually works about as fast (despite an interpreter), AND I have less bugs AND those bugs tend to be of one kind (off-by-one) and easy to find.

e.g., look at this example by Stevan Apter: http://nsl.com/k/t.k - 14 short lines implement an efficient (fast and memory compact) memory database than includes joins, selects, inserts, deletes, aggregates, grouping and sorting.

To the uninitiated, it looks like code golf, but this is actually very readable K if you know K. The definition order:where:{[t;f;o]{x y z x}/[_n;o;t f]} gives two names ("order" and "where") to an idiomatic K expression that takes a table "t", and paired lists "f" (of field names) and "o" (of re-index functions that can be used to filter or sort), and returns the resulting table after reindexing, one by one, applying those functions to their relevant fields. A sorting function would be "desc:>:" (that is, ">:" hereforth also named "desc") which returns a sorting permutation for its argument. A filtering function would be "&3>" (prnounced "where 3 is larger than" and can be written "where 3>" in the Q dialect).

Now, this conciseness does NOT come from golfing. It comes from eschewing the now obligatory object oriented programming, sticking with "down to the metal" data types, and rather than trying to find the minimal base of operations and endless compositions (like most languages do), use a wide base of operations and a precisely chosen set of compositions.

It is true that reading/writing a K line takes 10-50 times as long as reading a C line. But I've been unable to consider modern software engineering anything but ludicrous when a different set of primitives (and experience) can get you the same results for 1/100 or 1/1000 of the size of the executable specification, and everyone puts a SEP field on it.

tiredandgrumpy · on April 3, 2014

This convention is used within the type system throughout kdb+. i for 32bit integer, j for 64bit int, h for 16bit int, e for real, f for float etc. Anyone who knows q will automatically recognize these types in the the c-api. They'll also recognize the ref count r, the type t and experienced c programmers will recognize the trailing array idiom.

igorii · on April 3, 2014

It definitely is strange to look at, but it's quite easy to understand if you know k/q.

H is a short in k/q, so H refers to a short here as well. This naming scheme is true for all of the above letters.

juziozd · on April 3, 2014

This is my favourite:

  // remove more clutter
  #define O printf
  #define R return
  #define Z static

  ...

Removes clutter indeed... :)

rcxdude · on April 3, 2014

Short names make sense if they are easily understandable locally: This means either something extremely common throughout the codebase (I think the most common example being localisation wrappers for string literals. They should ideally be linked to a more clear explanation easily, e.g. from renaming import statements), or defined (clearly) and used only within a very small area of code. This API is neither of those.

silentbicycle · on April 3, 2014

Short names also make sense if the same conventions appear over and over throughout the code base. While they can be quite opaque at first, they're very consistent, and people with experience in other APLs will recognize many of them.