
Data Analysis with Vector Functional Programming [video] - srpeck
https://www.youtube.com/watch?v=ZGIPmC6wi7E&feature=youtu.be
======
ingenter
> "Only short programs have any hope of being correct"

Right, but I would not minimize the number of _bytes_ in a program, but rather
the number of nodes in AST:
[http://www.paulgraham.com/arcchallenge.html](http://www.paulgraham.com/arcchallenge.html)

Here's an exercise: parse your favorite APL program into an AST, and see how
many nodes it has.

~~~
dang
Almost certainly still relatively few.

There's something to be said for being able to see more code at once, too.
This is something the vector language people emphasize (also Chuck Moore,
IIRC). The APL/J/K style is a different way of both writing and reading code,
and the standard objections ("readability", "that looks like line noise")
mostly are just because of the gap between that experience and the more
mainstream way of experiencing code.

Btw we once had a long HN discussion about pg's suggestion that one should
measure code size in tokens rather then lexically. I remember arguing in favor
but being persuaded out of it by someone who was even more radical about small
codebases than I am.

------
joe_the_user
Has anyone written on the relation between Haskell and similar language and
APL, J and related languages?

~~~
michaelfeathers
I've given some talks making the case that what we are doing with Java
streams, LINQ, Rx, Ruby Enumerable, point-free style in Haskell, etc is
prepping us for array languages.

They are all emphasize transformation pipelines. What the array languages
bring to the game is shape polymorphism and many more operators. I suspect
we'll see both move into more mainstream languages over the next 5 years or
so.

~~~
joe_the_user
Any linkes, video, etc?

~~~
michaelfeathers
There's this one. It isn't deep. More about making people aware:
[https://www.youtube.com/watch?v=UX7xmhpUoi4](https://www.youtube.com/watch?v=UX7xmhpUoi4)

------
textmode
There's another two stories currently on the HN front page about IEX becoming
the 13th stock exchange.

IEX uses k.

------
rar_ram
I like the concepts introduced in the video. But, the Q syntax hurts my eyes.
I am personally, not in favor of fitting my entire program in a tweet.
Readability helps!

~~~
RodgerTheGreat
Readability is subjective and largely colored by your experience. Verbosity or
succinctness isn't the problem here; it's that the syntax and semantics of Q
are unfamiliar to you. I think it's very important to take a moment to step
back and consider how many choices in language design you may be casually
taking for granted.

Many mainstream languages today- Java, Python, Ruby, C, PHP etc.- have very
similar core semantics and syntax. Choices of keywords and type systems
differ, but loads of ideas work the same, especially when you consider
idiomatic everyday code- for loops, scalar variables, some superset of the
algebraic rules for operator precedence you learn in math class. Whitespace in
an expression tends to be irrelevant, but sometimes its absence is
significant. To add a scalar to a list, you use a loop. Assignment operators
flow right to left. Lists tend to be indexed from zero. How many of these
choices are _essential_ and how many are arbitrary? All of these languages are
descendants of FORTRAN and ALGOL, sometimes with some ideas from the Lisp
family thrown in. They share a common heritage.

Q, K, J, A+ and APL represent an entirely parallel course of evolution. Within
this family there is a great deal of mutual intelligibility. I'm very familiar
with K, but the Q dialect doesn't surprise me. When learned, it's familar-
readable. What did source code look like to you before you learned to program?

The APL family isn't amazing because programs tend to be short; that's a
_side-effect_ of the positive properties of these languages. They teach you to
write naturally parallel solutions to problems and offer a simple, consistent
way to compound together and apply functions to large structures at once.
Please don't say "this looks different than I'm used to" and then close your
mind to what the paradigm has to offer.

~~~
lqdc13
Debugging poorly written or convoluted K programs is much more complicated
than debugging equivalent C/C++/Java/Python programs.

It's decent for prototyping though, although I bet most people would prototype
just as fast or faster in Python.

~~~
RodgerTheGreat
I'll grant that the debugging story for K isn't as sophisticated as with many
other languages. Having a REPL and preferring programs that are mostly pure
functions helps, but there's room for improvement. Tooling is about the
community, though- not the language. I think that making documentation,
screencasts and generally encouraging the expansion of K's open-source
ecosystem will, with time, close that gap.

------
IndianAstronaut
This is functionally similar to dplyr in R. Although the more SQL like syntax
of dplyr is much more handy.

~~~
michaelsbradley
And when the data gets bigger, there's data.table[1], which performs amazingly
well at certain tasks (vectorized ops ftw!), though the syntax can get a
little clunky (if you squint at it hard, it's SQL-ish). On my 2012 macbook
pro, I'm able to do (some) transformations of tables containing 10s of
millions of rows in only a few seconds (and sometimes faster).

It's possible to use dplyr and data.table together, as well, to good
effect[2].

[1]
[https://github.com/Rdatatable/data.table/wiki](https://github.com/Rdatatable/data.table/wiki)

[&]
[https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A...](https://github.com/Rdatatable/data.table/wiki/Benchmarks-%3A-Grouping)

[2] [http://stackoverflow.com/questions/21435339/data-table-vs-
dp...](http://stackoverflow.com/questions/21435339/data-table-vs-dplyr-can-
one-do-something-well-the-other-cant-or-does-poorly/27840349#27840349)

[&]
[https://twitter.com/hadleywickham/status/553169339751215104](https://twitter.com/hadleywickham/status/553169339751215104)

~~~
IndianAstronaut
Are you able to load data sets into data.table which are larger than memory?

~~~
michaelsbradley
AFAIK, with data.table it's all in-memory; whereas dplyr has the option of
working with a database backend.

On the other hand, data.table has robust support for modify and update ops by
reference, which can be a big performance saver.

------
nickpeterson
The problem with all varients is the incredibly minimal resources to learn
them. I wish a skilled practitioner would do something larger than a blog
post. Something like a pluralsight/oreilly course...

~~~
srpeck
There is a lot of material on kdb+ on kx's website
([http://code.kx.com/wiki/Tutorials](http://code.kx.com/wiki/Tutorials)), Q
for Mortals
([http://code.kx.com/wiki/JB:QforMortals2/contents](http://code.kx.com/wiki/JB:QforMortals2/contents))
being the best introductory book.

Also some good resources in this Quora: [https://www.quora.com/What-are-the-
best-resources-to-learn-q...](https://www.quora.com/What-are-the-best-
resources-to-learn-q-KDB+)

And if your interest is in the k language more than kdb+/q, then I have found
the docs in John Earnest's ('RodgerTheGreat) oK interpreter a succinct,
example-focused introduction: [https://github.com/JohnEarnest/ok/blob/gh-
pages/docs/Manual....](https://github.com/JohnEarnest/ok/blob/gh-
pages/docs/Manual.md) Plus using his browser-based REPL
([http://johnearnest.github.io/ok/index.html](http://johnearnest.github.io/ok/index.html))
may lower the barriers to entry, and iKe
([http://johnearnest.github.io/ok/ike/ike.html](http://johnearnest.github.io/ok/ike/ike.html))
is great for
experimentation...[http://johnearnest.github.io/ok/ike/ike.html?gist=9c5f43baa4...](http://johnearnest.github.io/ok/ike/ike.html?gist=9c5f43baa443cccdb94e939893245aa2)

------
codygman
Is there anything like Q that is open source? J seems kind of similar.

~~~
ksherlock
There's an open source version of K(3) here:
[https://github.com/kevinlawler/kona](https://github.com/kevinlawler/kona)

~~~
codygman
Thanks! This is nice. I'm playing with J though since the resources seem very
nice.

------
haddr
It's brilliant language, and all, but I don't buy it. Here's why:

1) languages with this high level of abstraction are very nice if your
scenario maps perfectly to its usage (e.g. the wikipedia analysis given in the
video). Everything is nicely vectorizable, etc. But if there is some quirk in
your data, then sometimes you need to go the usual way, and the Q is of no
help (no better than any other language), with the difference that now you do
some very inefficient things with those "stinky" loops.

2) it's a question of taste, but I find Q syntax a bit unusual. Probably more
time you need to think how to fix your simple problems with this clever one-
liners than simply, well solving them...

3) legibility: for all of us working in software developement, we know how
much time we waste due to illegible code. Finding bugs, etc. Here this is
raised to the new level... of difficulty

4) this is a bit exaggerated, but I don't see how I could use Q in something
bigger? Is Q only a scripting languages for one-off mini-batch programs? For
instance R has this problem of not having any well-defined project structure,
and it is hard to do many things, for instance: bigger programs are hard to
maintain and debug, stream processing with R is pain in the ass. Server-side
stuff is a little bit shoehorned (Shiny server is cool, but then it's just
one-thread thing for serving filtered dataframes to ggplots)

It is cool niche language, but for smart simple analyses, nothing too complex,
as it will abstract you from details to your loss.

~~~
PeCaN
Regarding 2 & 3, I assume you don't know Q or any other array language. If you
do, it's not really any less readable than any other language. Finding bugs
isn't particularly hard due to how friendly the languages are to REPL
development and how easy it is to trace array transformations.

I do agree with 1 & 4 though—Q (and to a lesser extent, K, APL, and J) are
niche languages. However, they're really, really good at their niche.

~~~
haddr
It has nothing to do with array language, R is vectorized and is slightly more
legible than, say clojure, that is not array lang. It has to do with very high
density, though.

------
leephillips
Introduces Q, which is a proprietary, more verbose wrapper around K, another
proprietary language, inspired by APL.

