What is your tool of choice for interactive data exploration?
I've been looking for something similar for my phone, a sort of calculator-on-steroids even if not web-scraping ready. J for Android fits the bill perfectly... except the language itself seems to get in the way with its short mnemonics, unrememberable verb train rules, function composition words and other weirdness in the construction of tacit phrases. I've (partially) learned and forgotten J three times already. Every single time I come back is like starting anew.
I tried Clojure too but the Android REPL is stuck on Clojure 1.4.0 and the lack of an S-expression enabled keyboard/editor makes it a pain to code.
Any recommendations in this area?
The browser security model severely limits potential for doing web scraping with it, but I have found it pleasant to use so far.
Docs can be found here: https://github.com/JohnEarnest/ok#mobile
As for handling non DSV data... I dunno. Lisp is pretty nice for this, but it's a pain to get the data into the system. TXR and its lisp dialect are pretty good with this, so that's an option.
But if you really want to handle tabular data interactively, doing manipulation and analysis on the fly... that's quite literally what Excel was designed for.
However, it's a lot slower than GNU Awk.
You don't get the duck typing of Awk (whereby strings that look like numbers can be treated arithmetically, and nonexistent or blank variables are zero, et cetera). To convert fields to numbers with reasonable succinctness, an "awk local macro" called fconv (field convert) is provided.
There is no function-for-function compatibility with the Awk or GNU Awk library, and regular expressions are TXR: that means no register capture, and ^ and $ done with functional combinators.
"Prior art" I was well aware of is cl-awk.
This awk macro was intended to be a detailed implementation of the Awk paradigm (with some GNU extensions), complete with obscure features like assigning to the nf variable to change the number of fields (extending with empty strings if the new count is larger), and a working regex-based record separation via the rs variable.
I see Shrivers implemented something that's on my TODO list: variants of ranges that exclude the start, end or both. However, he didn't lift the restriction that Awk ranges do not combine with other conditions, including ranges.
> I often find myself in the browser console, exploring data scraped off webpages (often from Wikipedia tables)
Are you aware of Wikidata? I don't have a lot of familiarity with it, only that some (a lot?) of the programmer types who were drawn to Wikipedia in its infancy have shifted their focus to Wikidata nowadays. As I understand it, Wikidata is meant to be the data source behind those tables that you're scraping.
(Forgive me if you're already involved with Wikidata and one of your primary uses for scraping is in pursuit of migrating content to Wikidata.)
arange(6) # Same as ⍳5
add.reduce(arange(6)) # Same as +/⍳5
add.accumulate(arange(6)) # Same as +\⍳5
From what I can tell, the core learning one takes from J/K/etc is the concept of an array language and treating arrays as first class objects. And nowadays Python/R/Julia provide this same functionality in a more verbose/more readable/more mainstream package.
That's just part of it. Other parts (by no means a complete list):
- Visual density puts the pattern matching engine in your brain to good use; e.g. the K idiom "|/0(0|+)+\" which implements maximum-subsequence-sum is a 10-character visually recognizable sequence recognizable to many K programmers, which is efficient and well specified; Similarly, ",//" flattens a list. I don't know of any other language family that has this feature.
- Arrays with equivalent indices work better than structures ("column major" captures some, but not all of it)
- A properly selected core set of concrete operations (in K's case, about 60 of them) makes the needed abstractions shallower and simpler than a properly selected set of abstraction constructs.
But your second point is definitely something that Python/R/Julia do - that's why data frames (e.g. Pandas) are so popular. Can you suggest something that K does beyond this?
By the way, I'm not asking to criticize K. I really want to know of there is something I'm missing. I've investigated J a bit, but never found the enlightenment that people say I'll find.
I'm not familiar enough with pandas to say for sure that K is way better - but I was familiar enough with numpy back in the day, and that was definitely way too clunky and cumbersome compared to K.
But look at e.g. advent-of-code solutions on https://kx.com/2016/01/04/puzzle-solutions-for-advent-of-cod... - I suspect even pandas won't be able to use vector correspondence so succinctly.
Where the manuscript writer (or print publisher) chose to put bar lines, for instance, can give clues as to intent in performance. I'm thinking specifically about Venetian publishers in the 17th century -- but there are plenty of other examples.
Notation can be incredibly important when trying to discern intent in some disciplines.
The BC in this particular instance might only have barlines every 8 measures. The soprano parts might have barlines every 8 measures too, but might also have them where a significant change to their melodies change. Like, when a flurry of 16th notes begin to crop up.
Now, this isn't always the case -- it's definitely inconsistent, but such things show up enough that those hints cannot be discounted at all, and seem to be enough of a performance hint to at least be noted and appreciated by the performer.
One could interpret placement of barlines as something close to paragraph markers -- not all the time, but often enough that they seem to indicate specific intent.
APL's unofficial slogan, by the way, is "and you thought Perl was executable linenoise." :-)