
StructuredQueries.jl – A generic data manipulation framework - one-more-minute
http://julialang.org/blog/2016/10/StructuredQueries
======
cwyers
So, it seems reasonably dplyrish and I like that there's a pipe-forward
operator available (and I do like the look of the F#/Erlang style |> more than
I do the %>% one R gets).

This seems like it's written for someone who has a bit more of familiarity
with Julia than I possess, though, so is anyone able to give this some
context? Why does this need to be implemented with the @query macro rather
than just being implemented with functions like dplyr is?

~~~
CurtHagenlocher
You can use functions in R because R is a crazy language that lets functions
access the expressions which are passed as parameters and not just the values
of the arguments.

~~~
cwyers
You mean like this?

[http://adv-r.had.co.nz/Computing-on-the-
language.html#nse](http://adv-r.had.co.nz/Computing-on-the-language.html#nse)

So, this use-case would seem to point out one advantage of being a "crazy
language" in this case. What are the downsides of NSE?

~~~
one-more-minute
It's a cool feature which enables some really nice APIs, but it also has the
potential to break your mental model when reading code. For example:

    
    
        x <- f(a)
        y <- g(x, z)
    

vs

    
    
        y <- g(f(a), z)
    

It would be extremely unexpected if the result `y` was different for these two
expressions, but you can't know for sure without reading `f` – since it could
be using NSE. This applies to any innocent-looking function in any library you
use – imagine trying to debug an issue caused by something like this.

It's also awful for performance. Allowing functions to peek at their arguments
at any time is one of the things that makes R virtually impossible to optimise
effectively.

Of course, libraries like dplyr also show that it can be used to great effect
in the right hands.

~~~
GlennS
I think this is a big part of why R is such a success amongst data analysts.

A lot of R libraries have a really good human interface which you can play
around with and it will successfully DWIM a lot of the time.

Once you start trying to drive R in a more automatic way with your own
programs, though, it starts to get really messy. I wouldn't want to make even
a medium-sized program in it (10,000 lines would be too, for example). Then
it's time to switch to Python with numpy.

I wonder if Julia will be able to cover both types of program effectively.
There's definitely a niche for it if it can.

------
olvar_
It could be that I'm misunderstanding the problem, but I don't think that, in
general, makes a lot of sense to define functions acting on NA values. What
you really want are functions acting on arrays that may be missing some
values, but not necessarily functions that can act on those missing values.
This is because the NA values are an effect of the data tabulation, not some
deeper mathematical necessity. In that sense, wouldn't it be much easier to
endow every array with another list with the places where values are not
available and define procedures that use other underlying functions to compute
the results while filtering out the NAs? For instance, you could have two
"mean" functions, one that filters NAs and then returns the mean of the rest,
and another that returns the mean of the whole array and if there is an NA it
just returns NA. Both functions could use the mean(x::Number) function but
just wrap it in a different way. Or is this missing the point completely?

