
Dplython: Dplyr for Python - kiechu
https://github.com/dodger487/dplython
======
carljv
This is an interesting project, and it's illuminating to see what it takes to
emulate some R features in Python (custom infix ops, non-standard evaluation,
dataframes as namespaces/envrionments, etc.)

But I feel like it would be better to use method chaining for the piping of
transformation rather than overloading dunder method operators. It would
preserve one of the nice things about dplyr -- composing complicated
transformations from a simple vocabulary, but more pythonic. This is a
relative weakness I see in the design of pandas and would love to see ported
over.

But also, dplyr is a thing that really goes beyond pandas. It's really an
elegant, SQL-like DSL for transforming (mostly) arbitrary data. In this way
it's more like LINQ than a specific implementation/API of a data structure.

------
CurtHagenlocher
It looks like both Dplython and pandas-ply are missing one of (what I think
is) the core value propositions of dplyr: the ability to use the same
abstractions on local data and on remote data, with execution against the
remote source happening lazily such that the entire table doesn't need to be
downloaded in order to run a filter locally.

(Of course, I may be biased in that I work on a commercial product which also
has this characteristic.)

~~~
capybara
Dplython developer here... Definitely a big and important next step. I wanted
to get some feedback on the in-memory case first.

~~~
tanlermin
Blaze has already spent alot of time on this usecase and seems quite promising
for a whole host of other reasons. Perhaps it would be best to combine forces?

Also see this:

[https://github.com/blaze/blaze/pull/484](https://github.com/blaze/blaze/pull/484)

~~~
capybara
The idea of putting the column names into the namespace is very interesting,
and blaze looks pretty promising overall as a way to connect to big data
sources.

------
baldfat
Piping in R is actually from
[https://github.com/smbache/magrittr](https://github.com/smbache/magrittr) and
not Dplyr and is actually inspired from F#.

"R package to bring forward-piping features ala F#'s |> operator."

~~~
andrewla
At least from a historical perspective, dplyr introduced the `%.%` at a time
when Hadley was not aware of the existence of magrittr; later, it became clear
that the work had already been done, so dplyr was retrofitted to use magrittr
and %>%, which was easier to type because you don't have to let go of the
shift key..

That said, the hoops you have to jump through when interfacing with stock R
code and dplyr make me think that having an operator that is less generic than
`%>%` makes sense -- `X %>% foo()` is equivalent to `foo(X)`; except if it
isn't -- `X %>% plot(x~y, data=.)` is not equivalent to `plot(X, x~y,
data=.)`. And sometimes you need tricks like `X %>% { plot(.$x, .$y) }` to
work around the nonobvious behaviors.

I find most impressive the work that this project did to emulate the lazy
evaluation that is a hallmark of dplyr. I wasn't aware that python had
anything like this ability.

~~~
baldfat
Well Python and Panda has been making an R equivalent for a while now, Pandas.
Pandas and other work has made a Python a good choice for doing data science
work (Though I prefer R and started in Python years ago). The Pandas and
Python Data Science developers have been awesome. Now the Python Community not
so much (In regards to flaming R).

------
dandermotj
Looks like Hadleyverse is spreading! Honestly, I think his contributions have
led the way for R to being the leading language in data science.

------
stared
I was not aware it is possible with Python; but now I see:
[http://stackoverflow.com/questions/33658355/piping-output-
fr...](http://stackoverflow.com/questions/33658355/piping-output-from-one-
function-to-another-using-python-infix-syntax)

------
lamecicle
I'm hugely excited by this.

