I also appreciate your idea of porting dplyr to python, keep up the good work :)
This table sums up some of it:
operation | time
apply score + 1 | 30s
apply score.values + 1 | 3s
transform score + 1 | 30s
transform score.values + 1 | 20s
It seems to me that pandas is simply a leakier abstraction than dplyr, data.table etc. As a user of the library in most instances you shouldn't have to profile your code to figure out why things behave the way they do (btw, thanks for pointing out snakeviz - it seems like a useful tool).
This being said, we shouldn't complain too much about pandas - it is in the end a very important and useful tool.
I also appreciate your idea of porting dplyr to python, keep up the good work :)
This table sums up some of it:
operation | time
apply score + 1 | 30s
apply score.values + 1 | 3s
transform score + 1 | 30s
transform score.values + 1 | 20s
It seems to me that pandas is simply a leakier abstraction than dplyr, data.table etc. As a user of the library in most instances you shouldn't have to profile your code to figure out why things behave the way they do (btw, thanks for pointing out snakeviz - it seems like a useful tool).
This being said, we shouldn't complain too much about pandas - it is in the end a very important and useful tool.