Hacker News new | past | comments | ask | show | jobs | submit login

This. I’ve contributed code to popular libraries in both languages, and while I (overall) have a preference for python (mostly due to it being general purpose), I find R code unparalleled when it comes to raw data manipulation/analysis.

The overall api of tidyverse packages is such a joy, and recent improvements in purrr/tidyr allow me to construct nested data analysis workflows I couldn’t even dream of in python.




One random example I found recently is a tidyverse package called forcats that has lots of nice functions for categorical data. For example, it has a single function that merges all categories with a frequency of less than a certain threshold in the table into a new category like "other" or whatever. This is a task I often need to do, but as far as I can see it's a bit of a hack in python or pandas. It's just lots of little things like this, especially wrangling data tables.

https://forcats.tidyverse.org/reference/fct_lump.html

There's also the data.table package for this kind of data work, which is maybe less used but seems to have better performance.


Would you have an example of that?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: