Hacker News new | past | comments | ask | show | jobs | submit login

How does Pandas compare to R's Tidyverse?

Tidyverse was super easy to pick up, and I can do almost anything I want with. Why would I want to switch to Panda?

Has anyone tired the python tydiverse port? How does it compare to the original?

Echoing other comments, Tidyverse is somewhat more coherent (aided significantly by magrittr's %>% operator). Beginners might get tripped up by Non-Standard Evaluation (NSE), which is a little unintuitive, but there are packages to help with that.

The Pandas's API is a generalized solution to complicated, variegated use cases and its syntax reflects that (it was also hemmed by strictures of Python). There are several indexing methods, several ways to slice, several ways to do apply's, all of which behave slightly differently. Even expert Pandas users have trouble remembering the syntax for all of these, so they typically have a Pandas API browser window open or a printed cheat sheet pasted on some corkboard. Pandas definitely takes longer to get used to than Tidyverse but the payoff is that you get to use Python, which is a somewhat "deeper" language than R.

R is great for interactive work, and for data munging jobs that don't interact too much with non-R libraries. However Python is sinply more versatile end-to-end.

I used to start my interactive analysis in R and port to Python for production, but these days I start in Python straight away so there's no impedance mismatch. I've personally found that writing production code in Python (and by extension Pandas) to be much more pleasant than in R, even with Tidyverse.

The Tidyverse is more coherent and is generally bigger than what’s just in Pandas (R’s Tidyverse; I haven’t used the Python port).

If you already have a good grasp of Python, sure why not learn Pandas too? In my case, I’m reasonably ambidextrous in Python and R but find myself not reaching for Python unless there are colleague / deployment considerations that remove R as an option. The reason? R’s Tidyverse is pretty awesome, and reflects one of the better parts of the R language, namely the meta programming that is a holdover from Scheme’s influence on R.

Now, if you don’t already know Python and don’t have some other reason (such as specific deployment considerations or a team of Python collaborators) to learn? I don’t think so. Python is a fine language, just as R is a fine language. You’re already getting things done in R.

If you want a mental challenge, or to get in on the ground floor of something that might be the future, learn Julia, or F#, or (my favorite) Racket. Or heck, learn Spark, or a new modeling method.

Pandas' syntax and conventions are significantly more cumbersome than R, but it does pretty well given the Python syntax and convention that it has to work with. I haven't done a lot with pandas because of how difficult it is to remember the syntax and API, but I feel it's good enough that if you're already a Python user, you can stick to doing your data work in pandas rather than move over to R.

I haven't used tidyverse myself, but I know that pandas is heavily influenced and inspired by R. Most analysis tasks are doable in both platforms. If later stages of your pipeline involve deep learning (or machine learning, generally), then it could pay to be in the python universe given the wide adoption of python ML/DL tools. I generally wouldn't advise switching unless you have a certain pain point, though.

> pandas is heavily influenced and inspired by R.

Is it? How so?

I use both: Python/Pandas for working with production code and pipelining TensorFlow/Keras code, and R/tidyverse/ggplot2 for ad hoc data reports and visualizations. They both have their advantages and disadvantages and it doesn't hurt to know both workflows.

I find pandas far easier to actually program with, whereas the tidyverse is better for quick one-off scripts. The tidyverse and its obsession with non-standard evaluation, makes writing functions more difficult than it should be, and readability goes out the window when using tidyeval.

Neural net universe is in Python and you can use Python to build production pipelines.

Pandas is inspired by R's dataframes, which I'm told are native.

Native doesn't necessarily mean it's the best option. (tidyverse/dplyr leverages Rcpp for data transformation, which makes it a lot faster at common ETL tasks)

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact