Hacker News new | comments | ask | show | jobs | submit login

I like Python better as a language, but Python's libraries take more work to understand and the APIs aren't very unified. R is much more regular and the documentation is better. Even complicated and obscure machine learning tasks have good support in R. BUT the performance for R can be very, very annoying. Assignment is slow as all hell and it can often take work to figure out how to rephrase complicated functions in a way that R can figure out how to do efficiently. I think being much more functional than Python works well for data. I mean the L in LISP stands for list! Visualizations are also easier and more intuitive in R, too, IMO. Especially since half the time you can just wrap some data in "plot" and R will figure our which one it should use.

I think the conclusion of the article is correct. R is more pleasant for mathier type stuff, while Python is the better general-purpose language. If your jobs involves showing people powerpoint presentations of the mathematical analysis you've done,you'd probably want to use R. If, on the other hand, you're prototyping data-driven applications, Python would probably be better.

That said, I really like Julia, but can't justify really diving into it at this point. :\




> prototyping data-driven applications, Python would probably be better

I would disagree. Python's libraries are really reimplementing R in Python (Mainly Pandas). I find R to be very flexible and especially in the last 5 years with Hadley Wickham's libraries things are concise and very powerful.

Please look at dplyr and see how this new way fo doing R works. Especially with piping with %>%. https://cran.rstudio.com/web/packages/dplyr/vignettes/introd...

Code in R can look like this beautiful code (If you don't code in R and I would expect anyone can see what is happening) This is why I disagree that prototyping in Python would be better.:

flights %>% group_by(year, month, day) %>%

  select(arr_delay, dep_delay) 

  summarise(

    arr = mean(arr_delay, na.rm = TRUE),

    dep = mean(dep_delay, na.rm = TRUE)) %>%

  filter(arr > 30 | dep > 30)

Python has .pipe but I find it strange it goes to the new line before the items.

Python Code: >>> (df.pipe(h)

... .pipe(g, arg1=a)

... .pipe((f, 'arg2'), arg1=a, arg3=c)

... )


I find the following Pandas code pretty easy to read:

  (df
   .groupby(['a', 'b', 'c'], as_index=False)
   .agg({'d': sum, 'e': mean, 'f', np.std})
   .assign(g=lambda x: x.a / x.c)
   .query("g > 0.05")
   .merge(df2, on='a'))
There are now methods in pandas to do pretty much anything, so you can chain them together into one easy-to-read manipulation without lots of intermediate variables.


> R is much more regular

Compare scikit learn to other a large number of R libraries with incompatible interfaces. In this respect Python is more regular.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: