Hacker News new | comments | ask | show | jobs | submit login

Python's main problem is that it's moving in a CS direction and not a data science direction.

The "weekend hack" that was Python, a philosophy carried into 2.x, made it a supremely pragmatic language, which the data scientists love. They want to think algorithms and maths. The language must not get in the way.

3.x is wanting to be serious. It wants to take on Golang. Javascript, Java. It wants to be taken seriously. Enterprise and Web. There is nothing in 3.x for data scientists other than the fig leaf of the @ operator. It's more complicated to do simple stuff in 3.x. It's more robust from a theoretical point of view, maybe, but it also imposes a cognitive overhead for those people whose minds are already FULL of their algo problems and just want to get from a -> b as easily as possible, without CS purity or implementation elegance putting up barriers to pragmatism (I give you Unicode v Ascii, print() v print, xrange v range, 01 v 1 (the first is an error in 3.x. Why exactly?), focus on concurrency not raw parallelism, the list goes on).

R wants to get things done, and is vectors first. Vectors are what big data typically is all about (if not matrices and tensors). It's an order of magnitude higher dimensionality in the default, canonical data structure. Applies and indexing in R, vector-wise, feels natural. Numpy makes a good effort, but must still operate in a scalar/OO world of its host language, and inconsistencies inevitably creep in, even in Pandas.

As a final point, I'll suggest that R is much closer to the vectorised future, and that even if it is tragically slow, it will train your mind in the first steps towards "thinking parallel".




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: