Hacker News new | past | comments | ask | show | jobs | submit login

Don't forget that the higher level functionality (e.g. the scikit-learn routines Radim uses) are typically wrappers for underlying C/Fortran routines and they're the real bottleneck. The relatively few lines of VM'd Python are 'slow' compared to e.g. C but aren't the bottleneck.

The win with Python (and other dynamic languages) is that you can experiment quickly with ideas when you're formulating a solution, that's a big part of exploratory data science.

If you're curious about high-speed work in Python - Radim did a blog series on how he sped up word2vec to be faster than Google's original C code: http://radimrehurek.com/2013/09/deep-learning-with-word2vec-...

I'll also note [self promo!] that I wrote on book on High Performance Python, if that's your cup of tea (and Radim wrote a section in it): http://shop.oreilly.com/product/0636920028963.do




(tutorial author here) Good answer, and I can only recommend Ian's book!

I cut the marketing speak down to minimum in my articles and tutorials, but if you're interested in cutting edge machine learning & no-nonsense data mining, get in touch! I run a world class consulting company, http://radimrehurek.com.


The win with Python (and other dynamic languages) is that you can experiment quickly with ideas when you're formulating a solution, that's a big part of exploratory data science.

And in my experience, very hard to reproduce after a couple of years. With enough discipline, it's obviously possible to make well-structured Python programs that will last. But in practice that rarely happens with scientific software written in Python. Usually, there are many external dependencies, it's fragile (no static type checking), and platform-dependent (usually OS X or Linux). To add to the mess, most scientists like to hardcode paths to the input data, etc.

Although I am not a fan of Java, I usually don't encounter the same problems with older scientific Java software. If it's Mavenized you are usually ready to go after a 'mvn compile', otherwise, you just dump the project structure in an IDE and it usually works.

(The plague with scientific software in Java is that it is often not thread-safe.)

Also, I think the quick experimentation is not limited to Python and statically typed languages with a REPL can also provide that (Haskell, OCaml, Scala). And since Go was mentioned: since compilation time in Go is usually near-zero, it's the same.


> And in my experience, very hard to reproduce after a couple of years.

Well, let's be honest with ourselves... this isn't limited to Python. Scientific code that isn't a mess is almost nonexistent. For a lot of scientists, writing code is totally secondary and many simply aren't skilled programmers (nor should we necessarily expect them to be).

It is however deeper than that. As a graduate student, I was involved in a government initiative to write a high quality large scale code package. This was (still is, the program just got extended) a well funded and well organized effort with hundreds of people, including literally dozens of people who can legitimately claim to be the best in the world at their specialties. This included some genuinely amazing computer scientists and software engineers who enforced well planned coding practices.

And yet, the code is still far from ideal. A big part of this is its scale - millions of lines of very technical numerics code and libraries all working together. Most of what I consider to be the toughest work was on integrating various disparate pieces and unifying them under one common input structure.

Point being, even with effectively unlimited resources using rigorous development standards and statically typed languages (primarily c++11) there are still tons of issues. A lot of it is because of incorporation of older codes, which is inescapable in any non-trivial scientific code.


I'll also note [self promo!] that I wrote on book on High Performance Python

I've really enjoyed this book so far, so thanks!


Glad you enjoyed it :-) If you have a moment, leaving a review (e.g. on Amazon) would be most appreciated (there's a dearth of views as it is a bit of a niche subject!)


Nice! Just bought your book! :)


Also you've given some amazing talks in various PyCons!


Much obliged (assuming that's directed at me!) - Radim's started with some rather nice talks too :-)




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: