
Python, Machine Learning, and Language Wars (2015) - dekhtiar
http://sebastianraschka.com/blog/2015/why-python.html
======
gtrubetskoy
I'm surprised the article doesn't mention Anaconda, which is Python with all
the things he lists pre-installed for you. I've been a fan for some time now:
[https://www.continuum.io/why-anaconda](https://www.continuum.io/why-anaconda)

~~~
alceufc
I also think that Anaconda is great. However, I hope that in the future we
could install numpy, matplotlib, jupyter, etc. just using pip.

~~~
zo1
I did that earlier today:

pip install jupyter

pip install numpy

pip install scipy

pip install scikit-learn

pip install matplotlib

The only problem I had was with OpenCV, which requires manual make
installation if you want the contrib package. The other problem was when
trying to install scikit-learn, it requires manual pip installation of scipy.

~~~
timClicks
I would always be worried with installing numpy/scipy via pip that I wouldn't
be linking to BLAS/LAPACK correctly.

~~~
rhodysurf
What do you mean? I have always installed via pip with zero errors on all
three platforms.

~~~
moyix
The worry is not that it will throw an error, it's that it will silently link
against lower-performance math libraries and make your code inexplicably slow.

------
faaef
Buddha about Language Wars:

Then the Buddha gave advice of extreme importance to the group of Brahmins:
'It is not proper for a wise man who maintains (lit. protects) truth to come
to the conclusion: "This alone is Truth, and everything else is false'.' Asked
by the young Brahmin to explain the idea of maintaining or protecting truth,
the Buddha said: ' A man has a faith. If he says, "This is my faith", so far
he maintains truth. But by 9 that he cannot proceed to the absolute
conclusion: "This alone is Truth, and everything else is false". In other
words, a man may believe what he likes, and he may say 'I believe this'. So
far he respects truth. But because of his belief or faith, he should not say
that what he believes is alone the Truth, and everything else is false.

(from "What the Buddha taught, a really great book!)

~~~
platz
> 'I believe this'. So far he respects truth

What if he believes is objectively wrong?

~~~
faaef
He only talks about wise men who protect truth, so the truth being true is a
given I'd say.

EDIT: Of course he's talking to people who swear by the truth they protect.
Instead of telling them they're wrong, telling them others might be right is
far more likely to get them to consider his point of view -- that other
"truths" are just as equal.

------
p4wnc6
I too am a Python-preferring machine learning engineer, but my reasons are
almost precisely the opposite of the post's.

1\. No matter how much you ever think, as a scientist, that you "only do an
analysis one time" it is false 99.99999% of the time. You will always want to
run it multiple times. Other people will want help modifying and running
variations of it. Employers will need you, the scientist, to "productionize"
it and make it suitable for automated deployment, probably cross-platform.

2\. Your analysis will have to adapt to changing data inputs, which means you
invariably have to create a (well-designed, unit-tested, and best-practices
compliant) tool kit for custom data cleaning, pre-processing, database I/O,
file system I/O, and visualization.

3\. You will inevitably need to be concerned with raw-metal performance, but
generally in isolated pockets of your code, so you'll need a language like
Python that supports targeted performance optimization with tools like Cython.

4\. Code is read (especially by newbies who need your help) much more than it
is written, so you need a language that is easy to explain and reason about,
with very few syntactical tricks and complicated conceptual nuances.

Overall, Python suits this niche very well. It is a full-service object-
oriented language with a huge and well-maintained standard library. The third
party tools for machine learning and general numeric computing are by far the
best in the open source world (apart from a handful of boutique R libraries,
which can use via rpy2 anyway), and Python is a simple language that is easy
to teach and explain but also supports lots of targeted optimization in the
CPython layer.

From the very first line of code you write, when you still naively believe "I
will only run this once and I just need to crank it out," you need to be
obsessed with writing well-designed, extensible, unit-tested code that is only
a short distance from already being "production ready" \-- and Python is a
great language choice for this.

------
danso
As someone who's switched from Ruby to Python (for now, because the latter is
far easier to teach, IMO) and also put significant time into learning R,
because of how strong ggplot2 is...I was really surprised at the lack of
Google results for "switching from python to r" \-- or similarly phrased
queries to find guides on how to go from Python to R...in fact, that
particular query will bring up more results for R -> Python than the other way
around (e.g. "Python Displacing R as The Programming Language For
Data")...Talk of R is so ubiquitous in academia (and in the wild, ggplot2
tends to wow nearly on the same level as D3) that I had just assumed there was
a fair number of developers who have tried jumping into R...but there
aren't...I think minimaxir's guides are the most and only comprehensive how-
to-do-R-as-written-by-an-outsider things I've seen on the web [1]. But by and
far the common scenario is that of the author's: "Well, I guess it’s no big
secret that I was an R person once"

That said, one of the things I've appreciated about R is how it "just
works"...I usually go through Homebrew, but RStudio works just as well. I can
see why that's a huge appeal for both beginners and people who want to do
computation but not necessarily become developers.

Also, I used to hate how `<-` was used for assignment...but now, that's one of
the things I miss most about using R...I've grown up with single-equals-sign
assignment in every other language I've learned, but after having to teach
some programming...the difference between `==` and `=` is a common and often
hugely stumping error for beginners. Not only that, they have trouble
remembering how assignment even works, even for basic variable
assignment...I've come to realize that I've programmed so long that I
immediately recognize the pattern, but that can't possibly be the case for
novices, who if they've taken general math classes, have never seen the equals
sign that way. The `<-` operator makes a lot more sense...though I would've
never thought that if hadn't read Hadley Wickham's style guide [2]

[1] [http://minimaxir.com/2015/02/ggplot-
tutorial/](http://minimaxir.com/2015/02/ggplot-tutorial/)

[2] [http://adv-r.had.co.nz/Style.html](http://adv-r.had.co.nz/Style.html)

~~~
CalRobert
You know, I remember when I was trying my first language other than BASIC
(VB6, perhaps? or maybe 1995 era JS?) and it bugged me that "x = y" wasn't the
same as "y = x". Remembering it as "LET x = y" was helpful.

~~~
ktRolster
People have been upset about that since at least the 1950s. A lot of people
who make programming languages use the := for the assignment operator instead.

~~~
RodgerTheGreat
Helps when you prounounce the assignment operator as "becomes" rather than
"equals", too.

------
akshayB
There are lot of options when it comes to machine learning frameworks open
source and commercial as well. Also many of these frameworks are designed to
solve a specific problem. Some of the machine learning frameworks are
optimized to run on certain types of hardware. In my opinion selecting a
machine learning framework depends on the technology stack of your company
because it makes lot of sense to leverage existing system rather then
developing everything on an entirely new infrastructure and language.

------
stared
Previous submission:
[https://news.ycombinator.com/item?id=10113413](https://news.ycombinator.com/item?id=10113413)

------
spot
why pick just one language? with the polyglot Beaker Notebook, you can work
with many languages, even in the same notebook, and your data is automatically
translated between them.

each of these languages has its strong point. there is always some library you
want to use in some other language. or you want to collaborate with someone.
or next year you change your mind and Julia is finally good enough.

[http://BeakerNotebook.com](http://BeakerNotebook.com)

~~~
stuartaxelowen
Knowing other languages is great, but there is very real overhead in learning
them. Python is mature enough that you have mature libraries available for
pretty much everything.

~~~
spot
believe it or not, there are some people who know R and face overhead to learn
Python. and they love ggplot2.

and frankly, Python has no libraries for interactive visualization in your
browser because only JavaScript runs there.

etc.

~~~
evanpw
[http://bokeh.pydata.org/en/latest/](http://bokeh.pydata.org/en/latest/)

"Bokeh is a Python interactive visualization library that targets modern web
browsers for presentation."

(Of course, it works by generating JavaScript)

~~~
spot
right, and in order to really customize what bokeh does you need to write JS
in strings in your python code, so this just proves my point.

[http://bokeh.pydata.org/en/latest/docs/user_guide/interactio...](http://bokeh.pydata.org/en/latest/docs/user_guide/interaction.html#userguide-
interaction) see the CustomJS function.

~~~
tanlermin
No you don't. It now does python to JS compilation. And that's if you want
client side callbacks.

------
gavinh

       I know what you want to ask next: “Okay, what about turning my model into a nice and shiny web application? I bet this is something that you can’t do in R!” Sorry, but you lose this bet; have a look at Shiny by RStudio A web application framework for R.
    

True, but it is a dumpster fire.

------
knite
Can we get "(2015)" on the title?

