
Ask HN: Why is Python so popular for ML/DS? - arialeks
I&#x27;d like to start learning ML and it seems like Python (and R) is the default choice. But why? Why aren&#x27;t languages like C#&#x2F;Go&#x2F;Java relevant in that area, or at least a compelling option?
======
techjuice
Java, R, Python, C and C++ are massive in machine learning and Data Science.
The reason Python is at the top is because it has the largest set of machine
learning and data analysis tools, tutorials, easy setup and easiest use-
ability and extremely minimal to no boiler plate needed to get results.

------
Sevii
Python was big in scientific computing before ML went mainstream. In college
we were taught using matlab Mathematica and sci-py. So there were a lot of
stats and science libraries available in python. Academics typically don’t
care about software engineering or typing so python is an easy approachable
choice.

~~~
askvictor
Add ipython (now Jupiter) notebooks to this for easy to publish analyses.

------
stealthcat
Prototyping is what you usually do in ML/DS. You code incrementally and break
things really fast.

Python (IPython/Jupyter) and R's interactive mode is really indispensable for
this. Without restarting/re-initialize every variables and modules from
beginning, you can run code snippet, run, edit part of code, run, add some
code, plot something, run, repeat...

~~~
gesman
This + gazillion of optimized math functions and algos are available right out
of the box for all platforms.

------
IpV8
Anecdotally, I have programmed in python, js, java, c#, go, C, C++, ruby, and
php all in a professional environment and I'd have to say I prefer python the
most. It is a very logically made language with a nice balance of abstraction
and expressibility. My general language choosing path is: can I do it in
python? If so, use python. Obviously you need to drop down to lower level
languages for certain cases, but why work with memory management for
application code if you can help it? I guess my point is that if you're going
to be learning ML, you should be thrilled that you have the option of using
python. That said if other languages float your boat, I've certainly done some
ML in c++ with openCV and it was a positive experience. Use whatever you want.

~~~
stefanpie
Python also gives you the advantage to write faster code in C for smaller more
computationally intensive parts of the code and use the C code with python.
This allows you to dip into low level languages like you stated but also keep
all the main functionality in python.

------
eggie5
It's just where a lot of the tools are. And why all the tool are in the python
camp is probably b/c the academic/scientific community adopted the python
ecosystem at a high rate.

~~~
tedmiston
And many in scientific industries came from tools like Matlab backed by heavy
enterprise support contracts. Python was the first real foray into open source
(FOSS) for many of them.

Some adopted R as well but it just didn't take off the same way even though it
really was/is a better fit for some use cases.

------
rpedela
If the question is historical, it is because Python is easy to learn, easy to
read, interfacing with C/C++ is easy to get existing ML/DS code, and it was
one of the first scripting languages to handle huge numbers which is important
in science.

If the question is about now, Python has lots of libraries to help in addition
to the previously mentioned reasons. However Java still has better library
support for NLP overall such as OpenNLP, Stanford CoreNLP, etc. NLP in Python
is catching up though thanks to gensim, spacy, etc.

------
tedmiston
The main reasons Python is so popular in science (ML and before) is:

1\. It's easier to get started than Java, Go, etc.

2\. It's faster to write/prototype. (The IPython REPL and Jupyter notebooks
are awesome.)

3\. The Python community is also very open source friendly and has significant
momentum in third party packages like pandas, numpy, scipy, etc.

Check out some talks from past PyCons and you will see a very strong
scientific presence more so than the other languages you mentioned.

------
est
Python stands on the shoulders of the giants, namely MPI/MKL/BLAS/LAPACK,
which was the core of numpy, scipy and sympy. Maybe even SageMath.

If you are developing ML/DS models, a REPL environment comes very handy, there
is IPython.

You'll find a large chunk of your time is spent on gathering, cleaning data,
which Python really excels.

Make a website that integrates or visualize the data with Python? Sure

You can not find a second language this versatile and has a well supported
ecosystem.

------
solomatov
It's the easiest language to learn from the list you described. C#/Go/Java are
statically typed and compiled, which creates a barrier for learning. R is too
complicated and showing it's age. Python is a sweet spot.

Data scientists aren't interested in learning industrial programming languages
(C#, Java and maybe Go), they just want to do their job, and it doesn't
require industrial language.

------
ralphc
I'd say because Python is a good prototyping language that has easy
interfacing to the underlying C/C++ code used for fast numeric computation.

------
natalyarostova
Going from raw data to pandas driven analysis is pretty seamless.

I can use our production code for heavy analytics or modeling, and just as
easily take the data to my laptop Python interpreter in memory.

------
hahahaha23
To be honest, I don’t understand its popularity.

For example, I was trying to understand a batch normalization function defined
as

def batchnorm_forward(x, gamma, beta, eps):

I can’t tell if gamma/beta are scalar or vector?

~~~
tedmiston
We have type annotations in python now which can be used to clarify data types
in function signatures.

[https://docs.python.org/3/library/typing.html](https://docs.python.org/3/library/typing.html)

