Hacker News new | past | comments | ask | show | jobs | submit login
Introduction to NumPy (python-course.eu)
131 points by sebg on Feb 5, 2013 | hide | past | favorite | 21 comments



Well, this is pretty basic stuff, for people interested a little bit more in NumPy I recommend SciPy lectures: http://scipy-lectures.github.com/intro/numpy/index.html http://scipy-lectures.github.com/advanced/advanced_numpy/ind...


At the moment, scipy-lectures is probably the best comprehensive introduction to the scientific python ecosystem.

If you're looking for a more detailed overview of just numpy, the original numpy book (http://www.tramy.us/numpybook.pdf ) is quite good, as well.


Wow, great links. Been working with NumPy recently and I'd struggled to find a good introduction. Straight away these docs are useful:

np.lookfor('create array')


For people who are not satisfied with knowing just Numpy and want to delve into its applications, I would recommend Wes McKinney's book "Python for data analysis":

http://shop.oreilly.com/product/0636920023784.do

I have the book and its been great reading so far. Ch 4 gives a nice introduction to Numpy (about 30 pages). Concise but also useful for immediate real-world usage.


Beyond that, pandas (Wes McKinney's library that the book partly focuses on) is really, really fantastic. It has rapidly become a central part of the scientific python ecosystem.


- First example is a bit unfair to vanilla python. A better implementation would be the following, which is two times faster (than the given vanilla example):

  from itertools import izip

  X = xrange(10000000)
  Y = xrange(10000000)

  Z = [x + y for x,y in izip(X,Y)]
- You didn't mention the difference between flatten() and ravel(). flatten() always makes an explicit copy, ravel() doesn't.

- "Arrays must have the same shape to be concatenated with concatenate()."

Except for the concatenation axis obviously.


vanilla CPython to be precise. PyPy is drastically faster on stuff like that.


In my experience the best way to ramp up on NumPy is to just dive right in and attempt to translate a modestly sized Matlab program to NumPy. The documentation, at least as it was a year ago, seems more geared towards that kind of learning


agreed. The Matlab to NumPy reference page [1] from the documentation gets you going in no time

[1] http://www.scipy.org/NumPy_for_Matlab_Users


Out of curiosity, why not use Matlab?


There were several reasons, most of what other people mention here factored in. Matlab was being a PITA to deploy because of the licensing, Octave wasn't matching it in performance, it's relatively unpleasant to work with, etc.

The big reason though is that I was trying to use it "in production" with MCC and it just really wasn't up for the task. It became a nightmare to debug and had major stability issues; I couldn't keep it running for more than a few hours. The logged errors were cryptic null-pointer exceptions from deep down the stack and the input that caused those conditions would run fine upon restarting the process. A real nightmare to deal with.

Eventually I realized that I'd spent a solid two days worth of work trying to fix it with no end in sight and decided that splitting off all the logic into a twisted service wrapping NumPy could be done in less time than I was probably going to continue spending messing with Matlab/MCC. Took about a day to do that, and mostly because some of the math was slightly beyond me so I had to use some significant caution.

Ran like a dream after that for months.


* functions have to be written in separate files than "scripts"

* language features are mostly about matrix manipulation

* OO and functional programming are rudimentary at best

* you will avoid newer features because they are buggy and your colleagues will probably have older versions anyway.

* you might dislike the crashing IDE

* MATLAB is essentially a niche language compared to the universality of python.

The reasons to use Matlab: * your program structure is going to be simple probably a single algorithm applied to many matrices or a few files(streams)

* toolboxes

* toolboxes

* toolboxes

* Simulink


Cost is a huge issue. My spouse worked with an image processing professor on her master's degree and he was converting his entire group from Matlab to numpy. While academic licenses for Matlab are very cheap, they don't actually allow you to publish research in journals; for that you need a much more expensive class of license.


The programming language in Matlab is not very good. The main advantages of Matlab is that is has so many built in functions and great plotting tools. There are a lot of plugins too (often commercial).

Matlab is also not open source and rather expensive. There is GNU Octave which is free and mostly compatible with Matlab.


Licensing costs (and the student version is full of limitations - compare to the student version of Mathematica, which has almost no limitations), missing features (e.g. weak OO), vendor lock-in (specially when 99% of your coworkers "just use a pirated copy, it's what we do") etc...


Can't speak for him but personally I find Matlab to be rather expensive for commercial use and it totally falls apart when you need to do anything outside of numerical computation.

It is really nice having a full on general purpose programming language to use right next to your scientific code.


Don't forget maintainability.

Anything moderately complex in matlab quickly becomes a maintainability nightmare, in my experience. Matlab discourages modularity (functions have to be either one line or in a separate file), and writing tests of any sort is unnecessarily difficult in Matlab.

Beyond that, Numpy is remarkably clean once you know how to use it. How many times do you see "tile" and "repmat" calls in matlab just to multiply two matrices? Numpy's broadcasting is a simple but remarkably useful feature.

Finally, numpy gives you much tighter control over memory usage than matlab does. This was actually the main reason I switched, personally. I didn't realize how much I was missing for quite awhile.


  > functions have to be either one line or in a separate file
Hasn't been true for a while now. You can have sub-routines and scope-local functions within functions now; embedding a procedure within a script is, of course, still impossible. So the situation has improved ever so slightly...


Good point. I was thinking of lacking functions within scripts, which is quite annoying for small projects, but not a major problem for larger things.


Strings in python are much easier to deal with, so if your program is going to do almost anything with them, Matlab is a royal pain.


[Sage](sagemath.org) provides a very nice, unified approach to NumPy, SciPy, Maxima, Octave (if it is installed), and most other open source mathematical package -- with the very important exception of [Gambit](gambit-project.org).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: