Hacker News new | past | comments | ask | show | jobs | submit login
Scipy Lecture Notes – Learn numerics, science, and data with Python (scipy-lectures.org)
473 points by kercker on July 28, 2016 | hide | past | favorite | 17 comments

This is a great first start. Some resources the author may consider drawing upon, depending on whether and how they choose to expand:

[1] http://quant-econ.net/py/index.html

[2] http://people.duke.edu/~ccc14/sta-663/

One doc to learn them all :)

The `Optimizing and debugging code` part is where most data scientists falter.

So, this is a really nice effort for bringing it all together!

I wish I had this when I was learning the Python data analysis ecosystem. Does a nice job of clearly distinguishing the differences of the major elements.

I hope you don't take this as trolling, but: What's the deal with matrix multiplication in numpy? I wanted to dot-product two vectors yesterday, and I got it right only on the third try:

   x.T * y    # nope
   x.dot(y)   # still no
   x.inner(y) # ok
This is a disaster. I'm sure there are valid historical reasons for this state of affairs, but this makes numpy an environment where random idiosyncrasies get cast in concrete.

It's not historical. It's conceptual and deliberate (and far less confusing than the alternative, i.m.o.).

What's likely happening is that you're using a 1D array and a 2D array. If they were both actually vectors, `x.dot(y)` would work fine.

For example:

    import numpy as np
    x = y = np.arange(10)
    x.dot(y) # Yields 285
However, your problem is likely that one's a vector and one's a 2D array. Matlab doesn't actually have vectors at all, so this distinction confuses a lot of people. A row vector and a column vector are both 2D and aren't actually vectors at all. You can transpose them and swap the dimensions. When you transpose a vector in numpy, you still get a 1D vector. Transposing doesn't change the number of dimensions (To me it would be _incredibly_ confusing if it did...).

So, in your case, you probably had something like:

    import numpy as np
    x = np.arange(10)
    y = x[np.newaxis, :]
    x.dot(y) # Raises a ValueError
Note that the opposite (`y.dot(x)`) would have worked fine in this case, as well as `np.inner`.

`np.dot` uses the last dimension of the first array (10) and the second-to-last dimension of the other array (1). They don't match. It's that simple.

You might have also been trying to take the dot product of two 2D arrays with the same dimension. In that case, the same thing happens. The last dimension of one doesn't match the second-to-last dimension of the other.

Very thorough explanation!

If you're coming from a MATLAB background, you might find this helpful [0].

When I switched from MATLAB to Python/numpy/scipy, I had to deal with these syntactic annoyances. But now I feel comfortable after using it for many years.

Don't forget the biggest edge Python has over MATLAB for numeric simulation, i.e., you can prototype your work as quickly, and for the same program, convert the slow portions into C or Fortran without the need to rewrite the whole thing. Not to mention, a proper programming language underlying the numeric system.

[0] https://docs.scipy.org/doc/numpy-dev/user/numpy-for-matlab-u...

Don't forget the biggest edge Python has over MATLAB for numeric simulation, i.e., you can prototype your work as quickly, and for the same program, convert the slow portions into C or Fortran without the need to rewrite the whole thing.

That statement applies equally to Matlab/Octave. You can link C/C++/Fortran routines into Matlab code using its "mex" interface. Octave also supports this interface.

Python is more flexible, but less simple to start out with because it has a more complicated ecosystem. For example, there are many different ways to link in fast code (ctypes/cffi/cython/numba/weave/f2py/...). Moreover, the data in a numpy array can be stored in memory in different ways (C or Fortran order, not necessarily contiguous, ...), which requires some care when passing it to external routines.

By default, numpy.ndarray multiply element-wise. There is a separate type called numpy.matrix which does matrix math. Alternatively, as of Python 3.5, `a @ b` will do matrix multiplication on NDArrays.

The first is just a misunderstanding of the `*` operator. I don't believe there is any context for which this would yield anything other than the element-wise product between `x` and `y` (provided these are `ndarray`s).

The second, (the matrix product between `x` and `y`) works provided that the dimensions of `x` and `y` make sense to perform matrix multiplication, i.e., `x` is 1xN and `y` is Nx1.

While I agree there are some idiosyncrasies, I think it is hyperbole to call it a disaster. I will say that one idiosyncrasy that annoyed me up when moving from octave / matlab, was that NumPy's `zeros` and `ones` functions require the dimensions as a tuple, while others, like `rand` or `randn` take multiple arguments as the dimensions.

Does x.T.dot(y) work?

This looks like a great document to help one get started on the road to using Python for lots of STEM type tasks. Pandas fans: don't be put off by the lack of mention of it on the title page, as it is covered in there, too.

Look like it's covering both Python 2 & 3. Good job.

Is Python 2 going to be kept alive forever and ever? Is there any other language on the planet that keeps a previous version supported years after a major release came out?

C, C++, Java... I wonder whether there are languages that are used to make money that don't provide backwards compatibility.

Fortran 77?

Good job

Applications are open for YC Winter 2023

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact