

NumPy Exercises for Data Analysis in Python - selva86
https://www.machinelearningplus.com/101-numpy-exercises-python/
I compiled a list of numpy practice exercises related to data analysis. Might be helpful if you want to practice some data munging problems. Feedback welcome!<p>https:&#x2F;&#x2F;www.machinelearningplus.com&#x2F;101-numpy-exercises-python&#x2F;
======
amch
Does anyone know of any similar resources for Pandas?

I've found the following to be quite helpful but would love to know if anyone
knows of other resources in a similar vein: [https://pandas.pydata.org/pandas-
docs/stable/cookbook.html](https://pandas.pydata.org/pandas-
docs/stable/cookbook.html)

~~~
00ajcr
I started writing a '100-pandas-puzzles' set of exercises here:
[https://github.com/ajcr/100-pandas-
puzzles](https://github.com/ajcr/100-pandas-puzzles)

There's also pandas_exercises by Guilherme Samora
([https://github.com/guipsamora/pandas_exercises](https://github.com/guipsamora/pandas_exercises))
which is very good - it's split across multiple notebooks and is more
extensive than my repo.

~~~
selva86
Nice stuff!

------
jofer
This is very similar in spirit to
[https://github.com/rougier/numpy-100/blob/master/100%20Numpy...](https://github.com/rougier/numpy-100/blob/master/100%20Numpy%20exercises.md).
In fact, now that I look at it a bit more, it seems like all of this post's
examples are reworded versions of Nicolas Rougier's "numpy 100"...

~~~
amelius
There's also Rosalind for bioinformatics problems to be solved in Python.

[http://rosalind.info/problems/locations/](http://rosalind.info/problems/locations/)

~~~
res0nat0r
This looks interesting. Thanks for the link.

------
00ajcr
There are some nice exercises here, good work.

For question 48 it might be simpler to just write

    
    
      np.sort(a)[-5:]
    

instead of using argsort() and then using fancy indexing. Better yet, use

    
    
      np.partition(a, kth=-5)[-5:]
    

which scales linearly with the size of the array.

Also, the one-hot encoding puzzle (51) would be more efficiently solved using

    
    
      (arr[:, None] == np.unique(arr)).view(np.int8)
    

In general, `for` loops over NumPy arrays should be avoided where at all
possible.

~~~
selva86
Thanks for the suggestion, I will factor those in.

------
glup
One thing that is holding me back in numpy is not knowing the runtime
complexity of operations—of course I can profile code, but I should have
better awareness when writing code in the first place. Without an algorithms
background, I don't have strong intuitions on the runtime complexity of the
primitives (np.unique). Any suggestions?

~~~
stabbles
Switch to Julia! Hit @edit unique([1,2,3,2]) in the REPL and you see the
implementation.

~~~
dman
Nice! Ive been meaning to try out Julia for a while now. Is the numpy
equivalent in Julia largely written in Julia itself? (as opposed to C/Fortran)

~~~
mindB
Julia's numpy equivalent is basically the standard Array type from the
standard library, which I'm 99% sure is native Julia.

~~~
thebooktocome
If one is working on small (<= 15 by 15) matrices, the StaticArrays module [1]
is also native Julia and is much faster than Base.Array. Since a StaticArray
knows its own size after type inference, they are allocated on the stack,
which is nice.

One downside is that unless you're doing BLAS-style operations, writing non-
trivial transformations of StaticArrays always seems to require generated
functions.

Anyway, I think this is a feature that numpy doesn't provide.

[1]
[https://github.com/JuliaArrays/StaticArrays.jl](https://github.com/JuliaArrays/StaticArrays.jl)

------
dotancohen
After a quick peruse, about half the exercises included new material for me.
Anybody learning NumPy would do good to review this. Bookmarked!

------
loganzk
Working through them and noticed a few small things.

For #3, you can make a boolean array with np.ones/np.zeros with the same dtype
arg, saves a little bit of space.

ie np.ones((3,3), dtype=bool)

For #14, you can make use of the same compound boolean statements as you can
in pandas to make it a bit simpler.

ie a[(a > 5) & (a < 10)]

For #15, this is a built in numpy function.

np.maximum(a,b).

That's as far as I've made it, but I'm really enjoying them.

~~~
selva86
Thanks for the No.14 man!

However, for No. 15, that is not the point of the exercise.

------
shrx
> 13\. How to get the positions where elements of two arrays match?
    
    
      > Desired Output:
      > #> (array([1, 3, 5, 7]),)
    

Why is (array([1, 3, 5, 7]),) the desired output, and not array([1, 3, 5, 7])
?

------
longqzh
For #15, if the number of elements is large, the speed will be slower than we
expected, since maxx function is writren in pure python. But in my experience,
it is much faster than for loop in pure python.

------
blt
oh wow I wish I knew about r_ and c_ a few months ago! I'm still annoyed with
numpy for being more clunky than Matlab for linear algebra, but resources like
this are good for verifying that I'm doing stuff in a numpy-ic way. Thanks!

(Also numpy has some really nice features over Matlab, like [None,:]
broadcasting and being able to index a parenthesized expression or function
output without naming it. Ok, the latter is not really a feature, more of an
example of how Matlab is broken as a language)

------
cosmosa
Looks good, I will definitely use this as a reference!

------
boruto
Are there similar exercises for Java-8 streams?

------
budadre75
anyone knows anything similar for matplotlib?

