
Python Pandas: Tricks and Features - endlesstrax
https://realpython.com/python-pandas-tricks/
======
closed
Woah, I had no idea the testing module existed. One thing I've found useful in
pandas are the DataFrame .query and .eval methods [1]. They're nice for
cutting out tons of lambda functions in pipes.

E.g.

    
    
      df.somemethod() \
        .loc[lambda df: df.x < 2]
    
    

becomes

    
    
      df.somemethod() \
        .query("x < 2")
    
    

One issue I've noticed is that there's a frustrating bug [2] that causes many
queries to raise an error before evaluating, but this can be fixed by changing
the engine argument:

    
    
      df.query("x.str.contains('a')", engine = "python")
    
    

1: [https://pandas.pydata.org/pandas-
docs/stable/generated/panda...](https://pandas.pydata.org/pandas-
docs/stable/generated/pandas.DataFrame.query.html)

2: [https://github.com/pandas-
dev/pandas/issues/22435](https://github.com/pandas-dev/pandas/issues/22435)

~~~
mlthoughts2018
I wish more teams considered it important to expose the tests as a module or
subpackage that is included in distribution, such as what numpy does with
numpy.test(‘full’) [0].

When you are knee deep in some long-running docker container with some data
analysis going on in an interactive console and get hit by a weird bug, it can
be so, so helpful to easily run unit tests post-installation to verify
everything is setup correctly.

It can also be a good step in CI if you build minimal docker containers that
should house an installation of the package at the given commit, and have e.g.
Jenkins build the container with the package installed from that commit and
then launch the container with a simple command like

python -c “import mymodule; mymodule.test()”

[0] [https://stackoverflow.com/questions/9200727/is-there-a-
test-...](https://stackoverflow.com/questions/9200727/is-there-a-test-suite-
for-numpy-scipy/9200923#9200923)

~~~
closed
That's a good point--I used to keep tests outside the package, but it seems
like some projects make good use of having people who open issues run the unit
tests beforehand.

------
abakker
I recently spent a bunch of time trying to restructure an SPSS dataset that
had a sub-optimal structure. After failures with excel macros and SPSS syntax,
I ended up with about 100 lines of python using pandas columnar multindex and
stack(). The stack/Unstack is so fantastic for preparing data for tableau I
recommend everyone learn to use it.

------
gcmac
Thanks for posting - totally worth the read to learn there's a
pd.read_clipboard() function.

~~~
Ftuuky
Came here to say that. How come no other tutorial or MOOC on pandas mentions
that? It's so useful.

~~~
joelschw
Whilst it has its uses, I think we should encourage people to do things in a
reproducible way

