
Lesser-known Pandas tricks (2019) - HIP_HOP
https://towardsdatascience.com/5-lesser-known-pandas-tricks-e8ab1dd21431
======
jpxw
Something I love about pandas is that often you can pass a URL in place of a
file name.

The other day I needed to scrape data from a table on a webpage. Thinking
about traversing the DOM and building up an array was already giving me a
headache. Thankfully pandas has the “read_html” function. Getting a list of
dataframes for each table on the page was as easy as:

    
    
      dfs = pd.read_html(url)

~~~
reportgunner
1\. pandas can do date ranges _date_range =

pd.date_range(date_from, date_to, freq="D")_

2\. merge with indicator

if you set `indicator` parameter of _merge()_ to _True_ pandas adds a column
that tells you which dataset the row came from

3\. merge with approximate match - the _tolerance_ parameter of _merge_asof()_

 _pd.merge_asof(trades, quotes, on= "timestamp", by='ticker',
tolerance=pd.Timedelta('10ms'), direction='backward')_

4\. Create an Excel report and add some charts

5\. Save the dataframe in gzipped form

edit: formatting

edit2: I managed to get an outline link

[https://outline.com/fY26Aw](https://outline.com/fY26Aw)

~~~
dr_zoidberg
This is basically a summary of the link posted by OP.

Edit: not that it's a bad thing, I just mention it so people who read this
know for sure they can avoid the link if they already know how to use the tips
listed here. The article has code examples and more details.

------
aksakalli
Medium wants me to upgrade my account to read this article, please people
share your posts in somewhere else.

~~~
reportgunner
You are not missing much, the article isn't groundbreaking.

------
andreareina
Merge with indicator is also useful for doing anti-joins:

    
    
        left.merge(right, how="left", indicator=True, ...)
        [lambda df: df._merge == "left_only"]

------
staticautomatic
My favorite, most elegant SO answer I've ever gotten was to a question about
Pandas.

The question was "How do I create a column where each row's value is the mean
of another column's values starting at that row?" The answer was:

    
    
      df.loc[::-1, 'col_1'].expanding().mean()[::-1]

~~~
maest
I guess elegance is in the eye of the beholder.

Separately, I find it upsetting there are at least 6 ways of reversing a
dataframe. It suggests some API smell.

~~~
staticautomatic
I get tripped up by the Pandas API no matter how often I use it, and when I
haven't used it in a while I practically have to re-learn all the basics. But
whenever it does exactly what I need in a readable one-liner, I forgive it for
all its transgressions. It's an abusive relationship.

------
closed
Note that there is a handy PeriodIndex version of pd.date_range:

    
    
      pd.period_range(date_from, date_to, freq = "D")
    

AFAICT, a PeriodIndex and DateTimeIndex function mostly the same, and have
many of the same methods, except...

    
    
      * DateTimeIndex can't hold dates far in the future
      * PeriodIndex can't easily round to the end of a period (e.g. date + 0*MonthEnd() errors)
      * PeriodIndex doesn't handle timezones?

------
HIP_HOP
TLDR;

5 lesser-known pandas tricks:

1\. Date Ranges

2\. Merge with indicator

3\. Nearest merge by timestamp

4\. Create an Excel report from pandas

5\. Use gzip with when saving to csv

------
collyw
Does anyone want to do a TLDR? I don't especially want to sign into Medium.

~~~
HIP_HOP
you have it above

~~~
collyw
Ok, that's been posted since I posted this question.

