
Show HN: A tool to scrape senators' stock transactions for your own analysis - nsomani
https://github.com/neelsomani/senator-filings
======
nsomani
Hi everyone, with the recent interest in U.S. senator insider trading, I open-
sourced this repo for you to scrape and analyze the senators' stock
transactions yourself: [https://github.com/neelsomani/senator-
filings](https://github.com/neelsomani/senator-filings)

I provided a Jupyter notebook with my own simple analysis (limitations
included in the README), but of course it can be modified to suit your needs.
Let me know if you have any questions - happy to help!

~~~
amykyta
Great idea. Have you done any analysis on the data?

I’d be curious to see if a passive index to track members of the Congress
would beat the S&P.

Although it may not account for other sweet deals like this option expiry
extension and accelerated vesting [https://theintercept.com/2020/05/12/david-
purdue-senate-card...](https://theintercept.com/2020/05/12/david-purdue-
senate-cardlytics-stock/)

~~~
bko
I didn't use this tool, I instead used Senate Stock Watcher [0]. But a cursory
review [1] of the trades made by senators showed the amounts to be very small
(66% of trades were < 15k) and the typical Senator is a millionaire. In the 4
month period I analyzed, only 3 trades were between 250-500k.

I looked at performance as well, but its tough to benchmark these things.
There were net outflows in the market during that period and if you sold you
did relatively well in a falling market, so the typical trade did well. But
the relatively small size of the trades makes me skeptical that insider
information was used. Matt Levine said it best:

> Of course they could be lying, but in context the defense seems pretty
> plausible. (Kelly Loeffler, for instance, controversially dumped about 0.6%
> of her portfolio at around the same time, which sure seems like the sort of
> thing an investment adviser would do without any input from her? You could
> call your adviser and say “a disaster is coming, sell everything!,” but
> calling them to say “a disaster is coming, sell a tiny bit!” seems
> pointless.)

But this doesn't include William Barr as he is not a Senator.

[0] [https://senatestockwatcher.com/](https://senatestockwatcher.com/)

[1] [https://medium.com/ml-everything/analyzing-us-senators-
stock...](https://medium.com/ml-everything/analyzing-us-senators-stock-
picks-e8106bbbe8e)

[2]
[https://www.bloomberg.com/opinion/articles/2020-05-14/senato...](https://www.bloomberg.com/opinion/articles/2020-05-14/senator-
s-stock-trades-make-trouble)

------
beager
It would be fun to compare the timeline of stock transactions by legislators
with the timeline of public statements by those legislators and their
colleagues pertaining to the companies’ stocks they’re selling, or those
companies’ industries in general. Not that I expect such nakedly corrupt
pumping and dumping or hyping... but I’ve been more surprised in the past.

~~~
wdn
"Throw Them All Out" by Peter Schweizer documented a lot of this.

[https://www.amazon.com/Throw-Them-All-Out-Politicians-
ebook/...](https://www.amazon.com/Throw-Them-All-Out-Politicians-
ebook/dp/B0062N35X8)

------
inglor
Hey probably email the guys at tipranks (support@tipranks.com) with this.

They have a tool that takes returns and back tests then and finds correlations
they run on hedge funds and can probably find patterns quickly.

~~~
inglor
Plus finding from Elliot Spitzer to do things that bring transparency to the
stock market.

Source: I worked there ~4 years ago.

------
at_a_remove
Too bad they do not have to report their activity real-time, then the next
step would be to create something like an "index fund" that simply mimics
their choices.

~~~
buckhx
I've always wondered if some hedge fund uses senator's transactions as input
in their model

------
pjc50
While the impact of this on corruption is likely to be small, it would be
interesting to see how much of an edge on the market it would give you...

~~~
cochne
The data are old, but from 1993 to 1998, US senators outperformed the market
by up to 12.3% [0]

[0]
[https://insidertrading.procon.org/view.answers.php?questionI...](https://insidertrading.procon.org/view.answers.php?questionID=001034)

~~~
booleandilemma
Who knew our officials were such savvy traders? We must be electing the right
people.

~~~
wpietri
Personally, my suggestion is that federal elected officials be required to put
all their assets into a blind trust managed by some common third party (e.g.,
the CBO or a non-for-profit money manager). If public service is truly
important to them, that shouldn't be a big barrier. We don't need people with
a conflict of interest leading the nation.

~~~
curiousllama
How would you deal with illiquid assets? I can't imagine a republican taking
the business they spent their whole life building and putting it into a trust
managed by government bureaucrats.

~~~
pjc50
If they don't want to do that, they could always not become a politician?
Maybe leave some space for people who aren't business owners, who are
spectacularly over-represented?

~~~
wpietri
Exactly. With 325 million people, we shouldn't have a problem finding 542
elected officials who are willing to put their full-time focus on the job.

------
nathell
I've created a Clojure library to facilitate scraping scenarios like this one:

[https://github.com/nathell/skyscraper](https://github.com/nathell/skyscraper)

Writeup of a sample data acquisition + analysis usecase:

[http://blog.danieljanus.pl/2020/05/08/making-of-clojure-
depe...](http://blog.danieljanus.pl/2020/05/08/making-of-clojure-dependency/)

------
JoshuaEddy
Sounds like "Senate Stock Watcher":
[https://news.ycombinator.com/item?id=22834524](https://news.ycombinator.com/item?id=22834524)

[https://senatestockwatcher.com/](https://senatestockwatcher.com/)

~~~
nsomani
Senate Stock Watcher is really cool. I chatted a bit with Tim Carambat (who
built the site), and right now he's basically crowdsourcing the labeling for
PDFs. Hoping that we can develop on this package and eliminate the manpower
required for projects like Senate Stock Watcher.

------
stuaxo
Could this be adapted to work for British MPs ?

~~~
alibaba_x
Is their data public? The US publishes this data about their senators but I’m
not sure if the UK does as well

------
kumarski
You could actually implement the trades with
[https://alpaca.markets](https://alpaca.markets) if you wanted to.

~~~
jchook
Unfortunately the senators get 45 days to disclose their trades.

------
davidkuennen
This is nice. I think I will integrate it in my new stock events app. It's
exactly the kind of content that belongs there. Thanks!

------
feep
Takes about 2.5 hours to run on a slow machine (doesn’t look like CPU is an
issue) with a slow internet connection.

There is more processing to be done. There is a Jupyter cell that he says
takes 3 hours to run.

I will make a processed data file available, hopefully this weekend. If
anybody wants it, let me know.

~~~
malshe
That would be great! I am interested in the data. Thanks

~~~
feep
I hope you have a notifier running, because I cannot find an email for you.

Published at
[https://qri.cloud/feep/senate_financial_disclosures](https://qri.cloud/feep/senate_financial_disclosures)

I may add more fields, so check for updates.

------
TekMol
It says it uses the "Yahoo Finance API" but it seems no such API is offered by
Yahoo anymore.

There are other sites that offer access to what they call the "Yahoo Finance
API". I wonder how that works and if those are legit offers.

Does anybody know?

~~~
nsomani
Here's the specific package that I'm using:
[https://github.com/ranaroussi/yfinance](https://github.com/ranaroussi/yfinance)

~~~
TekMol
That seems to be a scraper.

I wonder if it is legal to scrape stock market data and then republish it.

~~~
Ill_ban_myself
Not only is it legal it’s insanely profitable! See:
[https://www.spglobal.com/marketintelligence/en/campaigns/snl...](https://www.spglobal.com/marketintelligence/en/campaigns/snl-
financial)

~~~
nl
Firstly, this doesn't scrape prices - they pay for a live feed direct from the
market.

Secondly, it isn't legal. They are protected by license agreement.

But stock prices is _not_ what the value of SNL Financial and/or Capital IQ is
for most people. I know a bunch of people who pay the $30K license fee for
these (as well as Bloomberg etc) who barely use the stock price at all.

------
fratimo66
Feinstein selling $ALLO on Feb 18th. Not a good call, I'd say.

~~~
rasengan
Agreed - and she says it was her husband and not her. There needs to be a law
that politicians are no longer single people but in fact come as a package -
and politicians should have 0 expectation of privacy as they serve the public.

~~~
fratimo66
I still believe in the value of individuality. Politiciants should have same
expectation of privacy as regular citizens.

Transactions should not.

~~~
save_ferris
The argument that politicians should have the same expectation of privacy as
regular citizens while also making their stock transparent is inherently
contradictory. Regular citizens do not have to make their stock transactions
public, but politicians should be required to.

I think it’s entirely reasonable for politicians to assume a life of less
privacy since every aspect of their lives is scrutinized by the press anyway.

------
dznodes
Is it legal for companies to pay politicians in stock? ("donate" stock to them
for favorable policies)

------
dhruvkar
Nice!

Any particular reason you picked using selenium+chromedriver instead of an
http lib (like requests) and beautifulsoup?

~~~
nsomani
It's a dynamic page, so I have to do things like mark checkboxes, click
buttons, etc. But if there's a better way to do it, I'm down to incorporate
it!

~~~
dhruvkar
Usually for dynamic pages, it's still working off of some internal API. So
when I search for first name starts with "a", it's sending a POST request to
[https://efdsearch.senate.gov/search/report/data/](https://efdsearch.senate.gov/search/report/data/)
with the following form data, which returns a nice JSON response.

Obviously you'd have to handle with correct headers and CSRF tokens, but it'll
be easier than Selenium for sure.

    
    
      draw: 1
      columns[0][data]: 0
      columns[0][name]: 
      columns[0][searchable]: true
      columns[0][orderable]: true
      columns[0][search][value]: 
      columns[0][search][regex]: false
      columns[1][data]: 1
      columns[1][name]: 
      columns[1][searchable]: true
      columns[1][orderable]: true
      columns[1][search][value]: 
      columns[1][search][regex]: false
      columns[2][data]: 2
      columns[2][name]: 
      columns[2][searchable]: true
      columns[2][orderable]: true
      columns[2][search][value]: 
      columns[2][search][regex]: false
      columns[3][data]: 3
      columns[3][name]: 
      columns[3][searchable]: true
      columns[3][orderable]: true
      columns[3][search][value]: 
      columns[3][search][regex]: false
      columns[4][data]: 4
      columns[4][name]: 
      columns[4][searchable]: true
      columns[4][orderable]: true
      columns[4][search][value]: 
      columns[4][search][regex]: false
      order[0][column]: 1
      order[0][dir]: asc
      order[1][column]: 0
      order[1][dir]: asc
      start: 0
      length: 25
      search[value]: 
      search[regex]: false
      report_types: []
      filer_types: []
      submitted_start_date: 01/01/2012 00:00:00
      submitted_end_date: 
      candidate_state: 
      senator_state: 
      office_id: 
      first_name: a
      last_name:

~~~
nsomani
That works for the search results, but the pages to display the periodic
transaction reports embed the data directly into tables. Here's an example:
[https://efdsearch.senate.gov/search/view/ptr/8db16cde-8a14-4...](https://efdsearch.senate.gov/search/view/ptr/8db16cde-8a14-4ea2-92ed-71d7f73ad131/)

So I think we'll still need to parse from the HTML. But it might still be
cleaner than using Selenium, so I'm open to any code changes.

~~~
dhruvkar
Ah got it. Yeah seems like that's direct HTML.

It would make the script faster and less error prone for sure. I'll check out
the repo in depth and see if there are any low hanging fruit.

------
mvkel
What’s the lag time on this? 30 days?

------
zeeone
Great effort, though these kind of tools are useless unless you can include
senators’ spouses and family </sarcasm>

------
filtercoffee37
Nice idea! This should be an easily navigable web app that every common person
should be able to use. Thanks for building.

