Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A tool to scrape senators' stock transactions for your own analysis (github.com/neelsomani)
350 points by nsomani on May 15, 2020 | hide | past | favorite | 80 comments

Hi everyone, with the recent interest in U.S. senator insider trading, I open-sourced this repo for you to scrape and analyze the senators' stock transactions yourself: https://github.com/neelsomani/senator-filings

I provided a Jupyter notebook with my own simple analysis (limitations included in the README), but of course it can be modified to suit your needs. Let me know if you have any questions - happy to help!

Great idea. Have you done any analysis on the data?

I’d be curious to see if a passive index to track members of the Congress would beat the S&P.

Although it may not account for other sweet deals like this option expiry extension and accelerated vesting https://theintercept.com/2020/05/12/david-purdue-senate-card...

I didn't use this tool, I instead used Senate Stock Watcher [0]. But a cursory review [1] of the trades made by senators showed the amounts to be very small (66% of trades were < 15k) and the typical Senator is a millionaire. In the 4 month period I analyzed, only 3 trades were between 250-500k.

I looked at performance as well, but its tough to benchmark these things. There were net outflows in the market during that period and if you sold you did relatively well in a falling market, so the typical trade did well. But the relatively small size of the trades makes me skeptical that insider information was used. Matt Levine said it best:

> Of course they could be lying, but in context the defense seems pretty plausible. (Kelly Loeffler, for instance, controversially dumped about 0.6% of her portfolio at around the same time, which sure seems like the sort of thing an investment adviser would do without any input from her? You could call your adviser and say “a disaster is coming, sell everything!,” but calling them to say “a disaster is coming, sell a tiny bit!” seems pointless.)

But this doesn't include William Barr as he is not a Senator.

[0] https://senatestockwatcher.com/

[1] https://medium.com/ml-everything/analyzing-us-senators-stock...

[2] https://www.bloomberg.com/opinion/articles/2020-05-14/senato...

That's neat idea, but I think the reporting lag is too long to action it. Would still be good for retroactive benchmarketing.

Also consider that extremely wealthy senators (many are) might only have a small part of their assets in stocks and could be part of a bigger financial strategy, as a hedge against other asset classes rather than an indicator of where they or their agent thinks the market will move.

This is amazing transparency that we needed. With this tool, you may have really changed the future of America to that of one that’s less corrupt.


I don't think so. Hardly anyone is that stupid nowadays that they would run their own assets against privileged information. Corruption, especially in western countries,tend to be a bit more sophisticated.A simplified example: Senator A figures out that stock value of company X will drop because of upcoming legislation. He contacts his college years friend,who happens to own a lot of this company's stocks. The friend sell on time just before stock price crashes amd nets millions. Now senator A wants something im return. So it happens that his daughter is finishing marketing studies and will be in need for a job. The friend places senator's daughter in one of his firms and now she's newly baked VP of Marketing.

> The friend sell on time just before stock price crashes amd nets millions.

Doesn’t this get looked at by the SEC? Not always, but they monitor these kinds of transactions especially when millions are involved.

Since the "upcoming legislation" is public information, wouldn't this count as simple market speculation rather than insider trading?

This was a simplified example. In real life people usually have very complex social networks and it can very easily become Senator A - Senator B- Friend's friend- Neighbors daughter and etc.

I agree, although I think currency of favors is usually less than direct stock tips. That is too easy to prove.

It will force senators to need a partner in crime. Communication increases the likelihood of getting solid evidence of insider trading, and also gives another leverage point if the partner in crime can be pressured to produce evidence. The smart ones will still get away with it.

I was surprised to hear of blatant Covid-related insider trading by individuals in government, either some really are too stupid to know better or they feel impunity. I fear it's the latter.

At least this will filter out some of the ones too stupid to be allowed in leadership roles.

I would move away from the Yahoo Finance API, there's a few data sources that you can use that have free access to their API, see [1] for an example.

- [1] https://www.alphavantage.co/

I took a look at Alphavantage, but I wanted other data like daily transaction volume + market caps to do things like weighted regressions. The rate limiting was also a little restrictive.

Seems to add the PDF analyzing feature is not that hard or I'm missing something? Have you tried to incorporate this feature?

The good news is that it's open source, so instead of asking OP why they didn't add a feature, you can add it.

If someone could help out and make a PR to do the OCR for PDFs, that'd be awesome! Would love to add it in.

It would be fun to compare the timeline of stock transactions by legislators with the timeline of public statements by those legislators and their colleagues pertaining to the companies’ stocks they’re selling, or those companies’ industries in general. Not that I expect such nakedly corrupt pumping and dumping or hyping... but I’ve been more surprised in the past.

"Throw Them All Out" by Peter Schweizer documented a lot of this.


You'd get more meaningful results if you checked _their_ statements with the transactions of their _family members_.

But that's too creepy for me (which is probably what is being counted on).

Hey probably email the guys at tipranks (support@tipranks.com) with this.

They have a tool that takes returns and back tests then and finds correlations they run on hedge funds and can probably find patterns quickly.

Plus finding from Elliot Spitzer to do things that bring transparency to the stock market.

Source: I worked there ~4 years ago.

Too bad they do not have to report their activity real-time, then the next step would be to create something like an "index fund" that simply mimics their choices.

I've always wondered if some hedge fund uses senator's transactions as input in their model

While the impact of this on corruption is likely to be small, it would be interesting to see how much of an edge on the market it would give you...

The data are old, but from 1993 to 1998, US senators outperformed the market by up to 12.3% [0]

[0] https://insidertrading.procon.org/view.answers.php?questionI...

Who knew our officials were such savvy traders? We must be electing the right people.

Personally, my suggestion is that federal elected officials be required to put all their assets into a blind trust managed by some common third party (e.g., the CBO or a non-for-profit money manager). If public service is truly important to them, that shouldn't be a big barrier. We don't need people with a conflict of interest leading the nation.

Most of them do this. Like in the recent bashing of a bunch of senators for market insider trading a few of them had their money in blind trusts invalidating the criticism. One of them was Feinstein.

How would you deal with illiquid assets? I can't imagine a republican taking the business they spent their whole life building and putting it into a trust managed by government bureaucrats.

If they don't want to do that, they could always not become a politician? Maybe leave some space for people who aren't business owners, who are spectacularly over-represented?

Exactly. With 325 million people, we shouldn't have a problem finding 542 elected officials who are willing to put their full-time focus on the job.

I'm not seeing the bad part here. Every day there are people who sell off their business and go do something else. If an owner isn't ready for that, well, they can keep running the business. I also think it's fine if the trust doesn't really manage the business, but instead just sells it off and puts the money in stock and bond index funds.

And I think it's not just Republicans who would have to do some hard thinking here. Bloomberg is a Democrat, for example. Does he want to be president, or does he want to run Bloomberg? Both of those seem like plenty for one person to do. I think it's reasonable to say that if he wants to be our leading public servant for a number of years, he should sell off his business to people who have time to focus on it.

On the contrary! In some cases, individuals strive for elected/appointed office because there are clauses which allow for penalty-free exits from locked-up equity.

I realize this doesnt help for illiquid assets, but it does help for assets where you are locked into vesting schedules.

> common third party

so you're proposing one unelected entity which basically has full control of all personal assets of the legislature?

seems like a pretty big potential for vulnerability

You realize that one unelected entity has control over all the retirement assets of 40% of America, right? I think if we can handle the Social Security Administration existing, we can handle somebody running a small fund for less than 600 people.

You know that the SSA does not manage any market invested assets, right?

When Social security taxes exceed payout (which they haven't since 2009), the income is used to finance deficit spending and the SSA is given a special federal bond.

Sure. But there's an enormous amount of money flowing there, far more than in a tiny investment fund, so it's still a potential vulnerability. The US Treasury and the US Federal Reserve do manage actual assets, and again at enormous scale.

In that time period is was perfectly legal for members of Congress to engage in insider trading.

You are right, the STOCK act to prevent insider trading was passed in 2012. Since then, it seems studies have shown the advantage has disappeared [0]

[0] https://www.nber.org/papers/w26975?utm_campaign=ntwh&utm_med...

That's a suspicious timeline. 98 was the Russian financial crisis. Do you get the same number if you include the crisis?

There is a substantial delay between the time of the actual trades and their filing, so any edge they might have had is long gone by the time you have access to the data.

Probably not a very good one, somewhat surprisingly. I haven't run any analyses / haven't looked at this dataset in depth but from past forays into congressional datasets I don't really get what a lot of these people are doing. Usually you can throw a dart, check the financial portfolio, and see massive glaring problems immediately. Huge tilts into single / a handful of companies, totally outsized amounts of money in some random local small cap / mid cap, weird biases and tilts into commercial real estate (for example), you name it and you can see it. If anything I might investigate a counter-congress trading plan if you're looking for edge...

Actually, there have been studies done showing their trades beat the market! It is not possible to know their exact return but the studies I've read suggest they have alpha.

I think I’m gonna have to read up a little more, but I’ve seen this in support of the excess returns view:


And this in support that avg congressional portfolio is not performing well:



Looks like there might be a historical component around the STOCK act that different studies play with for their data window. In either case — I still say for the poor level of diversification within & around asset classes that I don’t see a lot of shining examples of how to achieve success.

> I still say for the poor level of diversification within & around asset classes that I don’t see a lot of shining examples of how to achieve success.

Wouldn't you expect a poor level of diversification in a stock portfolio based on corruption? They aren't dollar cost averaging and diversifying. They're doing the exact opposite to maximize the trade.

Kinda, I guess for a sophisticated “corruption portfolio” I would expect to see suspiciously timed trading but maybe more coherent core holdings. Just because you have inside info doesn’t mean you have a time machine, you still have to protect against unknowns.

I've created a Clojure library to facilitate scraping scenarios like this one:


Writeup of a sample data acquisition + analysis usecase:


Senate Stock Watcher is really cool. I chatted a bit with Tim Carambat (who built the site), and right now he's basically crowdsourcing the labeling for PDFs. Hoping that we can develop on this package and eliminate the manpower required for projects like Senate Stock Watcher.

Could this be adapted to work for British MPs ?

Is their data public? The US publishes this data about their senators but I’m not sure if the UK does as well

You could actually implement the trades with https://alpaca.markets if you wanted to.

Unfortunately the senators get 45 days to disclose their trades.

This is nice. I think I will integrate it in my new stock events app. It's exactly the kind of content that belongs there. Thanks!

Takes about 2.5 hours to run on a slow machine (doesn’t look like CPU is an issue) with a slow internet connection.

There is more processing to be done. There is a Jupyter cell that he says takes 3 hours to run.

I will make a processed data file available, hopefully this weekend. If anybody wants it, let me know.

That would be great! I am interested in the data. Thanks

I hope you have a notifier running, because I cannot find an email for you.

Published at https://qri.cloud/feep/senate_financial_disclosures

I may add more fields, so check for updates.

It says it uses the "Yahoo Finance API" but it seems no such API is offered by Yahoo anymore.

There are other sites that offer access to what they call the "Yahoo Finance API". I wonder how that works and if those are legit offers.

Does anybody know?

There definitely is still a Yahoo Finance API for large corporate customers. That's where Apple's Stocks app gets its data, for example. I'm not sure about the third party services---they probably just scrape Yahoo's website and repackage the data.

Here's the specific package that I'm using: https://github.com/ranaroussi/yfinance

That seems to be a scraper.

I wonder if it is legal to scrape stock market data and then republish it.

Not only is it legal it’s insanely profitable! See: https://www.spglobal.com/marketintelligence/en/campaigns/snl...

Firstly, this doesn't scrape prices - they pay for a live feed direct from the market.

Secondly, it isn't legal. They are protected by license agreement.

But stock prices is not what the value of SNL Financial and/or Capital IQ is for most people. I know a bunch of people who pay the $30K license fee for these (as well as Bloomberg etc) who barely use the stock price at all.

Feinstein selling $ALLO on Feb 18th. Not a good call, I'd say.

Agreed - and she says it was her husband and not her. There needs to be a law that politicians are no longer single people but in fact come as a package - and politicians should have 0 expectation of privacy as they serve the public.

Still applies to standard insider trading law. It's not like a CEO's spouse can trade the company's stock after hearing about his/her day at work.

I still believe in the value of individuality. Politiciants should have same expectation of privacy as regular citizens.

Transactions should not.

The argument that politicians should have the same expectation of privacy as regular citizens while also making their stock transparent is inherently contradictory. Regular citizens do not have to make their stock transactions public, but politicians should be required to.

I think it’s entirely reasonable for politicians to assume a life of less privacy since every aspect of their lives is scrutinized by the press anyway.

This whole debate stems from the fact that we have career politicians, which (at the federal level, at least) is not what the founders envisioned. A citizen legislature cannot, by definition, exist when it's filled by people who have never had a real job.

Term limits fixes this from both ends. Politicians are allowed their privacy, because they're just regular people and nobody is going to get rid from a single Senate term or 3-4 House terms. It eliminates the incentive to associate one's self more with DC than your home district. And it would probably result in people having a better opinion of Congress as a whole because it's not filled with people who have never done anything else.

Is it legal for companies to pay politicians in stock? ("donate" stock to them for favorable policies)


Any particular reason you picked using selenium+chromedriver instead of an http lib (like requests) and beautifulsoup?

It's a dynamic page, so I have to do things like mark checkboxes, click buttons, etc. But if there's a better way to do it, I'm down to incorporate it!

Usually for dynamic pages, it's still working off of some internal API. So when I search for first name starts with "a", it's sending a POST request to https://efdsearch.senate.gov/search/report/data/ with the following form data, which returns a nice JSON response.

Obviously you'd have to handle with correct headers and CSRF tokens, but it'll be easier than Selenium for sure.

  draw: 1
  columns[0][data]: 0
  columns[0][searchable]: true
  columns[0][orderable]: true
  columns[0][search][regex]: false
  columns[1][data]: 1
  columns[1][searchable]: true
  columns[1][orderable]: true
  columns[1][search][regex]: false
  columns[2][data]: 2
  columns[2][searchable]: true
  columns[2][orderable]: true
  columns[2][search][regex]: false
  columns[3][data]: 3
  columns[3][searchable]: true
  columns[3][orderable]: true
  columns[3][search][regex]: false
  columns[4][data]: 4
  columns[4][searchable]: true
  columns[4][orderable]: true
  columns[4][search][regex]: false
  order[0][column]: 1
  order[0][dir]: asc
  order[1][column]: 0
  order[1][dir]: asc
  start: 0
  length: 25
  search[regex]: false
  report_types: []
  filer_types: []
  submitted_start_date: 01/01/2012 00:00:00
  first_name: a

That works for the search results, but the pages to display the periodic transaction reports embed the data directly into tables. Here's an example: https://efdsearch.senate.gov/search/view/ptr/8db16cde-8a14-4...

So I think we'll still need to parse from the HTML. But it might still be cleaner than using Selenium, so I'm open to any code changes.

Ah got it. Yeah seems like that's direct HTML.

It would make the script faster and less error prone for sure. I'll check out the repo in depth and see if there are any low hanging fruit.

What’s the lag time on this? 30 days?

Great effort, though these kind of tools are useless unless you can include senators’ spouses and family </sarcasm>

Nice idea! This should be an easily navigable web app that every common person should be able to use. Thanks for building.

Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact
