
Search the Full Text of Nonprofit Tax Records for Free - walterbell
https://www.propublica.org/nerds/new-search-full-text-of-3-million-nonprofit-tax-records-for-free
======
schwanksta
Oh hey I built this. Let me know if you have any questions about how it works.

Edit: wrote a little bit about that here -
[https://news.ycombinator.com/item?id=20141744](https://news.ycombinator.com/item?id=20141744)

~~~
ma2rten
How do you get the data? I wasn't able to find the forms for 2018 for some
charities. Did the IRS make them available yet?

~~~
schwanksta
The IRS puts them on an s3 bucket:
[https://docs.opendata.aws/irs-990/readme.html](https://docs.opendata.aws/irs-990/readme.html)

There are 2018 filings in there, but many charities have fiscal years that end
in Dec. IIRC, they generally file within 6 months. Given things like human
error, bureaucracy and filing extensions... more should start rolling in over
time.

------
rtempaccount1
The linux foundation results on here are quite interesting. they're showing
pretty rapid "revenue" growth from $39M in 2015 to $61M in 2016 to $81M in
2017.

I'm guessing this is, in no small part, down to increasing conference/event
revenues.

It'll be interesting to see what they do with all the additional cash.

~~~
ec109685
I don’t think so. Corporate sponsorships are a much better source of revenue.

~~~
rtempaccount1
Well sponsorships are part of conf. revenue in a lot of cases.

I've looked at the sponsor packs for the CNCF conferences, and those higher
tiers do not come cheap.

Also don't underestimate smaller individual payments * very large number.

E.g. Kubecon Barcelona, tickets were $900+ each and there were 7700 people, so
we're talking almost $7m from ticket sales alone.

Now the venue ain't cheap but that + all the sponsor cash == a fair profit,
I'd expect.

~~~
RyJones
look here on page 9:

[https://pp-990-rendered.s3.amazonaws.com/201823199349304962_...](https://pp-990-rendered.s3.amazonaws.com/201823199349304962_IRS990_0.html?X-Amz-
Algorithm=AWS4-HMAC-SHA256&X-Amz-
Credential=ASIA266MJEJY4BPWHTVU%2F20190610%2Fus-
east-1%2Fs3%2Faws4_request&X-Amz-Date=20190610T010454Z&X-Amz-
Expires=1800&X-Amz-SignedHeaders=host&X-Amz-Security-
Token=FQoGZXIvYXdzEGgaDFu7GJiems7CDWtIqiKWBIyEJe3KSCuYwUSfrCz1pGk1sf3x9K430voq74APfo4tSoZ7M9X2WosDkPnGTKjakqlOSrRo5oFaj51dg%2BiN1T9I3O4%2Fck8ZxhIoTCMbWJlsSAnksvtTAiEsA1JdJr4N9EL3p4ORQfjHpbRg4caMIrtK5q1vflqGwjX1lNlLxOjnlpxFSuhfDF%2FenCNMU%2F54n33EVQh4tT4aWC6ALs4xu%2FvLnH5zHiEERaOzflZwu5zGpMMyreYMi%2B%2FzLbFOUccTxgdOZzIkdbFId%2BRC%2BcclbLfkeav2EwyJV09safmRGbypowM8muKEHiRXzMXwQ4D1Cy%2B%2BuPLfhnnILK%2Fd9lQ9KiNy%2Fv3fYkdiy%2BJNer2voBabYPlakbD%2FjJeFG4KEodkuG9iPufxc%2Fv7qcwGLIR5hxHDPs4AObWYK6dBIi%2FxG1K1TD4PDjqq4xMay3u9Z5Jh5hoYNF5%2F678IqZ0st2Y80wpaXZAqfeuzDZuLLBVs5j2rqZRhxqFkpIGG0RWzYRbKHJHOMI%2BEJdLwmkSHYfJ%2FNQx%2FDMJkLR4x1W8Qo9VRU2P%2BJlJqTmvGFgRZsxhzUjPVLMxbSke5J79FrLBwyeCXLr3WRpgGBOSzdXpCAJWoUbYn3m45GRTQH8wIc5gM7Z%2FMmDzNaRj3E0TbkvF%2FPeg6XQBiSgGtZythlTGWJqERhp2beLlHSKRlP6mTYFIhL5Ed%2For8h%2FIxK1ijqi%2FbnBQ%3D%3D&X-Amz-
Signature=867dd41becf325410ef0145b577839184ee509a57f180106fe123ddd687a5396)

it breaks down the income into broad buckets

------
danhilltech
We’re building a graph of this same underlying data at
[https://alma.app](https://alma.app) (with a lot of enrichments) to help
people discover and donate to nonprofits.

E.g. here’s Stanford: [https://alma.app/charities/941156365-the-board-of-
trustees-o...](https://alma.app/charities/941156365-the-board-of-trustees-of-
the-leland-stanford-junior-university)

Would love to know what types of analysis people would like to see about
organizations and their relationships?

~~~
walterbell
_> Would love to know what types of analysis people would like to see about
organizations and their relationships?_

Related public records, e.g. court filings, municipal or other co-investments
alongside the nonprofits, adjacent (time or geo) legislation/policy changes,
rotating doors of nonprofit, gov, commercial.

~~~
danhilltech
Interesting thanks!

------
netwanderer3
This is amazing work that enables transparency, if every transaction was
indeed reported to the IRS. Typically we would have to manually search each
organization's website to obtain this information so kudos to them for
publishing everything all in one place and even with full text search feature
and api.

------
gravypod
How was this created? It would be cool to see how I could download copies of
this for personal analysis.

~~~
PowerfulWizard
[https://registry.opendata.aws/irs990/](https://registry.opendata.aws/irs990/)

A dataset of IRS 990 filings are available there. It is a big collection of
XML files.

Here is an example of one chosen at random:
[https://pastebin.com/pzNYBZYQ](https://pastebin.com/pzNYBZYQ)

EDIT: here is the same thru propublica explorer:

[https://projects.propublica.org/nonprofits/organizations/437...](https://projects.propublica.org/nonprofits/organizations/43740037)

which links here, which is the document I posted:

[https://s3.amazonaws.com/irs-
form-990/201643199349201044_pub...](https://s3.amazonaws.com/irs-
form-990/201643199349201044_public.xml)

~~~
schwanksta
Yup. We’ve been using this data for a while to render e-filed 990s on our site
and to extract highly paid employees. Now we just strip the markup out and
toss it all into elasticsearch for search. It’s really interesting to surface
things like grants.

I will say for personal analysis that the schema has a habit of changing, and
things like grants can appear in multiple places depending on the context.
What’s more, just 2/3rds of nonprofits e-file now (and I’m sure fewer and
fewer the further back you go) Just some things to look out for.

If you’re interested in processing the 990 XML data though, check out the
truly excellent irsx: [https://github.com/jsfenfen/990-xml-
reader](https://github.com/jsfenfen/990-xml-reader)

~~~
pbhjpbhj
If you don't e-file does that mean the IRS don't digitise your accounts and so
you avoid appearing in these sorts of data sets?

Sounds like a lot of interesting data will be in that last third, in which
case.

------
ykevinator
They also have an api

------
spenczar5
Deletable nit: could this be retitled to spell out “3 million?” I expected
something about 3M, the industrial conglomerate.

~~~
heimatau
I actually thought the same thing. Or maybe a $ sign prior to 3M. $3M could be
easier to distinguish quickly.

[edit] It seems that this is a link to 3 million records. Not a 3 million
dollars worth. It seems I was even confused with another way that the headline
could be perceived.

~~~
Scipio_Afri
Its not $3 million, its 3 million tax records from non-profits, digitally
filed by those non profits, from 2011 until now.

"The new feature contains every electronically filed Form 990, 990-PF and
990-EZ released by the IRS from 2011 to date. That’s nearly 3 million filings.
The search does not include forms filed on paper."

~~~
heimatau
Ah, then it should be something like 'Search the Full Text of 3 million
records from a Non-profit tax filings' or just add the word millions. Thanks
for the correction.

