
An open API for company SEC filings - pranade
http://kimonolabs.com/sec/explorer
======
Animats
There's no need for a third party "open API" for SEC filings. The SEC already
has one.[1] You can also download all SEC filings via FTP.

We've been indexing all SEC filings since 2000, from before the SEC had a
public search engine. You can access them at "www.downside.com". While the
site hasn't been updated in years, the indexing engine continues to run every
day at 4AM, downloading the new SEC filing index for the day. Costs me $15 a
month to keep it running.

There's a paid commercial service, "edgar-online.com", which also does this.
They've been unnecessary since the SEC put up a search engine.

[1]
[http://www.sec.gov/edgar/searchedgar/webusers.htm](http://www.sec.gov/edgar/searchedgar/webusers.htm)

~~~
msherry
It's true that the SEC makes this data available free of charge, but it's
typically not useful without a lot of extra processing.
XBRL([http://en.wikipedia.org/wiki/XBRL](http://en.wikipedia.org/wiki/XBRL))
is a standard of sorts, but there seems to be no enforcement of how it's used
across different companies' filings (or even across multiple filings from the
same company), and many critical pieces of information are filed under e.g.
company-specific extensions. It's definitely a pain working with the data in
any meaningful way.

FWIW, www.downside.com had a hard time reading the latest filings for AAPL,
the only one I tried. Visiting
[http://www.downside.com/cgi/testfinancialsextract.cgi?url=ht...](http://www.downside.com/cgi/testfinancialsextract.cgi?url=http://www.sec.gov/Archives/edgar/data/320193/0001193125-15-023697.txt)
threw an error "ERROR: SGML Parse error: EXCEPTION reading from net: The
filing is bigger than the maximum allowed size of 5000000 bytes. at
EDGAR/netutil.pm line 75."

~~~
Nicholas_C
Getting SEC filing data is an absolute nightmare. Every time I think of a
project that includes SEC filing data (Executive names/ages, MD&A text
analysis, etc.) I skip it and move on to something that's just as interesting
but less time consuming and more doable. There doesn't appear to be a scalable
solution.

~~~
Animats
What's the problem finding executive names and ages? Get the SEC index for a
CIK, pull the latest DEF 14A form[1], and start parsing the tables. Build a 2D
data structure for each table. Look for tables that have column headings
including "Name" and "Age". Then back up from the start of the table to the
previous heading that's not associated with a previous table, and look for
keywords in the heading such as "Director(s)" or "Executives".

It's tougher when the filer tries to be cool and doesn't use tables for
tabular data.[2] Then you have to figure out which <div> items are line breaks
and which aren't. Fortunately, the SEC doesn't let you put Javascript or off-
site CSS in a filing; it all has to be in one document.

Yes, dumb scraping techniques like looking for CSS class names won't help, but
it's not really that hard.

[1]
[http://www.sec.gov/Archives/edgar/data/1288776/0001308179140...](http://www.sec.gov/Archives/edgar/data/1288776/000130817914000114/lgoogle2014_def14a.htm)
[2]
[http://www.sec.gov/Archives/edgar/data/1326801/0001326801140...](http://www.sec.gov/Archives/edgar/data/1326801/000132680114000016/facebook2014proxystatement.htm)

~~~
readme
You just justified the existence of OP's API with your explanation.

~~~
Animats
Their system isn't capable of extracting complex info such as executive names
and ages, which is what the requestor wanted. The API only does the easy
stuff, returning fields from XML.

Edgar Online was sort of a data troll. They bought FreeEdgar to make them go
away. After the SEC put up their own search engine, Edgar Online was mostly
unnecessary, and it was sold to RR Donnelly. There's also "secinfo.com", which
someone runs as a spare-time activity and does about as much as Edgar Online.
There's no need for a pay service to get this free data.

------
apendleton
"The information provided herein may be displayed and printed for your
internal use only and may not reproduced, retransmitted, distributed,
disseminated, sold, published, broadcast or circulated." That's a curious
definition of "open."

------
csandstedt
This is a good idea but as others have highlighted the issue will be with data
quality. At TagniFi we've been pretty vocal about the quality of the XBRL data
because we find a lot of errors. Using the XBRL data directly from the SEC is
the equivalent of drinking pond water since there is very little validation
occurring[1]. This has resulted in a significant number of errors that will
need to be corrected before consuming the data. We've automated some of this
error correction but there are still quite a few that need human involvement.
We also run all of the data through hundreds of QA checks to ensure data
quality in the absence of validation.

[1] [http://www.tagnifi.com/dont-drink-the-pond-
water/](http://www.tagnifi.com/dont-drink-the-pond-water/)

------
wrd
I've lately spent a lot of time trudging through 10-K's and financial data, so
this is really cool! I've almost made this exact service myself on a number of
occasions but stopped due to the unreliability of XBRL, which meant that even
if I had a sweet API I'd still have to go back and double-check the numbers by
hand.

For me, Excel integration with products like CapIQ is good enough -- for now.

------
hbcondo714
Glad to see this got some good upvotes and discussion as some investors aren't
even aware of the wealth of information SEC filings contain. Having worked
with the SEC's archaic Edgar "database" with another provider of free SEC
Filings and API[1], I respect what Kimonolabs is doing.

[1] [https://www.Last10K.com](https://www.Last10K.com)

------
cymetica
We use the SEC Edgar API/XBRL/FTP datasets with a good amount of success over
at [http://cymetica.com](http://cymetica.com)

This may seem counter-intuitive, but we are glad to see more advancement is
being made in this area which results in greater opportunity for information
arbitrage.

