
Aleph: Search Over 2 Million Corporate Filings from Oil, Gas and Mining - okket
http://aleph.openoil.net/
======
pudo
The underlying code base is open source, we'd love to have more contributors
for entity/feature extraction, better UX etc. etc.:
[http://github.com/pudo/aleph](http://github.com/pudo/aleph)

~~~
cs25
How do you scrape the data in real time? Do government agencies have API or do
you guys use something like scrapy to scrape their websites?

~~~
pudo
The latter. I'm not part of the OpenOil team but with the install I work on
([https://data.occrp.org](https://data.occrp.org)) we run crawlers every day,
week or month depending on how often the source changes.

