
U.S. OMB to release largest index of government data in the world - danso
http://www.ire.org/blog/transparency-watch/2015/02/24/omb-release-largest-index-government-data-world/
======
danso
Besides the usual quality/depth issues of any dataset, this will likely be a
constant obstacle (emphasis mine):

> These indexes, _if developed and maintained properly by the agencies_ , will
> reveal a vast trove of government information to the public.

However, even just having a straight-up list of what exists is a really good
start...a lot of data (and information in general) is not really _kept_
secret, it's just obscure.

A great data anecdote comes to mind, from an investigation that found Florida
police officers with severe misconduct charges were continually employed. The
records of their misconduct were public record, in a SQL database, but
reporters didn't previously know about it, and the agency who kept such
data...well, it's not their job to monitor it:

[http://ire.org/blog/on-the-road/2011/12/20/behind-story-
trac...](http://ire.org/blog/on-the-road/2011/12/20/behind-story-tracking-
police/)

> The misconduct database was the big thing. It had multiple tables in it.
> Then there was a separate employee database, which was a state-wide
> database. It’s basically like a glorified rolodex. (The state) is in charge
> of certification, which is why they keep the database. There is a form that
> officers have to fill out if they change jobs or departments. It is the
> cleanest set of data I have ever worked with. There was no big clean up with
> the data. Sometimes you get a data set and find out it has errors or wrong
> information. Everywhere we turned this data pointed us correctly.

> This was a case where the government had this wonderful, informative dataset
> and they weren’t using it at all except to compile the information. I
> remember talking to one person at an office and saying: “How could you guys
> not know some of this? In five minutes of (SQL) queries you know everything
> about these officers?” They basically said it wasn't their job. That left a
> huge opportunity for us.

At least with an index of datasets, you can grep it for something like
"misconduct" or "inspection" or "investigation" and start from there.

~~~
esonderegger
I'll remain skeptical about OMB's ability to know about all the datasets from
federal agencies, because as you said, there is a wealth of government data
that is "public" but obscure.

Another example is the Department of Defense's budget justification data. The
PDF documents that go into the annual President's budget are made public and
are easily found on agency websites. They look like this:

[http://www.saffm.hq.af.mil/shared/media/document/AFD-150130-...](http://www.saffm.hq.af.mil/shared/media/document/AFD-150130-006.pdf)

What the public would have no way of knowing, though, is that attached to
those documents are files with the extension of *.zzz. Those are really zip
files, but need to be renamed because DoD computer systems tend to view
anything with a zip extension as toxic. Inside those zip files are the xml
files used to create the PDF document. The data is both clean and
comprehensive.

I'm writing a set of scrapers that will download the publicly available PDF
files and extract the XML, so if someone wanted to import them into their own
database for querying, they could. I really wish that weren't necessary,
though.

Edit: I should mention that those xml attachments only exist for the
Procurement and RDT&E budgets, which account for about a third of total DoD
spending.

~~~
philipashlock
For what it's worth, OMB is overseeing the Executive Order that requires
agencies to compile these inventories. You can find more background at:

\- [https://project-open-data.cio.gov/](https://project-open-data.cio.gov/)

\- [http://www.digitalgov.gov/resources/how-to-get-your-open-
dat...](http://www.digitalgov.gov/resources/how-to-get-your-open-data-on-data-
gov/#federal-data-with-project-open-data)

And you can see how agencies are currently performing at
[http://labs.data.gov/dashboard/offices](http://labs.data.gov/dashboard/offices)

Can you point to some examples or links for the .zzz files you mentioned?

------
mayneack
Is there a list somewhere of what is contained in all these datasets?

~~~
mikecb
That's what OMB is supposedly going to release.

~~~
mayneack
so we just know it's "a lot of data" right now, not "tax stats by county,
medicare patients in each zipcode, etc"

------
Shivetya
If anything perhaps people can show them how to organize their data better to
see where all the duplicated efforts are and reign in some of the spending.

Then maybe if the right data is available we can query all the regulations to
find obsolete or weed out all the special interest driven ones

------
rrggrr
For those wondering when China, the EU or others will overtake the United
States' reserve currency status and/or safe-haven status consider this data
dump.

Currency <\-- confidence <\-- accountability <\-- transparency.

Is the USGOV transparent? No. Is it more transparent than any alternative?
Yes. Auditing the Fed and Ft. Knox would be welcome additions. FOIA access to
US and State Courts would be nice as well.

~~~
taylorwc
Actually the Fed is audited[1]. It's been done by Deloitte for a while now.

[1] [http://www.federalreserve.gov/publications/annual-
report/201...](http://www.federalreserve.gov/publications/annual-
report/2013-federal-reserve-system-audits.htm)

~~~
dantheman
That's an audit of the Fed's expenses etc, it's not an audit of it's
policy/actions.

~~~
snowwrestler
Yes, the "audit the Fed" movement is really about increasing the influence of
Congress on Fed policy decisions, not financial controls the way we normally
think of them.

The Fed was designed to be independent of Congress and the fight over whether
that was the right decision is still being waged.

~~~
sswaner
And rrgrr's concern about China overtaking the US will happen shortly after
the Fed looses it's independence. The open deliberations of Congress lend
little to the fight against special interests subverting common sense.

