
President Signs Government-Wide Open Data Bill - rmason
https://www.datacoalition.org/press-releases/president-signs-government-wide-open-data-bill/
======
ajr0
> The Chief Data Officer (CDO) will "(1) be responsible for lifecycle data
> management"

I am very interested in what this type of lifecycle might look like
considering most data I feel should be kept forever. I wonder how a lifecycle
might collide with the challenges that bit rot[0] are dealing with.

[0] [https://www.theguardian.com/technology/2015/feb/13/what-
is-b...](https://www.theguardian.com/technology/2015/feb/13/what-is-bit-rot-
and-is-vint-cerf-right-to-be-worried)

~~~
ocdtrekkie
This is already getting really exciting. Some government entities (including
lower levels like local, county, and state governments) have moved to
digitizing their old paper and microfilm records. But if they're expected to
maintain many types of records essentially forever, it places a constant
burden to continue to update and migrate data in perpetuity, whereas paper or
microfilm can sit in a box in a closet for decades.

For the most part, common file formats like PDFs, JPGs, and TIFs are likely to
be understood for a very, very long time, but you don't just have file
storage, you have systems to manage, index, and find those files, and those
systems are likely to need constant maintenance.

~~~
swebs
I've seen Blu-ray disc manufacturers claim lifespans of over 100 years with
capacities of 100 GB per disc. An entire warehouse of paper and microfilm
documents would be able to fit in a shoebox.

~~~
ocdtrekkie
Note that most record digitization projects are likely looking at live storage
(online disk arrays), as it allows constant access to said records. Also note
that the discs may last 100 years but having disc readers which can read them
may not, and one would need to load all the discs to do a media conversion.

------
teddyh
I am reminded of the "API" decision made by Jeff Bezos at Amazon, as famously
described by Steve Yegge:

 _So one day Jeff Bezos issued a mandate. He 's doing that all the time, of
course, and people scramble like ants being pounded with a rubber mallet
whenever it happens. But on one occasion -- back around 2002 I think, plus or
minus a year -- he issued a mandate that was so out there, so huge and eye-
bulgingly ponderous, that it made all of his other mandates look like
unsolicited peer bonuses._

 _His Big Mandate went something along these lines:_

 _1) All teams will henceforth expose their data and functionality through
service interfaces._

 _2) Teams must communicate with each other through these interfaces._

 _3) There will be no other form of interprocess communication allowed: no
direct linking, no direct reads of another team 's data store, no shared-
memory model, no back-doors whatsoever. The only communication allowed is via
service interface calls over the network._

 _4) It doesn 't matter what technology they use. HTTP, Corba, Pubsub, custom
protocols -- doesn't matter. Bezos doesn't care._

 _5) All service interfaces, without exception, must be designed from the
ground up to be externalizable. That is to say, the team must plan and design
to be able to expose the interface to developers in the outside world. No
exceptions._

 _6) Anyone who doesn 't do this will be fired._

[https://plus.google.com/+RipRowan/posts/eVeouesvaVX](https://plus.google.com/+RipRowan/posts/eVeouesvaVX)

~~~
randyrand
This sounds "great", but how exactly does your team that works on say, a
matrix library for computer graphics, expose its data over the network?

Perhaps instead of it being "all teams" it should be "all cpu processes"?

~~~
anth_anm
They don't, they write a library and others consume it.

It's not a "you must find some data or service to expose". If you have data
and someone wants to use it, they do it via service.

~~~
randyrand
The Sql services started without any data. The service started with empty
tables. But yet they have an Sql service.

~~~
lugg
Nobody said you couldn't share, communicate or transfer ownership of your data
to another service.

I'd say you're being a pedant but you're not even technically correct.

------
mLuby
>The OPEN Government Data Act requires all non-sensitive government data to be
made available in open and machine-readable formats by default.

That sounds pretty awesome (and expensive)!

~~~
dak1
Not if you expand what's considered sensitive!

~~~
mooman219
I'd always want to make sure no PII is accidentally leaked. Example: in 1997,
researchers at MIT showed that using only gender, date of birth and ZIP code,
it is possible to identify the majority of US residents! They proved their
point by identifying the Massachusetts governor's medical records in a
publicly-available dataset that was presumed anonymous. In 2010, Netflix
published an “anonymized” dataset of movie ratings by users. After it was
released to the public, researchers were able to identify many Netflix users,
even though the dataset only contained user ID, movie, rating, and rating
date.

~~~
aaomidi
Meh. Voter records are all basically public and most have full name, address,
phone number and even sometimes party affiliations.

I'd argue that keeping those public is also a net good for society.

~~~
skookumchuck
There are two issues in tension here. One is a right to privacy, but the other
is a right to audit who is actually voting. If the voter rolls were secret,
there's nothing to stop a massive fraud by those in power to stay in power.

~~~
mattferderer
Party affiliation should not be public record. I would like to understand why
it is.

~~~
hndamien
Kind of defeats the purpose of a public ballot.

~~~
anth_anm
No, you can have the lists of voters without having any information about who
they voted for (or in this case, party affiliation which gives who they
probably voted for).

~~~
hndamien
Wow! Did I write "public" \- I meant "secret" \- it's been a long day.

Of all the comments I've made that should have been down voted, the one I made
above was it. HN gods are having mercy on me it seems.

------
crabl
The deep irony here is that [https://www.data.gov/](https://www.data.gov/) is
still down due to the government shutdown.

~~~
randyrand
Just curious, do government services really not have any reserve funding? It
seems like avoiding shutdowns could be solved pretty easily by having reserve
funds (at least for 1-3 months or so).

But perhaps that's the point. Shutdowns are supposed to be inconvenient.

~~~
Maxious
Indeed. "Many agencies, particularly the military, would intentionally run out
of money, obligating Congress to provide additional funds to avoid breaching
contracts."
[https://en.wikipedia.org/wiki/Antideficiency_Act](https://en.wikipedia.org/wiki/Antideficiency_Act)

------
torstenvl
I am deeply afraid of the impact of this law. The amount of meta-work required
to consolidate and annotate data we collect, in order to prepare it for public
consumption, seems likely to hurt government efficiency.

In addition to the administrative burden, it appears to ignore the fact that
non-sensitive information, in sufficient quantity and correlation, becomes
sensitive information.

Perhaps my skepticism is misplaced, but my initial reaction is that this
sounds better in the abstract than it will turn out to be in practice.

~~~
CWuestefeld
Part of my wife's job is to research Medicaid billing codes for every state
(yes, this is a state thing, but I'm just making an example). Once in a while
she can get their codes in a form as "advanced" as an excel spreadsheet. But
more likely she'll get a PDF doc that she's got to run through an OCR
programming to convert it to a spreadsheet, and has to check for errors. Or
for some states, nothing is published at all - she's got to piece it together
from partner hospital billing records.

There's no doubt that getting this data into a sane format will take the
states some extra resources.

But when you consider how much more efficient this will make my wife's
company, and every other provider of Medicaid services, it's bound to be a
huge win on net. And improving efficiency of delivering healthcare should be
important.

The government is big, but the private sector is still much larger. So there's
great leverage to make our overall systems more efficient because an
investment in efficiency on the government side will be multiplied many times
over as seen by the many private entities that the government is overseeing.

~~~
mywittyname
There's money to be made selling these information to hospitals. It's just
really hard to sell things to hospitals.

~~~
drak0n1c
Hospitals routinely spend huge sums of money on new equipment that
significantly improves their competitive edge on diagnoses and outcomes. They
are also willing to spend money on drop-in solutions that lessen the need for
paperwork that eats up admin, nurse, or doctor time.

However, you're right that they are notoriously stingy on buying new things if
the economics aren't immediately apparent, and will never buy into something
that demands radical workflow/org changes.

~~~
mywittyname
By hard to sell to, I mean, it's hard to get in front of the right person. The
people who control the purse strings are not necessarily the most
knowledgeable about the problems at-hand either.

------
En_gr_Student
This sounds like an amazing thing! Anyone serious does their work
reproducibly. This shouldn't add more than a little bit about storage on
devices and paper, in terms of costs.

------
tracker1
Haven't dug in... does this include data as part of government funded grant
studies? It is a nice start to more open data from the govt.

~~~
jointpdf
I searched/skimmed through the full text of the bill, and it doesn't seem that
it applies to data used/generated by grant-based research (or other projects).
The language of the bill is heavily focused on executive agencies.

That would be ideal, although a large portion of grant funding goes to medical
research (e.g. via NIH), where it would be difficult to anonymize the data.
They could require that it be sanitized (differential privacy, etc.), but I
don't know how that could be verified effectively/efficiently. The grants
process is quite time-consuming for the grantees (and the grantors) without
this requirement.

*not a lawyer

~~~
count
Data generated by a researcher on a government grant generally includes a
provision in the grant that says the govt owns the data, doesn't it? And if
the govt owns it, this seems to indicate it should be open...

~~~
jointpdf
I'm not really knowledgeable enough on policy to know how this might interact
with the new law. But based on my reading, it seems that institutions
conducting federally-funded research are required to retain any data generated
by the research, but not necessarily that the government "owns" it, per se.
There was a change in policy in 1999 that required the ability for the public
to use FOIA to access grant-funded research data, with some limitations.

"To balance the need for public access while protecting the research process,
OMB’s revision limits the kinds of data that will be made accessible (it
excludes personal and business-related confidential data) and limits
applicability to federally funded data relating to published research findings
produced under a federal award and used in developing an agency action that
has the force and effect of law."

[https://fas.org/sgp/crs/secrecy/R42983.pdf](https://fas.org/sgp/crs/secrecy/R42983.pdf)

------
maxxxxx
I am surprised that in the current political climate any reasonable laws can
pass. My only explanation is that there were no lobbyists fighting it.

~~~
HumanDrivenDev
Lobbyists are probably fighting laws you don't like as well.

~~~
TomMckenny
Even a broken clock is right twice a day.

~~~
maxsavin
Time isn't real

------
RickJWagner
At least two things to like in this:

1) Open is better

2) Bipartisan legislation, indicates some progress is possible

------
ams6110
Is this the first act with a recursive name acronym?

------
kgwxd
Does he get paid to work today?

Edit: It was an honest question I wanted to know the answer to, but I've got
my answer.

~~~
zbyte64
Depends, is the secret service still able to pay trump hotel for their stay?

