
Let’s Move Beyond Open Data Portals - anemani10
https://medium.com/civic-technology/rethinking-data-portals-30b66f00585d#.6c9bktzge
======
dj-wonk
To me, the article missed the point.

The central claim of the article is: "I actually think it’s time we abandon
data portals altogether." I didn't find any compelling argumentation to
support this claim. The author mentioned some trends; however, these trends do
not suggest that source data is not valuable. Rather, they are suggesting that
we need to _build on top of_ data. No surprise here!

Sure, particular applications and services may appear to be more valuable than
the raw data, but this does not suggest that the raw data is not valuable.
Quite the opposite. I don't think any savvy person would suggest that a data
portal is the end goal -- they are just the beginning.

I'd suggest something simple and probably non-controversial: data portals make
sense to the extent that they provide value over a long time frame (perhaps 5
to 20 years -- or longer).

Here is a simple way to look at how data can add value.

1\. Availability first (it has to exist).

2\. Discoverability second (people and services need to find it).

3\. Applications third (higher value can be extracted here).

(Note about me: I've worked on a now-defunct data catalog a while back. I
don't have illusions about them, and I think many can be improved in key ways.
Also, I know they require more maintenance than some would admit.)

------
6d0debc071
As long as I can get the data, it's fine. What I'd be worried about would be
if apps become the only way that an end user can get access to information
about their environment. That would seem like a massive step backward for all
that those applications, as part of a broader context, may enhance services.

~~~
echochar
Agree. The webmasters, app developers, are free to follow the trends and
pursue whatever presentation and delivery mechanisms they like. But as a
backup the bulk data needs to also be mirrored on at least one FTP server and
available to the public in an open, accesible file format (e.g., CSV, SQL
tables, XML, JSON, etc.).

------
ThomPete
_"...A good example of this is Foursquare. Beforehand you’d do everything in
one app. Now there’s Foursquare and Swarm. Facebook has Messenger as a
separate app. Google has like 17 different apps. You’re seeing this shift from
just one specific application that does everything to many different
applications designed for a particular experience..._

This is true in the west but is the same true for everywhere else? As far as I
remember China or Japan have apps to do everything in one and no indication
that this is going to change.

------
danso
Sorry...I just have to disagree with the OP. Several years ago, Socrata
stopped by where I worked (a news organization) and told us of their idea to
build a portal of city governments everywhere that would host datasets. They
were new at the time and I just thought they were bonkers.

Now, I can't believe what you can find on the various data portals. There's a
lot of shit data but that's because lots of organizations collect shit data.
But for the organizations that _do_ have data, Socrata is such a huge step
from what existed before.

I'll ignore the many situations in which agencies just didn't put out data at
all. Dallas, Texas is one exception. It has been posting its crime data for
years. Except it was on a FTP site with a convoluted structure. And it wasn't
all in one file. So you had to write a script that spidered the
subdirectories, downloaded the files, unzipped them, and concatenated them
(and I don't think they were a straight-up concatenation).

Now, it's just all on one page from which you can export the data as bulk CSV
[1]. Because Socrata's REST API is so straightforward, you can just script
your data requests to hit the right end points. But not only is there the
incident data, there's the narrative data [2] (which had also been on the FTP
site, but required its own spidering), and there's tables I hadn't seen
before, such as [3] suspects and real-time active calls [4]

On top of it, the police department has even decided to put up the data for
their _officer-involved-shootings_. Mind you, they were _already_ ahead of the
game, nationwide, last year when they created a parseable (via scraper)
webpage with HTML tables and PDFs. They certainly didn't have to make their
data even easier to get, but they did [5]

Texas has always been generally good about public records because of their
broad sunshine law. But it's not that the law turned them all into free-data-
hippies right away...the agencies just have a tradition of doing it -- it
helps alot if you're a Texas employee and you've seen how everyone else just
agrees to potentially damaging records requests, and yet no one gets stressed
out.

I have to think that Socrata, just by being _there_ as an option, not just in
Dallas, but everywhere, has made bureaucrats more aware of how data sharing
can just be... _done_. Certainly, there are always officials who will push
back, because they're power-control-freaks or because they have something to
hide. But plenty of bureaucrats don't really care...they've just been told by
their IT people that putting up data in an easy way would cost too much and be
too much of a security compromise. Now that there's an option of a general
data portal, there's fewer reasons to say no.

Just to give you an idea of how technically clueless many bureaucrats are (and
I don't really blame them, but their agencies for not prioritizing tech
training)...it is still not unheard of to be denied access to machine-readable
data -- e.g. they _print out a spreadsheet_ and fax it to you, instead of just
sending you the XLS -- because they think that if they give you the
spreadsheet, you can "alter the data".

Yes, it really is that dumb.

edit: to the author's credit, he's not saying that open data portals should be
closed, just that governments should move beyond them. That's a nice
sentiment, but in reality, it's an idea that takes away resources from
_improving_ data portals.

From TFA:

> _Now we actually give that directly to Waze, so they can reroute people
> dynamically. Indeed, this is a good open data story — taking the data to
> where people are —but there’s something more interesting: it’s a two-way
> street. Not only does Waze now share pothole and road condition data it
> collects regularly through its app, they went one step further. They began
> to proactively collect and share data in the interest public safety._

But _that can already be done_ with the existing LA data portal and its REST
API. Why does the city of Los Angeles have to give Waze anything other than
the GET endpoint, from which Waze engineers can download as they like. And not
just Waze, but everyone else, in equal measure. So there's nothing _wrong_
with what the author wants, he just doesn't appear to think that with APIs,
developers can create far better and far more resources than what the city
could do itself.

And no, the city (unless it has a magical source of revenue) can't do both
building out more "human" data applications while improving their open data
pipelines. The latter has much, much further to go before the city can spend
IT money on building out new apps.

[1] [https://www.dallasopendata.com/Police/Dallas-Police-
Public-D...](https://www.dallasopendata.com/Police/Dallas-Police-Public-Data-
RMS-Incidents/tbnj-w5hb)

[2] [https://www.dallasopendata.com/Police/Bulk-Police-
Narrative/...](https://www.dallasopendata.com/Police/Bulk-Police-
Narrative/inke-qqax)

[3] [https://www.dallasopendata.com/Police/Dallas-Police-
Public-D...](https://www.dallasopendata.com/Police/Dallas-Police-Public-Data-
Unknown-Suspects/jitt-qwwh)

[4] [https://www.dallasopendata.com/dataset/Dallas-Police-
Active-...](https://www.dallasopendata.com/dataset/Dallas-Police-Active-
Calls/9fxf-t2tr)

[5] [https://www.dallasopendata.com/Police/Dallas-Police-
Public-D...](https://www.dallasopendata.com/Police/Dallas-Police-Public-Data-
Officer-Involved-Shootin/4gmt-jyx2)

------
programnature
The faster people realize this the better.

What does the technology look like for achieving these kinds of goals?

~~~
pandacam
I would say it's actually fairly simple from a technology perspective: good,
well-documented APIs, as much SaaS apps as reasonable, and CTOs in government
who get it.

~~~
BMarkmann
I think the first two are, indeed, fairly simple from a technology
perspective. CTOs in government (or whatever other role is responsible for
making the data in question available) isn't simple or technology-related.

From my perspective -- both working with groups mandated to make data
available and researchers consuming public datasets -- those responsible for
making the data available DON'T get it in the vast majority of instances. It's
tough to tell if it's obtuseness, incompetence, or... call it what you will,
but if you are mandated to make information available that can directly assess
your performance or the performance of the organization you lead, you might
not have the right incentives.

A recent experience: the federal (US) government releases data regarding
clinical trials conducted by drug companies and universities available for
download in a format that they basically made up. OK, no problem, I've written
lots of parsers. Ingest the data from the source files, but wait! There's no
data dictionary or even a vague description of the relationships between the
contents of the (many) files they publish. You can make pretty good guesses,
but it definitely doesn't follow a well-documented API (or schema, whatever).
Just a recent gripe that's stuck in my craw, but it's not an isolated case in
my experience. I have come across a few that are very good and follow the best
practices you note, but most I've worked with do not. I would guess that the
former have your third characteristic; the latter likely do not.

