

Featured dataset cannot be downloaded - yid
http://www.data.gov/research/

======
philipashlock
Thanks for pointing this out. We've updated the highlighted dataset to
something that is better maintained, but if you find other problems please do
ticket them on github
([https://github.com/GSA/data.gov/issues](https://github.com/GSA/data.gov/issues))

These kinds of issues will improve as a new approach to listing datasets is
implemented. This is currently underway and is described on Project Open Data
- [http://project-open-data.github.io/implementation-guide/](http://project-
open-data.github.io/implementation-guide/)

We're also working on some basic automated tests to ensure listings have the
correct MIME type (which is incorrect far too often) and that URLs aren't 404s
or redirects or HTML pages rather than raw data.

For those interested, please do consider applying for the open position -
[https://www.usajobs.gov/GetJob/ViewDetails/363324500](https://www.usajobs.gov/GetJob/ViewDetails/363324500)

I'll do my best to answer any questions about the position and the process of
applying.

------
yid
(Submitter here) This is so frustrating. There are so many links to important
datasets that are just broken, such as the NHTSA crash ratings:
[http://catalog.data.gov/dataset/new-car-assessment-
program-n...](http://catalog.data.gov/dataset/new-car-assessment-program-
ncap-5-star-safety-ratings) (try clicking "download" for an empty XML file)

The others link to agencies' download pages. It feels like they forgot that
the purpose of data.gov is data first.

~~~
8ig8
I googled that dataset. I was happily going to post this link from my search
and be your hero:

[https://explore.data.gov/Science-and-Technology/DOE-Green-
En...](https://explore.data.gov/Science-and-Technology/DOE-Green-Energy-
Patents-Data-Service/4dnd-s6im)

But, I figured I'd try to download the CSV to make sure it works. No dice. I
end up with this:

[http://www.osti.gov/home/RETIRED/greenenergy?format=csv](http://www.osti.gov/home/RETIRED/greenenergy?format=csv)

"The DOE Green Energy product has been discontinued. All of the information in
DOE Green Energy can be found in SciTech Connect."

OK, so I follow the link to SciTech Connect:

[http://www.osti.gov/scitech/](http://www.osti.gov/scitech/)

Of course it is to a general landing page. So I search on that page for: "DOE
Green Energy" (using quotes)

And I'm presented with one result: "Ellens Test2". Here's a screenshot:
[http://i.imgur.com/08rHfcZ.png](http://i.imgur.com/08rHfcZ.png)

If that's what you're looking for, then I found it. If not, I give up.

~~~
yid
> If that's what you're looking for, then I found it. If not, I give up.

Thank you for the effort :) Unfortunately, I have no idea what that is, but
someone could probably tell Ellen that it works.

------
jowiar
They're offering a pretty decent chunk of change if someone wants to fix it:

[https://www.usajobs.gov/GetJob/ViewDetails/363324500](https://www.usajobs.gov/GetJob/ViewDetails/363324500)

~~~
yid
Something makes me feel like that might be an uphill battle. The focus of the
entire site appears to be a slick superficial UI. There is no attempt to
consolidate data from various agencies into a uniform access format (say a
JSON API or periodic dumps to a user-pays S3 bucket). The data is in XML or
CSV or even XLS. The whole implementation seems a bit misguided, to be honest.

~~~
eplanit
I agree. I wonder if CGI (who also developed healthcare.gov) was hired to
implement it.

------
bubbafat
The data set disappearing is explained a bit more here:

[http://www.osti.gov/home/RETIRED/greenenergy?format=csv](http://www.osti.gov/home/RETIRED/greenenergy?format=csv)

Which leads to here:

[http://www.osti.gov/home/ostiblog/osti-re-focusing-and-re-
ba...](http://www.osti.gov/home/ostiblog/osti-re-focusing-and-re-balancing-
its-operations-%E2%80%93-and-refreshing-its-home-page-%E2%80%93-advance-p)

Which explains the data is here:

[http://www.osti.gov/scitech/](http://www.osti.gov/scitech/)

Which I appreciate is not what you wanted - but it's what you got.

------
pbbakkum
This is a problem close to my heart, thanks for pointing it out. I have a
project up at [http://commonwealth.io](http://commonwealth.io) attempting the
reduce the barrier to using data like this. Even when the data is accessible,
it could be in any format and you have to load it into a DB to get any value.

The idea is to enable real SQL queries directly on data sets, so you don't
need to worry about accessibility or how you'll query it. Not much data in it
now, and it doesn't solve your immediate problem, but perhaps a better model
for storing and representing this data.

------
ewams
This is a great example of a use case for using Bittorrent Sync. They just
need to create a read-only secret for each dataset they want to share and then
put it on the website. Then the data is P2P'd for cheaper and much easier to
maintain. Plus if the Government shuts down or has an issue the data can still
be transferred.

------
bartleeanderson
I found this link after playing around at their site Then I just entered green
energy as a search term Hope this helps
[http://www.osti.gov/doepatents/index.jsp](http://www.osti.gov/doepatents/index.jsp)

------
dlapiduz
The project is open source, you can submit an issue (or a PR) here:
[https://github.com/gsa/data.gov](https://github.com/gsa/data.gov)

------
ChuckMcM
Interesting, I left a note saying the dataset was unavailable. It looks like
it was created 22-Apr-2010 so it is probably fairly stale at this point.

------
stevejohnson
This headline is very editorialized. Submitter should write a short blog post
and submit that.

~~~
yid
Sorry, I thought this was worth sharing and didn't have the inclination to
write a blog post.

~~~
alialkhatib
Those clauses seem contradictory. You have a very clear (and in my opinion on
point) critique of what you're presenting us, but you don't have the
inclination to write a blog post? You even wrote your own top-level comment
providing some context. _That_ would have made a sufficient blog post right
there.

(A great blog post might have compared bad vs good cases, or provided some
suggestions on improving, or even looked more broadly at whether this was an
outlier of badly structured/provided data or just the norm)

The whole "Try (downloading|accessing) _this_..." strikes me as the same kind
of title as Upworthy's "You'll never believe _this_..." in that it seems
vaguely like link-baiting. I'm a little too curious not to click just to see
what the dataset is about.

~~~
yid
> You have a very clear (and in my opinion on point) critique of what you're
> presenting us, but you don't have the inclination to write a blog post?

Correct. I don't have a blog, and don't care to set one up. And I'd imagine a
.gov link would have a lower chance of being linkbait.

~~~
alialkhatib
There are mediums (pun not intended) where you can make the equivalent of a
blog post without much effort (Medium, Quora, I'm sure there are others). And
link-bait is more about the marketing used to get people to click on the link
(e.g. editorializing), isn't it?

I don't want to get into semantics, so let's consider this a minor detail,
because the more important matter is that you're editorializing. HN has a
pretty clear (and helpful) set of guidelines on posting:
[http://ycombinator.com/newsguidelines.html](http://ycombinator.com/newsguidelines.html)

>If you want to add initial commentary on the link, write a blog post about it
and submit that instead.

I'm not trying to harangue you or anything; you make a good point but it's
hamstrung by a length-limited title section.

~~~
warrenmcwin
I really appreciate your efforts in explaining the posting guidelines, here!
It's about respecting the forum; this is _not_ some one individual's personal
editorial-logspace, but instead a place to share and discuss full-bodied web
content.

Now, to defend this poster -- it's a quick point that speaks volumes.

"A link is worth a thousand bytes," sometimes.

