
Congressional Data is Defective By Design - fogus
http://www.opencongress.org/articles/view/1227
======
seiji
The author focuses too heavily on a useless anti-PDF rant. Why is it a PDF?
Because it's better than just sending us the Word file the PDF came from.

The PDF file also meets the exact requirements the author is complaining
about: "You should have — at your fingertips — immediate, unrestricted digital
access to the full text of any piece of legislation the very moment it’s
released publicly by Congress." I'm reading the document on my laptop and I
can select and copy text from it. How much more "unrestricted digital access"
can we get?

Sure, the PDF in question doesn't have in-document links to jump to parts. But
it has delineated pages. Never underestimate how un-tech-savvy congresspeople
currently are. They probably want one authoritative version of their document.
For modifications, they probably say "the third paragraph on page 20" for ease
of reference [1].

The author ideally wants every thing generated by a government to be handed
over on a silken pillow in perfect Legislative DocBook markup with all meta
data painstakingly entered and kept up to date. While demanding all
information be free is a laudable goal, it would be better to spend effort
implementing and backing such plans than screaming at people who don't have a
clear path to fix their current shortcomings [2].

[1]: They could use section headings and then reference from there, but that
incurs another lookup of "Go to the table of contents, find the page number,
go to the page, find the reference" instead of "Turn to page, see reference."

[2]: If a clear path to fixing their current shortcomings does exist, please
work with them instead of working at them.

~~~
raintrees
It might be beneficial to get the digital copier companies involved in this
discussion. If Congress is anything like my various clients who are attorneys,
they are using a digital copier to "publish" the file as a PDF. The only other
choices currently offered by the Minolta and Kyocera machines I have supported
are TIFF... or paper.

------
tokenadult
The article proposes a process for openness and Web publication of
congressional draft legislation that I'm pretty sure is already followed by
the Minnesota Legislature on its website.

<http://www.leg.state.mn.us/>

There should be no technical difficulty in doing this at all, if a large-
population state has already been doing it for years, so the only barrier to
Congress being equally open is cultural. Note that Minnesota has, officially,
three "major" political parties and currently has divided state government,
with the governor from one major party and the two houses of the state
legislature controlled by another. If those competing politicians can get
along well enough to publish current information on draft bills, Congress
should too.

~~~
scott_s
I agree that there should be no "technical difficulty" in that it's already a
solved problem. But so are bridges, and those only get built when people
recognize a bridge is necessary, and someone is willing to pay for the time
and materials to engineer it.

I get the impression the author sees some malice behind how Congress publishes
their information. I don't see malice, I just think no one with the power to
start such a project even recognizes it's a possibility. That it's worthwhile
would come second.

------
arohner
While we're at it, we really need the blame command present in version control
systems to show who added particular text to a bill.

~~~
olefoo
Why do I envision sneaky history edits and committee rules allowing the
equivalent of a git rebase with amended commits.

~~~
cheriot
That or it will be red shirts doing all the commits.

------
makecheck
Documents _should_ be more open, but please don't push for something data-like
(i.e. send the XML proponents off the hill). It's time to make files human-
readable again.

For instance, use something completely plain but deceptively simple, like
reStructuredText. This can be parsed or edited by a ton of established tools.
It's also easily "diffed". Text formats are also well handled by revision
control tools (one of which should be used, publicly, by the editors of these
documents). This even achieves the clean markup he desires, because multiple
tools can parse .rst files and produce very good HTML out of them.

~~~
RyanMcGreal
For human-readable document content, there's nothing wrong with plain old
HTML, though formats like RST or markdown are somewhat easier to produce. (I
save my blog documents in markdown format - they're just easier to write and
update, and I don't have to futz around with JS-heavy WYSIWYG editors.)

However, these aren't adequate for structured data that falls outside the
scope of a generic print document. For that, I would recommend YAML or JSON
over XML, not only because they're human-readable but also because they're far
less verbose.

~~~
makecheck
I'm definitely fine with using data formats for things that really are just
raw data (e.g. a series of records with specific fields), and I like JSON too.

I just hope that they don't try to chop up something that should be "just a
document" into a pile of ugly marked-up containers that make it very hard to
see what's even in the document anymore.

------
RyanMcGreal
I have been making a similar case at the municipal level:

<http://raisethehammer.org/article/893>

I feel the author's pain regarding PDFs.

