
Why Gov.uk content should be published in HTML and not PDF - edent
https://gds.blog.gov.uk/2018/07/16/why-gov-uk-content-should-be-published-in-html-and-not-pdf/
======
rekshaw
I've worked with gov.uk. They are essentially a startup inside a huge
organization, and their great work reflects that. One of the things they told
me that struck me: most websites have customers, we have users. What they mean
by this is that being a customer implies choice, whereas there is no
alternative to .gov.uk, hence users don't have a choice. So it's gov.uk's
responsibility to make their website as clear, accessible and useful as
possible.

~~~
ReverseCold
I think there are some parallels between "just leave (insert technology
provider here)" and "just leave the country you call home".

> most websites have customers, we have users.

Interesting thought: Do Facebook, Google, etc have "users" and not "customers"
for their consumer products?

For many people, leaving digitally can have bad effects on their social life.
It's arguably not as bad as moving out of the country, but it's still a
practically impossible hurdle for a lot of people.

~~~
masklinn
> Interesting thought: Do Facebook, Google, etc have "users" and not
> "customers" for their consumer products?

Advertisers are the customers. Non-advertisers are the product. The goal is
not to provide a good service to the product, it's to drive engagement of the
product up so customers are happy.

~~~
hrktb
Both are customers though. As long as there is paying tier on the end user
side, it is also a customer who has a presence in the decision that are made.

To Google’s credit, Youtube Red for instance goes in that direction. Same for
Google Suite customers, Google Play users etc.

Of course the power balance is tipped toward the companies who are willing to
pay more, but that’s a balance, and not a one sided relation.

For facebook I think the picture is darker, but then that’s facebook afterall.

~~~
darth_mastah
The paying tier are customers (sort of). All the rest are product. It's quite
a specific product, because it has to be seduced and tempted by perceived
benefits to forfeit its privacy, but still a product it is.

------
_bxg1
I listened to a talk at a web development conference from one of the UK
government's accessibility developers. Among other things, they do extensive
accessibility testing and discovered things you wouldn't expect. For example,
to people who didn't grow up with them, dropdown boxes are apparently
unintuitive, so gov.uk avoids them. I was very impressed.

[https://www.youtube.com/watch?v=Q8Mj7_0Lok0](https://www.youtube.com/watch?v=Q8Mj7_0Lok0)

~~~
reaperducer
I find irony in the fact that this is a YouTube video with no text transcript,
massively reducing its accessibility to both those with disabilities and those
who aren't in an environment where they can sit and watch a 20-minute video.

Does YouTube have the ability for authors to attach or display a text version
of their video?

~~~
_bxg1
1) The conference did the uploading, not the speaker

2) I think the videos were thrown up on YouTube in a "why not" sort of way,
with minimal effort. Notice how the whole channel is just that specific event
over that couple of days. I'm glad they posted them, but it was primarily an
in-person event.

~~~
elil17
Deaf and hard of hearing people get routinely left out of the public sphere
because people excuse themselves from their obligation to provide accessibly.
People say “I wouldn’t make this if I had to caption it.”

In reality, captioning isn’t that hard. Very few people are willing to go to
the trouble of uploading a video but won’t caption it.

~~~
jolmg
> because people excuse themselves from their obligation to provide accessibly

What obligation? They're not even obligated to put the video up; it's just a
gesture of public service. Giving people the obligation to put captions when
they upload videos for free consumption is like giving a beggar part of your
lunch out of generosity and he complaining that he likes his sandwiches with
more cheese.

Not to specifically liken the deaf and hard of hearing to beggars, since all
YouTube viewers are beggars in my analogy. However, the point I'm getting at
is that when someone does you a favor like upload a video for free for your
consumption you should generally be grateful and do what you can with it, not
complain and demand more. It's just not the right attitude.

It's like FOSS. You can ask the developer for more, but not demand it.

I think if we want more captions from videos submitted without remuneration,
it's gonna need to be automated. Have the machine do the work. I think YouTube
already does this, now that I think about it.

EDIT: Added some more on the analogy in case people thought I had something
against deaf people.

~~~
bobthepanda
You don't even need automation, that's why we have platforms like Mechanical
Turk.

In fact I would be very surprised if people were not already using gig
platforms like that for captioning already.

~~~
monetus
correct me if i'm wrong, but its not the advertisers paying the mechanical
turk?

------
red_admiral
When gov.uk first started to replace the previous online presence, they were
like first-level tech support: brilliant if your question is in the FAQ,
useless otherwise. A lot of detailed information on aspects of running a
charitable organisation, which I presume weren't accessed every day, simply
disappeared. For a while, the web archive of the old site was my lifeline.

My feeling is that gov.uk has got a lot better since, but it's low-
information-density by design. This is absolutely the right thing to do for a
lot of their users in a lot of cases, and doubly so when targeting users who
might not speak good English for whatever reason. But I still feel that
information meant for professionals could be presented in a more useful way.

On the web design side, they definitely deserve 5 out of 5. Good general
design, sensible spacing, not requiring a huge javascript framework just to
display some text - there's so much right with this site.

~~~
zhte415
As a UK national living outside the UK, the website is fantastic. It has
exactly what I need and is highly discoverable, no more than a few levels of
depth to the form or submission I need. The multilingual support is also
excellent.

------
osrec
Having interacted with a number of government websites, I have to say the UK
government does an excellent job of theirs. Most things are simple to
understand and clearly presented. Even though I only use it for the the most
mundane of things, using it is generally a very pleasant experience!

~~~
nothrabannosir
Could not agree more. In fact, I’d go as far as to say they’re not just the
best .gov site out there, but one of the best sites, full stop.

(I know a few people who worked on various parts of it and for what it’s
worth: they’re all legit. They care, and they’re good. It should come as no
surprise that the site turned out the way it did, if they’re hiring these
sorts of people.)

~~~
Schoolmeister
I recollect reading about the UK GDS in Tim O'Reilly's book from last year
"WTF: What's The Future And Why It's Up To Us". In the chapter "Government as
a platform" he's got a small, amusing anecdote about the service.

Having looked up the relevant part, let me transcribe it for you:

> "One of the first things that struck Jen and me as we entered the GDS office
> on an upper floor of an old office building high above a busy London street
> was a large sheet of butcher paper covering the picture window in the lobby.
> In the paper was a small cutout through which you could see the people on
> the street below. The cutout had a large arrow pointing to it, labeled
> "Users", reminding everyone when they walked in just whom the unit was meant
> to serve."

I found it really drove home the service's users-first approach. Following
that anecdote he elaborates on the service's "10 commandments" (the 10 GDS
Design Principles). They're also quite interesting, but I'm sure a search
engine lookup can help you there. I really recommend the book if you want an
optimist's outlook on the future of technology, government and economy.

~~~
nine_k
"Be consistent, not uniform" is gold.

------
Steve44
I can see for dynamic information PDF definitely has limitations but a lot of
what I get from gov.uk I feel is better in PDF. The web pages are fine for
navigation and overviews but for hard information I much prefer a PDF.

A PDF document can be dated and versioned, you download a report from April
2010 and you can keep that as a document. You can then download a newer one if
you want, the older version is still available and largely immutable.
Archiving a dynamic web page is a lot harder.

~~~
throwaway37585
> A PDF document can be dated and versioned

So can an HTML page.

> you download a report from April 2010 and you can keep that as a document...
> Archiving a dynamic web page is a lot harder.

Right click, save page as... Am I missing something?

~~~
mijamo
If it's not in archive, you will NOT find the old HTML at all, whereas old
PDFs are usually easy to find because _someone_ will have it.

In addition saving web pages is a huge pain with all the scripts and CSS to
save as well, so then you need to compress that.

At last, saving and browsing saved HTML is a pain on mobile and similar
devices without a specialized tool.

~~~
throwaway37585
> If it's not in archive, you will NOT find the old HTML at all, whereas old
> PDFs are usually easy to find because someone will have it.

I don’t understand this argument at all. The _exact same thing_ can be said
for HTML pages.

> In addition saving web pages is a huge pain with all the scripts and CSS to
> save as well, so then you need to compress that.

That’s what the webarchive format is for.

~~~
slavik81
The GDS themselves state that users are more likely to archive their own
copies of PDF documents than HTML documents: "users are more likely to
download a PDF and continue to refer to it and share it offline"

Part of the reason may be that some browsers download PDFs for display in the
user's native reader. Though, even in-browser PDF readers like Firefox on
desktop have a toolbar with a prominent save button. The use of PDF is a
signal to the user that the document can be saved.

------
ksec
I wouldn't like HTML for government documents. They are perfectly fine for
website, or live documents. But for anything published by government there
will likely be a printed version, and the online "file" will need to be in
sync with the printed version as well. As a paper / documents based format,
nothing beats PDF/A.

I remember Obama said something along the line to tech leaders, "Your job is a
lot easier when you only have selected group of customers to please, but when
you have to cater for everybody and every interest group, things are a lot
harder." Paper based document will continue to be used for a decade or two
more.

Why cant we just have both? Too much man power in auditing or publishing? Well
that is what tech is for, automate it. The tech should be catering for its
users, which is not just the public but civil servants as well ( I cant
believe I just wrote that ), not trying to force the tech on everyone else.

P.S - I thought PDF/A is an open standard. Why is everyone suggesting PDF is a
closed format?

~~~
jcrawfordor
PDF/A is an open standard, but it's pretty rare to see a PDF document in the
wild that is actually PDF/A compliant. People generally just use the built-in
PDF export in some word processor (usually no PDF/A option at all) or use
Acrobat (doesn't do PDF/A by default). And non-PDF/A documents can have a
laundry list of odd closed features that may or may not be supported by
various readers.

In general the landscape of PDF compatibility has improved a lot, but it's
still a lot worse than HTML.

~~~
ksec
>And non-PDF/A documents can have a laundry list of odd closed features that
may or may not be supported by various readers.

Are there test suit for PDF/A compatibility?

~~~
oever
Yes. [http://verapdf.org/](http://verapdf.org/) At the Dutch publication
office we use this for spot-check testing for PDF/A-1a compliance.

------
djhworld
One thing I find terrible about PDFs is when organisations embed useful data
in them, in tables or graphs. Usually in the context of an annual report, or
study, or some sort of policy document.

They look great printed on paper, but not very transferrable.

I know there is software [1] out there that tries to parse tables out of PDF
documents, but from my experience you'll still end up doing manual adjustments
afterwards to correct what the parser could not infer.

That's the main reason why I support this motion. At least make the data both
PDF + HTML, so you have options.

[1] [https://tabula.technology/](https://tabula.technology/)

~~~
nv-vn
For data in general, I don't think either HTML or PDF is perfect. Both are
designed for you to lay out how a document is displayed, rather than the
content specifically. For example, some content in HTML is invisible or split
between different divs/spans/whatevers. A lot of it is generated dynamically
by JavaScript and async requests. Something like JSON or XML is much more
powerful for data storage. For example, you can store the data for graphs as
raw data so that the consumer can render the graph however it wants. Display-
agnostic data is always more flexible, and I think the best way to provide
data on a webpage is both as a visual version and as raw data.

------
CJefferson
This is good. One of the biggest (in my opinion) failures of accessibility in
academia is that everything is published as PDFs generated from latex, which
is THE least accessable format in existence (word's PDFs are fine, so it is
Latex's fault they are so bad).

Some people provide their latex, which improves matters, but most people
don't.

~~~
detaro
What's inaccessible about LaTeX-generated PDF?

~~~
dubya
It used to be a problem that you couldn't cut and paste text from TeX-
generated PDFs, and I assume this would be a problem for screen readers. I
think the dvi->pdf conversion produced a PDF consisting mostly of coordinates
and characters (in a non-standard encoding), which PDF readers couldn't put
back together into words. This seems mostly fixed on recent pdfs, but I don't
know if it's TeX (pdflatex?) or the readers that have gotten smarter.

Funnily, Donald Knuth's AoCP fascicles ship as postscript files, which
Preview.app converts to PDFs that can't be copied from.

~~~
c4h8o3del
If you're using a screen reader anyway, why not export to text instead of pdf?
You're just wasting more storage for no benefit you can perceive.

This relies on what should be a reasonable assumption that the latex source is
provided.

------
qwerty456127
I can see no reason why it can't be HTML _AND_ PDF. A well-composed HTML
document can be turned into a well-composed PDF automatically on the server
side or in a couple of clicks on the client side. It is always more convenient
to use PDF rather than HTML when you want to save a document to your computer
or put it on a USB drive to print it on a nearby printer.

~~~
sjwright
The article makes exactly this point. The argument is against material being
published with layout software (or printed from Microsoft Word) as a PDF. They
argue that the canonical version of such documents should be in HTML, which
can in turn be transformed to be suitable for desktop, mobile or print output.

~~~
qwerty456127
I don't really know about Microsoft Word but LibreOffice can produce fairly
good PDFs including a table of contents if the original document is outlined
properly AFAIK and it can even include the source document in the PDF so the
user can open it for editing. Your commentary actually makes me curious what
might be some problems of Word/LibreOffice-produced PDFs, can you name some?

------
velcrovan
This tradeoff between the relative virtues of PDF and HTML is a good argument
for looking at Pollen (pollenpub.com) a framework for designing your own
markup language that is capable of targeting HTML and PDF (or any output you
need: audio, Kindle, whatever).

A proof of concept is here:
[https://thelocalyarn.com/excursus/secretary/](https://thelocalyarn.com/excursus/secretary/)
On every page at that micro-site, you can view the Pollen markup source, or
the PDF version. The PDF and the HTML are generated at the same time, from the
same source.

Another example is my blog The Notepad
([https://thenotepad.org/](https://thenotepad.org/)). Both sites have links to
their source on Github.

~~~
Numberwang
I'm a big fan of asciidoc and quite sad that it didn't take of as a default
format for documents.

~~~
velcrovan
I think what's sad about formats per se is that you rely on toolmakers to
implement them, and to decide how to translate them into the target format;
and for all toolmakers to support them in ways that are coherent with each
other.

You also end up with this awkward two-step between the format and the tools.
If some capability is missing, you wait for the format to define a way to do
it, then the toolmakers to support it; or less ideally, the toolmakers define
their own incompatible ways of doing it without waiting for consensus.

This covers the problems I have with Markdown. It as wide support, but because
vanilla Markdown only covers a 1995-era subset of HTML, there are all kinds of
things (footnotes, figures, formatted code blocks etc) that people want it to
do. Any given editor or CMS or site generator will support 95% of your
preferred flavor's way of doing things and disagree with your other tools
about the last 5%.

The difference with Pollen is that it isn't a format or a markup
specification; it's a programming environment. So you design the markup, and
you tell it how to get from the source markup to your target format. The
format is yours and the implementation is yours; they are one and the same.

It is a bit more work, true, but it's less work in Pollen than it would be in
any other environment because it does parsing for you and applies your
transformations in a logical, ordered way.

~~~
Numberwang
That sounds interesting. I will check it out.

------
cmurf
Somewhere in the realm of Adobe, the PDF design/spec, and open source
software, there are big problems properly supporting forms and signing.

Adobe Acrobat Reader is now only supported on macOS and Windows. And only
Acrobat Reader fully supports all the varieties of forms in various PDF specs.
I think it's a big problem for government to in effect require the use of a
monolithic and proprietary operating system to fill out government forms.

And then the whole state of signing PDFs is a confusing mess. Learning how to
create or buy a certificate is irritatingly confusing, and then where the
certificate goes and how to use it. Google search guides and it's completely
different instructions depending on platform and Acrobat version. The latest
versions of Acrobat Reader let you do a thing called signing a document, but
it doesn't use certificates, doesn't let you add a password to the document to
prevent modification, but you can add text/image/drawing of a "signature" \-
no doubt this confusing thing exists because the certificate based signing is
so difficult. And then verifying digital signatures isn't something people
know how to do: technical companies I work with do not accept them unless the
digital signature includes a visible human signature and the PDF security
options enable printing the PDF!

------
kqr
One of the hardest things I remember from my time as a web consultant was
explaining to my customers why PDF is not a suitable replacement for HTML.
They always tried to replace half of their pages with pdf documents...

~~~
Spivak
Well don't leave us in suspense, why isn't PDF a suitable replacement?

* If your internal workflow is designing documents then the conversion to a web document will be clunky at best.

* Every end-user device has a PDF reader and their web browser probably opens it transparently.

* You need the PDF either way way because the website will never be the authoritative source so now you have the problem of making sure they don't get out of sync.

* You probably have a few internal graphic designers that can do wonders with print but it's unlikely you have an internal web development team.

* It's much easier to make a PDF accessible than a website. (Before you disagree remember that the offices we're talking about likely don't have a dedicated web designer) You can be sure it will print correctly, and for most users it's automatically offline.

~~~
fjsolwmv
> accessible

Let's take perhaps the simplest most important accessibility feature: How do I
increase the font size without making the content absurdly wide, while viewing
a PDF?

~~~
fshaun
As a visually impaired user, this is my single biggest gripe with PDFs. Its
also annoying that zooming on many websites completely breaks the layout,
often overlapping text or pushing important elements off of the screen. I find
this particularly sad, since HTML largely supports a separation of content and
presentation.

------
saagarjha
> We cannot get as much information from analytics about how people are using
> PDFs

This might be considered a positive.

~~~
flashman
Not on Surveillance Island

------
ggm
I'm a huge fan having used HMR&C web portal from overseas, the sea-change in
how good the UI is, and data collecting on what we want from the UI is also
impressive.

I set myself low-bar goals measuring government engagement. The Australian
whole-of-government portal is abjectly awful, it has well done 2FA but
continually nags me with badly designed 'do we have your permission' and
'remember you're talking to government' intersititials.

State government planning web, uses a web design method which is simply
unworkable on touch: the 'permit us to say we want you to agree to terms and
conditions' overlay won't scroll. when the underlying page does, so the
[agree] button can't be pressed because its off-screen. Gak!

------
Spooky23
Overall, decisions like this are compromises, with pros and cons:

"They’re not designed for reading on screens"

That's a subjective distinction.

"It’s harder to track their use"

Not relevant for most .gov use cases. Also, sounds like something that isn't a
user problem.

"They cause difficulties for navigation and orientation"

So does responsive design that makes discovery of content difficult in many
scenarios, especially atypical scenarios.

"They can be hard for some users to access"

So does HTML where a poor accessibility process is in place.

"They’re less likely to be kept up to date"

Conversely, they more easy for a consumer understand when changes take place,
and encourage a stronger release process.

"They’re hard to reuse"

That may or may not be a bad thing.

~~~
allannienhuis
I think the point is: it's harder to do those things properly in PDF format.

Also, I'm not sure how 'not designed for reading on screens' is subjective.
PDFs are paper-document oriented and don't display well for reading on
anything other than a large size monitor in virtually every case I've seen.
When delivering content in a web browser, why on earth would someone prefer to
view PDF vs a reasonably well laid out HTML version?

I think PDFs are common in gov't websites because much of the internal culture
is paper-document centred, and having those nicely printed PDFs solve internal
problems for gov't employees, not because they are actually any better for the
users of the website.

Canadian Gov't websites are full of PDF content too. :(

------
rossdavidh
I generally agree, as PDF's were designed for (and are best used for) making
documents for printing on paper. BUT: "We cannot get as much information from
analytics about how people are using PDFs. We can get data on how many times a
PDF has been downloaded from GOV.UK, but we cannot measure views of the file
offline." ...that section made me wonder if perhaps I have been missing a
possible upside to PDF's.

------
textmode
"[PDFs: ] They're quick and easy to create

... they can be easily created from popular applications that people are
already using to author and share documents."

This appears under the heading "Why do people use PDFs?"

However I would have listed this as the sole reason that documents should be
distributed as HTML. The reasoning is simple.

Imagine a hypothetical where one has a choice of distributing documents in two
formats, A and B, and there are particular advantages to each format. As such,
some users prefer format A, while others prefer format B. Not to mention those
users who would like to have both formats available.

In the hypothetical, _users can easily convert from format A to B_ however
converting from format B to A is difficult.

Assuming one can distribute the documents in format A, it makes no sense to
distribute in format B. Users who prefer format A will be unhappy.

Distributing in format A keeps users who like format B happy because they can
easily convert from A to B.

------
wwarner
I'll make a defense of pdf publishing here. The points the author makes are
good, but also show why pdf has a role to play. Responsive design being the
most difficult. Imagine you're producing a scientific paper, and it contains a
big table. It's unreasonable to ask an author to figure out how make every
table and figure display correctly on every screen.

Think of it this way. If you're publishing a pdf, then you master the
formatting using your word processor (latex, word, what have you). On the
other hand, if you're publishing on a responsive web site, then you really
ought to have a content management system to guide you through the
requirements of the platform. It's a significantly higher hurdle, both for the
authors and the platform owners.

~~~
fifnir
Publishing shouldn't be about displaying nicely in screen though but rather
about sharing the information.

As a bioinformatician, big tables inside pdfs are essentially useless. What's
the point of a few hundred rows worth of a table if you can't manipulate it
with whatever tool you prefer?

Moreover, I'm writing my thesis and dealing with many pdfs from the 90s in
most of which I can't just highlight and copy text so I need to type it out
like a savage. Is it guaranteed that today's pdfs will be easy to handle for
future people?

In my opinion publishing should be done in plain text and .tsv files and the
onus of displaying it on screen should fall on the editor (isn't that their
job anyway ??)

------
Y_Y
On a related note, I have to fill out a PDF form to make a GDPR complaint to
the Data Commissioner (in Ireland). This is a plain-text form, the information
could just as easily be sent in the body of an email.

Using PDF means having to deal with crappy PDF software, hurting accessibility
and scriptability and adding needless overhead on the other end.

I wish these systems were designed by sane and benevolent programmers, rather
than Pointy Haired Bureaucrats.

( [https://www.dataprotection.ie/docs/raise-a-concern-
Form/m/17...](https://www.dataprotection.ie/docs/raise-a-concern-
Form/m/1727.htm) )

~~~
crtasm
That's bad but small silver lining is that it's not a word document that
silently renders incorrectly in other editors so you don't even see some of
the fields.

------
gwbas1c
I love using PDF to create backups of webpages if I think I'll want to look at
it in the future. I generated PDFs as backups for references in college
assignments, and it really saved me when pages moved mid-assignment.

------
xchaotic
Sadly, the great GDS team that is bringing about the great design/technical
change in the UK gov is falling prey to politics from the same people that
brought you #brexit. GDS used to be an initiative directly from the Cabinet
Office, it has now been moved to the ministry of Culture and Sport.

You might think what does sport have to do with gov website UX? But it's all
politics - GDS delivered and it is making the other politicians and gov
officials look bad so they are being undermined. That's my reading of that
anyway - no good deed goes unpunished.

~~~
JdeBP
Actually, once you get the name right, it being the _Department for Digital,
Culture, Media, and Sport_ , it becomes somewhat clearer what it has to do
with this WWW site. But this then prompts the question of exactly when
"digital" became a noun. (-:

------
IanCal
I'd love to have a format that is just HTML + whatever zipped up that browsers
and operating systems happily just open in a browser.

I like making HTML reports (rmarkdown), but sharing them requires telling
people to download then open them in a browser. Google drive, for example,
happily just shows you a preview but then if you click on the file you get
_raw html_. Customers just don't understand.

PDF however, is absolutely fine to move around as a single chunk, but has
problems in almost every other way.

~~~
masklinn
> I'd love to have a format that is just HTML + whatever zipped up that
> browsers and operating systems happily just open in a browser.

There is MHTML, sadly it fails the second bit because AFAIK Chrome dropped its
support and FF and Safari can't open it natively. Apple has its own WebArchive
format, and Firefox's MAF extension generated MAFF file but is not compatible
with newer versions.

~~~
ksec
At one point in time, I thought MHTML was precisely what we need to get rid of
PDF, double click, and everything works. Sadly everyone abandoned it, and
Google doesn't want you to have down any webpage for archiving purpose.

------
qalmakka
Your government uses PDF? My dear Britons, you're have it easy. The Italian
government and Italian local administrations make massive use of MS Word
documents, and I'm not talking about .docx, but the old, venerable, write-
once-broken-everywhere Office 97 .doc. So, I wouldn't complain that much,
because it could be much, much worse, especially when you have to fill up a
20-years old form that does not render anymore on any modern word processor.
Oh, well.

------
bo1024
This is really interesting but it's missing discussion about what makes each
format right for what kinds of information/pages. Even some examples of things
currently commonly PDF that would be most helpful to switch to HTML. I have
trouble believing that HTML is right for every kind of government document
(for example, a quarterly or yearly report published by some agency seems like
a reasonable PDF).

------
pmontra
PDF is easier for some authors: write in Word, export as PDF, mail to somebody
who will upload to the web server and maybe print it for non digital
distribution.

Web CMS were invented (among many other reasons) to let non technical people
write and edit content directly inside the browser and publish it. I wonder if
gov.uk doesn't have a CMS or their authors don't want to use it.

~~~
petepete
The gov.uk folks make heavy use of Markdown, and so they should - the sooner
we leave this PDF/Word doc nonsense in the past the better.

HTML in the browser is the best tool for consuming documents; we can read and
write on any device, bookmark documents or chapter headings, resize and style
at will in a device independent manner, enhance documents and make them
interactive[1], even add videos if required. The best part is that the source
is all stored in plain text and is version controlled without requiring
Sharepoint or similar.

[0] [https://www.gov.uk/guidance/how-to-publish-on-gov-
uk/markdow...](https://www.gov.uk/guidance/how-to-publish-on-gov-uk/markdown)
[1] [https://insidegovuk.blog.gov.uk/2013/08/21/barcharts-in-
html...](https://insidegovuk.blog.gov.uk/2013/08/21/barcharts-in-html-
publications-new-feature/)

------
aquamo
I'd like to see version controlled text files used for important legal
documents and many other government docs. Allow the presentation layer to be
handled on the end points. Maybe something like markdown will be sufficient.

I always look up to the fixed width font model used by IETF RFCs. They are
extremely readable and searchable and last for a long time.

~~~
reaperducer
_I 'd like to see version controlled text files used for important legal
documents and many other government docs._

Someone mentions "blockchain" in 3... 2... 1...

------
yosefzeev
Everyone always forgets about epub.

------
Aaargh20318
The problem with HTML is that it's not a stable format. Will a browser in 2028
be able to render those pages correctly ? What about 2038 ? If the content
needs to be available for a longer time PDF/A would be a much better choice
than HTML.

~~~
detaro
What HTML + CSS from 1998 doesn't render usefully anymore?

~~~
Aaargh20318
What HTML + CSS from 1998 renders in exactly the same way (pixel perfect copy)
on a modern browser as it did on a browser in 1998 ?

~~~
detaro
Notice how I specified "renders usefully". Few documents have to be pixel
perfect, even fairly old HTML had tools to make individual parts where it
might be more important pixel perfect, and given the higher level of
standardization and standards compliance nowadays this would likely be better
for documents created today.

------
hidiegomariani
Big kudos to the work you are doing. Countries I've been living before -Spain,
Italy terrific experiences with gov digital services- should take lessons from
you on how to run a digital and accessible PA portal online

------
ggchappell
I agree with the point of the article.

Still, there is so much out there that is available only in one of the MS
Office formats, and Gov.UK is apparently doing better than that. So there is
actually some cause for celebration here, IMHO.

------
cmurf
There's a distinction between PDF and PDF/A. I agree with favoring HTML over
just any PDF, for all the reasons cited in the article. But for certain kinds
of documents you'd want PDF/A over HTML or perhaps in addition to HTML.

PDF/A can certainly accept, and a government policy can require the use of,
accessibility features and also even digital signatures.

------
spystath
I don't understand why these should be mutually exclusive. Starting from a
common parseable markup you can have both responsive HTML for browsers and
formatted PDF for print. Why don't decouple content from presentation?

~~~
mwcampbell
If the HTML is responsive, why can't it be good for print? And, for the sake
of reducing waste, should we not discourage people from printing in the first
place?

~~~
masklinn
You'd probably want a dedicated print CSS to remove various bits of UI and
decoration, as well as styling which works well on a screen but is terrible in
print (e.g. the white-on-black and white-on-blue headers).

However gov.uk already does both so it's not an actual issue here.

------
zmix
They should publish in XML, which kind ever, not HTML.

------
Nursie
Gov.uk pages are still using google-analytics and reporting the activities of
British Citizens' interactions with their own government to a US-based
multinational I see.

And the only option to disable is to click through and install a browser add-
on to opt-out.

This doesn't seem very GDPR-friendly.

~~~
barrucadu
No personally identifying information is sent to Google Analytics. IP
addresses are anonymised, email addresses, dates, and postcodes are stripped
too.

See
[https://github.com/alphagov/govuk_frontend_toolkit/blob/cf1c...](https://github.com/alphagov/govuk_frontend_toolkit/blob/cf1c3fefe6608fc9e82883b09c2307513996f666/docs/analytics.md)
and
[https://github.com/alphagov/govuk_frontend_toolkit/blob/cf1c...](https://github.com/alphagov/govuk_frontend_toolkit/blob/cf1c3fefe6608fc9e82883b09c2307513996f666/javascripts/govuk/analytics/analytics.js)

~~~
Nursie
It's present on every page, and it's loaded from google. Anonymisation is
pointless.

The fact that they use Piwik on a couple of pages they consider sensitive
(usually to do with payment) shows that even gov.uk know this cannot be relied
upon to fully hide things.

It shouldn't be there and I've raised a complaint with the ICO.

Note that the page says that it can be configured to strip all that info, not
that it does by default. One would have to look at each page to see how this
is configured. And it could still be wrong to switch this on by default, under
the GDPR.

~~~
MaxBarraclough
> Anonymisation is pointless.

How's that?

------
auslander
Because anyone can embed JavaScripts into PDF file and launch stuff.
[https://resources.infosecinstitute.com/analyzing-
malicious-p...](https://resources.infosecinstitute.com/analyzing-malicious-
pdf/)

------
fifnir
I am currently doing some literature research for my phd thesis and all I have
to say is : "fuck pdfs in the face"

------
JdeBP
> _On a responsive website like GOV.UK, content and page elements shift around
> to suit the size of the user’s device and browser._

But they do not, _on that very page_. I resized the page up to full screen and
then back again in a WWW browser and all that happened is that huge areas of
whitespace opened and closed around the text, which remained word-wrapped in
exactly the same places.

~~~
ghostly_s
Yes, they do. The line width only increases up to a certain maximum because
this is usability best-practice; there is an max optimal line length for
readability.

~~~
JdeBP
No, they did not. I performed the experiment myself and know what I saw, thank
you. If this is a question of my "device" always being above some maximum line
length, then clearly this is _not_ suiting the size of the device. It is
suiting some guidelines, _not_ the user's device.

~~~
ghostly_s
Try another browser. I'm telling you it is clearly responsive on mine, and the
down-votes are others telling you the same. Now if you're just trying to ret-
con the definition of 'responsive design' to grind some personal axe, go do it
somewhere relevant.

~~~
JdeBP
The down-votes are showing you why votes are not equal to truth, because this
effect is easily reproducible in Opera, Vivaldi, Chrome, Edge, and Firefox.

And it's clearly you who has some axe to grind. I merely point out that the
behaviour _of the very page itself_ is not as the article describes the
operation of that WWW site. It does not behave as advertised, and does not
change to suit the size of my device. The headline remains word-wrapped after
the word "should", for example, and huge areas of whitespace open and close
around it.

------
thanatropism
[http://pandoc.org/](http://pandoc.org/)

"If you need to convert files from one markup format into another, pandoc is
your swiss-army knife. Pandoc can convert documents in (several dialects of)
Markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup,
TWiki markup, TikiWiki markup, Creole 1.0, Vimwiki markup, OPML, Emacs Org-
Mode, Emacs Muse, txt2tags, Microsoft Word docx, LibreOffice ODT, EPUB, or
Haddock markup to

HTML formats XHTML, HTML5, and HTML slide shows using Slidy, reveal.js,
Slideous, S5, or DZSlides

Word processor formats Microsoft Word docx, OpenOffice/LibreOffice ODT,
OpenDocument XML, Microsoft PowerPoint.

Ebooks EPUB version 2 or 3, FictionBook2

Documentation formats DocBook version 4 or 5, TEI Simple, GNU TexInfo, Groff
man, Groff ms, Haddock markup

Archival formats JATS

Page layout formats InDesign ICML

Outline formats OPML

TeX formats LaTeX, ConTeXt, LaTeX Beamer slides

PDF via pdflatex, xelatex, lualatex, pdfroff, wkhtml2pdf, prince, or
weasyprint.

Lightweight markup formats Markdown (including CommonMark and GitHub-flavored
Markdown), reStructuredText, AsciiDoc, Emacs Org-Mode, Emacs Muse, Textile,
txt2tags, MediaWiki markup, DokuWiki markup, TikiWiki markup, TWiki markup,
Vimwiki markup, and ZimWiki markup.

Custom formats custom writers can be written in lua."

~~~
tjoff
You can not have PDF as an input format in pandoc.

PDF is terrible because you can not easily/sensibly even parse the text. That
also means it is hard to diff two versions and see the differences.

~~~
oever
"Tagged" PDF files allow text extraction. Tagged PDF is often required for
accessibility. Screen readers for the blind should be able to extract the
text. This is possible with Tagged PDF. PDF/A-1a and PDF/A-1b are tagged PDF
and required PDF formats in some governments.

[https://en.wikipedia.org/wiki/Pdf#Logical_structure_and_acce...](https://en.wikipedia.org/wiki/Pdf#Logical_structure_and_accessibility)

