Hacker News new | comments | ask | show | jobs | submit login
Why Gov.uk content should be published in HTML and not PDF (gds.blog.gov.uk)
541 points by edent 7 months ago | hide | past | web | favorite | 241 comments

I've worked with gov.uk. They are essentially a startup inside a huge organization, and their great work reflects that. One of the things they told me that struck me: most websites have customers, we have users. What they mean by this is that being a customer implies choice, whereas there is no alternative to .gov.uk, hence users don't have a choice. So it's gov.uk's responsibility to make their website as clear, accessible and useful as possible.

I think there are some parallels between "just leave (insert technology provider here)" and "just leave the country you call home".

> most websites have customers, we have users.

Interesting thought: Do Facebook, Google, etc have "users" and not "customers" for their consumer products?

For many people, leaving digitally can have bad effects on their social life. It's arguably not as bad as moving out of the country, but it's still a practically impossible hurdle for a lot of people.

> Interesting thought: Do Facebook, Google, etc have "users" and not "customers" for their consumer products?

Advertisers are the customers. Non-advertisers are the product. The goal is not to provide a good service to the product, it's to drive engagement of the product up so customers are happy.

Non-advertisers are consumers. That's the term the television industry adopted to differentiate between the two groups.

Both are customers though. As long as there is paying tier on the end user side, it is also a customer who has a presence in the decision that are made.

To Google’s credit, Youtube Red for instance goes in that direction. Same for Google Suite customers, Google Play users etc.

Of course the power balance is tipped toward the companies who are willing to pay more, but that’s a balance, and not a one sided relation.

For facebook I think the picture is darker, but then that’s facebook afterall.

The paying tier are customers (sort of). All the rest are product. It's quite a specific product, because it has to be seduced and tempted by perceived benefits to forfeit its privacy, but still a product it is.

> Advertisers are the customers. Non-advertisers are the product. The goal is not to provide a good service to the product, it's to drive engagement of the product up so customers are happy.


Advertisers are the customers of the Facebook Ad product line (Business manager, boost, campaign, pixel facebook).

Users of Facebook Social Networking services are prospects tracked by Facebook.

2 products:

- the tracking and targeting tools

- the social network

Users aren't the product. Interfaces to target segments based on users's data is one of the two products Facebook offers.

Damn... this hit me hard and makes me realize why corporations can be pretty evil :/... in the back of their minds, they're always thinking of how to please their "true" customers.

Corporations, at least the for-profit ones, are principally amoral legal constructs, which I find all the more terrifying than the notion that they could be evil.

Insomuch as they can be anthropomorphized they don't even truly care about whatever their 'customers' are or their well-being, so long as the bottom line is optimized across time.

The rank and file humans that make up the functions of a corporation might be moral and may influence the corporation to make 'irrational' choices due to human morality, but that is not the rule.

Remember when everyone was talking about how an amoral AI paperclip optimizer could destroy the planet? We have those, they're called corporations and they optimize dollars.

> Corporations, at least the for-profit ones, are principally amoral legal constructs, which I find all the more terrifying than the notion that they could be evil.

I had a pretty spirited argument that was ongoing for about a week with my friend, who was my co-founder at the time, where my position was similar to yours. (Note: this is all in the context of U.S. Law and Government) His argument was that it's impossible for any entity to be amoral because whether a corporation or person, both are treated as persons or entities meaning that they can provide their will. So, once a corporation reaches a point it indeed can be moral because it goes beyond a legal instrument and is dictated by a collective staff, leadership, and/or stakeholders. He would have said that in the case of a solo entrepreneur's company, it would just be an abstracted will of the entrepreneur. I eventually got to the point where I couldn't argue against the legal precedence of Corporate personhood. [0] I think there is a point when a corporation outgrows their founder and becomes self-sustaining where it's corporate culture dictates it's "morals".

[0] https://en.wikipedia.org/wiki/Corporate_personhood

I agree that a corporation can make what would be perceived as moral actions, exactly due to the human staff running it, but that does not erase that morals does not factor into the objective function that a corporation is meant to optimize. This is exacerbated as the morals of its actions become difficult to analyze for the individual and the moral actions of employees becomes less individually significant. The concerted effort, on which the fitness of the corporation is ultimately measured, will still amount to the optimization of the bottom line.

This is especially true when the corporation outgrows the control of its original founders who may have had a moral vision.

In the end the analogy I am making is that a corporation has reins, and as it grows it becomes increasingly unwieldy for the handlers (= staff) to direct it where it does not want to go (away from profit).

It is of course easy to have a corporation act moral when this objective overlaps with optimization of its objective function. I don't consider such a happy occurrence to qualify as truly moral, however.

The franchise and the virus work on the same principle: what thrives in one place will thrive in another. You just have to find a sufficiently virulent business plan, condense it into a three-ring binder — its DNA — xerox it, and embed it in the fertile lining of a well-traveled highway, preferably one with a left-turn lane...

- Neal Stephenson, Snow Crash

For that to be always true, the customers would have to be amoral too. Many companies find success in differentiating themselves to appeal to morally conscious consumers, and even more companies find themselves having to react to them from time to time.

But that is not moral, is it? The company is still not beholden to a willful idea of morality, but rather sociopathically concerned with appearance. For companies where this matters little this concern for appearance is quickly shed.

You could say the same about any social contract. The criticism applies to any person as much as it applies to any company. It’s the same as saying that altruism doesn’t exist because it’s logically impossible to conceive a scenario where an altruistic act doesn’t benefit the actor.

A vital difference is that companies do not have a sapient capacity for emotion and empathy.

> For many people, leaving digitally can have bad effects on their social life. It's arguably not as bad as moving out of the country, but it's still a practically impossible hurdle for a lot of people.

That's nonsense. It is absolutely not "practically impossible" to leave Facebook, and anybody saying so is exaggerating its importance and being melodramatic.

On the other hand, it can be literally impossible for a person to legally leave their country.

I wouldn't say it's impossible, but it does have a non negligible impact on your social life. It would be rather dismissive to wave away the impact of removing your social media accounts if you have by any stretch of the imagination a social life.

I think this is an age thing. For that group of people that have not known life without the internet, having a social life not attached to being online seems unfathomable. For those of us old farts that had a life before the internet, having a social life without the facebook is just another day. For those that think not having social media is the end of the world, I'd suggest you're just not very imaginative.

As someone who's experienced life prior to the internet, it's not that one can't live without social media but it does affect how easy it is to organize things when everyone else does. As another comment pointed out, social events are organised as a part of facebook events, or through group chat. Buying and selling things locally are done through facebook groups rather than a newspaper or even eBay or CraigsList. Photos and the like are shared on social media for key important events in people's lives that you mightn't have the opportunity to see all that often.

All of these things can be worked around, but the alternatives are slower paced and less efficient and leave you ultimately out of the loop. So it's not a problem of imagination, it's a problem of wanting to be able to keep up with what's going on.

I think rather that it's not the person themselves but their peers and aquaintances. For example, I had never used facebook before moving to my current town, but now living here I practically had to create an account and use it semi-regularly because everything from finding a house to advertising a music concert was done within facebook.

Now that I am settled in, my usage has wound down a lot. I have found a comfortable rythmn with the activities and things I need, but should I ever want to discover new experiences or join social gatherings I would have to bring it up once again.

Facebook has users who don't know about anywhere else to go. Hence no need for fb to fix bugs and imo FB is ridden by bugs and poor UX that would have killed a startup.

There is consumer choice. If there's anything the Internet has taught us, seemingly unstoppable companies usually have the most spectacular explosions.

For at least some (perhaps many) businesses (especially small ones), leaving Facebook is not an option. Perhaps even more so for Google.

>hence users don't have a choice. So it's gov.uk's responsibility to make their website as clear, accessible and useful as possible.

If only that sentiment were contagious. The status quo with every government department where I live seems to be "fk you, we're the government we do what we want." I wish more departments would take a "well our users have no choice so let's try not to suck too bad" attitude.

As someone who previously worked in the government of an American state known for its bureaucracy and corruption, there are a lot of people on the inside who dislike bad processes and bad apples just as much as users do, but they're not empowered to do so, either through inertia or lack of funding or whatnot.

The quintessential examples are transportation departments and urban planners. A lot of them are aware of newer developments in those fields, and that the current paradigms are problematic and misguided. But rocking the boat will just result in the status quo and leave the person without a job and no references for any future ones, especially if it is potentially unpopular.

Not to nitpick, but I thought there was a choice, in the sense that, if a citizen can't (or won't) find what they need on .gov.uk, they'll ring a government office or go in person, getting their question answered much more expensively.

It all comes down to the same thing, though, which is the importance of making the site useful.

Great point! I hope people working on Indian government websites are also reading this. Most of them are far from "user-friendly". Things have improved a lot in recent years(based on my experience on the new IRCTC website and the Aadhar website). But earlier, almost all of them were just "horrible". Some still are, for example: https://www.incometaxindiaefiling.gov.in/home. I hope user experience is taken seriously by developers and the government. Unlike Facebook, these sites do more than just provide some service. Messed up UI and non-compatibility can cause ridiculous delays in getting government work done.

I listened to a talk at a web development conference from one of the UK government's accessibility developers. Among other things, they do extensive accessibility testing and discovered things you wouldn't expect. For example, to people who didn't grow up with them, dropdown boxes are apparently unintuitive, so gov.uk avoids them. I was very impressed.


I find irony in the fact that this is a YouTube video with no text transcript, massively reducing its accessibility to both those with disabilities and those who aren't in an environment where they can sit and watch a 20-minute video.

Does YouTube have the ability for authors to attach or display a text version of their video?

1) The conference did the uploading, not the speaker

2) I think the videos were thrown up on YouTube in a "why not" sort of way, with minimal effort. Notice how the whole channel is just that specific event over that couple of days. I'm glad they posted them, but it was primarily an in-person event.

Deaf and hard of hearing people get routinely left out of the public sphere because people excuse themselves from their obligation to provide accessibly. People say “I wouldn’t make this if I had to caption it.”

In reality, captioning isn’t that hard. Very few people are willing to go to the trouble of uploading a video but won’t caption it.

> because people excuse themselves from their obligation to provide accessibly

What obligation? They're not even obligated to put the video up; it's just a gesture of public service. Giving people the obligation to put captions when they upload videos for free consumption is like giving a beggar part of your lunch out of generosity and he complaining that he likes his sandwiches with more cheese.

Not to specifically liken the deaf and hard of hearing to beggars, since all YouTube viewers are beggars in my analogy. However, the point I'm getting at is that when someone does you a favor like upload a video for free for your consumption you should generally be grateful and do what you can with it, not complain and demand more. It's just not the right attitude.

It's like FOSS. You can ask the developer for more, but not demand it.

I think if we want more captions from videos submitted without remuneration, it's gonna need to be automated. Have the machine do the work. I think YouTube already does this, now that I think about it.

EDIT: Added some more on the analogy in case people thought I had something against deaf people.

A better analogy would be offering your sandwich to a beggar only to be told that your sandwich is unacceptable because I have a gluten intolerance; therefore you are now obligated to buy me a sandwich made with gluten free bread.

You don't even need automation, that's why we have platforms like Mechanical Turk.

In fact I would be very surprised if people were not already using gig platforms like that for captioning already.

correct me if i'm wrong, but its not the advertisers paying the mechanical turk?

DHH (d/Deaf or hard of hearing) person demanding captions is like a wanting extra cheese is wrong.

It’s more like your giving a sandwich to every beggar except the Deaf one. And eventually that Deaf beggar will starve.

Deaf people have significantly less access to public discourse. I think it’s the responsibility of every person engaged in public discourse to make their content accessible.

Uploading a video is trivial while captioning still requires significant effort. Most organizations either opt to manually transcribe internally (slow, repetitive, mindnumbing work) or outsource (expensive). Machine transcription is available but still fairly technical.

Transcripts are useful beyond hearing-impaired people, but the cost realistically must be balanced with the organizations other responsibilities...

> Uploading a video is trivial while captioning still requires significant effort.

No, actually it doesn't. You start with the automatic one that youtube creates and then edit it.

It takes about twice as long as watching the video (i.e. about 2 minutes for each minute of video).

Source: I've started doing this for videos I upload.

Yes, if your audio quality is bad, and youtube can't understand anything so you have to start from nothing it would be significant effort, as you say. But if your audio is clear, it's really not that hard.

If you want perfection (line breaks in logical places, not too much text at a time on the screen, captions synchronized perfectly with the speaker), the time goes up to about 4 or 5x, which is still not "significant effort".

In my opinion, spending double the presentation time on editing transcripts qualifies as significant effort.

I guess it depends on your role. If you are editing the video you are spending way more time than that already.

A lot of videos get uploaded without any editing whatsoever, so is it really that hard to believe anything besides that is considered "significant effort"?

It takes 4-5x for my videos -- not going for perfection, just making nonsense into the math words I said reasonably clearly. Jargon can be tough for the automated captioning services. They are improving, and for the version I use for teaching it seems to learn the vocabulary as I go through the class. Maybe next year it'll know what an eigenvector is!

> Machine transcription is available but still fairly technical.

This isn't really true, on YouTube it's basically a check box[1]. Admittedly the quality isn't always perfect but it's trivial and better than nothing.

[1] https://support.google.com/youtube/answer/6373554?hl=en

It is not better than nothing if it actively confuses the listener. I experienced this recently when I uploaded something that got captioned automatically without my knowledge (new rollout of automated captioning on a university content management system, not YouTube). I got a visit in office hours from a confused student who said he'd spent significant time looking for African and Afghan linear spaces and simply could not find the definitions.

Affine. Affine linear space, my friend.

I meant in general. YouTube does makes it less technical but the time investment is still there.

If it's not that hard, do it and re-upload? A lot of people would be grateful to quickly glance over transcript instead of watching video.

Apple are way ahead here. Put a product on their store with a video and you have to do captions.

Actually their captions are the karaoke format and you can do decent typesetting. I hope for a future when all video has captions with CSS styling rather than the baked in format of TV news.

Have you looked at EBU subtitle standards? https://tech.ebu.ch/publications/tech3390

A collegue of mine is heavily involved in them and gets very animated in championing them. It all sounds very interesting when he talks about them, however it doesn't tend to stick in my mind.

> In reality, captioning isn’t that hard.

Maybe if you're an expert at it. Transcribing, then timing the subtitle seems like a lot of work.

Have you even tried with auto captions? Youtube is pretty good with it lately. I watched the first 2 minute of the video with auto captions and it's about 80-90% accurate.

In USA, institutions have run into problems with ADA and similar, blocking posting of videos unless they have been transcribed to text.

(and thus often simply not uploading at all as the cost outweighs the benefit)

Click the little 'CC' button in the toolbar. Video authors can upload closed captions to videos, or Youtube will autogenerate using voice-to-text which, while isn't perfect, does a decent enough job.

Worth noting that this was a presentation at Texas JavaScript Conference, and the presenter likely had nothing to do with the video being uploaded to Youtube.

IME, the autogenerated captions are spectacularly bad. I gave up even bothering to try them ages ago.

They are bad, but it is better than nothing. I think if you are completely deaf they probably are not good enough. For myself, I have some hearing damage from working with fire alarms without hearing protection which makes it sometimes hard to understand speech with some background noise, it also doesn't help that English is not my native language. These subtitles make some videos that are hard for me to understand much better than without, because I am able to combine the hearing that I got with the subtitles to figure out what is being said. So yea, I hope they could improve them, but without them my experience is definitely worse.

I think they're okay, given what they are. I've used them a few things to watch foreign language things. They're not perfect, but at least enough to understand what's happening.

Unfortunately auto generated captions suck especially for technical content. If people don’t caption their videos, d/Deaf people get left out.

You can add subtitles or closed captions to the video. It also supports adding a transcript (like subtitles or closed captions, but without timing information), and it will use speech recognition to try to line it up with the video.

You can also add a transcript or a link to a transcript in the video description.

> It also supports adding a transcript (like subtitles or closed captions, but without timing information), and it will use speech recognition to try to line it up with the video.

that is genuinely pretty cool and a good compromise between manually adding timestamps to captions and dealing with the hilarious but often very incorrect auto-captioning.

It would be ironic if YouTube were owned by the UK Government. I don't think this particular situation is ironic. I think it's just simply irritating.

YouTube provides the video uploader with extensive subtitle/CC capabilities, which they did not take advantage of.

The answer is yes. If they have the text of their video, they can upload it and Youtube will try to match it to the underlying speech.

Youtube can do some automated transcription but it can be buggy.

For funsies, I recently did the transcription of a video for a creator I enjoy. There was a blog post based on the video but it wasn't a 1:1 match so it had to be tweaked by hand. It was time consuming.

Re: accessibility - the GOV.UK team have produced some basic but helpful accessibility posters ("Do's and Don'ts"). The posters are on Github to allow anyone to use them. There are translations in multiple languages:


Here is the (2016) blog post introducing the posters: https://accessibility.blog.gov.uk/2016/09/02/dos-and-donts-o...

Ironic that these posters (at least the en-UK versions) are in PDF

I just realized, they cut out all the actual user videos in the video posted to YouTube... So that's unfortunate, but it's still a good talk.

When gov.uk first started to replace the previous online presence, they were like first-level tech support: brilliant if your question is in the FAQ, useless otherwise. A lot of detailed information on aspects of running a charitable organisation, which I presume weren't accessed every day, simply disappeared. For a while, the web archive of the old site was my lifeline.

My feeling is that gov.uk has got a lot better since, but it's low-information-density by design. This is absolutely the right thing to do for a lot of their users in a lot of cases, and doubly so when targeting users who might not speak good English for whatever reason. But I still feel that information meant for professionals could be presented in a more useful way.

On the web design side, they definitely deserve 5 out of 5. Good general design, sensible spacing, not requiring a huge javascript framework just to display some text - there's so much right with this site.

As a UK national living outside the UK, the website is fantastic. It has exactly what I need and is highly discoverable, no more than a few levels of depth to the form or submission I need. The multilingual support is also excellent.

Having interacted with a number of government websites, I have to say the UK government does an excellent job of theirs. Most things are simple to understand and clearly presented. Even though I only use it for the the most mundane of things, using it is generally a very pleasant experience!

Could not agree more. In fact, I’d go as far as to say they’re not just the best .gov site out there, but one of the best sites, full stop.

(I know a few people who worked on various parts of it and for what it’s worth: they’re all legit. They care, and they’re good. It should come as no surprise that the site turned out the way it did, if they’re hiring these sorts of people.)

I recollect reading about the UK GDS in Tim O'Reilly's book from last year "WTF: What's The Future And Why It's Up To Us". In the chapter "Government as a platform" he's got a small, amusing anecdote about the service.

Having looked up the relevant part, let me transcribe it for you:

> "One of the first things that struck Jen and me as we entered the GDS office on an upper floor of an old office building high above a busy London street was a large sheet of butcher paper covering the picture window in the lobby. In the paper was a small cutout through which you could see the people on the street below. The cutout had a large arrow pointing to it, labeled "Users", reminding everyone when they walked in just whom the unit was meant to serve."

I found it really drove home the service's users-first approach. Following that anecdote he elaborates on the service's "10 commandments" (the 10 GDS Design Principles). They're also quite interesting, but I'm sure a search engine lookup can help you there. I really recommend the book if you want an optimist's outlook on the future of technology, government and economy.

"Be consistent, not uniform" is gold.

I remember attending a talk by some of those involved a couple of years ago where they stated that the air cover from the (then) minister responsible - Francis Maude - was what enabled the progress which was made, and that other ministers attempted to frustrate the process throughout.

I don't know if it was recorded or not, but if it was it's an interesting insight into getting things done in the UK government.

Oh yes, these things don't happen without air cover from a wise strongman (or strongwoman)

> (I know a few people who worked on various parts of it and for what it’s worth: they’re all legit. They care, and they’re good. It should come as no surprise that the site turned out the way it did, if they’re hiring these sorts of people.)

I imagine working for a government is interesting from an incentives perspective too, simply because the optimisation is not about profit in the market sense.

I worked for 18F (a digital transformation organization within the Federal Government, working with many different parts of it) for nearly four years. The lack of profit pressure was one of my favourite things about it, and one of the reasons why we had more room to do things "the right way". (Another reason: you're never sure when a codebase will get improvement/maintenance windows.)

Related: GDS (the GOV.UK people) was a huge inspiration to us from the start. Both 18F and USDS (another Federal Gov org focused on digital transformation) have strong connections to GDS, and there's lots of conversation between the groups.

The team behind it have done some excellent writeups on the methodology and reasoning behind a lot of the design decisions. I only wish 1/10th of the web was this considerate towards the user.

Can you point me to any of these writeups? We're a local government site undertaking a design refresh, and they sound very helpful.

Splendid! Thanks.

This might be part of it: https://www.gov.uk/service-manual/design

Yeah, it's been a real turn around, used to be awful.

My only complaint is that they're sometimes too dogmatic about the one question per page thing, which is actually a pretty stupid UX commandment as soon as you get to complex logic flows.

I think they've relaxed a bit on that though, I didn't notice it last time I was doing my company tax return.

They do have to make the site usable by everyone including the non-tech savvy, I can definitely see good reasons for keeping the input extremely limited.

One of the "downsides" of it being a government site is that it needs to be usable for everyone. Which can be a bit frustrating for power users.

They could add a single-page view. It's not very hard to automatically generate single-page and multi-page views of the same form.

Not necessarily easy either, and I don't really see the point. It's perfectly usable as it is, might as well keep it simple and invest time/money in stuff more useful than satisfying the .5% of users that are techy (made up number) for the few times a year they have to use the website

It really wasn't, 2 years ago there was one place where they told you which type of accounts to submit made up of 10 pages of single questions.

Except, I knew which type I had to submit and it was telling me the wrong one. I was obviously answering one question wrong. But I couldn't go back and I could see my previous answers, I could only start again.

That's basically a ten page glorified wizard with no back button, which is actually a big UX no-no. Must have gone through the damn thing 5 times before I finally got the right form.

Singapore goverment's website is also very good.

Are we looking at the same site?

Sadly, not everything is yet on the new gov.uk site. But those thing which are tend to be presented very well indeed.

I can see for dynamic information PDF definitely has limitations but a lot of what I get from gov.uk I feel is better in PDF. The web pages are fine for navigation and overviews but for hard information I much prefer a PDF.

A PDF document can be dated and versioned, you download a report from April 2010 and you can keep that as a document. You can then download a newer one if you want, the older version is still available and largely immutable. Archiving a dynamic web page is a lot harder.

> A PDF document can be dated and versioned, you download a report from April 2010 and you can keep that as a document. You can then download a newer one if you want, the older version is still available and largely immutable. Archiving a dynamic web page is a lot harder.

How is it any let alone a lot harder? Save a timestamped copy of the file. Save it as a web archive if you fear assets depletion. Hell, pretty much every platform lets you save pages as PDFs natively, good luck doing the opposite.

Many pages don’t render correctly when printed to PDF (it’s easy to accidentally make your page entirely illegible after printing), and most pages that rely on JavaScript completely fail to operate when saved as a web archive. So saving pages is, in my experience, quite a crapshoot.

In addition to that point /static/ content on websites can be easily cached.

Want to have a forum as an optional /extra/ on an otherwise static page? Maybe OK, but please have a way of getting that forum as a different page (which is updated on some basis, maybe per post, maybe with a cooldown of maximum 1 generation every X or only during low server load, etc).

> A PDF document can be dated and versioned

So can an HTML page.

> you download a report from April 2010 and you can keep that as a document... Archiving a dynamic web page is a lot harder.

Right click, save page as... Am I missing something?

If it's not in archive, you will NOT find the old HTML at all, whereas old PDFs are usually easy to find because someone will have it.

In addition saving web pages is a huge pain with all the scripts and CSS to save as well, so then you need to compress that.

At last, saving and browsing saved HTML is a pain on mobile and similar devices without a specialized tool.

Why not print to PDF in that case? HTML can be converted to many other formats, while PDF cannot.

> If it's not in archive, you will NOT find the old HTML at all, whereas old PDFs are usually easy to find because someone will have it.

I don’t understand this argument at all. The exact same thing can be said for HTML pages.

> In addition saving web pages is a huge pain with all the scripts and CSS to save as well, so then you need to compress that.

That’s what the webarchive format is for.

The GDS themselves state that users are more likely to archive their own copies of PDF documents than HTML documents: "users are more likely to download a PDF and continue to refer to it and share it offline"

Part of the reason may be that some browsers download PDFs for display in the user's native reader. Though, even in-browser PDF readers like Firefox on desktop have a toolbar with a prominent save button. The use of PDF is a signal to the user that the document can be saved.

> Right click, save page as... Am I missing something?

The success of this depends on how the page is written. If it fetches some content dynamically you may not get a full and accurate picture of the contents of the page as viewed at the time the article was available live.

I wouldn't like HTML for government documents. They are perfectly fine for website, or live documents. But for anything published by government there will likely be a printed version, and the online "file" will need to be in sync with the printed version as well. As a paper / documents based format, nothing beats PDF/A.

I remember Obama said something along the line to tech leaders, "Your job is a lot easier when you only have selected group of customers to please, but when you have to cater for everybody and every interest group, things are a lot harder." Paper based document will continue to be used for a decade or two more.

Why cant we just have both? Too much man power in auditing or publishing? Well that is what tech is for, automate it. The tech should be catering for its users, which is not just the public but civil servants as well ( I cant believe I just wrote that ), not trying to force the tech on everyone else.

P.S - I thought PDF/A is an open standard. Why is everyone suggesting PDF is a closed format?

PDF/A is an open standard, but it's pretty rare to see a PDF document in the wild that is actually PDF/A compliant. People generally just use the built-in PDF export in some word processor (usually no PDF/A option at all) or use Acrobat (doesn't do PDF/A by default). And non-PDF/A documents can have a laundry list of odd closed features that may or may not be supported by various readers.

In general the landscape of PDF compatibility has improved a lot, but it's still a lot worse than HTML.

>And non-PDF/A documents can have a laundry list of odd closed features that may or may not be supported by various readers.

Are there test suit for PDF/A compatibility?

Yes. http://verapdf.org/ At the Dutch publication office we use this for spot-check testing for PDF/A-1a compliance.

> Paper based document will continue to be used for a decade or two more.

I hope not. I look forward to the death of paper.

You want to restrict information to people able to purchase electronic devices and data subscriptions?

That's a false dichotomy. If you can't afford a cheap phone these days you likely can't afford five books a year anyway.

Guess what libraries provide? Free book and computer access. Nothing changes.

You can take the the book/printed docs with you to read/study them at night, at work pauses, in the train, etc. Libraries don't usually let you take their computers with you, at least in my country.

Paper is an amazing medium for transmitting information: cheap, battery-free, easily copied or lended, transferable, DRM-free, non-proprietary, no monthly subscription needed...

The issues with batteries can be mostly, though not entirely, averted with e-readers, which use html (epub is just a zip with some xml metadata).

Now books, easily copied?

When's the last time you tried to share a copy of a book with a friend? Or did you buy a blank and copy it by hand? While that's a laudable example of dedication it's not exactly easy nor convenient.

Cheapness is likewise debatable, printing out a 300-pages book is fairly expensive.

Most other concerns are concerns only if you use broken-by-design sources.

One thing I find terrible about PDFs is when organisations embed useful data in them, in tables or graphs. Usually in the context of an annual report, or study, or some sort of policy document.

They look great printed on paper, but not very transferrable.

I know there is software [1] out there that tries to parse tables out of PDF documents, but from my experience you'll still end up doing manual adjustments afterwards to correct what the parser could not infer.

That's the main reason why I support this motion. At least make the data both PDF + HTML, so you have options.

[1] https://tabula.technology/

For data in general, I don't think either HTML or PDF is perfect. Both are designed for you to lay out how a document is displayed, rather than the content specifically. For example, some content in HTML is invisible or split between different divs/spans/whatevers. A lot of it is generated dynamically by JavaScript and async requests. Something like JSON or XML is much more powerful for data storage. For example, you can store the data for graphs as raw data so that the consumer can render the graph however it wants. Display-agnostic data is always more flexible, and I think the best way to provide data on a webpage is both as a visual version and as raw data.

This is good. One of the biggest (in my opinion) failures of accessibility in academia is that everything is published as PDFs generated from latex, which is THE least accessable format in existence (word's PDFs are fine, so it is Latex's fault they are so bad).

Some people provide their latex, which improves matters, but most people don't.

What's inaccessible about LaTeX-generated PDF?

It used to be a problem that you couldn't cut and paste text from TeX-generated PDFs, and I assume this would be a problem for screen readers. I think the dvi->pdf conversion produced a PDF consisting mostly of coordinates and characters (in a non-standard encoding), which PDF readers couldn't put back together into words. This seems mostly fixed on recent pdfs, but I don't know if it's TeX (pdflatex?) or the readers that have gotten smarter.

Funnily, Donald Knuth's AoCP fascicles ship as postscript files, which Preview.app converts to PDFs that can't be copied from.

If you're using a screen reader anyway, why not export to text instead of pdf? You're just wasting more storage for no benefit you can perceive.

This relies on what should be a reasonable assumption that the latex source is provided.

It's not at all fixed. It's slightly better than it was, but anything non-trivial (tables, columns) is often still unreadable.

It's.. inaccessible. Try any screen reader you like. If you are lucky you will get most of the text out with some characters randomly broken. Columns often don't read in the right order, tables and maths are both unreadable.

Good luck reading a scientific paper on your phone without lots of zooming and scrolling around.

Good luck reading a scientific paper on your phone period. Many journals are now making available the full text of papers in straight HTML (possibly only for users of participant libraries, I don't know) but that still sucks.

I mean, it's a small screen.

And that's somehow better in Word-generated scientific papers?

Actually, word produces PDFs which contain markup for screen readers, which make them much more accessible to blind people.

Almost everybody on arXiv provides their LaTeX source which let’s us do things like this: https://www.arxiv-vanity.com

I can see no reason why it can't be HTML AND PDF. A well-composed HTML document can be turned into a well-composed PDF automatically on the server side or in a couple of clicks on the client side. It is always more convenient to use PDF rather than HTML when you want to save a document to your computer or put it on a USB drive to print it on a nearby printer.

The article makes exactly this point. The argument is against material being published with layout software (or printed from Microsoft Word) as a PDF. They argue that the canonical version of such documents should be in HTML, which can in turn be transformed to be suitable for desktop, mobile or print output.

I don't really know about Microsoft Word but LibreOffice can produce fairly good PDFs including a table of contents if the original document is outlined properly AFAIK and it can even include the source document in the PDF so the user can open it for editing. Your commentary actually makes me curious what might be some problems of Word/LibreOffice-produced PDFs, can you name some?

This tradeoff between the relative virtues of PDF and HTML is a good argument for looking at Pollen (pollenpub.com) a framework for designing your own markup language that is capable of targeting HTML and PDF (or any output you need: audio, Kindle, whatever).

A proof of concept is here: https://thelocalyarn.com/excursus/secretary/ On every page at that micro-site, you can view the Pollen markup source, or the PDF version. The PDF and the HTML are generated at the same time, from the same source.

Another example is my blog The Notepad (https://thenotepad.org/). Both sites have links to their source on Github.

I'm a big fan of asciidoc and quite sad that it didn't take of as a default format for documents.

I think what's sad about formats per se is that you rely on toolmakers to implement them, and to decide how to translate them into the target format; and for all toolmakers to support them in ways that are coherent with each other.

You also end up with this awkward two-step between the format and the tools. If some capability is missing, you wait for the format to define a way to do it, then the toolmakers to support it; or less ideally, the toolmakers define their own incompatible ways of doing it without waiting for consensus.

This covers the problems I have with Markdown. It as wide support, but because vanilla Markdown only covers a 1995-era subset of HTML, there are all kinds of things (footnotes, figures, formatted code blocks etc) that people want it to do. Any given editor or CMS or site generator will support 95% of your preferred flavor's way of doing things and disagree with your other tools about the last 5%.

The difference with Pollen is that it isn't a format or a markup specification; it's a programming environment. So you design the markup, and you tell it how to get from the source markup to your target format. The format is yours and the implementation is yours; they are one and the same.

It is a bit more work, true, but it's less work in Pollen than it would be in any other environment because it does parsing for you and applies your transformations in a logical, ordered way.

That sounds interesting. I will check it out.

Somewhere in the realm of Adobe, the PDF design/spec, and open source software, there are big problems properly supporting forms and signing.

Adobe Acrobat Reader is now only supported on macOS and Windows. And only Acrobat Reader fully supports all the varieties of forms in various PDF specs. I think it's a big problem for government to in effect require the use of a monolithic and proprietary operating system to fill out government forms.

And then the whole state of signing PDFs is a confusing mess. Learning how to create or buy a certificate is irritatingly confusing, and then where the certificate goes and how to use it. Google search guides and it's completely different instructions depending on platform and Acrobat version. The latest versions of Acrobat Reader let you do a thing called signing a document, but it doesn't use certificates, doesn't let you add a password to the document to prevent modification, but you can add text/image/drawing of a "signature" - no doubt this confusing thing exists because the certificate based signing is so difficult. And then verifying digital signatures isn't something people know how to do: technical companies I work with do not accept them unless the digital signature includes a visible human signature and the PDF security options enable printing the PDF!

One of the hardest things I remember from my time as a web consultant was explaining to my customers why PDF is not a suitable replacement for HTML. They always tried to replace half of their pages with pdf documents...

Well don't leave us in suspense, why isn't PDF a suitable replacement?

* If your internal workflow is designing documents then the conversion to a web document will be clunky at best.

* Every end-user device has a PDF reader and their web browser probably opens it transparently.

* You need the PDF either way way because the website will never be the authoritative source so now you have the problem of making sure they don't get out of sync.

* You probably have a few internal graphic designers that can do wonders with print but it's unlikely you have an internal web development team.

* It's much easier to make a PDF accessible than a website. (Before you disagree remember that the offices we're talking about likely don't have a dedicated web designer) You can be sure it will print correctly, and for most users it's automatically offline.

> Well don't leave us in suspense, why isn't PDF a suitable replacement?

I'm pretty sure the team at Gov.uk wrote a pretty decent article on this topic.

Since I was specifically asked, yes, the OP covers my main arguments quite nicely. But in essence, I would argue the web is content-oriented, and the PDF is not.

For example,

1. The web is a medium where you cannot (within reason) exercise too much control over the design of things. You lay out general guidelines that tell the browser how to render whatever content is thrown at it. I won't pretend this gets us predictable design, but at least it's uniform, which is good for human consumption.

2. The web is a medium that lends itself to structural document construction in such a way that it is easy to use and reuse the content in ways not anticipated. This is good for machine consumption, which in turn empowers the human.

These arguments may seem philosophical, but they do have real-world effects. Once I managed to explain it in ways the customer understood, they were always eager to be of assistance in the endeavour.

As long as we're speaking in idealistic terms I pretty much agree with you but when you compare websites and PDFs that exist in practice it tells a different story.

Zooming PDFs on mobile can be a pain but it's certainly less of a pain than using some fixed width site from the early 2000s with a jQuery menu you can only operate with a mouse and links so small that a mouse would fat finger them.

Machine consumption of the documents is one of those benefits that sounds cool but really just boils down to SEO because very few other things are going to be scraping your site. Certainly worth something but it's usually not high on the priority list for governments and search bots scrape PDFs anyway.

I think the main point in favor of the web is that you can have multiple presentation layers for the same underlying content and you can improve presentation independently of content. Not necessarily that it's reusable because that's just templates and a style guide but that you can backport all your fixes.

Because you are building a web site? I mean, I think there's some things that are evident about those, and one of them is that they're not PDF's.

All the issues you stated stem from the fact that people don't want to change. Just because it's not the easiest way doesn't mean it's not a good one.

Everyone that can design a PDF can learn to design a nice HTML document.

Seems a little tautological. I mean the web is certainly an interlinked set of documents but nothing really says they have to be HTML. I mean people around here link to arXiv all the time and that's essentially just a repository of PDFs. Browsers just get you to the content, doesn't mean it has to be able to display all of it.

I don't think it's that people don't want to change, you're still going to need print designs regardless and it's much easier to host a print design than it is to convert a website into one.

> Everyone that can design a PDF can learn to design a nice HTML document.

I mean I wish that was true. I think people can learn to write content in a web compatible way but a designer that spends all their time in PS and ID isn't going to suddenly be able to crank out high quality web pages.

Not suddenly, no, but with training? Sure. HTML is not rocket surgery.

> you're still going to need print designs regardless

Hopefully not forever.

> accessible

Let's take perhaps the simplest most important accessibility feature: How do I increase the font size without making the content absurdly wide, while viewing a PDF?

As a visually impaired user, this is my single biggest gripe with PDFs. Its also annoying that zooming on many websites completely breaks the layout, often overlapping text or pushing important elements off of the screen. I find this particularly sad, since HTML largely supports a separation of content and presentation.

In the Adobe Reader app for Android on the bottom right you can select multiple 'view modes' one called 'Reader Mode' which will allow you to reflow a document and increase the font size considerably. I capped it out at about a letter per screen.

Trying to read a PDF on a mobile device is usually painful.

Sure, but that's because mobile devices have really small screens. You could create PDFs that were meant to be viewed at that size, but why?

You'd have the same problem trying to read a regular book through a mobile-phone-sized window. That doesn't make books an unsuitable vehicle for document delivery.

It does make it an unsuitable vehicle for delivery of information which people may want to access from a mobile device, however.

Not at all; there is no information that people won't want to access from a mobile device. But again, a mobile device is not necessarily suitable for the job.

Telling your customers to carry a laptop/tablet because you feel linebreaks are just a nice-to-have is quite user-hostile.

PDFs have line breaks.

Mobile devices works perfectly well for accessing this kind of information when developers aren't outright hostile to the users they are meant to serve.

It's not a developers job to tell me whether or not my device is suitable for reading their information because they can't be bothered to make it available as HTML.

> a mobile device is not necessarily suitable for the job.

Certainly not if the publishers took that attitude. And what you consider suitable simply isn't relevant if half the audience is now using a mobile device.

HTML will wrap the text to fit the page. PDFs don't do that.

Adobe's PDF viewer will actually do that. Check it out if you're frustrated by PDFs on mobile.

Opening PDFs on an iOS device doesn’t work transparently - you have to open them with iBooks which is a super awkward workflow.

Only if you're using some 3rd-party app that doesn't support PDF rendering. Apple's Safari and Files both handle them in-app.

Not for me. I’m not sure what’s up then.

It's possible there's a size limit, but I remember waiting a fair bit of time for large PDFs to load in Safari, so ¯\_(ツ)_/¯

If you click on a link to a .pdf in Safari on iOS (certainly on iOS 11, but I'm pretty sure many versions prior to that) it will transparently show the PDF within the browser (it's a fast native PDF viewer that can be zoomed/scrolled very easily).


Yes, I read the article, I'm asking this person specifically about their experience and perspective.

Accusing people of not reading the article is against HN community guidelines.


PDF is easy to create from a Word document, try using Word to create HTML and you get a dogs dinner. If our industry was better at creating tools for ordinary users to create HTML then the problem would solve itself.

The issue is that people want a frozen format so they can line up images precisely where they want them to be. This is antithetical to displaying things in a browser. Sure, it can be done, but it's CSS that's fighting the browser.

What we really should have done for our governments is figured out the 20 or so essential layouts and pre-coded CSS for them, but always allowed the government to fallback to custom HTML / CSS.

Never mind that shifting one of them produces a cascade of changes that leads to some office worker spending a fruitful hour lining them all up again for version two...

This is one of the reasons I strongly prefer text-based/command-line interfaces: they're always scriptable. In the worst case you'll just generate a bunch of commands that'd take a human forever to execute.

HTML is easy. CSS is hard. CSS is hard partly because it's a mess, but mostly because designing for the web is hard.

Normal human beings understand pages, in the papery sense of the word. They don't understand viewports. It takes an enormous leap of abstraction to reason about a document that could be any shape or size. Responsive design is a fundamentally unintuitive process. A lot of professional designers fail to understand this, even after years of designing for the web.


Documents are generally created and laid out for print and email distribution and often have very carefully laid out data tables, charts and page number referencing (as well as less important cosmetic formatting). Simply changing the file format isn't going to solve the accessibility problems and is going to create new formatting/compatibility issues, and and changing the workflow of every member of the civil service to require them to use an accessible HTML generation tool in place of word processors for anything that might be later made public isn't a trivial undertaking either. Said accessible HTML generation tool is probably going to want to change those nice HTML files back to PDF or Word files for restricted email distribution too...

Well don't leave us in suspense, why isn't PDF a suitable replacement?

Yes, but it's not a fully ACCESSIBLE document, which is half of the point of the article.

Pandoc already exists, it's just difficult to use.

You can write everything in Markdown and export to your org's Word template and have HTML, LaTeX, etc. export.

I really like markdown but I doubt it's the answer to this.

These people aren't programmers, most of them will only have a loose grasp of what a file format is. What they need are tools that don't require a programming background.

> We cannot get as much information from analytics about how people are using PDFs

This might be considered a positive.

Not on Surveillance Island

I'm a huge fan having used HMR&C web portal from overseas, the sea-change in how good the UI is, and data collecting on what we want from the UI is also impressive.

I set myself low-bar goals measuring government engagement. The Australian whole-of-government portal is abjectly awful, it has well done 2FA but continually nags me with badly designed 'do we have your permission' and 'remember you're talking to government' intersititials.

State government planning web, uses a web design method which is simply unworkable on touch: the 'permit us to say we want you to agree to terms and conditions' overlay won't scroll. when the underlying page does, so the [agree] button can't be pressed because its off-screen. Gak!

Overall, decisions like this are compromises, with pros and cons:

"They’re not designed for reading on screens"

That's a subjective distinction.

"It’s harder to track their use"

Not relevant for most .gov use cases. Also, sounds like something that isn't a user problem.

"They cause difficulties for navigation and orientation"

So does responsive design that makes discovery of content difficult in many scenarios, especially atypical scenarios.

"They can be hard for some users to access"

So does HTML where a poor accessibility process is in place.

"They’re less likely to be kept up to date"

Conversely, they more easy for a consumer understand when changes take place, and encourage a stronger release process.

"They’re hard to reuse"

That may or may not be a bad thing.

I think the point is: it's harder to do those things properly in PDF format.

Also, I'm not sure how 'not designed for reading on screens' is subjective. PDFs are paper-document oriented and don't display well for reading on anything other than a large size monitor in virtually every case I've seen. When delivering content in a web browser, why on earth would someone prefer to view PDF vs a reasonably well laid out HTML version?

I think PDFs are common in gov't websites because much of the internal culture is paper-document centred, and having those nicely printed PDFs solve internal problems for gov't employees, not because they are actually any better for the users of the website.

Canadian Gov't websites are full of PDF content too. :(

I generally agree, as PDF's were designed for (and are best used for) making documents for printing on paper. BUT: "We cannot get as much information from analytics about how people are using PDFs. We can get data on how many times a PDF has been downloaded from GOV.UK, but we cannot measure views of the file offline." ...that section made me wonder if perhaps I have been missing a possible upside to PDF's.

"[PDFs: ] They're quick and easy to create

... they can be easily created from popular applications that people are already using to author and share documents."

This appears under the heading "Why do people use PDFs?"

However I would have listed this as the sole reason that documents should be distributed as HTML. The reasoning is simple.

Imagine a hypothetical where one has a choice of distributing documents in two formats, A and B, and there are particular advantages to each format. As such, some users prefer format A, while others prefer format B. Not to mention those users who would like to have both formats available.

In the hypothetical, users can easily convert from format A to B however converting from format B to A is difficult.

Assuming one can distribute the documents in format A, it makes no sense to distribute in format B. Users who prefer format A will be unhappy.

Distributing in format A keeps users who like format B happy because they can easily convert from A to B.

I'll make a defense of pdf publishing here. The points the author makes are good, but also show why pdf has a role to play. Responsive design being the most difficult. Imagine you're producing a scientific paper, and it contains a big table. It's unreasonable to ask an author to figure out how make every table and figure display correctly on every screen.

Think of it this way. If you're publishing a pdf, then you master the formatting using your word processor (latex, word, what have you). On the other hand, if you're publishing on a responsive web site, then you really ought to have a content management system to guide you through the requirements of the platform. It's a significantly higher hurdle, both for the authors and the platform owners.

Publishing shouldn't be about displaying nicely in screen though but rather about sharing the information.

As a bioinformatician, big tables inside pdfs are essentially useless. What's the point of a few hundred rows worth of a table if you can't manipulate it with whatever tool you prefer?

Moreover, I'm writing my thesis and dealing with many pdfs from the 90s in most of which I can't just highlight and copy text so I need to type it out like a savage. Is it guaranteed that today's pdfs will be easy to handle for future people?

In my opinion publishing should be done in plain text and .tsv files and the onus of displaying it on screen should fall on the editor (isn't that their job anyway ??)

latex2html and pandoc can both convert to responsive html5, you don't need to learn anything new.

For bonus points, simply distribute the latex file so that the users can convert and read it in whatever alternative format they prefer besides html.

Or basically, semantic focus rather than presentation.

On a related note, I have to fill out a PDF form to make a GDPR complaint to the Data Commissioner (in Ireland). This is a plain-text form, the information could just as easily be sent in the body of an email.

Using PDF means having to deal with crappy PDF software, hurting accessibility and scriptability and adding needless overhead on the other end.

I wish these systems were designed by sane and benevolent programmers, rather than Pointy Haired Bureaucrats.

( https://www.dataprotection.ie/docs/raise-a-concern-Form/m/17... )

That's bad but small silver lining is that it's not a word document that silently renders incorrectly in other editors so you don't even see some of the fields.

I love using PDF to create backups of webpages if I think I'll want to look at it in the future. I generated PDFs as backups for references in college assignments, and it really saved me when pages moved mid-assignment.

Sadly, the great GDS team that is bringing about the great design/technical change in the UK gov is falling prey to politics from the same people that brought you #brexit. GDS used to be an initiative directly from the Cabinet Office, it has now been moved to the ministry of Culture and Sport.

You might think what does sport have to do with gov website UX? But it's all politics - GDS delivered and it is making the other politicians and gov officials look bad so they are being undermined. That's my reading of that anyway - no good deed goes unpunished.

Actually, once you get the name right, it being the Department for Digital, Culture, Media, and Sport, it becomes somewhat clearer what it has to do with this WWW site. But this then prompts the question of exactly when "digital" became a noun. (-:

As far as I'm aware, GDS has no plans to move from Cabinet Office (source: I work at GDS).

Eh. Portfolio's get shuffled around all the time. I wouldn't look into it too much.

I'd love to have a format that is just HTML + whatever zipped up that browsers and operating systems happily just open in a browser.

I like making HTML reports (rmarkdown), but sharing them requires telling people to download then open them in a browser. Google drive, for example, happily just shows you a preview but then if you click on the file you get raw html. Customers just don't understand.

PDF however, is absolutely fine to move around as a single chunk, but has problems in almost every other way.

> I'd love to have a format that is just HTML + whatever zipped up that browsers and operating systems happily just open in a browser.

There is MHTML, sadly it fails the second bit because AFAIK Chrome dropped its support and FF and Safari can't open it natively. Apple has its own WebArchive format, and Firefox's MAF extension generated MAFF file but is not compatible with newer versions.

At one point in time, I thought MHTML was precisely what we need to get rid of PDF, double click, and everything works. Sadly everyone abandoned it, and Google doesn't want you to have down any webpage for archiving purpose.

Chrome supports MHTML as an experimental feature. Visit chrome://flags/#save-page-as-mhtml in Chrome to enable it.

I wish there was an agreed-upon standard format that had cross-browser support.

> I'd love to have a format that is just HTML + whatever zipped up that browsers and operating systems happily just open in a browser.

Does HTML with data URIs meet these criteria?

That part works fine actually and is how I usually generate them, the problem is that then I need to tell people to download and open specifically in a browser. If they see previews in something like google drive, they see raw HTML.

Technically, all this is fine - the problem comes entirely from not having a nice agreed way of opening them. Which makes it more frustrating.

File > Save As... > Web Archive

I don't think you read the comment.

That, in chrome, generates an html file and another folder. I would need to tell users to download the html file, download the folder, put them in the same place and open it in chrome. If I put it on google drive to share it, they'll just be shown raw html unless they then download and open it.

I also tested it on a page using chrome and it didn't properly load the pictures.

Works great in Safari. Generates a single file that loads into the browser. Can be shared like any other file.

WebArchive uses Cocoa binary object serialization and is supported only by Safari.

I just tried it in safari and it generates something that doesn't open in chrome, and when I click on the file alone I get a warning that it "can’t be opened because it is from an unidentified developer".

Can it be shared to a non-Mac, or is it a bundle (directory) that Safari has special support for navigating?

No idea. The few times I've used it, it's always been sent to other Mac users using Safari because their IT department doesn't let them use Chrome.

I'd be surprised if there isn't a similar Chrome feature. Chrome and Firefox usually have all the bells and whistles.

Your government uses PDF? My dear Britons, you're have it easy. The Italian government and Italian local administrations make massive use of MS Word documents, and I'm not talking about .docx, but the old, venerable, write-once-broken-everywhere Office 97 .doc. So, I wouldn't complain that much, because it could be much, much worse, especially when you have to fill up a 20-years old form that does not render anymore on any modern word processor. Oh, well.

This is really interesting but it's missing discussion about what makes each format right for what kinds of information/pages. Even some examples of things currently commonly PDF that would be most helpful to switch to HTML. I have trouble believing that HTML is right for every kind of government document (for example, a quarterly or yearly report published by some agency seems like a reasonable PDF).

PDF is easier for some authors: write in Word, export as PDF, mail to somebody who will upload to the web server and maybe print it for non digital distribution.

Web CMS were invented (among many other reasons) to let non technical people write and edit content directly inside the browser and publish it. I wonder if gov.uk doesn't have a CMS or their authors don't want to use it.

The gov.uk folks make heavy use of Markdown, and so they should - the sooner we leave this PDF/Word doc nonsense in the past the better.

HTML in the browser is the best tool for consuming documents; we can read and write on any device, bookmark documents or chapter headings, resize and style at will in a device independent manner, enhance documents and make them interactive[1], even add videos if required. The best part is that the source is all stored in plain text and is version controlled without requiring Sharepoint or similar.

[0] https://www.gov.uk/guidance/how-to-publish-on-gov-uk/markdow... [1] https://insidegovuk.blog.gov.uk/2013/08/21/barcharts-in-html...

I'd like to see version controlled text files used for important legal documents and many other government docs. Allow the presentation layer to be handled on the end points. Maybe something like markdown will be sufficient.

I always look up to the fixed width font model used by IETF RFCs. They are extremely readable and searchable and last for a long time.

I'd like to see version controlled text files used for important legal documents and many other government docs.

Someone mentions "blockchain" in 3... 2... 1...

Because anyone can embed JavaScripts into PDF file and launch stuff. https://resources.infosecinstitute.com/analyzing-malicious-p...

Everyone always forgets about epub.

The problem with HTML is that it's not a stable format. Will a browser in 2028 be able to render those pages correctly ? What about 2038 ? If the content needs to be available for a longer time PDF/A would be a much better choice than HTML.

> The problem with HTML is that it's not a stable format.

Not a stable format as in still gets updated? Very few formats are stable by that yardstick, and PDF certainly is not one: https://www.iso.org/standard/63534.html

> Will a browser in 2028 be able to render those pages correctly ?

Correctly as in "in a way which can be consumed", most certainly. A modern browser can consume and render 20 years old websites just fine.

> If the content needs to be available for a longer time PDF/A would be a much better choice than HTML.

It really is not. PDF is a very complex format and an absolute bear to manipulate and extract data from.

Notice how I specified PDF/A, not PDF, which is a specific (and thus stable) version of PDF with some other requirements like embedding the fonts, specifically meant for archiving purposes. The whole point of PDF/A is that it’s a stable format.

Your browser should be completely able to render the first webpage ever created, from the early 1990s: http://info.cern.ch/hypertext/WWW/TheProject.html

What HTML + CSS from 1998 doesn't render usefully anymore?

What HTML + CSS from 1998 renders in exactly the same way (pixel perfect copy) on a modern browser as it did on a browser in 1998 ?

Notice how I specified "renders usefully". Few documents have to be pixel perfect, even fairly old HTML had tools to make individual parts where it might be more important pixel perfect, and given the higher level of standardization and standards compliance nowadays this would likely be better for documents created today.


Is actually still supported by many browsers, and never was actually part of a HTML spec (it's only mentioned in HTML5, and marked as deprecated there). Even where it isn't, the contents likely are still visible (same with blink, which is is supported in fewer browsers as far as I know, and also never was in an HTML spec)

Will a browser in 2028 be able to render those pages correctly?

Why not? Modern browsers can still render pages from the 1990's.

With a pixel-by-pixel identical result as back in the 90’s ?

None of the current browsers in 2018 even render the same URI in the same way. Depending on your OS, Browser, and screen DPI, the page will be subtlety different. However none of that actually prevents you from reading the page content. Which is what this is all about. Pixel perfect rendering is just not that important.

Big kudos to the work you are doing. Countries I've been living before -Spain, Italy terrific experiences with gov digital services- should take lessons from you on how to run a digital and accessible PA portal online

I agree with the point of the article.

Still, there is so much out there that is available only in one of the MS Office formats, and Gov.UK is apparently doing better than that. So there is actually some cause for celebration here, IMHO.

There's a distinction between PDF and PDF/A. I agree with favoring HTML over just any PDF, for all the reasons cited in the article. But for certain kinds of documents you'd want PDF/A over HTML or perhaps in addition to HTML.

PDF/A can certainly accept, and a government policy can require the use of, accessibility features and also even digital signatures.

I don't understand why these should be mutually exclusive. Starting from a common parseable markup you can have both responsive HTML for browsers and formatted PDF for print. Why don't decouple content from presentation?

If the HTML is responsive, why can't it be good for print? And, for the sake of reducing waste, should we not discourage people from printing in the first place?

You'd probably want a dedicated print CSS to remove various bits of UI and decoration, as well as styling which works well on a screen but is terrible in print (e.g. the white-on-black and white-on-blue headers).

However gov.uk already does both so it's not an actual issue here.

I do a lot of lit review.

Having hard pagination and consisstent layout is, for me, a cognitive gain, esspeciially for longer documents, say, 20+ pages. (I frequently read 500+ page docs.)

Other formats such as ePub are frequntly compact in space utiisaation, but again, the free-flowing text lacks the mnemonic framing of even a basic print book, let alone the expertise of a masterpiece of layout & typography such as Tufte.

Not that HTML isn't well-suited to other cases, or that PDFs can't be awful. But there's a place.

This is what I do with my CV. I just keep a markdown file up to date and can quickly generate PDF and HTML versions.

They should publish in XML, which kind ever, not HTML.

Gov.uk pages are still using google-analytics and reporting the activities of British Citizens' interactions with their own government to a US-based multinational I see.

And the only option to disable is to click through and install a browser add-on to opt-out.

This doesn't seem very GDPR-friendly.

No personally identifying information is sent to Google Analytics. IP addresses are anonymised, email addresses, dates, and postcodes are stripped too.

See https://github.com/alphagov/govuk_frontend_toolkit/blob/cf1c... and https://github.com/alphagov/govuk_frontend_toolkit/blob/cf1c...

It's present on every page, and it's loaded from google. Anonymisation is pointless.

The fact that they use Piwik on a couple of pages they consider sensitive (usually to do with payment) shows that even gov.uk know this cannot be relied upon to fully hide things.

It shouldn't be there and I've raised a complaint with the ICO.

Note that the page says that it can be configured to strip all that info, not that it does by default. One would have to look at each page to see how this is configured. And it could still be wrong to switch this on by default, under the GDPR.

> Anonymisation is pointless.

How's that?

I am currently doing some literature research for my phd thesis and all I have to say is : "fuck pdfs in the face"

> On a responsive website like GOV.UK, content and page elements shift around to suit the size of the user’s device and browser.

But they do not, on that very page. I resized the page up to full screen and then back again in a WWW browser and all that happened is that huge areas of whitespace opened and closed around the text, which remained word-wrapped in exactly the same places.

Yes, they do. The line width only increases up to a certain maximum because this is usability best-practice; there is an max optimal line length for readability.

No, they did not. I performed the experiment myself and know what I saw, thank you. If this is a question of my "device" always being above some maximum line length, then clearly this is not suiting the size of the device. It is suiting some guidelines, not the user's device.

Try another browser. I'm telling you it is clearly responsive on mine, and the down-votes are others telling you the same. Now if you're just trying to ret-con the definition of 'responsive design' to grind some personal axe, go do it somewhere relevant.

The down-votes are showing you why votes are not equal to truth, because this effect is easily reproducible in Opera, Vivaldi, Chrome, Edge, and Firefox.

And it's clearly you who has some axe to grind. I merely point out that the behaviour of the very page itself is not as the article describes the operation of that WWW site. It does not behave as advertised, and does not change to suit the size of my device. The headline remains word-wrapped after the word "should", for example, and huge areas of whitespace open and close around it.

Actually, I just opened the site in Firefox's "responsive design mode" web-dev tool. The site is responsive up to about 1023 pixels wide, after which it no longer re-flows at all. It just becomes a 1023 pixel wide site inside of a much wider browser window.

I think the OP's point was that they wanted the "responsiveness" to not have an upper limit of width. That if (example) they had a 4k monitor that was 2,160 pixels wide, and they maximized the browser window, they wanted the site layout to reflow to use all 2,160 (minus window decorations) of pixel width to lay out content. As it is now, if they had a monitor that wide, the site will only ever use 1,023 pixels of width to lay itself out, leaving large margins of unused space on the sides.

[edit]: A bit more research shows this CSS declaration on a <div> with the class of "container": max-width: 964px;.

Turning off that one CSS declaration allows the site to widen to fit a maximized browser window.


"If you need to convert files from one markup format into another, pandoc is your swiss-army knife. Pandoc can convert documents in (several dialects of) Markdown, reStructuredText, textile, HTML, DocBook, LaTeX, MediaWiki markup, TWiki markup, TikiWiki markup, Creole 1.0, Vimwiki markup, OPML, Emacs Org-Mode, Emacs Muse, txt2tags, Microsoft Word docx, LibreOffice ODT, EPUB, or Haddock markup to

HTML formats XHTML, HTML5, and HTML slide shows using Slidy, reveal.js, Slideous, S5, or DZSlides

Word processor formats Microsoft Word docx, OpenOffice/LibreOffice ODT, OpenDocument XML, Microsoft PowerPoint.

Ebooks EPUB version 2 or 3, FictionBook2

Documentation formats DocBook version 4 or 5, TEI Simple, GNU TexInfo, Groff man, Groff ms, Haddock markup

Archival formats JATS

Page layout formats InDesign ICML

Outline formats OPML

TeX formats LaTeX, ConTeXt, LaTeX Beamer slides

PDF via pdflatex, xelatex, lualatex, pdfroff, wkhtml2pdf, prince, or weasyprint.

Lightweight markup formats Markdown (including CommonMark and GitHub-flavored Markdown), reStructuredText, AsciiDoc, Emacs Org-Mode, Emacs Muse, Textile, txt2tags, MediaWiki markup, DokuWiki markup, TikiWiki markup, TWiki markup, Vimwiki markup, and ZimWiki markup.

Custom formats custom writers can be written in lua."

You can not have PDF as an input format in pandoc.

PDF is terrible because you can not easily/sensibly even parse the text. That also means it is hard to diff two versions and see the differences.

"Tagged" PDF files allow text extraction. Tagged PDF is often required for accessibility. Screen readers for the blind should be able to extract the text. This is possible with Tagged PDF. PDF/A-1a and PDF/A-1b are tagged PDF and required PDF formats in some governments.


Applications are open for YC Summer 2019

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact