Hacker News new | past | comments | ask | show | jobs | submit login
No-cost access to the latest PDF standard: ISO 32000-2 (PDF 2.0) (pdfa.org)
187 points by MrRadar on April 5, 2023 | hide | past | favorite | 68 comments



> ISO 32000-2 was also, however, the first and only core PDF specification published exclusively — and only for purchase — by ISO, inhibiting its use and adoption over the last five years.

Who would have thought that "international standards" that require payment don't work well? How many people are willing to spend a couple hundred CHFs to access the PDF specs? Also it seems insane to me that the PDF creator, Adobe, now has to pay ANSI to distribute their own spec for free.

For some reason you still need to "add to cart" the standards and they will ask you for your address and email. You don't need to provide real information for anything, not even email, they just redirect you to a page with download links.

Also the EULA is insane.

> You may install one copy of the Product on, and permit access to it by, a single computer owned, leased or otherwise controlled by you. In the event that computer becomes dysfunctional, such that you are unable to access the Product, you may transfer the Product to another computer, provided that the Product is removed from the computer from which it is transferred and the use of the Product on the replacement computer otherwise complies with the terms of this Agreement. Neither concurrent use on two or more computers nor use in a local area network or other network is permitted. You may print one copy of the Product for your personal use. You shall not merge, adapt, translate, modify, rent, lease, sell, sublicense, assign or otherwise transfer any of the Product, or remove any proprietary notice or label appearing on any of the Product. You may make one (1) copy the Product for backup purposes only.

They don't even allow you to use it on your 2 devices at the same time. Complete nonsense that I'm sure everyone will follow.


> Who would have thought that "international standards" that require payment don't work well?

Who says that they don't work well?

All of the manufactured goods that you own, and how they got transported (e.g., shipping containers, boats, fuel for boats, etc) are probably based a dozens of ISO (and ANSI, SAE, etc) standards that had to be paid for.

It's been the norm for about a century and helped build the modern world. See Engineering rules: global standard setting since 1880 by Yates and Murphy for a history. There are some good lectures by them online as well:

* https://www.hagley.org/research/history-hangout-0


They don’t work well for product development. I’ve worked in consumer and medical product development for 30+ years. I’ve worked as a consultant or contractor or directly for companies from startups to Fortune 50.

So many small companies or even teams at big companies don’t know about the standards that they should follow (ISO, ASME, UL etc) or if they do they don’t want to pay for them. Then there’s the complexity of tracking licenses and copies with electronic downloads, which I promise you no one does 100% right. There are third parties who will track this for companies but I’ve never managed to get anyone to use them.

It’s hard to even get everyone I’ve worked with to follow ASME Y14.5, which is the drawing and dimensioning standard for mechanical engineering (in the US).

Part of the challenge is that so many of these standards are broken up into 50 different parts that are purchased separately. Oh, you want to know how to dimension castings, that’s a separate document. Abbreviations to use on drawings, that’s separate. Drawing formats, that’s separate. Etc.

The result is that people collect old PDFs they “found” somewhere, and no one has a complete, up to date set of standards for any field. Even if someone at a company tries to get people to pay for standards, they maybe put them on a shared drive somewhere and they aren’t properly licensed for the team.

The results are that standards aren’t known or properly followed. Which is supposedly the purpose of standards orgs, but most of them use publications as a source of income, which defeats their purpose.

There are a few standards like USB, which are free to download. This should be universal.


> They don’t work well for product development.

I've used the ISO standards to develop smart card software. Before that I'd mostly consulted IETF standards.

The IETF standards are carefully crafted, clearly written by someone with experience in developing the thing they are documenting. That's probably because it's IEFT policy to insist on working code before the standard is ratified. The approach seems to be making it possible to to develop an interoperable version with that existing code just from the published standard. Since that's inevitably what you are trying to do it's a joy to work with.

The ISO standards on the other hand come across as brain dump made to look organised by the addition of bullet points, numbered lists and a table of contents. They give no explanation on why they made the design decisions behind the standard, so you end up having to re-discover it all yourself. It takes forever.

Having to pay for the ISO standards just adds insult to injury. As I was developing something that was a one off (I later open sourced most of it), I could not justify the thousands ISO was demanding. Fortunately if you dig long and hard enough, you can find copies of most of it lying around on the web.


If they were free then the readers of such standards wouldn't be the customers of the standards organizations but the product.


Sure, but the current model is just ridiculous. 290 or so CHF* is probably less than the procurement bureaucratic overhead in a large corporation that bases an entire line of business on a given standard, yet is prohibitive for hobbyists and many open-source projects.

A more reasonable model these days would probably be a per-seat license for an entire segment of the ISO library. As it is, I don’t even know if a given standard would actually be useful to have for my work, companies need to decide which employee(s) to license a copy for etc…

* I’m surprised it’s not billed in special drawing rights.


I run a business that extracts data from PDFs, even I couldn’t justify paying 290 CHF to buy a copy of the standard


I found an error/omission in the PDF spec 20 years ago and it's still there despite my efforts to characterize and report it to the grand overlords of PDF. (There is no security implication, hence I am nobody.)

When Apple eventually changed their implementation to be consistent with Adobe's, I gave up all further attempts as a waste of time.

My errata document is available under a $5000/yr corporate subscription agreement to my personal standards and practices library. Sorry, I do not offer memberships to individuals.


> standards that had to be paid for.

I am extremely skeptical that everyone who has ever designed a glass bottle cap has paid for the glass packaging neck thread specifications. I know I sure as shit didn't the last time I had to make one. Come at me, bottlecap cartel.

Sure, there is a collective cost to develop and maintain standards, but making sure that they can be openly accessed afterwards should be considered as part of this cost burden.

I really could care less if something is "the norm" when it's still obviously antithetical to it's purpose. Saying "the norm" is always, always, always a red flag phrase that something is being held very far away from a natural state of balance.


When nobody tells lawyers no, this is what happens. This protects nobody other than some lawyers feeling fuzzy that they "mitigated risks" for their customer.


this an old school organization, they don't really have any incentives to be user friendly. they manage a lot of standards, it's a stable gig, they are doing okay, so they did what usually old school organizations do, they got an EULA decade(s) ago and now it's just there and maybe gets updated from time to time. buf no one even questions the need for it.

maybe if more things move online, if they face some serious competition, then they might start thinking about these as potential UX issues


What if they are not allowed to say no to lawyers?


Legal is a cost center, which means that they don't get to say no. Here is how that conversation goes:

Lawyer: You must include my unenforceable EULA.

Anyone-with-a-brain: lol, no. Your "contribution" to this project only serves to do the direct opposite of our stated intent... Actually, why are you involving yourself here - are you billing for this?

Lawyer scampering noises recede into the distance


I think what a lot of people don't understand is that a lawyers job is to provide advise, not to make decisions.


Specifically, lawyers advise their clients on how to avoid liability.

The minimum liability comes from doing nothing, so the default advice is "Don't."


> The minimum liability comes from doing nothing

You must be working in a very unregulated industry if that’s true!


It still follows:

"How do I not get sued for my new medical device?"

"Don't sell it to anybody."


Poorly conceived advice apparently.


Only in the minds of HN engineers. Lawyers who tell their clients to do nothing rather than actually help try to figure out a path forward go out of business very quickly.

In-house counsel is a little different but again even at places like IBM the lawyers could not get away with blocking meaningful business


As a recovering lawyer, all I will say is that, it is extremely rare for any lawyer to have the power to do anything directly themselves of their own initiative, which is even more true in business than in other areas of law (lawyers can railroad their immigration, criminal defense, family law, etc clients a bit more because these clients have even less clue what is the right thing to do).

More likely in your scenario is:

Lawyer: You must include this EULA that this person/board/committee demands, and will fire both of us if you don't, because they asked me how to cover their asses in case of XYZ.

Anyone-with-a-brain: okay.

Lawyers themselves are certainly powerful, but they only really derive power from people with real power; states, boards of directors, judges, etc.


As a person who as been described as an "activist investor" by several angry businessmen, I've never seen a shareholder or board member demand anything like that. The only thing that I can think of that would loosely fit your hypothetical scenario would involve a non-human shareholder like a hedge fund making the demand, at the behest of their lawyers. I suppose another potential loose fit would be a fresh business school grad whining about the absence of a "moat" - and this sort of thing being the best that legal could offer in response. In either case both legal and management is to blame: legal for the bad council and management for not pushing back against something so stupid.


In a room sit three great men. A CEO, a lawyer, and an investor paying for both.


Yep, it's a giant scam...

Same for (a lot of) science..

You pay taxes to build universities, you pay taxes for professors and other educators paychecks, you pay taxes for research grants, and all that money from you is than used to 'discover' something, but you need to pay some private company to actually read the article.

It's slowly changing, but still, a lot of science, for "poor people", is only available on pirate sites.


Not just for poor people - I've had to pirate papers that I've co-authored.


Damn. I would be ultra mad. Good for you for pirating.


This works for ISO standards in many areas.

In many environments there are requirements to be standards-compliant. Buying the standard is a prove of compliance. (Turn it around: "you didn't even obtain the standard! How could your product be compliant?")

This is of course mostly relevant for engineering. Less so in IT, but even IT standards compliance is sometimes part of government contracts. At least for language compilers.

And yes, nobody will be able to verify the compliance, but commercial vendors document how well they comply to the standard and that limits the liability.


The specifications for the electronic driving license - recently agreed to by the EU as one of their core standards - is only available as (a number of) ISO specifications.

ISO 18013-5 and co.


ISO C and ISO C++ also technically cost money.


This is true, but that's not good either. Thankfully finding a final draft isn't hard for these, but we shouldn't have to do it in the first place.


One interesting thing I noticed is that the shortest conforming PDF 2.0 file clocks in at 254 bytes long (if you strip out the indentation):

  %PDF-2.0
  1 0 obj<</Count 0/Kids[]/Type/Pages>>endobj
  2 0 obj<</Pages 1 0 R/Type/Catalog>>endobj
  xref
  0 3
  0000000000 65535 f 
  0000000009 00000 n 
  0000000053 00000 n 
  trailer<</ID[(                )(                )]/Root 2 0 R/Size 3>>
  startxref
  96
  %%EOF
This is 41 bytes longer than the shortest conforming file in PDF 1.0 through PDF 1.7, which do not require the ID entry in the file trailer dictionary. (Apart from the ID entry and file header, the shortest file is otherwise identical.) The standard justifies its mandatory inclusion by saying that "some workflows" require PDF files to be "uniquely identified". I wonder what sort of workflows those might be.


DRM fingerprinting


At least the standard doesn't seem to support that: it suggests creating the ID by hashing the current time, file path, and file size. Besides, there are a billion other side channels that can be used to digitally watermark PDFs.

Anyway, I've since found an explanation in the PDF 1.7 spec. PDF files can include "file specifications" (file paths or URLs) to refer to other files. An ID can be added to a file specification for an external PDF, which the PDF reader can validate against the ID contained in the located file's trailer dictionary, to ensure that the external PDF is the expected one.

I guess "some workflows" must have gotten upset when trying to generate a link to a PDF file with no ID, so they finally required its presence in the spec.


It's worth noting that while https://www.pdfa.org/resource/iso-32000-pdf/ mentions the 2011 patent release for PDF 1.7 (https://www.adobe.com/pdf/pdfs/ISO32000-1PublicPatentLicense...) it makes no such mention of such a release for PDF 2.0.

Additionally, https://www.pdfa-inc.org/eula/ seems to indicate that you may only use the downloaded files on one computer - and does not establish a license agreement between the user and any third party IP holders.

No-cost access does not mean free-as-in-software. Caveat emptor.


Formally speaking, I'm pretty sure you can satisfy the terms of the EULA by "buying" another copy of the files (another instance of the "Product") for every computer you want to store it on. IANAL though. (And regardless, it definitely doesn't meet the principles of freedom as in free software.)


Nothing encumbered by fees or patents should be accepted as a "standard." Period.


The problem is that work on standards costs a lot of money. Somehow we need to pay for that. The solution could be the state.


The fun is: The fees for obtaining the standard doesn't really go to the ones who create the standard.

Standard is created by volunteers typically sponsored by their employer to go to standards meetings, who also pay (depending on national body) for being members.


> The fun is: The fees for obtaining the standard doesn't really go to the ones who create the standard.

The standard organization itself has and needs some infrastructure to exist, and that has to be paid for.

While the folks getting together creating the standards are (often) sponsored by their employer, the employer thinks this is a net benefit because they get a say in setting it, after which they can turn around and sell more product(s) based on the standard.

The standards organization themselves do not have such an option, but they still have bills to pay.


What services do standard organizations provide? Apart from scheduling meetings and hiding the standards behind a paywall?


I guess you can take a look there: https://www.iso.org/what-we-do.html


But what is a standard? Either one company or multiple companies wanting to be interoperable with their software decide on a way to save data (communicate, whatever), so more people buy their software and use their prouducts.

Adobe has an interest for more and more people to use PDFs because more people will then buy their products.. if they're afraid of the costs, don't publish the standard and risk people switching to other stanards.


Like what? And if they do, is it really sustainable? Are these standard organisations really funded by the papers they sell?


ISO is funded by sales and membership fees from member states. It's been around for 75 years. If they're not sustainable, I don't know what organizations are.


There should be organizations where people can get paid to just chill and invent stuff.


Heads-up: apparently some kind of signup is required to download anything from this page. So if you heartily dislike ISOs not because of cost, but because of how insane it is to make standards difficult to actually access at all - nothing has changed.


Unsure how long it'll be valid, but to save you having to enter fake details, here you go:

https://www.pdfa-inc.org/?download_file=2009&order=wc_order_...



The discussion around this being a non-free standard encumbered with license restrictions makes me wonder: is there an alternative for pixel-perfect rich document rendering? HTML needs sandboxing to be safe, rtf is barebones, and .docx has external dependencies.


If you're describing an image format where all subobjects are bitmaps and not vectors, and all math is done with integers, then pixel perfection is possible.

Once you introduce any mildly interesting vector or typesetting feature, you will not get pixel perfection. Examples:

* You want to draw a circle. Antialiasing is desirable because jaggies look really ugly. Do you want to prescribe the exact antialiasing algorithm? What about other ones that are faster or more accurate? Which AA convolution kernel are you using - box, Gaussian, sinc?

* Which approximation algorithm will you use to draw cubic Bézier curves? I don't think there is a closed-form solution for them.

* You support affine transformations. Define a 10×10 square. Scale it by 1/100. Scale it by 100. What if, due to floating-point rounding errors, your square is now 9.999×9.999 and now renders to 9×9 pixels without antialiasing?

* You support automatic text wrapping. One renderer uses float32, looks at how wide each word is, and decides to break at some point in the sentence. Another renderer uses float64, looks at how wide each word is, and decides to break at a different point in the sentence.


I've only had tangential contact with it, but Wikipedia claims (https://en.wikipedia.org/wiki/DjVu#Format_licensing) that DjVu is open, albeit they do talk about patents and I'm not qualified to say where that falls on the non-free spectrum

I downloaded their example file and it's entirely binary, unlike PDF which is just pseudo-binary, but Evince did open it and it seems unlike PDF it's entirely raster based and would require a separate OCR layer on top of the text to make it eligible for copy-paste, if that's one of the goals


> it seems unlike PDF it's entirely raster based and would require a separate OCR layer on top of the text to make it eligible for copy-paste, if that's one of the goals

Not really all that different from PDF:

"Like PDF, DjVu can contain an OCR text layer, making it easy to perform copy and paste and text search operations." https://en.wikipedia.org/wiki/DjVu

DjVu however really does seem to be biased toward scanned-origin documents, not digitally produced ones.


> Not really all that different from PDF

That is not correct, PDF supports text spans natively; perhaps you're thinking of scanner software that merely uses PDF as a convenient packaging for their JPEGs?

I cannot defend DjVu as I've only had tangential contact with it, and for sure have never tried to author any such file. I was just raising awareness that there are competing standards that appear to be libre and are designed for pixel perfect output


The obvious example would be PostScript. After all, PDF grew out of a desire to have a postscript variant better suited for digital documents instead of just printing. Illustrator's file format (.ai) was also just postscript up to version 8 or so.


I worked on PDF for years, and yearned for a better way to render pixel perfect. I think that handling of lines, shapes and images could be drastically simpler, but, alas, handling Glyphs (characters/fonts) will always be a nightmare of complexity.


PDF isn’t pixel-perfect (rounding and precision isn’t exactly specified). For literal pixel-perfectness, use PNG.


Pixel-perfect is antipattern, document should be scalable, reflowable, restylable.


So I'm supposed to have a PDF reader to implement a PDF reader? What's the bootstrapping strategy here? tsk.


> What's the bootstrapping strategy here?

You open it in Adobe™ Acrobat™ Reader, of course. The first version of the PDF Reference Manual was published to accompany the first version of Acrobat™ [0].

[0] https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandard...


Half-joking—-Does the standard explain why some documents I open in Acrobat Reader ask me if I want to save, when I only opened the file (and made no intentional edits)??


PDF includes javascript as an extension, although I don't know if javascript actions can be triggered without user input or not.

Also many PDF viewers have little to no support to javascript in PDF.


This can happen for PDFs that are broken in some way and that Acrobat auto-repairs, and thus regards as modified.

And no, the standard doesn’t concern itself with that particular behavior. Also, many PDFs “in the wild” do not fully conform to the standard, because Acrobat Reader is so lenient.


No cost... except a ton of your personal info.


You can lie on the Internet.


The way pdf has become a standard in everyday life is truly disgusting.

One is expected nowadays to have a pdf reader installed on your device, this is a problem in it self because most of the software short of just reading the pdf, that are "free" does not include most of the features one would come to expect to utilize the standard. Now I don't have to sign pdf's daily but when you start a new job for instance and have to sign a pdf it becomes troublesome.

As a developer the ecosystem is pretty terrible too, in order to utilize said functions such as signing you'd need to fork out a huge chunk for a subscription to some library. Don't get me wrong, paying for software I'm all for but forcing your users to pay for some third-party software to interact with your documents is not fine.

To me, this is simply disgusting that you'd have to buy into this "standard" in every aspect, both as a user and as a developer.

We've come a long way over the years of having more options to both users and developers but honestly, it's simply not enough. And pdf as a format being very complex it's more than understandable that it would be a challenge to make it more available.


Or just write plaintext documents IETF style.


IETF RFCs haven’t been plain text for quite some years. RFC 2629 (June 1999) defined the first XML vocabulary, and RFC 7990 (December 2016) completed the process of declaring an XML format the canonical source, rather than plain text. To learn more, start at https://www.rfc-editor.org/rse/format-faq/.


That's just for ease of translation into other formats, right? No human is reading/implementing the XML.


I don’t know altogether.

I know it’s common to author in something else and then convert into the canonical XML form, but I’d be surprised if it wasn’t also common to author in the XML format. It’s not an onerous format.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: