
In Defense of COS, or Why I Love JSON and Hate XML - hoov
http://jimpravetz.com/blog/2012/12/in-defense-of-cos/
======
jimfuller
XML, being in the markup family tree, has a lot more history then simple json
encoding ... measuring its usefulness on a corner case has always been well
... boring. I am glad people are using JSON to sling simple data across the
web versus markup.

Come back to me when you are using json to encode an entire document ... you
might look at XML a bit differently.

tl;dr use the right tool for the right job.

~~~
billysilly
_"Come back to me when you are using json to encode an entire document"_

How is that relevant to the article, which is about COS? In other words, what
does _COS_ lack what XML has?

~~~
patrickg
* Infrastructure (schema support - DTD, Schema, RelaxNG; transformation - XSLT) * No obvious document format (What encoding are the strings? How to escape characters? * Only used to describe predefined object types (boolean, strings, arrays, dictionaries * Hard to ensure the integrity of the data without interpreting the data from the interpreter itself (no external validation)

~~~
rauar
You know this can be done on top when you have demand for this? I prefer a
non-bloated protocol format over XML anytime. How often does the DTD not
matter at all ? How often is the encoding fixed by convention ? ...

~~~
patrickg
I use schemas (in the form of RelaxNG most of the time) almost every time I
deal with XML. Together with schematron you can make very complex lint-like
scripts to verify your data. Actually I program in XML with a self created
programming language (formulated in XML). This together with RelaxNG and a
good XML makes it fun writing XML and absolutely (syntax-)error free.

------
randomfool
Dings XML for crappy commenting syntax then gives JSON a pass for not
supporting comments at all.

I love JSON, but it does have it's issues.

~~~
wvenable
JSON not having comments is a benefit when used for it's primary purpose:
information exchange. JSON is not a great format for configuration files or
static documents even though it's increasingly used for it.

~~~
ishbits
YAML is better suited for configuration files.

~~~
joe_the_user
Yaml is not good for configuration files because it is not easily human-
editable. It seems "easy" but meaningful whitespace is a cluster-f _ck_.

Edit: I have been looking for reasonable configuration file formats for a
while. Json actually has pretty bad human-readability at any scale because of
its quoted key-values. Yaml is easily _readable_ but when a user tries to
change anything, things go to hell. The humble ini-file has won so far. It's
limited and not fully standardized but it is human readable and human
writable. I'd love see something better but human readability/writable is
going trump all sorts of cleverness.

Edit2: From <http://en.wikipedia.org/wiki/YAML> " _The specific number of
spaces in the indentation is unimportant as long as parallel elements have the
same left justification and the hierarchically nested elements are indented
further._ "

Yeah, a user of your software is going to be able to understand when they mess
up on that rule, riiiight. Screw Yaml.

~~~
jscn

      It seems "easy" but meaningful whitespace is a cluster-fck.
    

Tell that to everyone using Python. If your editor can't handle meaningful
whitespace transparently, it sounds like you need a better editor.

~~~
joe_the_user
_you need a better editor_

I think you misunderstand the concept of "configuration file". These are not
intended for programmers but for users of the program you write.

White complaints about Pythons' meaningful white are still common, programmers
can get over that by, indeed, getting a better editor or otherwise wrapping
their head around the situation.

But end-users can't be expected to have this kind of sophistication.
Explaining to someone while things they can't see have made their program
usable is no fun.

------
pom
Like many, I feel that the poster is throwing the baby with the bathwater.
Yes, SOAP and XML Schema are horrible. Don't use them, then. Yes, XML is
verbose, but that's exactly why Relax NG has a compact syntax. Use that if you
don't like the XML syntax. Yes, data can be expressed as attributes or
elements, but there are simple rules of thumb to decide between one or the
other: if your data can have structure, or you may want to have multiple
instances of the same thing, it's generally better to use an element;
otherwise an attribute should do the trick.

There are also errors and approximations: XML did not introduce the bracket
syntax, it inherited it from SGML. A DTD is not a schema (and if you want to
criticize XML, you should point out that it should _not_ have inherited DTDs
from SGML.) He doesn't even mention the worst part about comments, which is
that you can't have -- inside a comment (very annoying when commenting a large
block of data...)

XML has many beautiful applications, like SVG, SMIL (which never took off but
keeps getting rediscovered/reimplemented in an inconsistent manner [full
disclosure: I participated in the SMIL and CDF W3C working groups]), XSLT, &c.
XHTML was not perfect by a long stretch but the new HTML5 syntax is much, much
worse.

Use XML, JSON, and whatever is necessary to get the job done. For the project
that I am working on right now, I am using XML for serializing Web app
descriptions; in _this_ situation, XML is clearly better than JSON.

------
andrewcooke
the main arguments are about syntax. there's no mention of namespaces and
schema are dismissed because the author didn't use them. no mention of tools
for automated processing.

this is not a very good article, in all honesty. he doesn't like the syntax,
but doesn't seem to consider that different technologies can be suited to
different problems, or that he simply hasn't experienced the kind of uses
where xml works well.

~~~
w0utert
Agreed. I like JSON and I think it's the better choice for serialization and
data exchange in many cases where XML is used today, but pretending it's the
magic alternative to anything you could also do with XML is crazy, and it only
shows a very limited understanding of what XML actually is.

Transformations, schema checking, xpath queries, well defined linking and
embedding, a host of tools that support applications of XML on every platform,
the list goes on and on. JSON has none of this, it's basically exactly what
its name says it is ('JavaScript object notation') and not much more. It
covers maybe 10% of what XML and all its related tools and standards are. The
current trend seems to be that XML is 'old technology' and all the cool kids
use JSON, and it bugs me. You can't possibly sincerely say JSON is 'just like
XML but less verbose' unless you simply don't really know much about XML at
all.

~~~
ams6110
I must be one of the few who thought that XSL was a great way to build visual
interfaces to data, in browsers that support XSLT (this was an area where IE
had an very early advantage over other browsers, not sure what the state of
things is today). Sorting, filtering, drilling down... all trivially easy and
declarative with XSLT, once you groked xpath. Yeah it was verbose, but
otherwise great.

There are some similar approaches for JSON data now, I think... but none are
standards-based AFAIK, meaning they are mostly single-vendor or project
solutions.

------
DougWebb
XML can do the same trick with indirect objects as COS, using ID and IDREF
type attributes. A number of years ago I was dealing with the archival and
retrieval/display of enormous medical textbooks in XML, and I couldn't
efficiently pull out arbitrary elements (chapters, sections, paragraphs, etc)
because of the hierarchical nature of the XML document structure. I had to
parse the whole thing to use an XPATH to get the element I needed, and that
took too long. (My parser could handle 3MB/sec, and some of these books were
over 100+MB.)

The solution I came up with was a program that transformed the documents by
flattening them into a relatively small hierarchical structure that
represented the volume/chapter/section headings of the book, and a flat list
of elements that were small enough to parse quickly. I inserted ID and IDREF
elements to link these two parts together, and created an external index of
the file offsets and lengths of each element in the flattened list. That let
me use simple file I/O to access any element by ID, pull it out of the larger
file, and only then start the parsing engine.

It was like the article mentions: my XML file, together with the external
index (in a simple Unix DBFile file) was a miniature NoSQL database of the
textbook.

BTW, this predated the "NoSQL" label, and was developed after testing of
Oracle and the XML databases of the day completely failed to meet performance
and scalability requirements. My solution has an infinite capacity to scale;
its performance is not impacted by the number of books in the system nor their
size. The retrieval and display time of a single chapter or any subelement is
a constant proportional only to the size of that chapter, and is not affected
by the overall size of the collection. All of the other solutions we looked at
got slower as the number of books increased and as the size of the books
increased. (I mention this only to head-off any comments about reinventing the
wheel.)

~~~
thristian
ID/IDREF isn't exactly the same as indirect objects in COS. When you're
parsing an XML document, ID/IDREF are just attributes like any other that get
added to the DOM, and then application code can dereference them later if it
wants. In COS, indirect objects are part of the serialisation format and the
parser _needs_ to understand them and dereference them in order to be able to
parse the file.

For example, the COS "stream" object type is serialised as a settings
dictionary, followed by the 'stream' keyword, the stream data, and the
'endstream' keyword. But what happens when the stream happens to contain the
bytes 'endstream'? Well, the settings dictionary has a "Length" key that tells
you how long the stream is, without you having to scan for the 'endstream'
keyword. However, because most streams are compressed and compression makes it
difficult to guess in advance exactly what the compressed size willl be, COS
allows you to make the Length key an indirect reference to an integer defined
_later_ in the file. Like so:

    
    
        <<
        % Here, the value associated with the Length key is
        % indirect object 42 revision 0.
        /Length 42 0 R
        /Filter /FlateDecode
        >>
        stream
        ...compressed data goes here...
        endstream
        % And now we define object 42 revision 0
        42 0 obj
        12345
        endobj
    

So, the parser needs to know the length of the compressed stream in order to
parse it, but it needs to have parsed the compressed stream in order to get to
the length data. The way out of this catch-22 is the object index at the end
of the file, which gives you the index and location of each indirect object. A
COS parser needs to start at the end of the file and load all the indirect
objects, cache them, then go back to the beginning and stitch them into the
deserialised object graph as they're referenced.

------
xrt
How is COS separate from PostScript? It looks like a straight PS dictionary.

~~~
thristian
Well, PostScript probably doesn't have the indirect-object-reference index at
the end, but I don't really know much about PostScript so I couldn't say for
sure.

I wonder if PostScript is to COS as JavaScript is to JSON.

------
armored_mammal
And here I thought everyone hated XML...

~~~
georgemcbay
I don't hate XML, I just think it tends to be misused.

I (still) think XML is fine for configuration files and things like markup of
UI/UX components (See: Flex, WPF, Android layouts, etc). In this context,
where you may be modelling the relationship between many parents with many
children nested fairly deeply, I find XML _much_ more readable and easy to
edit and reason about than JSON.

Where XML makes little sense is as an over-the-wire network protocol format or
in any context where its "human readable"/"human editable" properties aren't
as important. In those contexts, I think JSON is far better suited though
neither is completely ideal.

~~~
DougWebb
I don't think XML is good at all for config files or UI/UX component markup,
but for me that's mostly because of the end tags. Other formats, such as YAML,
are able to describe hierarchical data without the verbosity of end tags, and
I think that makes the files much more readable.

In fact, I think it's fair to say that WPF and all of Microsoft's config and
resource file formats inherit their use of XML from mimicry of the Java
ecosystem when C# and .NET were being developed as an alternative stack to
Java. Android's inheritance is more direct, since it's based on the Java
ecosystem. And as we all know, the crazy hype phase that XML went through was
hand-in-hand with the crazy hype phase that Java went through. If Java hadn't
happened, XML probably still would have as an improvement over SGML, but it
probably would have remained the simple document markup language that it's
pretty good for rather than becoming the "XML Everywhere" beast.

------
ucee054
[http://linuxmafia.com/faq/Devtools/parable-of-the-
languages....](http://linuxmafia.com/faq/Devtools/parable-of-the-
languages.html)

~~~
jimfuller
I love how some technologies cause some folks to get hostile ... HTML and XML
are very close cousins in the same markup family.

On the basis of correct usage (e.g. not insane application of XML to corner
cases) I don't fully understand the logic of people who profess love for HTML
and hatred of XML, they are in the same family. The only difference is that
HTML (now)is a controlled vocabulary, eventually folks will want to add markup
of their own design w/o committee discussing the merits.

Perhaps next decade ;)

