
Is the Unix shell ready for XML? (2006) - networked
http://www.lbreyer.com/unix_xml-1.html
======
_paulc
The Juniper libxo [1] library is an interesting approach which preserves the
traditional text output but also gives an option to output structured data:

    
    
      The libxo library allows an application to generate text,
      XML, JSON, and HTML output using a common set of function
      calls. The application decides at run time which output
      style should be produced.
    

There is some work ongoing to convert FreeBSD utilities to support libxo [2]
and a long thread on freebsd-arch [3]

[1] [https://juniper.github.io/libxo/libxo-
manual.html](https://juniper.github.io/libxo/libxo-manual.html)

[2] [https://wiki.freebsd.org/LibXo](https://wiki.freebsd.org/LibXo)

[3] [https://lists.freebsd.org/pipermail/freebsd-
arch/2014-July/0...](https://lists.freebsd.org/pipermail/freebsd-
arch/2014-July/015633.html)

~~~
xenophonf
I've just started working with the new AWS CLI, and while in the long term I
will probably switch to using boto directly from Python, for now I process the
CLI's JSON output in bash scripts. Using structured data in this way is really
quite interesting. Because it isn't sensitive to whitespace or newline
mangling, the JSON output can safely be stored in an environment variable a la
Perl (e.g., ```VAR=$(aws ... -output json)```). Rather than hacking up some
rudimentary parsing using sed/awk/etc., I can pipe it to an existing parser
(e.g., jshon), which will extract the data I want with minimal effort. I think
that if I had an objection to XML, it isn't because it's structured but
because it's so verbose relative to simpler serialization formats like JSON or
YAML. Then again, I don't know why we even bother with those when we're just a
tiny bit of syntax away from sexps. :)

~~~
jcrites
I've also found the utility called jq [1] to be a great help working with JSON
data on the CLI, such as from the AWS CLI. It's great at parsing and selecting
output, or performing simple transformations.

[1] [http://stedolan.github.io/jq/](http://stedolan.github.io/jq/)

------
AndrewGaspar
If you're looking to work with structured, hierarchical data from a CLI, this
is where PowerShell really shines. Its OO model is a really excellent fit for
Windows, and all of the common utilities in PowerShell are designed around
this model. And it solves this problem much more elegantly than trying to
massage XML through UNIX-style text processing utilities.

On the one hand, I think it would be interesting to see a truly OO shell for
UNIX, but on the other hand I don't know if that's really necessary for a UNIX
system.

~~~
tdicola
Unfortunately in practice I've found powershell hasn't been much of a panacea.
You give up all the problems of parsing text, but now have new problems
dealing with tools that don't output objects and getting them to do so.

I used powershell for some moderately complex workflow & reporting stuff but
at the end was kind of on the fence about if I should have just scrapped it
all and done it with traditional tools. It really felt like powershell made
hard things easy but easy things hard. I don't think I would use powershell
again unless there was absolutely no other choice.

~~~
GauntletWizard
I was all about Powershell, and then a coworker called it a gimped Python. I
realized nearly everything I'd ever 'solved' in Powershell was a triviality in
Python, and nearly every api and utility that was good in powershell would
have been just as well off with a python api. Not to fault Microsoft, it was a
huge step forward, but I'd love if they made bindings easy and deprecated, now
that they've embraced open source.

~~~
MetaCosm
Sadly, deployment of Python (or Ruby, Perl, etc) is horrific pain at scale.

If you are automating a large organization -- Powershell has one massive edge,
already deployed and able to be managed sanely via the domain management tools
with a permission model around it.

Some large (30k+ machine) organizations actually have fairly decent sized
research projects on best ways to automate deployments and system management.
It ends up being far trickier than it appears on the surface and any tool
blessed by MS and shipped by default has a huge edge. Powershell might not be
great -- but it is already on the machine and the competition is batch files.

------
tdicola
Where are the UML diagrams and how many sprints will it take to implement all
the tools? Has an architect approved all the design patterns that will be used
to implement them?

On a serious note though, wouldn't XML sed and awk just be XSLT transforms?

~~~
jeff_marshall
XSLT is rather verbose compared to typical sed/awk syntax.

While it's not quite sed/awk (there isn't a programming language per se, just
simple edit/select commands) xmlstarlet can do some of the things to an xml
document that you might do to a text document with sed and uses xslt
internally.

------
fsniper
Isn't it a solution for a non-problem?. I wish people would let alone unix
shell and clear text.

edit: new "Avoid Gratuitous Negativity" complience.

~~~
z1mm32m4n
I wouldn't say it's a non-problem, but I agree that there is a certain
elegance to line-oriented text processing. It's the same design problem as
choosing a data structure; sometimes you can get by with a list, but some
problems benefit from non-linear data structures.

~~~
fsniper
Problem is not xml, json or any other non linear data structure. It's trying
to use shell to manipulate data that it's not intended to.

Shell will always have problems with handling bloated/overheaded data
structures. In these situations using intermediary higher level languages
would help.

PS: I'm sorry my first post is unintentionally against new rule. I just
realized.

~~~
zer0rest
what new rule?

~~~
fsniper
[http://blog.ycombinator.com/new-hacker-news-
guideline](http://blog.ycombinator.com/new-hacker-news-guideline)

------
zackmorris
It's really too bad that there isn't a standard binary data structure format
that we could process the same way we process stdin/stdout/stderr. Stumbled
onto BSON the other day, which allows MongoDB to store binary JSON:

[http://bsonspec.org](http://bsonspec.org)

[http://en.wikipedia.org/wiki/BSON](http://en.wikipedia.org/wiki/BSON)

The main issue is that it doesn’t have arbitrary precision like, say, OpenTNL.
That could probably be overcome somewhat with compressed streams.

It would also be nice to have a leaf-first format, or hints for depth-first or
breadth-first traversal so that the stream could be processed as it’s
received.

These are really old ideas so maybe I just haven’t stumbled onto a solution.

~~~
userbinator
ASN.1 DER? It's very compact and length-delimited.

~~~
vog
That was my first thought, too. Why do people forget about good old standards?
There are so many attempts to do Binary JSON, Binary XML, Binary YaML, ...
whatever. Followed by language-specific serialization formats like Python's
Pickle, php-serialize, and so on.

What a huge mess!

If any binary format wants to become a universal competitor to those, it:

\- should offer some big advantages over the already established ASN.1, and

\- should be significantly smaller and faster than compression of human-
readable formats (e.g. json+gzip or xml+xz)

------
slapresta
Thankfully, no.

XML, as a medium of transmission of structured data, pales when compared to
basically any other commonly used alternative in 2015, in ease of use,
readability and available tooling.

Now, I enthusiastically agree with the underlying ideal of having programs
communicate on the shell through data structures instead of through eight-bit
bytes. Let's do that, be it XML or anything else.

~~~
carterehsmith
I am not the one to promote XML, far from it, but the tooling for it is
superb. Like, recently, we had to connect our (Java) app to NetSuite web
service. All it took is to point to their web service endpoint, and the tool
created strongly-typed (Java) client classes and proxies. That is in Java, and
it works with C#, too (Visual Studio). Provide the endpoint, VS reads the WSDL
exposed by the service, and generates all the required classes and proxies.
And at all times, you see what functions the web service is exposing, and what
parameters you need to send, and what kind of response you will get.

Much better then your typical REST API.

~~~
wantab
So it does everything you want, with ease, in every language, better than
everything else, but you won't promote it?

~~~
carterehsmith
True. What happened was, back in the 2000-something, XML was so hyped that
people pushed XML as a solution for everything.

On the negative side, we had to endure managers coming back from “XML
conferences” and recommending we ditch our

Oracle or Sybase or whatever and replace it with an XML database, because XML.

On the plus side, vendors made sure to support it so to this day we have good
tooling in Eclipse, VS, etc.

Also on the bad side they have gone crazy overboard so we have security issues
like this:

[https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Pr...](https://www.owasp.org/index.php/XML_External_Entity_\(XXE\)_Processing)

So I still have WS endpoint but I rewrote it to manually parse XML and avoid
all the crappy extensions that they put in and I cannot disable.

So yeah, this is loaded… I am not promoting XML. It has its issues. Use what
you like.

~~~
wantab
Your complaint is about people who don't know what XML is and how to use it
and broken tools, not XML. XML does not cause any of the examples you show.

------
agumonkey
Out of curiosity, Google Trends (xml)
[https://www.google.com/trends/explore#q=xml](https://www.google.com/trends/explore#q=xml)

~~~
delbel
Cuba is the #1 country interested in XML.

~~~
agumonkey
Maybe they just got news of its existence.

------
haddr
Same now for json. Thankfully there is jq (json path).

------
ynrbode
Be very cautious of texts that start with "this is how we can" instead of
"this is why we should"

