
Show HN: UXY – adding structure to Unix tools - rumcajz
https://github.com/sustrik/uxy
======
dima55
This is becoming a really crowded space. Some other similar tools that make
slightly different design choices and that have variable envisioned use cases:

\- [https://github.com/dkogan/vnlog](https://github.com/dkogan/vnlog)

\- [https://csvkit.readthedocs.io/](https://csvkit.readthedocs.io/)

\- [https://github.com/johnkerl/miller](https://github.com/johnkerl/miller)

\- [https://github.com/BurntSushi/xsv](https://github.com/BurntSushi/xsv)

\- [https://github.com/eBay/tsv-utils-dlang](https://github.com/eBay/tsv-
utils-dlang)

\- [https://stedolan.github.io/jq/](https://stedolan.github.io/jq/)

\- [http://harelba.github.io/q/](http://harelba.github.io/q/)

\-
[https://github.com/BatchLabs/charlatan](https://github.com/BatchLabs/charlatan)

\- [https://github.com/dinedal/textql](https://github.com/dinedal/textql)

\- [https://github.com/dbohdan/sqawk](https://github.com/dbohdan/sqawk)

(disclaimer: vnlog is my tool)

~~~
nailer
> This is becoming a really crowded space.

Those who fail to understand powershell are condemned to recreate it poorly.

It'd be great for GNU to create a standard for native structured output (as
well as a converter tool like the one in this post), then have other tools be
able to do it.

But realistically, pwsh is Open Source, runs just fine on Unix boxes and does
this now.

~~~
majkinetor
Amen to that

------
bayareanative
A related problem is the constant churn of logging.. taking structured data,
destructuring it with a string serialization and then parsing it again.

This resource-wasting antipattern pops up over and over again.

Also, logs are message-oriented entries and serializing them as discrete,
lengthy files is insane.

Structured data should stay structured, say a time-series / log-structured
database. Destructuring should be a rare event.

~~~
xelxebar
I think Plan 9 gives a nice distinction. We use files as both a persistent
store as well as an interface, so it seems nice to separate those two concerns
out. That way you could have your logs as a UI into application state and only
incur the overhead of serialization and persistence when you deem necessary.

Caveat, my Plan 9 experience is mostly theoretical.

------
jph
Excellent, thank you for creating UXY!

I will donate $50 to you or your favorite charity to encourage a new feature:
to-usv, which outputs Unicode separated values (USV) with unit separator
U+241F and record separator U+241E.

Unicode separated values (USV) are much like comma separated values (CSV), tab
separated values (TSV) a.k.a. tab delimited format (TDF), and ASCII separated
values (ASV) a.k.a. DEL (Delimited ASCII).

The advantages of USV for me are that USV handles text that happens to contain
commas and/or tabs and/or newlines, and also having a visual character
representation.

For example USV is great for me within typical source code, such as Unix
scripts, because the characters show up, and also easy to copy/paste, and also
easy to use within various kinds of editor search boxes.

Bonus: if the programming implementation of to-usv calls a more-flexible
function that takes a unit separator string and a record separator string,
then you can easily create similar commands for to-asv, to-csv, etc.

~~~
inimino
Eventually you have to deal with content that contains your separator
characters, however obscure. So essentially you have two choices:

A. use some "weird" separators and hope those don't appear in your input

B. bite the bullet and escape and parse properly

Option A is perfectly reasonable for one-offs, where you can handle
exceptional cases or know they won't occur because you know what's in the
data. However for reusable code, you need option B, which means not using
`cut` to parse CSV files, for instance (since commas can occur inside double-
quoted strings). In that case, what's the benefit of using USV over an
existing, more common, format?

~~~
jph
Yes you're exactly right about escaping.

Orthogonal to escaping, the choice is what characters to use for unit
separator and record separator.

If the data are for machines only, then for me the choice of characters
doesn't matter. If the data are potentially for reading or editing, such as by
a programmer, then my choice is to prefer typically-visible characters over
typically-invisible characters and/or zero-width characters (e.g. ASV a.k.a.
DEL a.k.a. ASCII 30 & 31).

My choice of USV is thus because U+241F and U+241E are visible, and also in
Unicode they are semantically meaningful.

~~~
inimino
Glad we agree on escaping.

I'm still not sure what the value is over CSV, which also has visible
delimiters. It's true that you have to establish/enforce a specific convention
around escaping and quoting, since CSV has historical variation here. But it
would make more sense to me to encourage any particular consistent handling of
CSV, rather than yet another entirely new separator. At least some tools
already support CSV, whereas nothing currently supports USV, as far as I know.

------
nerdponx
Seems a lot like the Powershell model, which I have mixed feelings about. It's
nice for shell _scripts_ , but it makes day-to-day usage cumbersome. I think
you can actually use Powershell on Linux, but I'm interested to see where this
tool goes.

~~~
nailer
> It's nice for shell scripts, but it makes day-to-day usage cumbersome.

How? `ps | kill node`. No pgrep hack because ps output a list of processes,
not a line of text. As a Unix person Windows Terminal and pwsh is where I
spend most of my day.

~~~
majkinetor
> In my experience Powershell is quite a bit more verbose than that.

This is common misconception. Posh allows both verbose and shorten styles via
various mechanisms - command aliases, parameter abbreviations and aliases,
proxys, pipeline settings for objects etc.

------
koolba
> uxy align

> Aligns the data with the headers. This is done by resizing the columns so
> that even the longest value fits into the column.

> ...

> This command doesn't work with infinite streams.

Does this do nothing with infinite streams or does it do a "rolling"
alignment?

Even with an infinite stream you can keep track of the max width seen thus far
and align all future output to those levels. It'll still have some jank to the
initial alignment but assuming a consistent distribution of the lengths over
time it'd be good enough for eyeballing the results.

~~~
rumcajz
Currently it uses the alignment of the headers as the default. It's only when
a field exceeds the size of the header when the output is misaligned. The next
record returns to the default alignment though.

I was thinking about adding a 'trim' command that would trim long fields to
fit into the default field size.

------
no_gravity
I think this is putting too many different functions into a single command.

    
    
        uxy ls
    

This looks like it "tabifies" the output of a given command. Aka it turns the
output of the given command into a tab seperated format.

    
    
        uxy reformat "NAME SIZE"
    

This seems to collide with the above since "reformat" is not a command which
will be tabified. Instead it filters stdin for two columns.

    
    
        uxy align
    

This seems to do the same as "column -t".

------
adrianratnapala
> * any other escape sequence MUST be interpreted as ? (question mark)
> character.

Isn't it better to forbid them? Presumably you are saving the space for
further extensions, but this is allowing readers to interpret them as '?'

Similarly what is the rationale for interpreting control characters is '?'?
Instead you can ban them, with the possible exception of treating tabs as
spaces.

~~~
rumcajz
Postel's principle: By liberal in what you accept... It means that the tool
won't crash just because there's weird input.

~~~
wgoodall01
Isn't it better to crash than to fail silently, possibly storing malformed
data?

~~~
lioeters
Known as "Fail early, fail loudly".

[https://en.wikipedia.org/wiki/Fail-fast](https://en.wikipedia.org/wiki/Fail-
fast)

------
vram22
For anyone interested in learning how to create their own Unix command-line
tools (not just use them), feel free to check out these links to content by me
(about doing such work in C and Python):

1) Developing a Linux command-line utility: an article I wrote for IBM
developerWorks:

[https://jugad2.blogspot.com/2014/09/my-ibm-developerworks-
ar...](https://jugad2.blogspot.com/2014/09/my-ibm-developerworks-article.html)

Follow links in the article to go to the source code of the tool described in
the tutorial, and the PDF of the IBM dW article.

2) My comment, here:

[https://news.ycombinator.com/item?id=19564706](https://news.ycombinator.com/item?id=19564706)

on this HN thread:

Ask HN: Looking for a series on implementing classic Unix tools from scratch:

[https://news.ycombinator.com/item?id=19560418](https://news.ycombinator.com/item?id=19560418)

------
dharmatech
Cool project!

Have you considered having a way to render output in a graphical toolkit?

See for example:

[https://github.com/dharmatech/PsReplWpf](https://github.com/dharmatech/PsReplWpf)

which renders PowerShell output in WPF presentations.

~~~
dima55
You can use this (I wrote it, and have been using it daily for many years):
[http://github.com/dkogan/feedgnuplot](http://github.com/dkogan/feedgnuplot)

------
mijoharas
Can anyone elaborate on why the tool is named UXY? I couldn't find anything in
the repo, and there is no wiki.

~~~
imglorp
Seems like an acro-mondeau of UX (user experience) and XY (tabular format).
The tool normalizes some of the Unix tool outputs as a table which can be
manipulated.

------
rabidrat
Very cool, I've had a similar idea myself recently! Though, why not go with a
simpler format like TSV (tab-separated values)? Then you don't have to worry
about quoting and escaping anything but tabs and newlines (which are very rare
in tabular data).

~~~
rumcajz
Tabs are a nightmare to deal with when you want to align the columns. Also, I
don't consider tabs to be human readable: They are too easily confused with
spaces. (Case in point: make)

~~~
rabidrat
Fair enough, I've experienced those pains myself. But what is the strategy
with UXY? kind of a semi-fixed-width format that is only partially aligned,
but still requires quoting/escaping? I'm not sure it's any better than CSV or
PSV (pipe), and it also doesn't interoperate with existing tools.

I'm not attacking your overall idea, btw. I've just given this a bunch of
thought myself, and the design space is very tricky. My current approach would
be to use ASV (ascii codes 27-31) and abandon 'cat'-based readability in favor
of a 'vcat' which gives you a better visual representation. Of course that has
its issues too.. :)

~~~
kbd
> I'm not sure it's any better than CSV or PSV (pipe), and it also doesn't
> interoperate with existing tools.

I think the point is that it's largely a formalization of what unixy tools
already do.

