
Next generation Unix pipe by Alex Larsson - jobi
http://blogs.gnome.org/alexl/2012/08/10/rethinking-the-shell-pipeline/
======
rbanffy
I find the tendency to repeat Microsoft's mistakes deeply disturbing. Even if,
in this case, the author acknowledges PoweeShell goes too far, his own idea
goes too far.

I'd be all in with flags that make ps or ls spit JSON or XML, but this typed
nonsense? What when I want to output a color? Will I need a new type?

Oh... and the sort thing... its not hard to sort numerically.

~~~
zokier
>I find the tendency to repeat Microsoft's mistakes deeply disturbing.

What, in this case, you consider "Microsoft's mistake"? I thought that
PowerShell was commonly considered conceptually sound, but flawed in the
implementation, mostly for it's verbosity making it unwieldy for interactive
use. If this project can solve that, then I don't see it "repeating
Microsoft's mistakes". Instead it would be correcting them.

------
yason
I wonder where this comes from. There's no need for next generation.

People have, for thirty years or so, successfully printed their data into a
suitable textual streams for processing with programs glued together with
pipes, and optionally parsed the results back into some native format if so
required.

Meanwhile, none of the "next generation" pipes have gained any momentum.
Obviously they solve something which is either not a problem or they do solve
some problems but create new ones in greater numbers than what they solved,
tipping the balance into negative.

Any object or intermediary format you can think of can be represented in text,
and you're back to square one. For example, even if there's XSLT and XQuery,
you can just serialize trees of XML elements into a row-based format of
expression and use grep on the resulting stream to effectively make
hierarchical searches inside the tree.

~~~
ufo
I have the opposite experience, actually. Its so damn annoying to have
programs communicating structured data with each other over pipes that more
complicated things inevitably diverge into a) monocultural programs made with
a single programming language and that don't communicate with the outside
world or b) Some form of exchange format (JSON, XMl, etc) that needs to be
explicitly supported by every participant.

And unix utilities suck at handling structured data. If your file format is
line based you might have a chance of it being easy to work with but don't
even ask what happens if you insert a newline inside a textfield then.

------
veyron
That is a terrible idea: sometimes the app can take advantage of a constraint
to minimize work done.

In your example, if we just wanted to filter for a particular user, dps would
have to print out ALL of the information and then you could pick at it. This
doesn't seem bad for ps (because there's a hard limit) but in many other
examples the output could be much larger than what is needed. That's why
having filtering and output flags in many cases is more efficient in
generating everything.

As a side note: To demonstrate a dramatic example, I tried timing two things:

    
    
        - dumping NASDAQ feed data for an entire day, pretty-printing, and then using fgrep
        - having the dumper do the search explicitly (new flags added to program)
    

Both outputs were sent to /dev/null. The first ran in 35 minutes, the second
in less than 1 minute

~~~
Someone
If "sometimes the app can take advantage of a constraint" is an argument here,
you should be against all usage of pipes.

~~~
veyron
That's not true. So in the case of `ps`, there is a known limit to the number
of processes, and it is fairly small, so the performance hit is limited.

As another example in this context, if the original data source is gzip'd,
it's faster to gunzip and then pipe rather than integrating the gzip logic
into the app itself.

~~~
Someone
I still disagree. I think you are arguing for the inclusion of, at the least,
grep, cut, head and tail in cat.

I do not claim that is a bad idea (conceptually, pipes do not require multiple
processes, and those tools could be dynamically linked in) but why stop at
those tools? Some people would argue that sed and awk also should be in,
others would mention perl, etc.

I also do not see why it would be faster to use an external gzip tool through
a pipe. If it is, the writer of the 'tool with built-in unzip' could always,
in secret, start an external unzip process to do the work.

------
mongol
For certain types of unix pipeing, I have found it useful to pipe from tool to
CSV, and then let sqlite process the data, using SQL statements. SQL solves
many of the sorting, filtering, joining things that you can do with unix pipes
too, but with a syntax that is broadly known. Especially the joining I have
found hard to do well with shell/piping.

I think a sqlite-aware shell would be awesone, especially if common tools had
a common output format (like csv with header) where that also included the
schema / data format.

~~~
agumonkey
very clever

------
peterwwillis
My preference for a "next generation pipe": Shared file descriptors. (Sort of)

It would work virtually the same as a standard pipe; the difference being you
could control whether it was read, write, or both, and every application you
'piped' to would have access to the same file descriptors as the parent,
unless a process in the path of the pipe closes one.

The end result will be the equivalent of passing unlimited individual
arbitrary bitstreams combined with the ability to chain arbitrary programs. In
fact, you could simplify things by simply passing the previous piped command's
output as a new file descriptor to the next program, so you could easily
reference the last piped program's output, or any of the ones before it.

For example:

 _cat arbitrary_data.docx | docx --count-words <$S[0] | docx --head 4 <$S[0] |
docx --tail 4 <$S[0] | docx --count-words <$S[2] | echo -en "Number of words:
$STREAM[1]\n\nFirst paragraph: $STREAM[2]\n\nLast paragraph:
$STREAM[3]\n\nNumber of words in first paragraph: $STREAM[4]\n"_

STREAM[0] is the output of 'cat'. STREAM[1] is the counted words of STREAM[0]
($S[0] is an alias). STREAM[2] is the first 4 lines of the doc. STREAM[3] is
the last 4 lines of the doc. STREAM[4] is the counted words from STREAM[2]
(note the _" <$S[2]"_). And STREAM[5] is the output of 'echo', though since
it's the last command, it becomes STDOUT.

There may be a more slick way of doing this, but you can see the idea. Pass
arbitrary streams as you pipe, and reference any of them at any point in the
pipe to continue processing data arbitrarily in a one-liner.

...

Actually, it looks like this is already built into bash (sort of), as the
_Coprocesses_ functionality. I don't know if you can use it with pipes, but
it's very interesting.

------
DHowett
I like the idea of processes dumping structured objects: pipes are rather
often used for the processing of structured data, and while tabulated output
certainly makes it easier, we still end up effectively using constants: cut to
the third column, sort the first 10 characters, and print the first four
lines.

This method is fragile when given diverse input: what if the columns could
themselves contain tabs, newlines, or even nul bytes?

Passing objects as binary blobs, on the other hand, doesn't allow for ease of
display or interoperability with other tools that don't support whatever
format they happen to be. This, of course, can be rectified with a smart shell
with pretty-print for columnar data (insofar as a shell could be charged with
data parsing; you may imagine an implicit |dprint at the end of each command
line that outputs blobs).

I'd also be interested in seeing a utility that took "old-format" columnar
data and generated structured objects from it, of course, with the above
format caveats.

~~~
chris_wot
Something like a cut, only we call it dcut? Actually sounds like a pretty good
idea - that way those who don't want to switch to the new format don't have
to, and you can pipe it through this program to create the new style
structured output...

~~~
alexlarsson
The inverse of dtable? Yeah, that would be very nice.

------
rogerbinns
What would be ideal to solve first is some sort of initial format negotiation
on pipes. Otherwise you will end up with the wrong thing happening (eg having
to reimplement every tool, spewing "rich" format to tools that don't know it,
or regular text to tools that could do better).

We've already seen something like this - for example ls does column output if
going directly to a screen, otherwise one per line, and many tools will output
in colour if applicable. However this is enabled by isatty() which uses system
calls, and inspecting the terminal environment for colour support.

Another example is telnet which does feature negotiations if the other end is
a telnet daemon, otherwise just acts as a "dumb" network connection. (By
default the server end initiates the negotiations.)

However the only way I can see this being possible with pipes is with
kernel/syscall support. It would provide a way for either side to indicate
support for richer formats, and let them know if that is mutually agreeable,
otherwise default to compatible plain old text. For example an ioctl could
list formats supported. A recipient would supply a list before the first
read() call. The sender would then get that list and make a choice before the
first write() call. (This is somewhat similar to how clipboards work.)

So the question becomes would we be happy with a new kernel call in order to
support rich pipes, which automatically use current standard behaviour in its
absence or when talking to non-rich enabled tools?

I would love it if grep/find/xargs automatically knew about null terminating.

~~~
dfc
man grep:

    
    
       -Z, --null
          Output a zero byte (the ASCII NUL character) instead  of  the  character  that  normally
          follows  a  file  name.   For example, grep -lZ outputs a zero byte after each file name
          instead of the usual newline.  This option makes the output  unambiguous,  even  in  the
          presence  of file names containing unusual characters like newlines.  This option can be
          used with commands like find -print0,  perl  -0,  sort  -z,  and  xargs  -0  to  process
          arbitrary file names, even those that contain newline characters.
    
       -z, --null-data
          Treat the input as a set of lines, each  terminated  by  a  zero  byte  (the  ASCII  NUL
          character)  instead of a newline.  Like the -Z or --null option, this option can be used
          with commands like sort -z to process arbitrary file names.
    

man xargs:

    
    
       --null
       -0     Input  items are terminated by a null character instead of by whitespace, and the quotes
          and backslash are not special (every character is taken literally).  Disables the end of
          file  string,  which  is treated like any other argument.  Useful when input items might
          contain white space, quote marks, or backslashes.  The GNU find -print0 option  produces
          input suitable for this mode.
    

man find:

    
    
       -print0
          True;  print  the  full  file  name on the standard output, followed by a null character
          (instead of the newline character that -print uses).  This allows file names  that  con‐
          tain newlines or other types of white space to be correctly interpreted by programs that
          process the find output.  This option corresponds to the -0 option of xargs.

~~~
alexlarsson
Yes, in other words, the parent is right that zero termination is currently
_not_ automatic.

~~~
dfc
He did not say automatic, he said "knew about" nulls. When you talk about
automagically detecting nulls I have this image of an ascii-art Clippy with a
cowsay bubble that says "I see you are using null terminated data, I have
enabled --null for you."

~~~
rogerbinns
I did exactly say automatic. It is the previous word to "knew about" you
quoted!

And yes, I would expect that find detects that when it is talking to xargs
then null termination should be used without the user having to go and fish
out what the options are for each tool. And if you used ps with another tool
that prefers json then ps can automatically do that, again without having to
find and maintain flags.

------
huhtenberg
At the risk of stating the obvious - this won't take off for a simple reason
of being too complex _by Unix standards_.

~~~
IsTom
I'm not sure, it's probably not too complex by GNU standards.

------
dfc
_"Even something as basic as numerical sorting on a column gets quite
complicated."_

    
    
        sort -g -k field_num

~~~
comex
Two problems: the header is sorted along with the fields, and you have to look
up the field number. Insurmountable? No; but somewhat complicated.

~~~
dfc
`grep -v` or `tail` seems like a lot easier workaround than desiging a brand
new shell piping system. Maybe there are other use cases but sorting numeric
fields is definitely not worth the effort. A lot of black magic can be easily
conjured up with `col`, `tr`, `cut`, `column`, `head`, `tail`, `grep`, `pr`,
and `sort`; and that's without ever even touching `sed` and `awk`.

~~~
alexlarsson
Obviously a lot can be done, but its hardly easy, you yourself call it black
magic. But its _very_ easy to do with typed data, and only the beginning of
what you _can_ do.

~~~
dfc
When I said black magic[1] the last thing I was trying to convey was that
using coreutils/bsdmainutils was complicated. Typed data is easy to work with,
but creating a sophisticated unix pipes 2.0 is not. No matter how complicated
you think coreutils/bsdmainutils mastery is, you have to admit its a lot
easier than building unix pipes 2.0.

If you throw in numutils and moreutils you can go nuts with columns of data.
What tasks would you like to accomplish on the command line with columns of
typed data and pipes 2.0?

[1] On a side note I was surprised that we had different conceptions of what
black magic. I was going for evil, nefarious, and/or unorthodox. Have I been
using the term wrong? That's an honest question, it would not surprise me if I
have been oblivious.

~~~
alexlarsson
I don't expect every user to create unix pipes 2.0, so the difficulty of that
is not really what needs to be compared. It will only have to be done once.

And once this is done any user can avoid having to painstakingly construct
pipelines that try to cut out the right columns to treat as numbers, or avoid
all the problems parsing strings that may contain spaces or other control
characters. You can do an operation like:

filter out all processes with %cpu > 20 with uid > 1000 and sort by second
cmdline arg as:

dps | dfilter pcpu ">" 20 uid ">" 1000 | dsort "cmdvect[1]"

Obviously a made up example, but something like this is easy to read and
write, whereas something working on tabular ascii data would be quite long and
complicated.

As per black magic, I have about the same interpretation as you. I didn't
really misunderstand it to be about how complicated it was. However, "black
magic" certainly has a feel of "you should not do this", and arguing that you
can then use that in order to do something which could instead be simple and
obvious in a typed system seems kind of weird.

~~~
vidarh
Your example would be about the same length with awk and sort, with the only
caveat that you need to figure out the field numbers, and the upside that I
can trust the tools are available pretty much everywhere.

~~~
alexlarsson
Its doable yeah, but its a lot more work.

First you have to handle the header specially (want it in the result but not
in the comparisons).

In order to compare by uid you need numeric uids (-n), but that means you
can't also get the readable username, so you need a custom output format.

Then you need to ensure the output format is such that nothing with possible
spaces or control chars can end up in a column before the data you're looking
at, as then finding the right column is hard.

Even then, extracting the first command line arg like in the example will fail
in the case of a binary name that has a space in it (as there is no way to
know which spaces in the commandline corresponds to actual spaces in the
arguments or just delimiters).

------
snprbob86
Neat! I've commented about this very problem before on several of the many
threads regarding "object pipes", ie. REPLs.

<http://news.ycombinator.com/item?id=1033623>

<http://news.ycombinator.com/item?id=1566325>

<http://news.ycombinator.com/item?id=2527217>

Since that last comment, I've been working a bunch with Clojure, which has a
far more expressive variant of JSON, as well as some heavy duty work with
Google's Protocol Buffers.

A few points:

1) Piping non-serializable objects is a _BAD IDEA_. That's not a shell, that's
a REPL. And even in a REPL, you should prefer inert data, a la Clojure's
immutable data structures.

2) Arbitrary bit streams is, fundamentally, unbeatable. It's completely
universal. Some use cases really don't want structured data. Consider gzip:
you just want to take bytes in and send bytes out. You don't necessarily want
typed data in the pipes, you want typed pipes, which may or may not contain
typed data. This is the "format negotiation" piece that is mentioned in the
original post. I'd like to see more details about that.

3) There seems to be some nebulous lowest common denominator of serializable
data. So many things out there: GVariant, Clojure forms, JSON, XML, ProtoBuf,
Thirft, Avro, ad infinitum. If everything talks its own serialization
protocol, then none of the "do one thing well" benefits work. Every component
needs to know every protocol. One has to "win" in a collaborative shell
environment. I need to study GVariant more closely.

4) Whichever format "wins", it needs to be self-describing. A table format
command can't work on field names, unless it has the field names! ProtoBufs
and Thrift are out, because you need to have field names pre-compiled on
either side of the pipe. Unless, of course, you start with a MessageDescriptor
object up front, which ProtoBufs support and Avro has natively, but I digress:
Reflection is necessary. It's not clear if you need header descriptors a la
MessageDescriptor/Avro, or inline field descriptions a la JSON/XML/Clojure. Or
a mix of both?

5) Order is _critical_. There's a reason these formats are called
"serializable". Clojure, for example, provides sets using the #{} notation.
And, like JSON, supports {} map notation. Thrift has Maps and Sets too.
ProtoBufs, however, don't. On purpose. And it's a good thing! The data is
going to come across the pipe in series, so a map or set doesn't make sense.
Use a sequence of key-value-pairs. It might even be an infinite sequence! It's
one thing to support un-ordered data when printing and reading data. It's
another thing entirely to design a streaming protocol around un-ordered data.
Shells need a streaming protocol.

6) Going back to content negotiation, this streaming protocol might be able to
multiplex types over a single stream. Maybe gzip sends a little structured
metadata up front, then a binary stream. ProtoBufs label all "bytes" fields
with a size, but you might not know the size in advanced. Maybe you need two
synchronized streams on which you can multiplex a control channel? That is,
each pipe is two pipes. One request/response pair and the other a modal byte
stream vs typed message stream.

Overall. This is the nicest attempt at this idea I've seen yet. I've been
meaning to take a crack at it myself, but refused to do it without enough time
to re-create the entire standard Unix toolkit plus my own shell ;-)

~~~
alexlarsson
Regarding order. The dtools approach uses a stream (i.e. potentially infinite)
of variants. Each variant is a self contained typed data chunk which is by
itself not "streamable" (i.e. you have to read all of it). The data chunk is
strongly typed and the type is self-described.

The supported primitive types are: bool, byte, int16, uint16, int32, uint32,
int64, uint64, double, utf8 string (+ some dbus specific things).

These can be recursively combined with: arrays (of same type), tuples, dicts
(primitive type -> any type map), maybe type, and variant type

In my dps example I generate a stream of dictionaries mapping from string to
variant (i.e. any type). The type of each item in the map differs. For
instance cmdvec is an array of strings, whereas euid is an uint32.

~~~
snprbob86
Thanks. I looked at the GVariant page a bunch too.

It seems like the encoding is a steam of _{type, value}_ pairs, where values
can contain per-type headers as well.

Protobufs, on the other hand, use _{field, wire-type, value}_ where _field_ is
required to have an externally known type to parse _value_ , but _wire-type_
is sufficient to determine the length of value, so you can skip unknown fields
(used for backwards compatible protocols). In theory, required fields could
omit field and wire-type, but Protobufs deemed it more complexity than
justifies the space and performance impact.

Primitive values like integers are totally expected in any such format like
this. Their salient feature being that they're of known length. I'm a little
more leery about "arrays" or other data structures of variable length which
are encoded with a known length. Consider Pascal strings _{length, [chars]}_
vs C strings _{[chars], NULL}_. The later lends itself much better to
streaming protocols, but the former is far simpler to work with when you have
a complete dataset.

I ran into this situation with a Protobuf I was designing where the first
attempt had a message with a repeated field, but it became obvious that I
wanted a begin message, a repeated message of singular fields, and then an end
message, to allow a fast-start on the send, which didn't require to know the
full data set length up front.

There are, however, situations where you _do_ want the length up front. For
example, if you need to allocate space to put things. You can get faster
parsing if you know the total message size immediately. In general, however, I
don't think it matters all that much with modern languages and hardware.

This is one reason why Clojure has both lists and vectors. Lists are lazy
head/tail pairs and (count some-vector) is a constant time operation.
Unfortunately, Clojure's reader doesn't seem to offer streaming reads of lists
(I may be wrong about this).

The bigger issue with unbounded values is that they are more difficult to work
with in most languages. Haskell, Lisps, and other functional languages fair
far better than most, but once you start mixing fixed-sized messages with
known fields, with variable-sized sequences, you wind up with a situation like
_{x, [ys], z}_ where a piece of code wants to look at _z_ before looking at
_ys_. If that tuple is represented as an associative structure _{:x 1, :ys [2
3], :z 4}_ then it's suddenly very confusing that it's an ORDERED map and all
sorts of assumptions go out the window.

Even more fundamentally: Source code is a serialized protocol. You write down
text and the order of the characters on the page have meaning. Sometimes, that
order may be over-specified, but regardless, humans see order and make
assumptions from it, even when order doesn't matter.

------
forgotusername
I've quickly jotted some thoughts here: <http://damnkids.posterous.com/rich-
format-unix-pipes>

Regarding this version, standardizing on a particular transfer format is a bad
idea. If history has shown anything, it's that we like to reinvent this stuff
and make it more complicated than necessary (see also XDR, ASN.1, XML, etc. :)
pretty much on a 5 year cycle or thereabouts.

Do the bare minimum design necessary and let social convention evolve the
rest.

~~~
alexlarsson
Having to many different formats is also a problem though, as incompatible
formats means you can't combine two apps in a pipepine.

The negotiations in dtools is made using a F_GETLK hack with a magic value
offset. That approach could easily be extended to support multiple formats.

~~~
forgotusername
The way I see it there's two distinct problems being coupled together, kinda
like inventing HTTP but defining it only to be used with (html, gif, png) or
something. The reality is that if your solution gets even some adoption,
conventions will quickly emerge based on actual use rather than _expected_
use, which almost never goes well. Additionally when some after-market use is
discovered that wasn't part of the original spec, yet makes fabulous sense,
existing implementations may be better positioned to deal with it (instead of
suddenly finding they're being fed PNG files which are actually base64-encoded
XML, or something mad like that, typical shoehorning crap).

Similarly, producing a big ecosystem of utilities to go along with it will
probably result in a bunch of 1970s style compatibility commands that nobody
actually uses any more (say, in 2030).

But feel free to bake a glib-specific serialization in and I'll feel free to
pass it up. ;)

Love your fcntl() hack. My kind of hack!

------
rhizome
If I can do a slight PG impression, "what problem does this solve?"

~~~
comex
Among others, this problem:

[http://www.dwheeler.com/essays/fixing-unix-linux-
filenames.h...](http://www.dwheeler.com/essays/fixing-unix-linux-
filenames.html)

find -print0 is a lame hack, and even filenames with spaces (not newlines) are
somewhat messy to work with on the Unix shell.

Or a little recurring problem I have: How do I grep the output of grep -C
(matches showing multiple lines delimited with a "--" line)? I wrote a custom
tool to do it, which does the job, but really it would be nice if I could use
all the normal line-based Linux tools (sort, uniq, awk, wc, sed) with a match
as a "line".

~~~
luriel
> find -print0 is a lame hack, and even filenames with spaces (not newlines)
> are somewhat messy to work with on the Unix shell.

This problem is simply a flaw in sh (and its descendants), other shells handle
it much better, see for example Tom Duff's rc shell: <http://rc.cat-v.org>

Also note that Plan 9, the successor to Unix (and which uses the rc shell as
its main shell) doesn't even have a find command, find's design is not really
very unix-y.

As for your second questions, the answer might be structural regular
expressions: <http://doc.cat-v.org/bell_labs/structural_regexps/>

~~~
comex
> This problem is simply a flaw in sh (and its descendants)

Indeed, although it's not _just_ sh; if you want to, say, make a table of
filenames and some attributes of each file, you're in trouble if the filenames
contain spaces (awk, cut, sort don't work as easily) and screwed if they
contain newlines.

What does Plan 9 use instead of find?

> As for your second questions, the answer might be structural regular
> expressions:

I've actually been meaning to write a clone of the command line portion of
sam, tack on some slightly more powerful features, and try living with it...
it would be able to solve much of that use case, but I think it would be
cleaner if all the normal tools just knew that the output of grep -C is, in
fact, a list of multiline strings.

~~~
p9idf
The Plan 9 approach is to avoid creating problems for yourself by not using
spaces in file names to begin with. The file server initially disallowed
spaces in file names just as nulls and slashes are disallowed. That
restriction has since been relaxed,† but everyone still avoids spaces. If you
cannot avoid files with spaces in their names, there exists trfs,†† a file
system that transparently replaces spaces with something more convenient.

Instead of find, I run /bin/test on a list of files. For anything more
complicated than what test can handle, I use Inferno's fs program.†††

† [http://swtch.com/cgi-
bin/plan9history.cgi?f=1999/0323/port/c...](http://swtch.com/cgi-
bin/plan9history.cgi?f=1999/0323/port/chan.c;v=diff;e=1)

†† <http://a-30.net/inferno/man/4/trfs.html>

††† <http://www.vitanuova.com/inferno/man/1/fs.html> It has a misleading name.
It is not a file server.

------
m_eiman
Also see <https://github.com/unconed/TermKit/>

------
sprobertson
I've been playing around with a similar idea - using plain JSON as the message
format, you can make a set of pipeable command line utilities for manipulating
data from many web APIs.

~~~
zootm
Have you seen RecordStream? It's some good cli tools based on streams of json
which might be handy.

<https://github.com/benbernard/RecordStream>

I've used it a lot and it's a godsend for a most "record-y" manipulation.

------
lubutu
I've noticed that the NUL-termination problem [1] has come up a number of
times in these comments. If you want a solution to this that isn't so drastic
as an object system, perhaps take a look at Usul [2], non-POSIX 'tabular' Unix
utilities which use an AWK-style $RS.

[1]: <http://news.ycombinator.com/item?id=4369699>

[2]: <http://lubutu.com/soso/usul>

------
enthalpyx
Similar idea: <http://code.google.com/p/recordstream/>

------
chris_wot
What about providing a filter that converts to whatever format you can think
of? e.g. outputs in JSON or XML

~~~
Groxx
Because what are you converting _from_? It can't be turtles all the way down,
at some point there must be a defined system that everything speaks. Adding
output formats after that is relatively simple.

~~~
chris_wot
The output is typed, in a standard format. He defines in his post (cf output
of the dfs program).

------
pjmlp
Actually I prefer Powershell's approach to transfer objects, as it is more
flexible than standardize in a specific transfer format.

But I do concede that it has the downside that if the object lacks the
properties you want to access, then it might be painful in some cases.

~~~
manojlds
But it is not like you cannot do normal string processing using cmdlets like
"Select-String". And an object missing a property is almost same a column
missing in the returned text output right?

~~~
pjmlp
Good point, I've forgotten about that.

------
pka
I can't believe this. Just 2 or so weeks ago I set about writing exactly
something like this in Haskell [1]. It's by no means complete or even working
at this point, but basically what I had in mind was something like:

    
    
        yls | yfilter 'mdate = yesterday && permissions.oread = true' | yformat -ls
    

Every tool emits or consumes "typed" JSON (i.e. JSON data with an additional
JSON schema). Why typed? Because then the meaning of things like _mdate =
yesterday_ can be inferred from the type of _mdate_ and mean different things
depending on whether _mdate_ is a string or a date. In the case of a date, the
expression _mdate = yesterday_ can automatically be rewritten to _mdate >=
201208110000 && mdate < 201208120000_ etc. In the case of a string we do
string comparison. In the case of a bool we emit an error if the compared-to
value isn't either _true_ or _false_ , etc.

Basically, I wanted to build a couple of standard tools inspired by the FP
world, like filter, sort, map, fold (reduce) and have an universal tool for
outputting formatted data in whatever form is desired - be it JSON, csv files,
text files or custom formats. Every tool would support an -f parameter, which
means that its output is automatically piped through the format tool, so that
something like

    
    
        yls -fls
    

is functionally equivalent to

    
    
        yls | yformat -ls
    

which would output the JSON data from _yls_ in the traditional ls way on a
unix system.

    
    
        yls | yformat -csv
    

would output csv data. Some more examples:

    
    
        yls | yfold '+ size' 0
    

prints out the combined size of all files in the current directory.

    
    
        yls | ymap 'name = name + .jpg' | ymv
    

would append .jpg to all files in the current directory.

    
    
        ycontacts | yfilter -fcsv 'name = *John*'
    

would print out all Google contacts containing _John_ in their name as a csv
file.

    
    
        yps | yfilter 'name = java*' | yeval 'kill name'
    

would kill all processes whose names start with 'java'.

The cool thing about this is that this approach conserves one of the main
selling points of FP: composability. I.e. you can throw something like _yfold
'+ size' 0_ in a shell script and then write:

    
    
        yls | size.sh
    

This way people would be able to build an ever growing toolbelt of abstracted
functionality specifically tailored to their way of doing things, without
losing composability.

[1] <https://github.com/pkamenarsky/ytools>

~~~
flogic
Personally, I'm not feeling the quotes and would prefer parens since they're
nestable.

------
Rovanion
So JSON?

