

Convert JSON to a Unix-friendly line-based format - adulau
https://github.com/dvxhouse/jsonpipe

======
haberman
"Everyone I know prefers to work with JSON over XML, but sadly there is a sore
lack of utilities of the quality or depth of html-xml-utils and XMLStarlet for
actually processing JSON data in an automated fashion, short of writing an ad
hoc processor in your favourite programming language."

Actually there is just such a suite of utilites! See
<https://github.com/benbernard/RecordStream>

~~~
adulau
RecordStream seems more complete but requires much more dependencies. I really
like the simplicity of jsonpipe and how easy is to use it:

    
    
          curl -s "http://feeds.delicious.com/v2/json/adulau  | jsonpipe -s "#"  | grep  "\#u" | cut -f2 | sed -e"s/\"//g" | xargs -d"\n" wget -r -l 1 –p –-convert-links
    

A simple example to take a local mirror of my last del.icio.us bookmarks...

~~~
jedsmith
"simple"

------
sigil
Maybe you're solving a different problem, but I'm not sure emitting one key-
value pair per line is ultimately the way to go:

    
    
      $ echo '[{"a": [{"b": {"c": ["foo"]}}]}]' | jsonpipe
      /   []
      /0  {}
      /0/a        []
      /0/a/0      {}
      /0/a/0/b    {}
      /0/a/0/b/c  []
      /0/a/0/b/c/0        "foo"
    

In my own work, I've built up a suite of stream-based nested record processing
tools that accept & produce JSON, protocol buffers, and a unix tab-delimited
format. For the unix format it's been more useful to stick to the standard
one-record-per-line thing, and let the user specify what fields to extract and
their order.

Here's a depressing example of the fun you can have with new media and old
unix tools, to give you some idea:

    
    
      $ mill io -r json -w texty -W fields=in_reply_to_screen_name < tweets.json | \
        grep -v -E '^(None|)$' | sort | uniq -c | sed -e 's#^ *##' | \
        sort -k1,1 -nr | head -5
      258 justinbieber
      248 ddlovato
      184 gypsyhearttour
      164 Logindaband
      145 Louis_Tomlinson

------
jedsmith

        import os.path as p
    

Renaming imports to single letters bugs me. I went to see where it's used, and
it isn't. Binding pyflakes to Cmd+S in TextMate was the best thing I ever did,
and it would have caught this. pyflakes is seriously awesome in that role, and
pylint before committing.

I'm also surprised that simplejson is used, instead of the built-in json in
Python 2.6 and up. A good solution is:

    
    
        try: import json
        except ImportError: import simplejson as json

~~~
masklinn
> I'm also surprised that simplejson is used

2.6's json lacks simplejson's C accelerators [0] which makes it roughly 20
times slower than simplejson (2.7's simplejson is also ~50% slower than
simplejson, as new performance optimizations were added in 2.1.0, as well as
memoizations). Simplejson is also updated more often, which lead to it having
less bugs, and interesting new features (the ability to natively serialize
decimals was added in 2.1 for instance)

simplejson should be the preferred import, with json as a fallback if
available.

[0] 2.6's json is simplejson 1.9; 2.7's is 2.0.9; the latest release of
simplejson is 2.1.3 and 2.1.4 is in preparation

~~~
jedsmith
I elect to stick with the standard library in all cases, so I know how my
software will perform everywhere without a redundant external dependency.
Sticking with the standard library means you'll eventually get those
improvements, too.

~~~
bretthoerner
> a redundant external dependency

Is pip install that hard? Surely any deployed project is already managing a
requirements.txt/buildout/setup.py/etc.

> means you'll eventually get those improvements

The C speedup extension for simplejson existed well before it was merged into
Python stdlib as "json". Are you so sure you'll eventually get said
improvement?

------
zpoley
Here is my contribution to Unix friendly JSON command line tools (requires
Node.js and NPM): <https://github.com/zpoley/json-command>. Here are a couple
other good ones: <http://kmkeen.com/jshon/> <https://github.com/micha/jsawk>

------
y0ghur7_xxx
Why not just use Rhino or any other stand alone javascript interpreter?

<http://www.mozilla.org/rhino/>

a simple example

echo 'x=[1,2,3];x[1]'|js

------
edd_dumbill
Line-based processing is still important! This work reminds me of an article I
wrote 11 years ago covering Sean McGrath's work on PYX—a line-based format for
XML—see <http://www.xml.com/pub/a/2000/03/15/feature/index.html>.

That work derived from that of Charles Goldfarb on SGML, dating from 1989 on
ESIS, ISO 8879.

We'll always be downsampling to something we can use with sed, grep and awk.
They're too handy not to.

~~~
wladimir
Does line-based processing still hold up? I tend to use it less and less these
days, in favor of tools that process records instead of lines. There's only so
much you can meaningfully store in a line of text, there is no standardized
parsing, and it has all kinds of escaping issues if you have fields with
embedded newlines / separators.

(FYI even syslog is moving from strictly line based to a more structured
format, RFC5424/5425)

------
wnoise
> Because the path components are separated by / characters, an object key
> like "abc/def" would result in ambiguous output. jsonpipe will throw an
> error if this occurs in your input, so that you can recognize and handle the
> issue. To mitigate the problem, you can choose a different path separator:

Ugh. I should never ever have to pick details of the format to work around
content in the format. The only real solution is escaping, though it does add
complexity.

~~~
asymptotic
Hahahah he uses a Unicode snowman as his example delimeter!

$ echo '{"abc/def": 123}' | jsonpipe -s '☃' ☃ {} ☃abc/def 123

I'm ready to face the day with a smile on my face.

------
tebeka
cat foo.json | python -m json.tool

~~~
vlisivka

      $ cat foo.json 
      { foo:"bar" }
      $ js -e "var doc=`cat foo.json`; print(doc.foo);"
      bar

------
sigil
Interesting. For faster flattening of nested structures into paths -- and the
inverse operation, unflattening -- you could use this Python C extension:

<https://github.com/acg/python-flattery>

Full disclosure, I'm the author. ;) It uses "." as the path separator, but
would be easy to allow "/".

------
brendano
The use case is a bit different, but I wrote a little converter to TSV, adding
in a header. I mostly use it for input into R, but I use it a lot. It only
works for fairly flat JSON objects.
<https://github.com/brendano/tsvutils/blob/master/json2tsv>

------
fforw
I've written a "json" command line tool with nodejs that offers transformation
of JSON with modern javascript expressions (with support for Array.map,
Array.reduce etc).

<http://fforw.de/post/scripting-json/>

