Hacker News new | past | comments | ask | show | jobs | submit login
Jtc – CLI tool to extract, manipulate and transform source JSON (github.com)
122 points by vyuh 6 days ago | hide | past | web | favorite | 39 comments

> jtc is written in idiomatic C++ (the most powerful programming language to date)

Citation needed. The `jq` vs `jtc` section is interesting, but author seems a little full of himself with some of the explanations.

Oh yeah, immediately above that line:

> jq is written in C, which drags all intrinsic problems the language has dated its creation

And then to follow on the C++ claims:

> Main JSON engine/library does not have a single new operator, nor it has a single naked pointer acting as a resource holder/owner, thus jtc is guaranteed to be free of memory leaks (at least one class of the problems is off the table) - STL guaranty.

That's a lot more faith than I would be willing to put into C++ or C. Sure, that claim might be correct, but there's enough edge cases and undefined behavior in the language that I take it as a fairly bold claim unless it's also been thoroughly reviewed for any places some undefined behavior may be induced.

I mean, it's probably fine, but just the willingness to make such a bold claim in that way communicates the opposite of what the author likely intended for me.

> jq is written in C, which drags all intrinsic problems the language has dated its creation

I can't even parse this sentence. What does it mean?

FWIW jq was originally written in Haskell, then ported to C. The C source for the core parts is strikingly clean. I think, it's partly from this Haskell heritage that jq gets its nice composability.

However, TBF, I do find I must refresh jq syntax whenever I use it (pr only bc I don't process json often). It will be interesting to see what jtc does - though this foreshadowing does not bode well...

> "undefined behaviour"

Yeah, you don't know what you're talking about.

"Undefined behavior" simply means "stuff not covered by the ISO standard".

So Python and Rust are 100% UB, and people don't really care.

Undefined behavior can cause perfectly functioning code in one compiler version to be completely elided in a different version, as we've seen numerous times in the past. Review the article and comments here[1] for a good example of exactly this and discussion about the how and why, but the TL;DR is that a specific optimization in a specific compiler release determined that the adding of two variables, one of which a pointer, would result in an overflow which is undefined for pointers, and thus elided the check to see if the result was less than should be possible (overflowed), and the entire error handling case it resulted in. This caused the value to be used while invalid (as the error check and handling was elided), causing a segfault.

Perhaps the features used in this project protect against that, but my point is that undefined behavior and the way compilers deal with it is variable and problematic.

1: https://news.ycombinator.com/item?id=14163111

It's definitely problematic, but so is having your interpreter completely redefine language constructs on every point release. Or having different interpreter implementations act differently because there's no standard.

Somehow we make do in these cases and don't act like the sky is falling.

I do agree with the author that the learning investment required by jq (learning a full blown DSL) is total overkill for most simple JSON based tasks.

For those, I mostly use "jq ." to get all leaves on a single line and then feed the output to standard unix tools like grep cut awk and friends.

And for more complex tasks, python, perl, even C++ if speed is needed.

> For those, I mostly use "jq ." to get all leaves on a single line and then feed the output to standard unix tools like grep cut awk and friends.

There's a tool called `gron` designed for workflows like this, I've found it incredibly useful.

https://news.ycombinator.com/item?id=16727665 https://github.com/tomnomnom/gron/

> the learning investment required by jq (learning a full blown DSL) is total overkill for most simple JSON based tasks.

There's no reason why you would need to learn the whole DSL for simple tasks. Just learn what you need. For simple stuff, the DSL is also simple.

jtc also seems to have its own DSL by the way, and it doesn't really seem more intuitive than jq's:

  jtc -x'[0][:][name]<person>v [-1][children]<kids:false>f[0]<kids:true>v' -T'{"{person} has children":{kids}}' -r
That's apparently the equivalent of:

  jq '.Directory | map({"\(.name) has children": (.children | length == 0)}) | .[]'
There are some things that seem simpler on jtc, though, like searches of string values without regard to the json structure.

I’d really like a CLI tool which solves this problem using a syntax I already know. I struggle with jq because I use it quite infrequently.

IMO the ideal solution is something using pure JavaScript syntax, possibly with a library resembling jQuery for tree traversal.

Most of jq's syntax is just a mix of JavaScript and a typical shell's. Every expression has an input and output like a shell command's stdin and stdout. Functions work on arguments and their stdin to produce their stdout.

Here's an explanation of the syntax in the command I posted:

  jq '
    # output the value at "Directory" from input object
    # pipe to map (JavaScript also has map()). The argument of map works in
    # the context of each element in map's input array.
    | map(
      # Produce an object where the property name is a string that
      # interpolates the value of the "name" property of this element.
      # Instead of interpolating, we could have also used this more
      # JavaScript-ish (ES5) syntax:
      #   {(.name + " has children"): ...
      # The property value is an expression that gets the value of the
      # "children" property and pipes it to the expression `length != 0`.
      # `length` (which JavaScript also has) outputs the length of the
      # piped input, and then we compare that with 0.
      {"\(.name) has children": (.children | length != 0)}
    # map's output is a single record which is an array. We pipe that to
    # .[] to make multiple records, each an element of the array. The
    # syntax here is comprised of 2 parts: `.`, which is the input object,
    # and `[]` which is the subscript syntax without an index.
    | .[]
> IMO the ideal solution is something using pure JavaScript syntax

The greatest advantage of the current syntax is the great balance it has between legibility and terseness. I don't think making it pure JavaScript would be better.

> possibly with a library resembling jQuery for tree traversal

Using jQuery in JavaScript to traverse JavaScript objects? I don't know what to say...

Something like this would do the same, if somebody implemented a tool called jsed:

    jsed '
                x => [ 
                    x.name + " has children", 
                    x.children && x.children.length != 0
Interestingly enough there is already a tool called jsed which seems to kind of do this... https://www.npmjs.com/package/jsed

Edit: Note the jQuery part of it is for advanced cases like searching for specific nodes in the tree, then navigating back up to the parent. Basically the cases like "<Work>[-1][children]" from the jtc guide, I would write as: '$("Work").parent().find("children")'

That was supposed to be a `!=` instead of a `==`.

  jq '.Directory | map({"\(.name) has children": (.children | length != 0)}) | .[]'

The author has also written related tools. One to convert XML to JSON and back (https://github.com/ldn-softdev/jtm) and another to convert JSON to SQLite tables (https://github.com/ldn-softdev/jsl). Combining these with the hxnormalize tool ( https://www.w3.org/Tools/HTML-XML-utils/man1/hxnormalize.htm...), one can do very sophisticated manipulation on HTML web pages.

HTML -> XML (via hxnormalize) -> JSON (via jtm) -> process using jtc (or even jq)

> convert XML to JSON and back

This is basically impossible to do in a way that is compatible with other tools. Things like duplicate attributes of an object can exist in XML, but not in JSON. You can still work-around these limitations if you just have a pipeline using the same toolset, but part of the point of these tools is to then convert them back to a format that some other tool can use, which is where this pattern breaks down.

Here's a list of pitfalls: https://stackoverflow.com/questions/33072812/potential-probl...

This suggest a very scalable, easy approach to extract data from somewhat regular HTML...

I generally use xidel [1] for that type of task. Feed it xpath, css selectors or its own pattern matching thing.

[1] https://github.com/benibela/xidel

or just use xpath

At first I was a little skeptical of this tool's claims of being simpler than jq, as I struggled to understand why (for instance) they were using <> in some parts of their example and [] in others, but after going through their step-by-step explanation it all made sense, and it does seem very simple. A tool that simplifies things is always welcome.

The "User Guide.md" file is over 3000 lines long, comprising some 133KB. I read through about a quarter of it, before my eyes started to glaze over.

I'm not convinced that I would call this "simpler" than jq.

jq is one of the most intentionally designed pieces of software I've seen. In due time, I expect it to become as ubiquitous and indispensable as ls, grep, sed and awk. The jq language could use a formal spec, but other than that jq has a sublime elegance.

By contrast, this seems like one of the least intentionally designed pieces of CLI software I've seen.

Looks neat, but seems slow?

  $ wc -l conn.log
    8505 conn.log
  $ cat conn.log |time ~/src/jtc-standard.json/jtc -a -w '<uid>l' -qq  |shasum
        0.44 real         0.43 user         0.00 sys
  167f6d638a4ccf9e0be1ff4ed74caa01a461bd9a  -
  $ cat conn.log | time jq -r .uid|shasum
        0.10 real         0.10 user         0.00 sys
  167f6d638a4ccf9e0be1ff4ed74caa01a461bd9a  -
  $ cat conn.log | time json-cut uid |shasum
        0.04 real         0.01 user         0.01 sys
  167f6d638a4ccf9e0be1ff4ed74caa01a461bd9a  -
json-cut is a crappy tool I wrote that use github.com/buger/jsonparser

It probably should not invent its own language or if it does, it should take inspiration from XPath/JSONPath which is arguably simpler.

I agree. Modern SQL supports JSONPath. Postgresql 12 has support for it. A lot of people are/will be familiar with it.

Be careful with terminology: the thing defined in SQL:2016 is called the SQL/JSON path language, and although it’s similar to the older JSONPath, it’s not the same.

PostgreSQL also calls the SQL/JSON path language “JSON Path”, and the data type is called `JSONPATH`.

Thanks for the clarification! I guess it would make a lot of sense for a tool such as jtc to use SQL/JSON path language then. I'm sure it'll be the most common JSON query language soon.

> JsonPath has some drawbacks, such as a lack of operators for reaching parent or sibling nodes, -- https://www.baeldung.com/guide-to-jayway-jsonpath

Unfortunately, ".." just doesn't fit nicely with the "." operator of C-style languages.

I'm almost inclined to go full-XPath, and use "/", so I can use "..".

Sorry, but the api seems quite obtuse on first glance. I’ll just use Ruby or JavaScript or literally any modern language with good Json support

JSON functionality is a subset of Javascript. Surely therefore the most sensible approach to pipe to a JavaScript runtime such as node.js? I can see how a C/C++ lib might me useful, but why build a standalone CLI just for JSON manipulation? Surely this functionality exists already? Or am I just missing something?

What makes this different then all the 300 json libraries provided by and for the programming languages we already use?

If I really wanted CLI, I would use one of those languages to write a program that reads arguments while making use of one of those JSON libraries.

That's a complicated query tool. I'll be sticking with Taskwarrior for my simple use cases.


> jtc stand for: JSON test console, but it's a legacy name, don't get misled.

It's time to stop admitting that (if the name is so terrible) and re-brand or, more likely, come up with a better backronym meaning for your name.

JSON Transform Command

That is pretty perfect - the term usage is a bit unnatural but nobody will notice since developers use weird names for things all the time (thanks, General Regular Expression Parser!)

"JSON tool CLI"? Idk, naming is hard

Can anyone compare to jq? https://stedolan.github.io/jq/

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact