Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: Jb / json.bash – Command-line tool (and bash library) that creates JSON (github.com/h4l)
183 points by h4l 5 months ago | hide | past | favorite | 52 comments
jb is a UNIX tool that creates JSON, for shell scripts or interactive use. Its "one thing" is to get shell-native data (environment variables, files, program output) to somewhere else, using JSON encapsulate it robustly.

I wrote this because I wanted a robust and ergonomic way to create ad-hoc JSON data from the command line and scripts. I wanted errors to not pass silently, not coerce data types, not put secrets into argv. I wanted to leverage shell features/patterns like process substitution, environment variables, reading/streaming from files and null-terminated data.

If you know of the jo program, jb is similar, but type-safe by default and more flexible. jo coerces types, using flags like -n to coerce to a specific type (number for -n), without failing if the input is invalid. jb encodes values as strings by default, requiring type annotations to parse & encode values as a specific type (failing if the value is invalid).

If you know jq, jb is complementary in that jq is great at transforming data already in JSON format, but it's fiddly to get non-JSON data into jq. In contrast, jb is good at getting unstructured data from arguments, environment variables and files into JSON (so that jq could use it), but jb cannot do any transformation of data, only parsing & encoding into JSON types.

I feel rather guilty about having written this in bash. It's something of a boiled frog story. I started out just wanting to encode JSON strings from a shell script, without dependencies, with the intention of piping them into jq. After a few trials I was able to encode JSON strings in bash with surprising performance, using array operations to encode multiple strings at once. It grew from there into a complete tool. I'd certainly not choose bash if I was starting from scratch now...




This is incredibly high-quality BASH programming, as a fellow bash freak I am studying this code, and even I am learning some new techniques.

https://github.com/h4l/json.bash/blob/main/json.bash

You've boiled it down to a set of very elegant constructs. Respect. Thank you @h4l, this is badass.

I hope you follow up with a golang or rust implementation, that would really be something else.

p.s. I noticed the following odd behaviors with escaping delimiters (e.g. "="), is there a way to get an un-escaped equal sign as the trailing part of a key or leading part of a value?

  $ docker container run --rm ghcr.io/h4l/json.bash/jb msg=Hi
  {"msg":"Hi"}
  $ docker container run --rm ghcr.io/h4l/json.bash/jb msg=\=Hi
  {"msg=Hi":"msg=Hi"}
  $ docker container run --rm ghcr.io/h4l/json.bash/jb "msg=\=Hi"
  {"msg":"\\=Hi"}
  $ docker container run --rm ghcr.io/h4l/json.bash/jb "msg\==\=Hi"
  {"msg\\=\\":"Hi"}
  $ docker container run --rm ghcr.io/h4l/json.bash/jb "msg\\==\=Hi"
  {"msg\\=\\":"Hi"}
  $ docker container run --rm ghcr.io/h4l/json.bash/jb "msg\\===Hi"
  {"msg\\=":"Hi"}


Thank you, that's high praise! I learnt a lot about bash writing this, but I've also not looked at the code in a few months, and it's already starting to look quite intimidating!

I definitely like the idea of a goland/rust implementation, there are certainly things I could improve.

So the argument syntax escapes by repeating a character rather than backslash. I chose this because with backslashes escapes it would be unclear whether a backslash was in the shell syntax or the jb syntax, and users may end up needing to double escape backslashes, which is no fun! Whereas a shell will always ignore two copies of a character like =:@.

The downside of double-escaping is that the syntax can be ambiguous, so sometimes you need to include the middle type marker to disambiguate the key from the value. But the type can be empty, so just : works:

    $ jb ===msg==:==hi=
    {"=msg=":"=hi="}
In the key part, the first = begins the key, the == following are an escaped =. The first = following the : marks the value, and everything after is not parsed, so =hi= is literal.

When you have reserved characters in keys/values (especially if they're dynamic), it's easiest to store the values in variables and reference them with @var syntax:

    $ k='=msg=' v='=hi=' jb @k@v
    {"=msg=":"=hi="}


How is this different to this https://github.com/kellyjonbrazil/jc


jc has many parsers for the specific output format of various programs, it can automatically create a JSON object structure using its knowledge of each format.

jb doesn't have high-level knowledge of other formats, it can read from common shell data sources, like command-line arguments, environment variables and files. It gives you ways to pull several of these sources into a single JSON object.

jb understands some simple/general formats commonly used in shell environments:

- key=value pairs (e.g. environment variable declarations, like the `env` program prints

- delimited lists, like a,b,c,d; (but any character can be the delimiter) including null-delimited (commonly used to separate lists of file paths)

- JSON itself — jb can validate and merge together arrays and objects

You can use these simple sources to build up a more complex structure, e.g. using a pipeline of other command line tools to generate null-delimited data, or envar declarations, then consuming the program's output via process substitution <(...). (See the section in the README that explains process substitution if you're not familiar, it's really powerful.)

So jb is more suited to creating ad-hoc JSON for specific tasks. If you had both jc and jb available and jc could read the source you need, you'd prefer jc.


As well as anyone's general thoughts/experiences, I'd appreciate opinions on the error handling mechanism jb uses to detect errors in upstream jb processes that jb is reading from.

Normally, detecting errors on the other end of a pipe requires care in a shell environment (e.g. retrospectively checking PIPESTATUS). I used an approach I've called Stream Poisoning. It takes advantage of the fact that control characters are never present in valid JSON. When jb fails to encode JSON, it emits a Cancel control character[1] on stdout. When jb encounters such a character in an input, it can tell the input it's reading from is truncated/erroneous. This avoids the typical problem of a pipe silently being read as an empty file.

I've got a page explaining this with some examples here: https://github.com/h4l/json.bash/blob/main/docs/stream-poiso... I can imagine using control characters in a text stream being rather controversial, but I feel it works quite well in practice.

[1]: https://en.wikipedia.org/wiki/Cancel_character


What happens if the next program in the pipe is not jb? Does jb also exit with a code?

For example `jb | jq`, where jq or a similar program discards the cancel character.

(Away from pc, unable to check right now.)


Good question! Yep, jb exits with non-zero:

    $ jb size:number=oops; echo $?
    json.encode_number(): not all inputs are numbers: 'oops'
    json(): Could not encode the value of argument 'size:number=oops' as a 'number' value. Read from inline value.
    ␘
    1
If you pipe the jb error into jq, jq fails to parse the JSON (because of the Cancel ctrl char) and also errors:

    $ jb size:number=oops | jq
    json.encode_number(): not all inputs are numbers: 'oops'
    json(): Could not encode the value of argument 'size:number=oops' as a 'number' value. Read from inline value.
    parse error: Invalid numeric literal at line 2, column 0

    $ declare -p PIPESTATUS
    declare -a PIPESTATUS=([0]="1" [1]="4")
So jq exits with status 4 here.


I find writing bash really gratifying - almost relaxing - but you're right, it also makes me feel kind of 'guilty', especially when I start getting carried away and reading tput docs.

However, I think in your case the rationale in the performance section of your Readme totally makes sense, and every single use case I can think of for this would prioritise minimal latency over increased throughput. I've seen init containers that would execute probably 100x faster with this for the exact reasons you point out. I'm quite curious as to what you would you choose instead of bash if you were starting from scratch now?

FYI Shellcheck has a couple of superficial nits that you might wanna address (happy to send a PR). And your Readme is great.


I did find it quite satisfying to coerce bash into doing this while maintaining decent performance. I definitely came to appreciate some aspects of bash more from this, but it's so easy to shoot yourself in the foot!

If I started from scratch now I'd use a compiled language that could produce a single static binary and start with really low latency. I'm pretty sure jo must not be tuned for startup time, if they optimised that they must be able get it way faster than bash can start and parse json.bash. I was pretty surprised that bash can startup faster!

The codebase is basically at the limit of what I'd want to do with bash, but there are features I could add if it was in a proper programming language. e.g. validating :int number types, pretty-printing output, not needing the :raw type to stream JSON input.

Thanks for the heads up on Shellcheck, I'd be happy to take a PR if you'd like to.


I like the syntax to send typed values from the terminal:

    jb id=42 size:number=42 surname=null data:null

    => {"id":"42","size":42,"surname":"null","data":null}
I never had the need to use typed arguments in bash, but if I ever have it, this might be the syntax I'd use.

In fact, I was thinking about such a syntax recently. I am writing a tool which lets you call functions in Python modules from the command line. At first, I thought I need to define the argument types on the command line. But then I decided it is more convenient to use inspection and auto-convert the values to the needed types.


Glad to hear, this was something I wanted to make reliable, ergonomic and intuitive. I figured a lot of languages use `: type` to declare types.

The same using jo would be like this, which I find harder to type and remember:

  jo -- -s id=42 -n size=42 -s surname=null data=null
  {"id":"42","size":42,"surname":"","data":null}
Notice that surname comes out as the empty string though, I think this must be a bug in jo!


> I like the syntax to send typed values from the terminal:

Incidentally, this syntax shows a notation that is clearly superior to json (at least for non-nested stuff). If all you need is this, you'd be better off by avoiding json altogether.

[Rant: if json is so unergonomic that people keep inventing alternatives like this syntax and stuff like "gron" to de-jsonise their lives, maybe using json was always a bad idea, after all... I guess in a decade everybody will look at json with the same disdain as we do XML today.]


> shows a notation that is clearly superior to json

I don’t see that at all? Why is `n:number=1` superior to `{n:1}`? If anything, CLI commands are awful for anything other than strings.


But strings are often the most common case (or even, the only case that is needed). And they need much less punctuation. Compare:

    a=1 b=2 c=3
with

   {"a"="1", "b"="2", "c"="3"}
the json version needs 19 punctuation characters just to define three variables, against the bash version that only has 3. Which one would you prefer to type with your keyboard?


Depends on the shell. The following parses as a number in Murex:

    %{n:1}
https://murex.rocks/parser/create-object.html

I’m sure you can do similar things in other modern shells too. So the real problem is that people are stuck on the constraints of 1970s command lines.


{"password":"hunter2"}

A man of culture I see.

This looks really useful where you don't want to introduce another scripting VM just to spit out some JSON, i.e I have used Ruby a lot for this in the past.

I can see myself using this in container init scripts and other very low dep environments to format config files from env vars etc.


I wouldn’t do that to my users; the happy path here is nice, but the unhappy path seems very likely to end in garbled bash errors that are impossible to track down.

As a user, I’m fine with embedding a reasonably small VM to handle the configs; disk space is cheap. Better yet would be a compiled binary that handles it, but that feels like asking a lot of maintainers.

There’s a lot of surface area for someone to mis-quote stuff in their environment and generate unintelligible bash errors.

Or that may be just me; I hate bash in general, so maybe it’s just that bleeding over.


How did you guess my password?!?!

This is just the kind of use case I had in mind. Something I've considered is publishing a mini version with only the json.encode_string function, as that's enough to create an array of JSON-encoded strings and use a hard-coded template with printf to insert the JSON string values.

That would be a fraction of the overall json.bash file size.


RIP bash.org!


I found the JSON array syntax a little unintuitive:

    $ jb dependencies:[,]=Bash,Grep
    {"dependencies":["Bash","Grep"]}
One possible alternative would be to accept JSON literal snippets, like this:

    $ jb dependencies='["Bash", "Grep"]'
This should support all forms of nested JSON objects. You could have a rule that if an argument does NOT parse as a valid JSON value it is treated as a raw string, so this would work:

    $ jb foo=bar bar='"this is a well formed string"'
    {"foo": "bar", "bar": "this is a well formed string"}
You could even then nest jb calls like this:

    $ jb foo=$(jb bar=baz)
    {"foo": {"bar": "baz}}


Thanks for giving it a try and your feedback. I agree, the array splitting is a bit fiddly. It is actually possible to pass JSON directly, you use the :json type on the argument:

    $ jb dependencies:json='["Bash","Grep"]'
    {"dependencies":["Bash","Grep"]}

    $ jb foo=bar bar:json='"this is a well formed string"'
    {"foo":"bar","bar":"this is a well formed string"}
And then you can indeed use command substitution to nest calls:

    $ jb foo:json=$(jb bar=baz)
    {"foo":{"bar":"baz"}}
It works even better to use process substitution, this way the shell gives jb a file path to a file to read, and so you don't need to quote the $() to avoid whitespace breaking things:

    $ jb foo:json@<(jb msg=$'no need\nto quote this!')        
    {"foo":{"msg":"no need\nto quote this!"}}
Another option is to use jb-array to generate arrays. (jb-array is best for tuple-like arrays with varying types):

    $ jb dependencies:json@<(jb-array Bash Grep)
    {"dependencies":["Bash","Grep"]}
And if you use it from bash as a function, you can put values into a bash array and reference it:

    $ source json.bash
    $ dependencies=(Bash Grep)
    $ json @dependencies:[]   
    {"dependencies":["Bash","Grep"]}


BATS was a little heavy for me as a testing dependency for my own use (I ended up writing what I intended to be "the most minimalist shell testing library possible", see below, I think it still needs work though!), but I at least want to commend you for having what looks like a great test suite to begin with!

https://github.com/pmarreck/tinytestlib


It's nice to have compact single-file dependencies like this! I like the look of your assertions (checking out, err & status). I definitely found myself writing my own assertions to get understandable errors.


Amazing tool and syntax. Hat down!


Thanks!


Amazing bash programming skills, this is so cool that i want to find a problem to solve using it right now!!


Appreciated, Slightly related: https://github.com/bashtools/JSONPath.sh Jsonpath handling in Bash - very usable for huge files


I wonder if I could use this on my project which uses multiple glue functions to piece together JSON strings. https://github.com/fieu/discord.sh


If it helps, there's a little example of using the bash API with bash variables/arrays, should give you an idea of how it could be to use: https://github.com/h4l/json.bash/blob/main/examples/notify.s...

This example uses the pattern of setting an out=varname when calling a json function, the encoded JSON goes into $varname variable. This pattern avoids the overhead of forking processes (e.g. subshells) when generating JSON.

Otherwise you can use the more normal approach of jb writing to stdout, and capturing the output stream.


jq has quite a few more ways to input things more than from stdin and files. Some examples:

  $ abc=123 jq -cn '$ENV | {abc, othername: .abc}'
  {"abc":"123","othername":"123"}

  $ jq -cn --arg abc 123 '{$abc, othername: $abc}'
  {"abc":"123","othername":"123"}

  $ jq -cn --argjson abc 123 '{$abc, othername: $abc}'
  {"abc":123,"othername":123}


jshn.sh: https://openwrt.org/docs/guide-developer/jshn src: https://git.openwrt.org/?p=project/libubox.git;a=blob;f=sh/j... :

> jshn (JSON SHell Notation), a small utility and shell library for parsing and generating JSON data


Is there a minimum bash version required? I.e. will it work with bash 3 or whatever ships with macos by default?


There is, the earliest version I've tested with is 4.4.19, but ideally a 5.x version. 3 certainly won't work I'm afraid. If you use homebrew on Mac it's a good way to get the latest bash.


This is great! no doubt I'll be reaching for it very soon.



Awesome, thanks for packaging it!


Yeah if you need this it's definitely a sign you shouldn't be using Bash.

Can you give a concrete example of when this is the sanest option?


Two main situations I think. The first is just interactive use in any shell to encode ad-hoc JSON. If you have a next-gen shell which can handle structured data directly, then you probably don't need it.

Second is situations where you'd rather not add an additional dependency, but bash is pretty much a given. For example, CI environments, scripts in dev environments, container entrypoints. Or things that area already written in bash.

I don't advocate writing massive programs in bash, for sure it's better to turn to a proper language before things get hairy. But bash is just really ubiquitous, and most people who do any UNIX work will be able to deal with a bit of shell script.


> Second is situations where you'd rather not add an additional dependency, but bash is pretty much a given. For example, CI environments, scripts in dev environments, container entrypoints. Or things that area already written in bash.

Is this tool not an additional dependency?

> But bash is just really ubiquitous

Biggest crime of the Unix world probably.


> Is this tool not an additional dependency?

"Dependency" generally means "external dependency".

It's only a dependency if your solution's build process fetches it from its upstream repo. (Or worse: it's just mentioned in some manual build instructions as one of the things your solution needs.)

This is small enough to copy into the source tree of your solution, in which case it's no longer a dependency.


> Is this tool not an additional dependency?

It is, but if you already have bash, adding another shell script isn't much of a jump. e.g. I'd feel OK about committing jb to another repo for use from a .envrc file to set up an environment, whereas committing a binary would not feel good.

> Biggest crime of the Unix world probably.

Sorry if I'm perpetuating this! :) My take is that problem is not with bash, the problem is that it's hard for more advanced tools to replace it.


I agree with the interactive usecase.

But for when you don't want an extra dependency, awk and perl are better than bash and just about as ubiquitous. (I might dare to say more ubiquitous, since MacOS in particular ships with an ancient version of bash that can't even use this jb tool. But the versions of awk and perl it comes with are fine.)


Built into Powershell:

    > @{ hello = 'world' } | ConvertTo-Json
    > { "hello": "world" }


Not only its built in, but syntax is on another level, i.e. you don't need to learn special syntax if you know PowerShell. This thing alone makes pwsh worth it instead of using number of other tools.

    @{ Hello = 'world'; array = 1..10; object = @{ date = Get-Date } } | ConvertTo-Json
 
    {
      "array": [
        1,
        2,
        3,
        4,
        5,
        6,
        7,
        8,
        9,
        10
      ],
      "object": {
        "date": "2024-07-03T21:07:21.6562053+02:00"
      },
      "Hello": "world"
    }


That is pretty cool, and I wish such features were common in regular UNIX shells.

For good measure, this is how you might do the same with jb:

    $ jb Hello=world array:number[]@<(seq 10) object:json@<(date=$(date -Iseconds) jb @date)
    {"Hello":"world","array":[1,2,3,4,5,6,7,8,9,10],"object":{"date":"2024-07-03T19:26:36+00:00"}}
Alternatively, using the :{} object entry syntax:

    jb Hello=world array:number[]@<(seq 10) object:{}=date=$(date -Iseconds)
    {"Hello":"world","array":[1,2,3,4,5,6,7,8,9,10],"object":{"date":"2024-07-03T19:30:26+00:00"}}


Powershell has the upper hand here!

Still, bash can try to keep up using json.bash. :)

    $ source json.bash
    $ declare -A greeting=([Hello]=World)
    $ json ...@greeting:{}
    {"Hello":"World"}
... is splatting the greeting associative array entries into the object created by the json call.

Without the ... the greeting would be a nested object. Probably more clear with multiple entries:

    $ declare -A greeting=([Hello]=World [How]="are you?")
    $ json @greeting:{}   
    {"greeting":{"Hello":"World","How":"are you?"}}
Vs:

    $ json ...@greeting:{}                                
    {"Hello":"World","How":"are you?"}


    $h=@{x=1; y=2}; $h + @{z=3} | ConvertTo-Json

    {        
      "y": 2,
      "z": 3,
      "x": 1 
    }
You can even use [ordered]$h to make keys not go random place.


Windows: what if everything was an (command) object?

Linux: what if everything was a file?

Soon we might have...

Mong/Os: what if everything was JSON?

YiAM/OS: YiAM/OS is ANOTHER MARKUP OPERATING SYSTEM... would come out shortly thereafter...

I like JSON and getting in the terminal is a challenge - GOOD JOB!


Kubernetes already got you covered: What if everything was a YAML?


Similar to jo, which is written in C [1]

[1] https://github.com/jpmens/jo


This is mentioned.


I'll wait right here using Nushell while the you guys can spend the next 10 years re-inventing it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: