Hacker News new | past | comments | ask | show | jobs | submit login
JMESPath – A query language for JSON (jmespath.org)
219 points by selrond on Feb 17, 2018 | hide | past | favorite | 125 comments



I like JMESPath, but it has some serious limitations which prevent it from being as general purpose as jq.

JMESPath limitations:

- No simple if/else, but it is possible using a hack, documented below.

- The to_number function doesn't support boolean values, but it is possible using a hack, documented below.

- can't reference parents when doing iteration. Why? All options for iteration, [* ] and map, all use the iterated item as the context for any expression. There's no opportunity to get any other values in. May be possible for a fixed set of lengths. Something akin to the following (except there is no syntax for switching or if statements):

  switch (length):
   case 1: [expression[0]]
   case 2: [expression[1], expression[1]]
   case 3: [expression[0], expression[1], expression[2]]
   ...
- Key name can't come from an expression. Why? The ABNF for constructing key-value pairs is given as: keyval-expr = identifier ":" expression. The key is an identifier, which gives no possibility for making it an expression. No functions modify keys in such a way as to allow using an expression as a key.)

- No basic math operations, add, multiply, divide, mod, etc. Why? Nobody added those operators/functions.

- There's a join, but no split.

- No array indexing based on expression. Why? Indexing is done based on a number or a slice expression, which also doesn't support expressions. Here's the ABNF:

  bracket-specifier = "[" (number / "* " / slice-expression) "]" / "[]"
- No ability to group_by an expression.

- No ability to get the index of an element in a list

Hacks:

Convert true/false to number:

  boolean_expression && `1` || `0`
If/else:

Option 1)

  [{q:CONDITIONAL_EXPRESSION, v:IF_RESULT_EXPRESSION},{q:!COND_EXPRESSION,v:ELSE_RESULT_EXPRESSION}][?q]|[0].v
Option 2)

  {"if":CONDITIONAL_EXPRESSION, "ctx":@} | [{q:if ,v:ctx.IF_EXPRESSION},{q:!if,v:ctx.ELSE_EXPRESSION}][?q]|[0].v


I love JMESPath. I first discovered it when using the AWS CLI `--query` option[0].

I then realized that using it in my code would make things much more declarative and easy to grok than a bunch of maps, filters etc. Here's a real example which I think illustrates it[1]. It has libraries for lots of languages with a clear specification/compliance test[2].

The cherry on the top is the interactive query on the website. You can tweak any of the examples (both queries and data) and get results instantly. Extremely useful for playing around, building queries to work with JSON data (webhooks, API responses etc)

[0] https://docs.aws.amazon.com/cli/latest/userguide/controlling...

[1] https://gist.github.com/gingerlime/757c7b4778c1ab68605dfce66...

[2] http://jmespath.org/libraries.html


Besides's this projects cli, jp (https://github.com/jmespath/jp), I see jl (https://github.com/chrisdone/jl) and jq (https://github.com/stedolan/jq/) in the comments. I wonder if anyone has had experience with all three (or even just one) and can comment on their experiences?


I only have used jq, but I noticed that even AWS CLI docs [0] (already mentioned in the comments a couple of times) suggest to use jq for "more advanced features that may not be possible with --query".

[0] https://docs.aws.amazon.com/cli/latest/userguide/controlling...


jq is by far the best developed and has the most intuitive syntax, but it doesn't have a formal spec for its language.

I have been maintaining https://github.com/kislyuk/yq, which wraps jq with a transcoder for YAML and XML.


Is there demand for a formal spec for jq? Would that lead to additional implementations? Serious question.


I would love to have a jq lib in every lang, which would probably require a spec


That's fair.


jq has a more polished CLI and can do 'everything'.

jmespath is more limited, but is well specified and easier to fit in your head. It's also much more appropriate for using as a library from other projects, since it has clean implementations in many languages. It's also an advantage in this case that it can't do everything, since you can more realistically provide untrusted user input.

jmespath's default (Go) CLI isn't as fully featured as jq, unfortunately.


I use jq on simple projects and to prototype. It is lightweight, fast, and usually consistent in operation.


Apparently JMESPath is a better JSONPath/dot-notation alternative for querying JSON, while jq and jl are full-featured fully functional programming languages that happen to use JSON as their underlying data type.

I had never heard of jl before, but I've recently compiled an "awesome" collection of jq tools, libraries and use-cases at https://github.com/fiatjaf/awesome-jq that may be worth checking out.


I've been using jq quite a bit with AWS CLI inside various bash scripts. It allows manipulation and filtering the native api doesn't, which makes it pretty straight forward to get certain values - such as specific tags on the current instance (converted to key-value form, or to turn into bash variables), a list of other systems in the current auto scaling group, etc.

I've also started using it just to get colorized/formatted output from curl or a local json file.

    cat file.json | jq


I've used jq and jp/JMESPATH quite a bit.

I love jq, and use it more often than JMESPATH. However, recently I've noticed that I arrive at "the solution" I need more quickly with JMESPATH.

In retrospect, JMESPATH's documentation and examples have been more useful for me than jq's (docs/examples).


jq has a community wiki. We really should rewrite the docs at some point and/or add lots of examples, as the language has grown quite a lot.


I use jq frequently. It covers most of what I want to do.


The reinvention of XML in JSON is almost complete - JMESPath vs XPath, JSON Schema vs XML Schema etc. If you need semi structured data to that level, consider using XML instead - you can validate it, there's plenty of tools, it's very stable and mature etc


Isn't that what we want? I would think that solving the same problems for a less verbose format is a good thing. And looking at the examples and the library support (http://jmespath.org/libraries.html) I would say this solves a real problem. I can already think of several use cases for my own system.

The fact that a solution is treading familiar ground does not invalidate its utility.


It's not very much less verbose, really. You still need a key and a value. The only thing you lose, really, is the end tag. For a complex text document in say TEI or Docbook, I don't see how this is much of an advantage.


No, you also loose the attribute system and thus the ability extend existing elements without changing their structure.

In addition, arrays must be indicated clearly in JSON while you can consider children of an xml nodes to always form a list (be it empty or containing one element).

So while the two format are very similar, there is still some differences that doesn’t make them interchangeable for every usecase.


Try implementing an xml parser sometime. The encoding and entity parsing alone is so complicated it’s brain melting. Compare that to the JSON spec and then ask why the world has decided JSON is the better format for casual object graph encoding. Not sure I could make the same argument about YAML though.


The reason I prefer JSON over XML is the latter's ambiguity. It's completely obvious how to convert a native object into a JSON object, for example. The translation to XML brings all sorts of questions about when to use tags vs. attributes of tags. It's cognitive load I'd rather spend on solving the problem at hand, not serialization.

If there had been an opinionated SGML syntax which mapped directly and unambiguously to common language primitives and back, I'm sure it would have been more popular than JSON.


If your JSON is so complex it need a schema, doesn't XML extra structure help?


How would the extra structure help?


It's less complex since there aren't attributes. Of course, if you like attributes, then that's a downside.


You could have attributes, they just need to be another nested object. This makes a mess, though.


We still need XSLT for JSON to complete the circle.

Though it seems like XSLT should support JSON:

https://www.w3.org/TR/xslt-30/#json


I thought JavaScript was the xslt of json.


There are plenty transformers already, for example: http://goessner.net/articles/jsont/


XML Stylesheet Transformation, the language to describe transformations of XML documents into another XML documents. Isn't jq or JMESPath an example of such a language for transformations?


No, those are query languages akin to XPath.


What's the difference between query and transformation then? Isn't transformation a (vector) function ^f of input ^x, with n dimensions of x and m dimensions of f where f_i = f_i(x_1, ... x_n)?


XPath can be used in Xslt, but not the other way round. In Xslt you can create new elements, while in XPath you can't. Seems like a pretty big difference to me. XPath isn't Turing complete either, while Xslt is.


Working on that, was thinking of naming it JST.. building it as part of a high level programming platform we are building.. will be open sourced once complete.


The problem with XML is that almost every tool has bad usability — both things like basic tasks frequently dumping you into a thicket of cryptic standards docs or that the libraries in most common languages have unnatural API choices (i.e. they follow the libxml2 C interfaces) and leave a lot of basic usability on the floor.

As a simple example, there's no technical reason why XPath couldn't allow you to either use namespaces as written in the document[1] or ignore them when appropriate[2]. Nobody cared about usability and it's the number one reason I've heard why developers with jobs to do fled to JSON / YAML as quickly as possible. A similar story arises with XPath 2 — libxml2 never got support so that standard effectively doesn't exist for most projects, but there's no way to shift the resources away from developing further revisions which will also not be widely used towards much cheaper basic investment in shared infrastructure.

1. e.g. if the doc is <foo:bar>, it should never require you to write {http://path/to/foo}bar to match that element.

2. Ever try to target docs in multiple versions of a standard?


Last time I checked, XML was uncertain which validation should I use. DTD, Schema, other solutions. Each has syntax/structure and 1st page explanation so cryptic, that I don’t even understand where do I begin.

I don’t like js/json at all, but for json (and without much js knowledge) I can roll out simple validation in less time than is needed to understand these schema formats. If my structure is dynamic, omg it will be hard to explain it to declarative validator. If it contains value-level type logic (graphs, references), I bet second phase of validation by hand is inevitable anyway.

A second thing, full-blown xml is alien to any structures of simple languages and cannot be serialized like jsonlib.encode(foo). You create nodes, set attributes, build trees, all that mess. It feels like using a 17th century official mail ceremony to send “)))” to your buddy.


> Last time I checked, XML was uncertain which validation should I use. DTD, Schema, other solutions

DTD was considered obsolete more than a decade ago.

Nobody uses anything but XSD nowadays.


> and cannot be serialized like jsonlib.encode(foo)

Huh? Could have fooled me. In Cocoa, for example, serializing to JSON and serializing to an XML property list (or binary property list) is effectively the same code.

https://developer.apple.com/documentation/foundation/nsjsons...

https://developer.apple.com/documentation/foundation/nsprope...


Yes, but property list is a subset of XML and Cocoa’s almost internal format. It doesn’t even connect keys to values, probably losing all the xpath/xslt/etc abilities. No custom schema too?


Hmm...since XML is a meta-language describing a family of markup language[1], every actual XML-based language will be a "subset".

And of course that is why some of the XML tools are so heavy: they deal with the full meta-format. The "nice" thing about JSON is that it doesn't have this indirection step, it's just one concrete language, and that's why it is simple. But again, it's trivially easy to define an actual concrete markup language using XML that is just as simple.

I agree with you that the choice of <key>theKey</theKey><string>value</string> rather than <theKey>value</theKey> was regrettable, had I designed the format I would have chosen differently. I think they wanted one DTD to describe this format, which wouldn't have been possible without keeping the meta-level indirection.

I created XML-based archivers for Cocoa that don't have this problem[2]. Again, this wasn't hard, and the API is NSKeyedArchiver compatible, so [MPWXMLArchiver archivedDataWithRootObject:someObject]; gets you a nice XML representation.

[1] http://www.xml.com/pub/a/98/10/guide0.html?page=2

[2] https://github.com/mpw/Objective-XML/blob/master/MPWXmlArchi...


You can still use XPath/Xslt in that scenario, it's just more annoying. Of course it wouldn't be hard to write an Xslt to rewrite the XML so that the keys and values are better connected, so you can query it more easily using XPath.


People become just furious at the existence (especially when they never write xml by hand) of closing tags. Because the XML people wanted to be able to spot unclosed elements before they reached EOF and because they thought it would be nice to be able to report what was actually unclosed we're going to have to just accept a world in which programmers will use absolutely anything before using xml.

see also: hating pascal's 'end' instead of '}' or claims that python is readable because it doesn't even have '}'


Is that a bad thing? For cases where the document is at least partially written by hand it makes sense to replace XML with something more friendly, such as JSON or YAML.


Wasn't the basic idea of JSON to be small and unstructured to be sent between applications? Shoehorning everything into JSON that is already solved by XML seems illogical, as GP already said. It reeks of NIH.

People shouldn't use JSON for anything hand-written. YAML or even the older INI format are way better suited for configuration files for one reason alone: they allow comments.


To me, JSON can be viewed as slightly enhanced S-expressions, enhanced in a particular way. It would sound strange hearing "shoehorning everything into S-expressions in Lisp that is already solved by structural Fortran syntax seems illogical".

NIH claim is probably valid when there are no noticeable difference. In my opinion, XML differs a lot from JSON, particularly when one need to write snippets by hand. JSON also seems more logical/laconic (once upon a time M-expressions were supposed to come to stage, but S-expressions turned up to be good enough).


Small, or smaller than XML, yes. Unstructured — why?


Exactly. I'll stick with XML, XPATH, and XSLT - thank you very much. Is standards-based, natively-implemented, and super-fast. If a web services sends me JSON, the first thing I do is serialize it to XML.


Hm. One reason why I prefer Json to XML (and I actually like XML) is, that JSON is simpler. The fact that XML has schema definitions out of the box, and that these schema definitions can reference other definitions would lead to more complex parsers that can contain more bugs and vulnerabilities.


> would lead to more complex parsers that can contain more bugs and vulnerabilities.

Partially yes, but it's also at least in part due to a "fuck all this language theory crap, lets just standardize something already" approach.


jq is a better XPath/XSLT. It's been for a long time now.

(Fair discloure: I'm a jq maintainer.)


JSON still doesn’t support comments though, they need to work on that.


To me this sounds almost like "S-expressions still don't support comments".

Why do you need comments in a structure of nested - unique key mappings - arrays - strings - numbers - booleans - nulls ? You can include comments, just like any string, into it, just reserve a key with unique name, if you want. From JSON parser/transformer point of view, "comment" as a concept isn't a data structure piece, it's rather "intent" piece.


Something seems odd about wanting to put comments inside the JSON instead of before it. Then people will want a way to read the comments programmatically, then they'll want annotations, etc.


You miss an important point, JSON is a subset of valid parseable JavaScript. This is a huge win when working with the web stack.


We can create first-class arrays in JSON.


That's the only advantage I see in JSON but it's a good one.


But can you store XML in a database that is Web Scale?? /s


SQLite will happily store whatever you throw at it and scales perfectly to my Web Site ;)


Yes. It's one of the many abilities a database like PostgreSQL has.

The "Web scale" is what MongoDB tried to sell (unfortunately, successfully), and it's not even close to being web scale.


Marklogic.... sort of.


The "reinvention" is not complete and will never be necessary. The difference is that XPath is necessary to query XML because it's a botched horribly overcomplicated, designed-by-committee markup language. Except for tools like jq no such language is actually required for JSON because it maps on to language structures that always exist.

Neither JSON schema or XML schema are particularly popular - and for good reason. Let's say you want to create a schema that limits field "country" to be limited to ISO 3166-1 country codes - either you:

* Keep that schema file updated by hand every time something like Sudan breaking in two happens (no).

* Write a program that generates the schema (seriously... no)

* Do schema validation in code where it belongs - pulling in relevant validation data from canonical sources, rather than some markup language invented by people who didn't have the imagination to consider a really common use case.


There's a lot of benefit to being able to state what keys may be specified in a certain location, though. Look at DSLs like Cloudformation, for instance. Having schema validation could make static analysis of this kind of code much easier to handle. E.g.: Fn::Sub may be used inside of Fn::Join, but the reverse is not true, regardless of the types "returned" by each. It's certainly possible to validate via the api, but being able to do it in my editor will make finding errors much faster.

To your other point, however, dynamic code generation is becoming much more common. AWS generates a huge amount of its code from JSON definitions across multiple languages to keep its SDKs up to date. I could see schema validation being valuable in this domain as well.


>There's a lot of benefit to being able to state what keys may be specified in a certain location, though.

There is. I find examples - snippets of XML/JSON - to be the best way of communicating this - not schema languages.


> * Keep that schema file updated by hand every time something like Sudan breaking in two happens (no).

There is a lot of use for libraries dealing with time and dates. When you want to cover all cases, at some point you get to the situation when you have to allow variable number of seconds in a minute - not always 60, but sometimes 59 or 61, or may be even different numbers. And you don't know in advance - for arbitrary long future - which minutes will have which number of seconds.

So, for your timekeeping system to maintain precision, you have to allow external updates for when a minute will be considered non-60 seconds.

And those cases could happen more often than changing a list of valid country codes.

What would you do with time then?


Ideally, use a library that intelligently parses dates as part of a turing complete validator.


The point is that you can't always avoid scenarios with keeping something updated. List of countries is another example.


The point isn't to avoid it. Of course it's inevitable - that was my point! The point is to use code to validate instead of some markup so that the programmer can use their judgment about how it should be delegated.

I wrote some example code below that shows how you can validate with list of countries in such a way that no code changes will be required when the list changes.


That ability to use outside canonical sources is really interesting. Are there some existing examples of schema languages with that feature?


JSON Schema, at least, can refer to a URI for the definition of something, and that URI can refer to only a specific section of the JSON document to which it points.


The point I was making was that you shouldn't use a "special" language for validation at all - you should just use a library in a regular language to do it.

Anyway, code:

yaml_text:

   John: Yemen
   James: South Sudan
python code:

   from strictyaml import load, MapPattern, Str, Enum
   import pycountry

   result = load(
       yaml_text,
       MapPattern(
           Str(),
           Enum([country.name for country in pycountry.countries]),
       )
   )
full disclosure: I wrote the validation library ^^


The idea behind XML schema, DTD, etc. is to pick a simple language to express schemas in, so that implementations in different languages have a decent chance of being compatible with each other.

Python isn’t a good choice there, as it is too flexible. For example, that code could have gotten the list of allowed country names from a file, database, or URL.

⇒ If I have to send such json to you, I almost would have to write my program in python, and even then, it could be hard for me to replicate your setup.


>that code could have gotten the list of allowed country names from a file, database, or URL.

That is exactly the point. You should be able to do that, because the canonical list of data could easily from any of those and it should the up to programmer's discretion how to fetch it.

The point of validation is to prevent invalid data from slipping through a net at minimum cost and that's how you do that.

Suden, Sudaan and South Sudan were all invalid countries in 2010 and that YAML was invalid. In 2012, Suden and Sudaan were invalid but South Sudan was not so that YAML was valid.

In the above example you have to make no code changes in order to account for that - just update pycountry every so often.

With XML schemas and DTDs either you don't validate country at all (letting Suden and Sudaan) through the net. Or, you rewrite and redistribute the schema by hand every time some dependency like a list of countries changes.

>If I have to send such json to you, I almost would have to write my program in python

Only if I choose to validate that data using a shared schema. Frankly, I've dealt with XML a lot and the number of times I've been handed a shared schema of any kind is very low. People just don't seem to use them. If they define an API in XML for instance they tend to just send examples and give a written explanation (e.g. insert valid country name here).

I don't see much value in making a schema more inherently "shareable" especially not if it means it has to be re-released every month.


The example on the front page seems equivalent to (and only marginally less verbose than):

    locations
      .filter(l => l.state === 'WA')
      .map(l => l.name)
      .sort()
      .join(', ')


The benefit of a query language is that it can be described declaratively (i.e. in a non-executable text file, perhaps within JSON itself), and then programs written in any language can execute its query logic using a standard interpreter written in that specific programming language.

So you get reusability of queries across the stack, in all languages that implement a parser against the spec. Your example only provides re-usability in JavaScript, and requires evaluating code at run-time so may not be suitable for queries based on user-submitted data in multi-tenant environments.


I really appreciate this comment. I was trying to figure out why I wouldn't use native data types and functions, but this makes it clear.

In your opinion, where would someone be storing the json such that they'd benefit from a tool like this? The only time I use json outside of pulling it from an API (where I can convert it to a native object) is probably storing it in postgres, where I've already got json querying tools.


Some cases off the top of my head:

- Infrastructure configuration stored in JSON. Query could reference other JSON files, or the JSON file itself (loops would need to be considered).

- Declarative reactive programming, e.g. platforms like IFTT. You might want to take certain actions based on data in a JSON post. The IFTT GUI would create JSON config files that its server side parsers can safely use without eval'ing code to decide which action to take.

- Adding conditional logic to jsonschema form generation. Recently I've built a questionnaire renderer in react that renders forms based on jsonschema. The user creates forms with a GUI, which compiles them to JSON, and then the renderer knows how to render. Conditional logic (e.g. question B is required if question A === true) can be quite limited when constrained to pure JSON. Something like this could help with that.

The nice thing about declarative syntax is you can build a GUI to generate it, so users never use the JSON itself, but you can store it in a database, safely execute rules based on it, etc. without requiring programming from the user.

That said, there are usually better ways to accomplish this, like in pure JSON for example. Mongo syntax achieved this, with declarative operators like $or{}, ${sum}, etc., but it can be quite cumbersome.


I'm guessing the point is to be a common type of query language that can be used from any number of other languages: http://jmespath.org/libraries.html

In at least some languages if these queries can be used without deserializing into a native data structure and then serialized back into a string, this could be a major win.


Your example however will not support dynamic or user-configurable paths without eval(). Alternatively, instead of eval you could run expressions through a JS parser, but it'll be more code than your example. The library we're discussing also defines a grammar for the query language.


If we're talking about JS, it would seem to me to be trivial to accomplish, by simply decomposing paths and using bracket notation to access nested props, just like lodash _.get does.


Thanks for taking the time to write this tool. Can you explain how it is distinguished from jq?


I want to add one more into the mix - Couchbase has something called "N1QL" ('nickel'), which is actual SQL adapted for JSON:

https://www.couchbase.com/products/n1ql

It's not standalone though, you need Couchbase to use it.


Does this have any mathematical foundation like the relational algebra for SQL? Or more generally, does a mathematical framework exist to treat this or similar constructs and that goes beyond what relational algebra provides and that, for example, also handles aggregate functions?

The reason I am asking is that I am currently trying to build a tool to analyze a kind of time series data, think log file entries, in order to look for anomalies and visualize them. I could of course just build all the transformations I am interested in in an ad hoc fashion but it would be nice to have a mathematical framework in order to start out with a small set of basic operations and then compose those while having some guarantees about the expressiveness of that the basic operations and ideally also a rigorous foundation for transforming them, for example for performance optimizations.

But so far I was unable to find something that seems fitting, everything I am aware of is either to limited like relational algebra or way to general like general functions. It feels like what I am looking for should exist but I am unable to find it.


What reasons did you not go with SQL itself? I may not fully understand what you're trying to do, but in any case it sounds really interesting.


I never tried it but I am expecting the performance to be not good enough, it takes already several minutes with code specifically written to perform the calculations I am interested in. And because I don't know what exactly I am looking for I need more or less interactive speed so that I can try out many different ways to look at the data. But maybe I could use [materialized] views to convey enough information to the query planner how to efficiently carry out the calculations or maybe I am even underestimating how good query planners are. I just have the gut feeling that performing a lot of aggregation will make a database perform a lot of unnecessary work. But maybe I should and will try loading the data into SQL Server and see what happens.

The other thing is that SQL seems not the best fit to me. Say you just want to know how many events occurred in the last three months in any hour, that is straight forward grouping and counting at first, but already rounding the timestamps to an hour is not as obvious as it should be. But if there was no event in a specific hour, your result will just have no row for that hour instead of a row saying there were zero events in that hour. This in turn will cause more trouble if you want to build a histogram showing in how many hours there were say 0 to 9, 10 to 19, 20 to 29, and so on events. Certainly still doable with SQL but we are already entering the territory where writing a single query will take most people several hours to get the desired result.

I also couldn't easily tell how to express calculating the 99th percentile of the event size for every day of the week and hour of the day. I am pretty sure it is possible but I guess it would also be pretty unreadable unless you put in quite a bit of effort to create utility functions instead of hacking together one huge SQL statement. Then again I don't really know much about the more recent SQL features for partitioning and aggregating, maybe I should have a closer look at that first.


Is this for an open source project, or anything you'll be publishing? I'd be interested to follow on with the results!


Right now it is just an effort to develop a tool to diagnose and hopefully thereafter fix random performance problems we are experiencing with one of our applications in production. Despite having a small team dedicated to investigating the problems, monitoring every click and function call with Dynatrace, having had a Microsoft SQL server expert look into it, and getting the system audited by one of the big consulting companies, the problem persists since years and nobody has really any clue about what is going wrong.

The performance is never really great, it is [one of] the central applications of the company and depends on the interaction with a sizable junk of the system landscape developed over decades and therefore it is prone to be affected by incidents in a lot of systems but most of the time it is good enough. But once every couple of weeks or months something goes badly wrong an requests, it's a web application, start taking several seconds or even minutes to complete. Minutes later everything is back to normal.

But I digress. If I would manage to come up with a reusable and somewhat general tool to analyze data similar to what I am looking at, I would consider releasing it. It could either be a somewhat general data analysis and visualization tool, think R, or it could be more specifically tailored towards looking for anomalies in data sets like the one I am investigating. But as of now I am struggling to come up with a general framework to express the analyses I am performing and therefore all I have is a rather ad hoc collection of transformations that extract and visualize aspects of the data that could lead to new insights into what is going on.

But right now it is really driven by our specific issue, I notice something in one view of the data and then come up with a new transformation to look at it in more detail or from a different angle. It is nothing that could easily be reused by anyone else and so for the moment it seems most likely that this will never become public or maybe only in the form of a blog article explaining what kind of information might be useful to look at and how to derive it from logs that look rather uninteresting at first glance.


Aha, well thanks for sharing this far :)


100% you should.

All of what you have just described isnt very hard in SQL.. coalescing on null, aggregating, etc.

GROUPING SETS() will give you your missing rows.


load the data into a database (sqlite is very good). use the database. delete the database.


Might as well just keep everything in a relational database to begin with, in that case, and generate JSON from the query results.



A worthwhile alternative to this approach (a JSON-specific query language) is a language for converting JSON structures to newline-delimited records. Then, standard shell tools can be used to query and join: https://github.com/micha/json-table


For those interested in arbitrarily transforming JSON objects (for example, in a communications pipeline) I’d recommend JSONata. It’s quite useful and we’re well along in a Golang port with $function extensibility. http://jsonata.org


Agreed. I like JSONata a lot, even though it's the dark horse among JSON traversal languages. I've had a good experience parsing semi-unstable JSON with it.


There is also JSON Pointer (https://tools.ietf.org/html/rfc6901) from IETF.

Very simple standard and easy to implement, but not as powerful as jmespath nor jq.


To say that it’s “from IETF” is sort of a misnomer I think. RFCs are submitted to IETF by others, not by IETF itself.


jq exists, is fast, and works well. Is this compatible?


No, and I generally like jq’s syntactic choices better.


This looks interesting - but doesn't MongoDB basically achieve the same effect? I kind of prefer MongoDB because you query JSON with JSON - but I'm open to changing my mind :)


If you're using MongoDB already, then sure use MongoDB's query tools. But if you are just working with raw JSON from a potential variety of sources, or in a streaming context, then you need something more in-place and general-purpose, which this appears to be.


why choose this over jsonpath (like xmlpath) or jq?


I already have a query language for json. I insert json into mssql 2017 community edition and query it there. https://docs.microsoft.com/en-us/sql/relational-databases/js...


One more alternative to many listed here is SPARQL-Generate. A single query language that works for XML and JSON, and has syntax borrowed from SPARQL.

https://ci.mines-stetienne.fr/sparql-generate/playground.htm...


My tiny lib with very similar functionality: [1]. The query syntax is slightly different though. Also I decided to re-use JS for evaluation of sub-expressions instead of implementing own full-fledged parser.

[1] https://github.com/xonixx/jsqry


I love libraries like this, which is small enough to be read in one sitting. I can scan through and get a general understanding of everything that it does.

The "evaluation of sub-expressions" made me curious. This line:

   token.func = Function('_,i,args', 'return ' + token.val);
..could be a potential security issue with user-submitted expressions?


Thanks. I doubt this could be a security issue. Typical usage like so

    var name = one(users, '[_.id==?].name', 123)
uses parameterized queries, same idea as with SQL to eliminate injections.


I see, I should have dug deeper before commenting. Wow, parameterized queries, there's been a lot of thought put into this compact library!


I built this VS Code plugin to convert JSON interactively using JMESPath: https://marketplace.visualstudio.com/items?itemName=octref.v...

Might be useful if you are testing API or playing with JSON data.


Well actually - your plugin has led me to find JMESPath and post the link to HN :D


I'm sad that JSONSelect (https://github.com/lloyd/JSONSelect) never caught on. It uses CSS selectors to query JSON, which has the nice side effect that learning to use it improves your CSS as well!


I remember when xml started down this road too. "We aren't going to be sgml, but just a lightweight markup. Xpath, xslt, etc. And now we have xml today, the modern day sgml.

But this time it's different?



Does someone knows nice human-friendly search language like the one in Jira, Slack, etc?


the second I looked at the example on the homepage and saw this:

sort(@)

I'm like nope! What is this "@" symbol? Why can't that be "name"? I'm already passing judgment that this library will be a nightmare to use which isn't good.

Now I know I can read the docs and eventually what I can pass the sort expression and what it all means, however, this is an issue I come across more and more with new libraries in programming... show simple examples, not "smart" or complicated ones. I shouldn't have to read through docs to try to decipher an introductory example. There is a reason every programming language starts with "Hello World".


This is not that new. And it happens to be the JSON query language built in to the AWS CLI tools.

For last 3 - 4 years I see lots of AWS CLI examples piping through jq. That’s an extra dependency that’s not necessary when this is built in.

Here’s the author’s idea of intro materials (posted Jan 2015):

http://jamesls.com/how-to-easily-explore-jmespath-on-the-com...

I’ve found him very responsive on bugs and (provably useful) feature ideas.


Each step is a transformation

X | Y | Z

The @ is the result from the prior transform.

I would've preferred _ from scala, but meh.


I'm rather disappointed its not called JPath.


Unsurprisingly there are already multiple projects called JPath


this, combined with GraphQL would be awesome


So no mention of JSONiq which came before JMESPath?


May also be interested in the `jq` CLI, which on first glance appears to use a similar but not identical query language. https://stedolan.github.io/jq/


Also this tool is pretty interesting: https://github.com/chrisdone/jl


How do this query language compares to jq?


seconded. the jq language is surprisingly powerful and at first glance at jmespath, the syntax is similar.


I think the equivalent jq query would be something like:

    [.locations[] | select(.state == "WA").name] | sort | join(", ") | { WashingtonCities: . }


I'd rather use App::RecordStream.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: