Hacker News new | past | comments | ask | show | jobs | submit login
Jsmn, a minimalistic JSON parser in C (zserge.bitbucket.org)
89 points by jasonmoo on Aug 15, 2012 | hide | past | favorite | 35 comments



The code is certainly very short, but what about \u escape sequences in strings, parsing different representations of numbers, etc.? Since those things are part of the JSON standard, you're not a JSON parser if you just leave them to the application to handle.

Since this skimps out on half of the work, it won't even be able to tell you with certainty what's valid JSON and what isn't.

(disclaimer: I also wrote a popular ANSI C JSON parser)


I saw json-parser. Looks good. I'm currently using YAJL for a project, but I'm open to switching to something faster/easier.

Have you done any performance tests against other C json parsers?

Thanks,


This is not a JSON parser, it's a tokenizer. From a parser I'd expect at least acknowledgement of the basic key-value association.

It doesn't say that it's not useful. It's just that for anything non-trivial, you'd need to supplement this library with quite a bit of your own code, e.g. a stack for tracking nesting levels.


I wrote some simple jsmn examples, since it doesn't ship with any: https://github.com/alisdair/jsmn-example/

And then I wrote about writing the examples: http://alisdair.mcdiarmid.org/2012/08/14/jsmn-example.html


This is really nice and I'm sure it will help new users a lot. I've myself only used the provided test.c file to see how to use the API. While this was surely doable, your example (and its explanation) has improved the situation a lot. Thank you!


It fails to parse basic unicode escapes - not a JSON parser.


For those interested in other lightweight implementations, I have not yet had a chance to compare this but have been very happy with cJSON by @dave_gamble for extremely resource-constrained embedded microcontrollers.

http://sourceforge.net/projects/cjson/


Ditto. Found it easy to use and integrate. I needed something lightweight for a highly nested numerical model containing numbers and descriptive strings. I didn't want the Json bit to have a big footprint because the numerical computations make the software complex enough as it is.


I've used cJson for a couple of projects as well. No complaints here.


If we're golfing:

https://github.com/quartzjer/js0n/blob/master/js0n.c

Appears to work the same way, though it doesn't bubble back type information.

(Also, the `goto * go[ * cur];` trick was pretty crazy the first time I saw it)


This is called computed (or assigned) goto and is a GCC extension (that is, not in the C standard): http://gcc.gnu.org/onlinedocs/gcc/Labels-as-Values.html


Note that other compilers (such as Oracle Solaris Studio) support computed gotos as well:

  http://docs.oracle.com/cd/E19205-01/820-7598/bjabt/index.html
And LLVM:

  http://blog.llvm.org/2010/01/address-of-label-and-indirect-branches.html


Arg! "[0 ... 255] = &&l_bad" compiles as valid C, totally confused by this code. Is this some weird gcc specific extension? Hm, probably part of the C99 array extensions. Checking...

Answer: C99, borrowed from a GCC extension. http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/Designated-Inits...


There are two not in C99 gcc extensions in that snippet. && to take the address of a label, and ... to produce a range of indices. It may also happen to be a designated initializer, but the whole piece is far from standard.


Yeah, I was interested in the [0...255] thing, the calculated gotos were discussed in the other comment.

As far as js0n, it's a great example of cool things that can be done in gcc, but isn't really useful at all since it only handles the first level of depth and has no typing. The jsmn, with typing and nesting, is actually usable, also compiles a lot smaller than the big tables in js0n.


Looks good. For those of us shopping around, what advantages does Jsmn have over yajl?


It does no runtime memory allocation, and since it doesn't depend on libc and has a very small size, it can be used on highly resource-constrained embedded processors.


Sorry, but no. Shuffling around strings is all fine when you're writing Javascript and have GHz and GB of RAM at your disposal, but that stuff just doesn't fly when you need to conserve RAM and especially need to have easily determined time and space constraints.

JSON just doesn't fit that profile. Neat implementation, but don't give the impression that any of this would help with embedded.


"embedded" is a big market and isn't always constrained to be hard-realtime. I spent years as an "embedded" developer at chumby industries and ended up having to use JSON fairly frequently.

Sometimes you don't get to choose what the source format is because you're consuming someone else's data feed, and if all they offer is JSON, you parse JSON. Also, on a 400-ish mhz ARM with 64 MB of RAM parsing reasonably sized JSON data is no big deal in native code.

You need to further qualify your comments beyond just "embedded" because what you're now talking about (very low mhz, KB of RAM) is a niche within the larger embedded world.


Even with GHz and GB of RAM, I've seen enough JSON C parsers which leaks memory over time. There are other issues like how efficient is the memory manager (either malloc or others), etc. IMO, getting rid of memory allocation is definitely an shift toward making a reliable json parser.


Are you willing to categorically claim that there is not now, and never will be, a situation in which someone must process JSON and would prefer to do so with an extremely wimpy chip? And by "extremely wimpy" I mean the same thing you do: clock speeds measured in single- or double-digit MHz, code on something like NOR flash, and maybe a kilobyte or two of RAM if you're lucky.

Sure, it sounds like a bad idea, but it wouldn't be the first time someone has had to do something crazy for compatibility with someone else's stuff.


Why wouldn't this help with embedded? Are you saying it's never a good idea to use JSON for data transport within any embedded system? That would be a bold claim.


highly resource-constrained embedded processors

For me, that means code space and RAM measured in kilobytes, cycles in MHz. Most people don't realize that serializing data (most importantly, floating point numbers) to strings is a complicated and time-intensive matter. Even a limited printf implementation can easily cost you many kilobytes. Not to mention it introduces you to C's most special hell: variable length memory blocks containing strings.

The one thing that embedded gives you is a lot of control about your computing environment. Just sending around binary data is a very viable thing to do under these conditions. Not so for JSON; its single biggest selling point is that you can use it in every platform out there. Its the oldest tradeoff: giving up flexibility allows you to use more constrained processors (and save money).


Excellent points. Thank you for transmitting your wisdom.


Embedded machines these days often run at a gigahertz (or most of it), and have enough RAM to run Linux. Shuffling around strings is not a problem.


What embedded machines are we talking about? The processor in your smartphone is not resource constrained, its just power constrained. Thats a very different tradeoff.


We have successfully used Lua on a Blackfin processor in one embedded project. It worked just fine as long as you did not use it for real-time tasks.


I worked on a team that created embedded flight data recorders for U.S. Army helicopters that used JSON as a configuration format.


What is the purpose of JSON?

Does it have to do with efficiency?

Because if so, now we find ourselves discussing the resource requirements just to scan/tokenise and parse it to get it back into a human readable form. Why did we translate it to a non-readable form in the first place? What were we trying to achieve?

Maybe we should let JSON be something the receiver translates text to (if they want that sort of format), not the sender. The receiver knows what resources she has to work with, the sender has to guess. The same principle applies to XML. By all means, play around with these machin-readable formats to your heart's content. But do it on the receiver side. No need to impose some particular format on everyone.

The "universal format" is plain text. The UNIX people realised this long ago. People read data as plain text, not JSON and not XML, not even HTML. No matter how many times you translate it into something else, using a machine to help you, it will, if humans are to read it, be translated back to plain text.

As for the "plain text haters", let us be reminded that UNIX can do typesetting. Professional quality typesetting. But that's the receiver's job.[1] There's a learning curve, sure, but what the receiver can produce using typesetting utilities on her own machine is world's better than what a silly web browser can produce from markup.

1. I am so tired of dumping PDF's to text and images. PDF makes it seemingly impossble to scan through a large number of documents quickly. Ever been tasked with reading through 100 documents all in PDF format (i.e., scanned images from a photocopier)? What could be accmplished in minutes with BRE takes hours or even days to accomplish. This is a problem that persists year after year. OCR is a hack. In most cases, the text should never have been scanned to an image in the first place. The documents are being created on computers, not typewriters!

So, as I see it, if you were a plain text hater, and you were really sincere about making things look nice, then you would be a proponent of educating people how to do typesetting and sending them plain text, the universal format, that they can easily work with.

My solution to JSON and XML is sed. It works in all resource conditions and most times is just as fast as any RAM hungry parser. If I need to do complex things, that's what lex and yacc are there for. Pipes and filters; small buffers. 'Nuf said.


Out of curiosity, what's the fastest JSON parser written in C out there?

Is it still YAJL?


Oj[1] benchmarks significantly faster, but I'm not clear if it's usable outside of a Ruby environment, as I am not aware of any non-gem distributions.

[1] http://www.ohler.com/oj/


Direct link to the source: https://bitbucket.org/zserge/jsmn/src/1caee52d37e3/jsmn.c

This looks good to me. It isn't going to be the fastest or the shortest (no we aren't golfing) but it's simple and easy to understand.


How is this different from jansson? I have used jansson in the past, and it has served me very well.


jsmn is only one .h and one .c file, jannson is more.

jsmn only parses the JSON into tokens, you handle all the rest.

jannson provides its own hashmap etc., jsmn doesn't.

I've just used jsmn for a current project of mine where I've already had implemented e.g. a custom hashmap, and I didn't want to link two into my code. So the choice for the lean (if not to say "minimally") jsmn came naturally. And I don't regret it :)


Hey. What do you think about this?

https://github.com/popee/libjason/blob/master/jason.rl

It is implemented in ragel state machine compiler, with no other dependencies. It is modified version of libejson but simplified to use only standard JSON. Also small. Btw ragel is great utility ;-)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: