Hacker News new | past | comments | ask | show | jobs | submit login
URL Object Notation: A better JSON for URLs (vjeux.com)
54 points by vjeux on Sept 22, 2011 | hide | past | favorite | 57 comments



  _foo_bar=Something&baz=Else
could parse as either of:

  {foo:{bar:"Something",baz:"Else"}}
  {foo:{bar:"Something"}, baz:"Else"}
which seems to ruin things. You need a character to end an object, maybe ; or & would work.

Also, I don't like using : in query strings as a separator since that makes it a bit ugly and not quite a query string. How about using = for both and using say ! to identify non-strings, and @item@item@item for arrays?

So your example becomes:

  _user_name=Bob%20Smith&age=!47&sex=M&dob=5/12/1956&pastimes=@golf@opera@poker@rap&;children_bobby_age=!12&sex=M&;sally_age=!8&sex=F


> _foo_bar=Something&baz=Else

> could parse as either of:

> {foo:{bar:"Something",baz:"Else"}}

> {foo:{bar:"Something"}, baz:"Else"}

I don't think this is right, or at least, it shouldn't be right. I think _foo_bar=Something&baz=Else should only parse as {foo:{bar:"Something"}, baz:"Else"}.

{foo:{bar:"Something",baz:"Else"}} should be _foo_bar=Something&_foo_baz=Else

It should be possible for the writer of the article to fix this without major changes.


> {foo:{bar:"Something",baz:"Else"}} should be _foo_bar=Something&_foo_baz=Else

This makes sense, but is not what I or the author was suggesting. Note that it is somewhat repetitive (foo appears twice). In contrast the new suggestion simply has an possible ; to disambiguate these possibilities.


I thought that I could get away without terminations for objects & arrays but I was wrong. I've added ; as you suggested. I made it safe to remove ; at the end in order to keep size small for flat structures.

@item1@item2@item3 is a really good idea. It's added :)


Unreadable, ambiguous and just plain ugly.

First of all, in my understanding URLON is primarily made for URI fragments. No browser would send a URLON-encoded data, and if you're doing AJAX, then URLs are barely a concern.

While the JSON is overly-verbose, the URLON is way too human-unfriendly:

    JSON: {"table":{"achievement":{"ascending":true,"column":"instance"}}}
    URLON: _table_achievement_ascending:true&column=instance
Humans suck at parsing grammars. We even suck with deeply nested parentheses (that's why editors highlight matching parens), and counting underscores is way more counter-intuitive.

Oh, and key names with underscores will look ugly.

If one'd want to be "URLish", and concise, he'd write something like

    table[achievement[ascending=true&column=instance]]
While not perfect, at least, a human could understand what's going on from a first glance.


To me, this:

    _table_achievement_column=instance&ascending:true
is less clear than:

    table[achievement][column]=instance&table[achievement][ascending]=true
which also has the benefit of being parsed properly by most web frameworks.


> table[achievement][column]=instance&table[achievement][ascending]=true

Which is way too verbose than some sort of

    table[achievement[column=instance&ascending=true]]


The URL strings are mostly read only. Once generated, they are not going to be edited by the users anymore.

I find it better to have less readable output as long as there is a win in term of length. Because no one likes big urls.

Also, what is less visible in URLON is the object structure. But usually when you want to edit something, you are only interested in changing the values, not the structure itself.


If you want to optimize length over readability, then how about:

    col=instance&asc=1
Or even better:

    c=instance&a=1
:)


It is still possible to make such small structure with URLON

  URLON.stringify({c: 'instance', a: 1}) == '_c=instance&a:1'
We also benefit from the typing: 1 is a number an not a string :)


@adeelk: 1, the number, is converted to :1, but "1", the string, is converted to =1. This way I can tell whether it's a string or number upon reconstruction.

You can actually run this on the blog page URLON.stringify({c: 'instance', a: '1'}) == '_c=instance&a:1' It will return false


But

    URLON.stringify({c: 'instance', a: '1'}) == '_c=instance&a:1'
as well, right?


Most web frameworks will never parse this because it's intended to be used within the fragment part of the URL which is never sent to the server.


The standard URL query-string syntax, supported by libraries in every programming language worth using, uses & and = to construct “key1=value1&key2=value2”-style parameters. URLON completely breaks that syntax: if I see something like

  _table_achievement_column=instance&ascending:true
my immediate reaction is “so there are two parameters, one called ‘table_achievement_column’, and one called ‘ascending’... wait... why isn’t there an equals sign between ‘ascending’ and ‘true’?” The syntax is just similar enough to query-string syntax that it can mislead to someone trying to parse it by eye. And if you’re trying to pass complicated recursive structures (the kind of structures that JSON was invented to describe) through the URL, they’re not going to be parseable by eye in any format.


Right, cause no one ever uses an underscore in parameter identifiers.


In cases where I really need to have data structures in URLs, I rather like Rison: http://mjtemplate.org/examples/rison.html which has existed for quite some time now and has parsers/generators in Python (https://github.com/stdbrouw/python-rison), Ruby and JavaScript.


Thanks! I didn't know about Rison, it is trying to solve the same issue :)

Edit: I've added Rison to the list of translations in the article. It is far more readable than URLON but I find that it doesn't feel like it's a part of an url.


If you know what kind of structure each argument will have, the variants O-Rison and A-Rison further cut down on the parentheses. For example:

    http://example.com/service?query=q:'*',start:10,count:10&pretty=false
Looks pretty natural to me.


> It is far more readable than URLON but I find that it doesn't feel like it's a part of an url.

Should it?


I like the concept. However JSON works because it can be directly converted into a JavaScript object and (without too much trouble) a JavaScript object can be converted to JSON.

I thus think that to succeed this approach needs some support on the server side to convert the notation into an object that can be interrogated from code. OK, that would need to be different for each server runtime but things like this tend to pick up support fairly quickly.

I use JSON to communicate with ASP.NET web services because the .NET runtime provides a great de-serializer that can convert the JSON directly to one of my server side classes and vice versa


I am not sure to properly understand your second paragraph. Here's an answer: URLON supports both "stringify" and "parse" operations. So you can go both ways.

You can even do fun things like URLON -> Javascript Object -> JSON if you want to. The fact that it is 100% compatible with JSON is a great benefit.

As for my use, the URLON is in the hash part of the URL (after the #) so everything in running in the client. This is useful to give URLs that hold the current state of the page.


I thought the work you did was cool but honestly? Hashes are not the place to store page state. See http://www.webmonkey.com/2011/02/gawker-learns-the-hard-way-... for an example.

If you need to store page state RESTfully, I would suggest you create a page state resource and PUT your state there. If you need a page's state, ask for it by name. This also has the benefit of keeping URLs shorter and cleaner.


I'm not sure to understand what you mean. Could you tell me how I would implement that on a concrete example:

http://db.mmo-champion.com/items/#table__search_results_item...

I want to store: page number, sorted-column, reverse.


Create a new /items/pageState URI that accepts POST, PUT, and GET requests.

POST to /items/pageState to get a handle - this could be a randomly generated small sequence of characters. A response will contain a Location: header with a URL: /items/pageState/aA1 (the aA1 is just an example for this description - each user would get an unused sequence of characters)

Anytime the user changes the state of the page, PUT the page number, sorted-column, and reverse fields to /items/pageState/aA1.

Now, the url of the items page to http://db.mmo-champion.com/items/?pageState=aA1. When that page loads, the JS will make an Ajax GET request to /items/pageState/aA1 to fetch the state of the page, and re-render it appropriately.

If you're concerned about speed, well, don't be. Add support for eTags on /items/pageState, and while the state is unchanged, that data will be fetched from browser cache instead of the network.


One state per user is not what I want. If the user sorts in one way, give a link, then sorts in another way and put a link back. I want both links to be differents.

Also, your solution adds a lot of extra server calls. The goal of client-side sorting & pagination is to avoid those. I don't want to get them back just for the sake of being RESTful.


Ah, new constraints on the problem! :) Love it!

My solution does not have to add a lot of extra server calls - it all depends on the caching strategy you choose. For example, if you use maxAge, you would only be adding one extra request per new state created, or per new fetch; all subsequent fetches for the same state would automatically pull from the browser cache.

To satisfy the linking requirement, I would redesign my solution to just POST the state to /items/pageState, and get back a URI that represents only that state (which can be shared across all users). Combine this with the maxAge, and yes, you would have to make more requests, but the extra overhead would still be way less significant than adding a link to an image on every page (especially in terms of bandwidth)

But, chacun à son goût! Everyone prefers their own tradeoffs :)


You have to store state in the URL. Otherwise how can people link to particular "states" within your application? There would be no way to address them without being part of the URL.


I outlined a way to do this nearby. :) State can be stored using query parameters. If you have enough state that URIs become unwieldy, then the method I suggested nearby would work very well.


Nobody turns a JSON object directly in JavaScript (I'm assuming you mean eval, otherwise, both are doing the same parsing). Security considerations have long since moved us away from that.

Also, this is really meant for web client apps that are maintaining their state in the URL (that crazy AJAX thing). The app would parse it and make whatever JSON requests are needed with the server.


So having a " in your URL is worse than having an &? Neither are representable in HTML, so you're going to have to deal with escaping whether you want to or not. And if you're already escaping & to &amp;, it's pretty easy to escape " to &quot;. Which you should already be doing anyway. (Never put a literal [&<>"'], in your (x)html document, please.)


This smells bad. Passing in a load of JSON (or URLON) seems to go against RESTful practices and really reduces the hackability of the URL.


I thought that the URL structure is orthogonal to whether the service is RESTful or not. I know that RESTful services tend to have nice URL formats but, as far as I understand it, you really shouldn't be caring about URL formats - you have a single entry point and everything is driven by "hypertext" from there:

http://roy.gbiv.com/untangled/2008/rest-apis-must-be-hyperte...


This has been designed to be used in a hash context (after the #) so it's not used in REST application.


One more point - `JSON.stringify` returns an error when attempting to stringify a circular structure such as the `window` object...

I tried `URLON.stringify(window)` and nothing happened, no error or return value (bad sign) so hit it a few more times, saw a couple stack overflow errors, and the tab crashed.

If possible it should definitely throw a RangeError instead of attempting to stringify a circular structure.


You can use Douglas Crockford cycle tools in order to deal with this.

https://github.com/douglascrockford/JSON-js/blob/master/cycl...

As the only way to know if an object has already been visited is in linear time in Javascript, it makes cycle detection a O(n^2) process. This is way too costly to add by default unfortunately.


Thanks for the link, I see what you mean about costly.

Perhaps a way to hack-fix this is to set a parameter for maxRecursion, to prevent the errors - seems like a pretty solid way of preventing endless loops (RangeErrors) but is probably something that should be off-by-default and switched on globally via a setting or once via a parameter...

I can add it and submit a pull-request for your consideration if you like?


http://jsperf.com/object-cycle-detection

I've added a jsPerf to see how bad it performs. For small objects it's reasonable but if the object has 10 000 elements the Javascript test will take 1 second, which is not acceptable.

I'd rather give people a detection cycle detection library if their input may be circular.


A slight mod can make it easier to read/parse. First the rules: everything is a string unless specified like num# date% bool! list* and objects are delimited by : and ;

Here is an example:

    user:name=Bob,age#=47,sex=M,dob%=5-12-1956;pastimes*:golf,opera,poker,rap;children:bobby:age#=12,sex=M;sally:age#=8,sex=F


I think it would be easier to declare that several forms like (true, false, ISO-8601 strings or decimal numbers) are "special" and if there's a string that looks like them it must be specifically noted as so. This complicates parsing, but feels more natural.

In my understanding, most of time there's an occurence of "age=47", it actually means {"age": 47}, not {"age": "47"}. But if we'd want to represent the latter, we could easily do it with "age*=47".

So it'll be something like:

    user:name=Bob,age=47,sex=M,dob=1956-12-05;pastimes:golf,opera,poker,rap;children:bobby:age=12,sex=M;sally:age=8,sex=F


Btw, people complain about the usability or applicability of such a format, well it is more about the mental challenge to solve a problem in a clean way and the steps to get to the best solution, even if the problem is non-existent or silly or retarded, the neural exercise is worth it.


'#' denotes a fragment in a URL. Browsers will only send 'user:name=Bob,age' to the server.


Then lets use $, we still have a few chars available


So the main advantage over rison is that it urlify should work correctly in as many clients as possible? It would be good to start a test matrix to verify this.


Are you planning to make a server-side implementation so that these URLs could be parsed on the server?

Love the idea.


If your server side is written in Javascript, you can use the implementation here:

https://github.com/vjeux/URLON/blob/master/urlon.js

If not, someone need to write a URLON library for your language.


Why would want json (or html) in your urls? Example?


I want to store the current state of the page. For example, if there's a table in the page, I want to keep track of what page I'm in, what column I sorted ...

http://db.mmo-champion.com/items/2/#table__search_results_it...


If that's all you're trying to do, then why not store the json using local storage and a randomly generated key, and just put the key into the url's fragment?


The URL has to be shareable. If I give the URL to someone else with your technique, he won't be able to see the same page.


Yeah, that's a valid point. Requirements matter :)


Why wouldn't you just base64encode your json and include that in the URI and the base64decode it when you want to de-json it ?


I've tried many things but I wanted it to be readable, small, and good looking.

JSON:

    {"user":{"name":"Bob Smith","age":47,"sex":"M","dob":"5-12-1956"},"pastimes":["golf","opera","poker","rap"],"children":{"bobby":{"age":12,"sex":"M"},"sally":{"age":8,"sex":"F"}}}
Base64:

    eyJ1c2VyIjp7Im5hbWUiOiJCb2IgU21pdGgiLCJhZ2UiOjQ3LCJzZXgiOiJNIiwiZG9iIjoiNS0xMi0xOTU2In0sInBhc3RpbWVzIjoNClsiZ29sZiIsIm9wZXJhIiwicG9rZXIiLCJyYXAiXSwiY2hpbGRyZW4iOnsiYm9iYnkiOnsiYWdlIjoxMiwic2V4IjoiTSJ9LA0KInNhbGx5Ijp7ImFnZSI6OCwic2V4IjoiRiJ9fX0=
JSON + URIEncode

    %7B%22user%22:%7B%22name%22:%22Bob%20Smith%22,%22age%22:47,%22sex%22:%22M%22,%22dob%22:%225-12-1956%22%7D,%22pastimes%22:%5B%22golf%22,%22opera%22,%22poker%22,%22rap%22%5D,%22children%22:%7B%22bobby%22:%7B%22age%22:12,%22sex%22:%22M%22%7D,%22sally%22:%7B%22age%22:8,%22sex%22:%22F%22%7D%7D%7D
URLON

    _user_name=Bob%20Smith&age:47&sex=M&dob=5-121956;&pastimes@=golf@=opera@=poker@=rap;&children_bobby_age:12&sex=M;&sally_age:8&sex=F


You can translate the base64 version into a readable form from the terminal using widely available libraries. For example, using a Python interpreter you can do this:

  >>> from base64 import b64decode
  >>> from json import loads, dumps
  >>> print dumps(loads(b64decode('eyJ1c2VyIjp7Im5hbWUiOiJCb2IgU21pdGgiLCJhZ2UiOjQ3LCJzZXgiOiJNIiwiZG9iIjoiNS0xMi0xOTU2In0sInBhc3RpbWVzIjoNClsiZ29sZiIsIm9wZXJhIiwicG9rZXIiLCJyYXAiXSwiY2hpbGRyZW4iOnsiYm9iYnkiOnsiYWdlIjoxMiwic2V4IjoiTSJ9LA0KInNhbGx5Ijp7ImFnZSI6OCwic2V4IjoiRiJ9fX0=')), indent=2)
  {
    "pastimes": [
      "golf", 
      "opera", 
      "poker", 
      "rap"
    ], 
    ## [etc.]
  }


The goal is to use it in a URL. I don't want my urls to look like that:

  http://db.mmo-champion.com/items/#eyJ1c2VyIjp7Im5hbWUiO
  iJCb2IgU21pdGgiLCJhZ2UiOjQ3LCJzZXgiOiJNIiwiZG9iIjoiNS0x
  Mi0xOTU2In0sInBhc3RpbWVzIjoNClsiZ29sZiIsIm9wZXJhIiwicG9
  rZXIiLCJyYXAiXSwiY2hpbGRyZW4iOnsiYm9iYnkiOnsiYWdlIjoxMi
  wic2V4IjoiTSJ9LA0KInNhbGx5Ijp7ImFnZSI6OCwic2V4IjoiRiJ9f
  X0


If you want to pack that much information into a URL, it’s going to look ugly no matter how it’s formatted.

(I assume you have some reason for not just passing a session key in the URL and keeping all the relevant state on the server.)


Fair enough, though json | base64 is the easiest for others to implement

json | gzip | base64 saves another 33 bytes :)

and you already know urlon is the shortest


That would be the most sane solution i think. Is there an advantage to having a readable URI? Browsers are in the process of deprecating the address bar already.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: