
Arguments against JSON-driven development - afroisalreadyin
http://okigiveup.net/arguments-against-json-driven-development/
======
wtbob
> The fundamental advice on Unicode is decode and encode on system boundaries.
> That is, you should never be working on non-unicode strings within your
> business logic. The same should apply to JSON. Decode it into business logic
> objects on entry into system, rejecting invalid data. Instead of relying on
> key errors and membership lookups, leave the orthogonal business of type
> validity to object instantiation.

This right here is the correct approach. Serialisation formats should be
serialisation formats, whether they be JSON, S-expressions, protobufs, XML,
Thrift or what-have-you; application data should be application data. There
are cases where it makes sense to operate on the serialised data directly, for
performance or because it makes sense in context, but in the general case
operate on typed application values.

~~~
qwertyuiop924
Part of the problem is that JSON and Sexprs aren't that they AREN'T
serialization formats. They've been pressed into service as such, but they are
actually notation for datastructures: In python, it may not be idiomatic to
crawl dicts like this, but in JS, those aren't dicts, they're objects. If
they've been de-serialized to some degree, they may even have their own
methods.

By the same token, in Lisp, Sexprs aren't a serialization format. They're a
notation for the linked cons cells that Lisp data is made of. In Lisp, that
Sexpr will be crawled for data, or maybe even executed.

So while in Python, both may _seem_ to be serialization formats, they aren't.

Either way, if the application programmer has any sense, they'll abstract away
the format of their data. In a lisp app, you won't be cdring down a sexpr,
you'll be calling a function to grab the necessary data for you, usually from
a set of functions that abstract away the underlying sexpr implementation, and
treat whatever it is as a separate datatype.

Of course, the sexpr might have been fed to an object constructor. Heck, it
might _be_ an object constructor, or a struct constructor. All of those types
typically provide O(1) access, and autogenerated access functions, so it's the
same story.

~~~
recursive
From the official source: "JSON (JavaScript Object Notation) is a lightweight
data-interchange format. "

[http://json.org/](http://json.org/)

It's nothing about pressing into service. This is the authoritative source.

~~~
dspillett
From the official source: "Democratic People's Republic of Korea"

[http://www.korea-dpr.com/](http://www.korea-dpr.com/)

It's democratic, and for the people. It says nothing about being a
totalitarian dictatorship.

Not everything is/does what it says on the tin. You don't have to agree with
official or otherwise authoritative sources without question.

(not that I wish in any way to compare Mr Crockford or whoever runs json.org
with DPRK or its leadership - I'm just using a deliberatly extreme example to
highlight that what is written may not be what is, at least not from
absolutely everyone's point of view)

~~~
oldmanjay
I hope you feel clever, because in context, this is far too inapplicable to
the discussed reality of JSON to be an actual point.

~~~
dspillett
_> I hope you feel clever_

I generally do, thanks, though I'm not any sort of genius by any measure.

 _> because in context, this is far too inapplicable to the discussed reality
of JSON to be an actual point._

You seem to be missing a bit of an intentional context switch. The comment was
more about the logic of the GP's response to "<something> isn't really X"
which was "yes <something> is X, it says so on <something>'s home page", than
it was about <something> or X in particular.

So it was relevant in the context of the discussion and the facts being used
for reference but not, as you call out, in the context of _the subject of the
discussion_ (hence my somewhat defensive clarification of intent in the last
sentence)

------
sidlls
I'm not entirely in agreement.

Use of object-oriented programming paradigms here would merely distribute the
logic that is necessary to achieve the desired mapping over multiple points in
the code.

The example function presented is only marginally too complicated. I'd split
it in two: one to obtain the book list given the same arguments as the example
function, and one taking the result as its only argument to build the mapping.

I find myself shying away from rigorous adherence to encapsulation more and
more these days. I prefer small functions that operate on data explicitly.

Edit: and I'm a bit confused how the example has anything to do with "JSON-
driven development", other than the coincidence that a hash/dictionary is the
core data structure being manipulated here. This example function could exist
and be (mostly) reasonable had JSON never existed. I'd expect to see an
argument that the JSON serialization schemes that abound are problematic,
given the title.

~~~
milesvp
This. I've been programming this way for over a decade. Long before JSON was a
thing. I find I rarely need anythng more than a list or a dict for most of the
data manipulation I do. Being on the web has only strengthened my tendency for
this, since everything ends up being stringly typed anyways. Nearly every
function/API I write is: get some data from somewhere (hopefully serialized),
manipulate the data, return data (very possibly serialized). Nearly every time
I've seen coworkers try to improve things with classes, it complicates the
code, and often adds little encapsulation given how much we do is reliant on
external data sources.

Every once in a while I think how nice it would be to be able to use typed
data and smart setters to avoid much of the bounds checking I have to do, but
I find there's never enough code between the boundaries of serialization to
make it worth the added complexity that this introduces (also my problem
domain involves mostly copy so most things are basically strings, ints, or
datetimes anyways).

~~~
jowiar
Lists are lists, and I have no issues there. It's dicts as objects (often
nested) where things to get hairy. I often see folks end up relying on
internal implementation details of other libraries, or other data sources, and
things can subtly break (or explode in a ball of fire).

The more systems I've built, the more I've wanted to have very well-defined
seams between "inside" and "outside" \-- well-defined interfaces with external
APIs, libraries, databases, systems that may be maintained at a different
speed, etc. Having an explicit translation/serialization/etc. step forces you
to do this. It's not the only way, but pretty much every codebase I've
interacted with that doesn't do this gets careless, and it's can get really
messy when things get any longer than a "script".

~~~
sheepmullet
> I often see folks end up relying on internal implementation details of other
> libraries, or other data sources, and things can subtly break

I'm not sure how you can get around this as a consumer of a service? How do
you know what is an internal implementation detail? Why are they exposing
implementation details?

As a producer of a service there are lots of techniques. E.g.

\- Only expose data and provide a spec for that data.

\- Provide "helper" classes in target languages for consumers to use

\- Publish an API spec + gaurantees

\- Always maintain backwards compatibility

~~~
jowiar
As a consumer, you're not going to get around it. But what you can do is
quarantine it to a specific area within your application. I find dependencies
to be a very natural place to divide up an application -- at the very least,
to consider "What would have to change if I ripped this thing out entirely?".
"What are the methods I would need to implement?", etc, and writing a
translation layer implementing that interface.

As a hypothetical, let's say you needed to implement a binary persistence
layer in your application. Rather than interacting with S3 in 10 different
places, you define a "BlobStore" interface with the methods that you need,
code against that interface, then implement an S3BlobStore that handles the
calls to Amazon using whatever library necessary.

------
Cthulhu_
I disagree with the anemic object argument. If an object is just there to
store data and no behaviour, then that's fine - don't add behaviour if it
doesn't need it. A large portion of back-end services are CRUD and data
wrangling operations anyway - as in, convert data format A to data format B
(which I guess could be a constructor or factory method if you're comfortable
with having the conversion logic in a data class).

~~~
clifanatic
> If an object is just there to store data and no behavior

Then why do you have it at all?

~~~
Bognar
To define a valid shape for related data.

------
lmm
The main reason this happens in Python is that creating actual datatypes is
incredibly clunky (by Python standards) because of the tedious "def
__init__(self, x): self.x = x". The solution here is to have a very
lightweight syntax for more specific types, e.g. Scala's "case class".

I'd also argue for using thrift, protobuf or even WS-* to put a little more
strong typing into what goes over the network. Such schemata won't catch
everything (they have to have a lowest-common-denominator notion of type) but
distributed bugs are the hardest bugs to track down; anything that helps you
spot a bad network request earlier is well worth having.

~~~
aeruder
An article about the "attrs" library was posted here a couple weeks ago.
Really highlighted the tedium of Python objects while offering a neat
solution.

[https://glyph.twistedmatrix.com/2016/08/attrs.html](https://glyph.twistedmatrix.com/2016/08/attrs.html)

Regarding protobuf, I'm a bit disappointed with the direction of version 3.
Fields can no longer be marked as required - everything is optional; i.e.
almost every protobuf needs to be wrapped with some sort of validator to
ensure that necessary fields are present. I understand the arguments, but I
did enjoy letting protobuf do the bulk of the work making sure fields were
present.

~~~
tantalor
Required fields are bad; don't use them.

 _You should be very careful about marking fields as required. If at some
point you wish to stop writing or sending a required field, it will be
problematic to change the field to an optional field – old readers will
consider messages without this field to be incomplete and may reject or drop
them unintentionally. You should consider writing application-specific custom
validation routines for your buffers instead._

[https://developers.google.com/protocol-
buffers/docs/proto#sp...](https://developers.google.com/protocol-
buffers/docs/proto#specifying-field-rules)

~~~
lmm
They're a tradeoff. Sometimes you really are confident enough that this
attribute will be required forever that the saving of not having to write
custom validation is worth it.

------
mhd
This basically repeats the ORM arguments/counter-arguments, but now it's a
slightly more complex data structure instead of the DB-row-as-hash/array you
get there. "row-driven" in this context often leads to barely wrapped DAO
Objects.

On the other hand, sometimes (surprisingly often) a hash is good enough and
the effort spent in modeling the database (...) doesn't need to be replicated.

And as with ORMs/SQL generators/DAOs/etc., there's a whole spectrum of
solutions and you really have to look at the task to see what's appropriate...

------
mythz
This isn't JSON-driven development, it's just choosing to apply logic over
loose-typed data structures instead of named constructs. It's more awkward in
Python because it doesn't have sugar syntax to index an object like JavaScript
has.

But using clean built-in data structures instead of named types has its
benefits especially if you need to serialize for persistence of communication
as it doesn't require any additional knowledge of Types in order to access
serialized data, so you can happily consume data structures in separate
processes without the additional dependency of an external type system that's
coupled and needs to be carried along with your data.

This is why Redux uses vanilla data structures in its store or why JSON has
become popular for data interchange, any valid JSON can be converted into a
JavaScript object with just `JSON.parse()` which saves a tonne of ceremony and
manual effort then the old school way of having to extract data from data
formats with poor programatic fit like an XML document into concrete types.

If your data objects don't need to be serialized or accessed outside of the
process boundary than there's little benefit to using loose-typed data
structures, in which case my preference would be using classes in a static
type system to benefit from the static analysis feedback of using Types.

~~~
Nullabillity
> as it doesn't require any additional knowledge of Types in order to access
> serialized data

You still need to know the shape of the data you're working with, or you won't
get anything useful done. So you can't skip defining types or a format, you're
just skipping the tools that help you follow said format.

~~~
mythz
You only need to know how to access the data you need, not the entire class
structure that's coupled to the monolith that created it.

------
mcms
Anemic objects and whether they are harmful or harmless has been debated in
software engineering for long.

I find over-relying on encapsulation more harmful than useful nowadays
specially if you are going to write scalable software that are inherently
distributed. For example, hiding accessing a database behind a simple getter
function makes another programmer ignore performance implication and other
issues that may arise.

~~~
qwertyuiop924
Yes, but OTOH, it lessens the likelyhood of errors, and means you'll have to
rewrite minimum amounts of code when you, say, switch from MySQL to Postgres.

Abstraction always lessens awareness of that which is abstracted. Decide where
to draw the line for your app.

------
Millennium
It sounds to me like these arguments aren't so much against JSON, per se.
They're against using JSON.parse() (or json.loads() in Python, json_decode()
in PHP, or whatever) as your entire data-import process.

Instead, the argument goes, one should load the JSON, walk the redulting
structure, and use it to build your native data structure/objects/whatever.
Similarly, when the time comes to save, you crawl through your native
structure to build a dict/array/primitive structure, then call
JSON.stringify() (or the analogous function) to serialize that.

Uncoupling your data structure from the serialization format, though, is
really just basic good software design anyway, is it not? Does anyone argue in
favor what this article calls "JSON-driven development" as a design principle?
Or is it just a shortcut that developers -and I am no less guilty of this than
anyone else- sometimes take in the interest of getting a quick-and-dirty
solution out the door?

Yes, working directly on the output of JSON.parse() is a code smell. But I'm
not sure that claiming there's a rising trend of "JSON-driven development" is
entirely founded. It's just people taking shortcuts.

~~~
ramblenode
This. "${PRACTICE}-driven development" suggests a practice that someone
actively pursues because of perceived merit rather than a shortcut taken
because of time/resource constraints.

------
jowiar
With dynamic languages (certainly Python, Ruby, and JS), there's a definite
lack of nudge from the tooling to translate from "interchange" format into an
internal "smart" format (procedural code + everything is a hash = easy hacks).
Whereas with something like Scala, the tools make it very clear that if you
serialize/deserialize on the periphery, you're in for a bag of hurt (or, at
the very least, fighting with one hand tied behind your back).

This is not to say that one tool is better than the others, but tools do have
opinions, and while being more permissive/ambivalent makes throwing together a
quick script easier, a tool that nudges you in the direction of building
something in a more-sustainable way is useful when building nontrivial
systems.

------
micimize
While I see the point the Ulaş is getting at, I wouldn't call this JSON-driven
development. I think JSON-driven development would use abstraction layers that
are based on JSON, like JSON schema, and perhaps an OOP library that leverages
it.

What I'd actually call this problem is a lack of abstraction. In functional
programming, simple data structures are often preferred, and composable
functions are used to manage complexity. A functional programmer might declare
a function `to_structured_dict(enumerable, path)` and call it with
`to_structured_dict(book_list, path=('shop_label', 'cell_label, 'book_id',
count'))`

~~~
qwertyuiop924
And then EVERY programmer would further abstraact that, as you really want the
bare minimum of your code depending on your datastructure internals.

------
falcolas
If you're in Python, and are afraid of "anemic" objects, I would recommend
checking out collections.namedtuple. It's a fantastic lightweight and
performant object-like data structure.

You also get a few additional features, such as in-order iteration, the
parameters are fixed at run time, and there's a method for turning it into an
ordered dictionary (which is serializable in, wait for it, JSON).

~~~
ajdlinux
If you're not limited to the standard library,
[https://github.com/hynek/attrs](https://github.com/hynek/attrs) is also worth
taking a look at.

~~~
StavrosK
I just found out about it in this article, and already posted a new story (it
looks that good):

[https://news.ycombinator.com/item?id=12359522](https://news.ycombinator.com/item?id=12359522)

------
Singletoned
> Once you go dict, you won't go back. This style of development is too easy,
> since dictionaries are baked into Python, and there are many facilities for
> working effectively with them.

How is this an argument against using dictionaries?

After 10 years of Python development, I do find myself using dictionaries
rather than objects, in just the way that the author proscribes, but I'm
finding it to be a genuine pleasure.

~~~
st3v3r
And what happens when one of those keys changes?

~~~
niftich
You change your code. Loose coupling a nice goal to aim for, but at the end of
the day, somewhere deep down inside the code, you have to tightly couple to
actually get anything done. Where that transition occurs is entirely
programmer's discretion.

~~~
st3v3r
But there's a difference between changing it once when you
serialize/deserialize it, and changing it every time you try to access the
key.

~~~
sidlls
Writing the code with the assumption that a key will change is on the same
level with premature optimization, in my opinion.

It might be reasonable to make the assumption for some keys. In these cases a
function taking the data and key as arguments and returning the value is
sufficient and appropriate.

Frankly, if the code base is so littered with references to that specific data
and key combination it might actually indicate a poor design.

~~~
st3v3r
"Writing the code with the assumption that a key will change is on the same
level with premature optimization, in my opinion."

Considering how easy it is, and how often I've had keys change on me, I have
to strongly disagree.

"Frankly, if the code base is so littered with references to that specific
data and key combination it might actually indicate a poor design."

That's kinda the point of the article.

------
beat
I think this article is somewhat off-base. The problem isn't JSON, it's lack
of respect for separation of duties. JSON is just a data exchange format.

Want to program in an OO way using JSON? Easy. Just build a factory to
generate objects from JSON input. Put your validation and error handling right
there. Now you can get a known valid object from the JSON, a class instance
with all the encapsulation and business logic your heart desires. Need to
share it with the outside world? Provide a JSON output method.

Translating data formats is at the heart of day-to-day programming. It ain't
rocket surgery. Fix the problem, not the blame.

(And if you think JSON sucks, believe me, you never dealt with data file
formats from the pre-XML days!)

~~~
wvenable
The author isn't saying the problem is JSON.

The same programming style exists quite a bit in older PHP code as well. This
is because one of it's primary data types is an list/hashtable hybrid. And
JSON is similar -- it promotes those same structural types; the list and the
hashtable (array and object, respectively). So programmers are using them, not
just for building structures for data interchange, but for actual programming
logic.

The fix for the problem is just education.

------
nbevans
There is so much wrong with this blog post that I don't even know where to
begin. He appears to have included Python-specific details in his list of why
he hates lists and dictionaries. Apparently Python throws exceptions if a key
doesn't exist and seemingly has no Maybe/Option alternative? I don't know if
that is true or not.

He claims using lists and dictionaries means you lose encapsulation - does it?
A smarter programmer would realise that actually it entirely depends on the
_types_ you are storing in those data structures.

------
digisth
The rule of thumb I've always used for when to use OO is "will there be more
than one extant object at once or not?" If yes, and especially if these
objects need real behavior, then use OO.

If you're essentially going through one object at a time, then discarding
them, you're may just be doing conduit data processing, and so there's little
advantage to using objects. I think what's missing in this (well-written)
analysis is this distinction; if you're slurping data from one place, making a
few changes (or especially if you're not making any), then sticking into a DB
or vice versa, OO may be the wrong choice.

Ask yourself while writing the code: "are these active, behavior-driven
objects that need encapsulation and relatively sophisticated behaviors, or is
this just data I'm doing some relatively simple processing on?"

------
corysama
The author has lots of good points. Because I write most of my Python in the
style he is advising against, I recognize that style has issues. The main
issue for me is that a dict of dicts of dicts is not an interface. It doesn't
have any constraints. It doesn't communicate expectations for use for the
actual intent of the code. The best you can do is a comment explaining what to
expect and a lot of error checking.

That said, almost all of the python I write these days is in the form of
functional transforms on built-in data structures. And I love it!

There was a great Pycon2012 talk titled "Stop Writing Classes". You can find
it linked and discussed here
[https://news.ycombinator.com/item?id=3717715](https://news.ycombinator.com/item?id=3717715)

~~~
davidism
Start Writing More Classes: [http://lucumr.pocoo.org/2013/2/13/moar-
classes/](http://lucumr.pocoo.org/2013/2/13/moar-classes/)

HN discussion:
[https://news.ycombinator.com/item?id=5204967](https://news.ycombinator.com/item?id=5204967)

------
agentgt
On the one hand I agree with OP on directly interacting with JSON is not
really a good idea but on the other hand I completely disagree with that
behavior should be shoved into data objects. Also I think part of the problem
is Python doesn't have much typing (I know they recently added optional typing
in python but I don't think many use it).

As more of an FP guy I'm firm believer of the separation behavior and data.
Clojure's Hickey sort of has a valid point... its freaking data... stop making
it complicated to access it.

------
Robin_Message
I'm surprised no-one has linked Steve Yegge's Universal Design Pattern –
[http://steve-yegge.blogspot.co.uk/2008/10/universal-
design-p...](http://steve-yegge.blogspot.co.uk/2008/10/universal-design-
pattern.html)

It argues that loosely defined objects are an excellent design pattern, but
I'm too tired to decide if it is directly relevant to this.

------
thesmallestcat
It's called a hash. Or a dict. Or a map. Not JavaScript Object Notation, FFS.

------
spullara
At this point everyone should be using an evolvable (thrift, protocol buffers,
avro, etc) schema format when they are storing or transmitting their data if
they want to run an always on service - there is no downtime for migrations in
the real world. Trying to do this ad-hoc with JSON is a lost cause and will
eventually lead you to failure at runtime or worse, data loss situations.

~~~
crucini
JSON isn't un-evolvable. In fact, thrift can serialize to JSON.

What makes thrift evolvable in practice is that we don't remove fields and
don't add mandatory fields. The same discipline can be applied to JSON
definitions.

Well thrift also tags all fields with integers, so a consumer with an older
schema can parse a record with a newer schema, skipping the new fields. Of
course JSON trivially has this property.

Maybe the key here is "ad-hoc"; something like JSON-schema is needed.

~~~
spullara
Yep, I mentioned below that using JSON as a serialization format is fine but
you still need to specify a schema and understand what happens when you read
data written by newer/older code.

------
jjzieve
At least lists, dictionaries map relatively well to a tabular (SQL) format.
Objects don't map well at all! Anyone who's spent enough time with "mature"
ORMs knows this. Especially when there's a deadline and you have to write
"native" SQL just to get whatever the hell you needed in the first place.
"Well maybe you should have read everything and understood the ORM to its most
minute detail..." NO! That's the whole point of abstraction! If I understood
everything about that code, I'd be better off re-writing it to better suit MY
specific problem. Look, I don't want to be another OO basher. OO definitely
has a place in complex systems like game development, where the lives of the
objects are longer than a page refresh. But in web dev, its becoming
increasingly obvious to me that the OO paradigm is a huge time suck. /rant

~~~
okreallywtf
I feel like we have this discussion at work daily involving nhibernate. It is
an abstraction that makes 80% of work quicker and easier, but what it makes
easier and cleaner would have been trivial anyways.

------
keithnz
Lots of religion in this thread!

I think the point is, if json is your data exchange format it _could_ be bad
if you let that structure propagate into your application.

so in general you should prefer :-

json => <chosen languages best form for dealing with data>

over json => <chosen languages tools for dealing with json>

Different languages are going to have different mechanisms. Some languages you
may abstract from json completely, some languages may natively deal with json,
and your persistence layer may deal with json also.

So what you need is a well considered design that takes advantage of your
chosen languages philosophy / mechanics, whatever that may be. There is no one
way to design anything. The thing to avoid is - not working out how to
structure your code to make things easy / appropriate to the task at hand.
That path leads to messy code.

------
lgunsch
I find that code that uses dictionaries a lot ends up with mysterious unnamed
types used in various places throughout the system disguised as dictionaries.
They have required fields, and must interacted with using business logic that
is not obvious. This becomes a real problem when the original authors of the
system are gone, and new maintainers have taken over and have to implement new
features.

By using objects, or lightweight objects like namedtuple, which have already
been mentioned in other comments a bunch, you formally document the data-
structure. You give it a name, expected fields, and required behaviours when
interacting with it. The code becomes much easier to follow and understand
clearly. Bugs don't creep in when a new maintainer forgets about the
mysterious undocumented required business logic.

------
qwertyuiop924
For crying out loud, you don't have to build an object hierarchy around the
thing (although it would make sense to in this case), but at least have the
common sense, or the sense of shame, to abstract away data lookups into
separate functions. That's data structure abstraction 101.

------
lanestp
I take a couple of issues with the author here. I don't personally worry much
about breaking away from strict OOP. When a pattern like this develops it is
usually because the data is too dynamic for a static property list. An obvious
example is for creating reports. No one is going to sit down and hand design
every single multi column report in a large project (I tell a lie, people do,
it just makes the code base a horror show). By letting the data be more
dynamic (Usually with JSON) it is trivial to create generic report structures
and populate them.

Additionally, if you are using NoSQL as a backing store then solutions like
class serialization don't make any sense since you will need to communicate in
JSON anyway.

------
grandalf
The code in the example has poor encapsulation, but I do not think it's the
"JSON" style that causes that.

Much OOP code includes a hodgepodge of exposed internal state and methods that
offer a combination of derived state and behavior to mutate that state.

Often, using data literals (like JSON) can make code clearer by making it
explicit what is going on with state (when/if it is being mutated), and making
the system easier to snapshot, test, etc.

While much code that uses JSON-like constructs is overly verbose and error
prone, adding a bit of structural typing (with Flow) or creating schemas to
ensure system invariants (jsonschema) can lead to a system that is easy to
reason about and maintain.

------
weatherlight
Use GraphQL or Really stick with RESTful routes. The more predictable the
schema of these Dictionaries/hashes/JSON are the less likely you are to see
that mess above. This is true whether you are using a FP approach or an OO
approach. Using an Imperative coding style when doing ETL will always be
hairy.

That function also violates the Single responsibility principle. I wouldn't
even know where to begin to write a unit test for that other than breaking it
down into smaller parts. There are design patterns that could be followed in
dynamically typed languages that would avoid that mess altogether other than
just OO.

------
smizell
Coupling your code to the JSON you receive over the web can lead to some
interesting problems. If the system on the other end decides to make some
change you are not expecting, it can lead to errors.

In JavaScript, a simple thing that helps is to use lodash.get and provide a
path to the property you are wanting.

    
    
      lodash.get(someObject, 'path.to.a.property')
    

If the path isn't there, the lodash.get returns undefined. This is much nicer
than getting the error "Cannot read property 'to' of undefined" when "path"
isn't there.

~~~
StavrosK
Yeah, this is currently pretty bad in Python, which led me to create the jsane
library:

[https://pypi.python.org/pypi/jsane](https://pypi.python.org/pypi/jsane)

>>> j = jsane.loads('{"foo": {"bar": {"baz": ["well", "hello", "there"]}}}')

>>> j.foo.bar.baz[1].r()

u'hello'

------
bluetwo
Yes, any useful technology will get over-applied, at the detriment of better
ways to do things.

Not the fault of the technology, but of the developer who failed to consider
alternate ways of accomplishing the same task.

------
tscs37
JSON is best when it's solely used for serialization (or config files).

Using it deep into the project makes no sense, the first step in handling JSON
should always be to code it into native data structures.

~~~
rocqua
The point made here seems to be that often, these native data structures are
dicts and lists, because that is JSON. Meanwhile, you'd often want something
else when looking at only the internals.

------
ajmurmann
I think the "anemic objects[domain model]" is a red herring in this case. It
would be much cleaner to create separate serialized and deserializers that
convert your actual domain models to JSON and back. By the time you are doing
something as shown in his example like building a book inventory it should be
all proper objects and no primitives dictated by what should be the
serialization layer.

Edit: Fixing phone auto correct typo - "property objects" -> "proper objects"

------
mpweiher
I actually saw this coding style _way_ before JSON, for example at Apple. I
even jokingly created DUKE: Developers United against Keyed Everything.

Another place I see this is in the eternal dynamic/typing debate. A lot of the
criticism of dynamic typing will be with examples from JavaScript, Ruby,
Python and maybe even PHP. Hardly ever from Smalltalk (or Objective-C),
because Smalltalk code tends to not have the types of problems cited. This
puzzled me for a while, because I also find these languages somewhat less
"solid", yet couldn't quite put my finger on why.h

That is, until I realised that all of these languages use hashes as their
basic object representation. Coincidence? I think not. So I coined the term
"hash language" for these languages, both because they are hash-based and it
appears to be easy to make a hash of things in them, possibly for precisely
that reason.

That said, I think it's also a mistake to disregard the power this sort of
very generic programming brings, especially once you consider objects composed
of multiple facets that are interpreted in different contexts.

IMNSHO, the way to combat hash-programming is to provide powerful and
convenient metaprogramming facilities for object representation, so dealing
with objects generically is just as easy and obvious as dealing with
dictionaries.

Not entirely surprisingly, my own language (
[http://objective.st](http://objective.st) ) has some facilities for this,
mostly by making identifiers into first class entities. More research needed
;-)

------
iamleppert
Is the OP familiar with Object.keys()?

You don't have to hard-code explicit dot notation into your code when you're
processing JSON or any other hierarchical object serialization format, which
is what JSON is.

If you want to make your code more robust, you should process the structure of
the JSON document and infer meaning from its keys based upon your position in
the tree and of the values of the key names that are meaningful to your
application.

This makes it possible to accept any kind of JSON, even if the original format
changes, and you won't get uncaught exceptions and your application can decide
what to do in a more graceful manner.

You should also centralize the code that is responsible for serializing and
deserializing your JSON wire format and creating objects. There's no reason to
have ad-hoc code in each object constructor like his example. A good example
of such a thing is dnode
[https://www.npmjs.com/package/dnode](https://www.npmjs.com/package/dnode). It
handles all the JSON abstraction (in this case for RPC) and you don't even
need to worry about the JSON ever again.

This has nothing to do with JSON and more to do with poor design and tight
coupling of interfaces.

------
spdustin
I may be missing something, so I'd appreciate a correction, but why all that
effort when you can use collections.namedtuple and a custom object_hook for
json.loads?

    
    
        import json
        from collections import namedtuple
        
        data = '{JSON string goes here}'
        fancy_data = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))

------
K0nserv
I agree with this and I've raised it several times in Objective-C codebases
that can some times end up littered with NSDictionary:s everywhere taking no
advantage of the Objective-C type mechanics. I don't think it's necessarily as
bad in python or javascript. This is because these languages are dynamically
typed and that diminishes the benefits of deserializing json to a native
model. It's still valuable because you can guard against bad data by rejecting
at the boundary of your program.

In statically typed languages however the added bonus is a lot more
significant because the type system increases the benefits of deserializing
JSON to native models. Take this Swift example

    
    
        import Foundation
        
        enum SerializationError: ErrorType {
            case InvalidData
        }
        
        struct Thing {
            let a: Int
            let b: String
            
            static func deserialize(fromDictionary data: [String:AnyObject]) throws -> Thing {
                guard let a = data["a"] as? Int, let b = data["b"] as? String else {
                    throw SerializationError.InvalidData
                }
                
                return Thing(a: a, b: b)
            }
            
            static func deserialize(fromArray data: [[String: AnyObject]]) -> [Thing] {
                return data.flatMap {
                    try? deserialize(fromDictionary: $0)
                }
            }
        }
        
        let data: [[String: AnyObject]] = [
            [
                "a": 10 as NSNumber,
                "b": "Hello" as NSString
            ],
            [
                "a": "10" as NSString,
                "b": 10 as NSNumber
            ]
        ]
        
        let models = Thing.deserialize(fromArray: data)
    

Not only do you end up with a native array of models you can also be certain
that any type information is correct because invalid results have been thrown
away during parsing.

~~~
xyzzy4
It's good to use lots of NSDictionary's in Objective C, in my opinion. The
alternative is to create lots of different object types that takes much more
code, and very little benefit for that extra code. If you're just shifting
data around then there's no need to define objects for them.

------
moosey
This is what keeps dragging me back to moose & perl5. You describe the
attributes of a class, the constructor for that class is created for you, and
you can pass in hashes and it will automatically instantiate (and fail if the
rules you have set for attributes are not met).

I've found that you can kinda sorta do the same with other languages
(python/ruby/javascript) by writing static factory builders inside the class
that do this checking for you and raise an exception or return an object for
you, but it still doesn't compare to me to
Moose/Moose::Util::TypeConstraints::coerce/subtype and attributes with the
coerce option set. It makes it so easy to coerce a deep json object into a
deep class structure.

I always try to hunt down things that are similar in other languages (python
allows named arguments from a dict, IIRC, and that allows similar things, but
you still have to write the constructor yourself), but I've yet to find
something that makes it as simple.

------
sleek
If he used tuples as keys to the dict, he wouldn't have this absurd code and
the article wouldn't have been written

------
zdw
And this is where we end up without easy to use and well supported schemas...

If this was XML, you'd write a very simple RELAX NG grammar (use the compact
syntax: [http://relaxng.org/compact-
tutorial-20030326.html](http://relaxng.org/compact-tutorial-20030326.html) )
that describes the structure of the incoming data, then use it to validate the
input data before processing it.

After that, you know data is valid and in the right structure, so you can
throw away most of the "is this in the right place?" checks.

JSON and YAML's various schema implementations can't hold a candle to this,
and it's been around for over a decade.

The XML ecosystem does have some very bad parts, but it's not all bad, so it's
worth learning from places where it actually works well.

~~~
sk5t
In Java land, swagger + dropwizard validation do a rather good job of this
with json. I wouldn't be excited to return to using soap/xml all the time.

------
michaelfeathers
This is space that the Clojure community has already visited.

~~~
davexunit
And every other Lisp community.

~~~
collyw
And what was their outcome?

~~~
golemotron
heat death.

------
crucini
I agree that the supplied code is improvable. Consider this:

    
    
      def set_r(adict, keypath, val):
        key = keypath[0]
        if len(keypath) == 1:
          adict[key] = val
          return
        if not key in adict:
          adict[key] = {}
        set_r(adict[key], keypath[1:], val)
    
      def build_book_inventory(book_ids, shops):
        shop_labels = [shop['label'] for shop in shops]
        books = Persistency_books_table_read(
          shop_labels=shop_labels,
          book_ids=book_ids)
        inventory = {}
        keys = 'shop_label cell_label book_id'.split()
        for book in book_list:
          keypath = [book[k] for k in keys]
          set_r(inventory, keypath, book['count']
        return inventory
    

First, the author clearly needed "autovivification" as supplied by Perl. We
supply a substitute with set_r().

Second, I'd avoid creating local variables like "book_id". It creates mess. We
never had the slightest interest in the book_id; it's just part of the wine we
are pouring from one bottle into another.

Third, I've preserved (modulo names) the interface of this function but I
suspect the surrounding code could also be improved. Also call a list of books
"books", not book_list; list is the assumed sequence container in Python.
"books=book_ids" is unfortunate; to thrive in a weakly typed language we need
variable names that distinguish objects from ids.

Larger point: the author wants to create classes for the various business
objects, which is a common enough pattern, but ultimately just makes extra
work and redundant lines of code. A relational database can handle a wide
variety of objects, with some knowledge of their semantics, without any custom
code per-class.

As you know, the difference between dicts and objects in python is mostly
semantic sugar. We can easily enough make a class that gives dot-notation
access to values in a dict, if one objects to the noisiness of foo['bar'].

If you want to enforce object schema at system boundaries, there are better
ways (more compact, expressive and maintainable) than writing elaborate
"classes" for each type of object.

------
rm999
More generally, using data structures well-suited to your problem is really
important but often underappreciated in software engineering. Elegant code and
algorithms naturally follow.

In the example in the article, OO seems like a good way to go.

------
dgb23
From an OO perspective I can completely agree with this and the encode/decode
pattern the author is suggesting seems to be the right way to deal with this
problem as it will also hopefully translate in having an esperanto data type
which can be used for all sorts of APIs and formats.

But from a functional perspective I would disagree. The code example wouldn't
even make sense in that world. You would query it as is and compare it with
other data or create new structures as with any other data you handle in your
language.

------
tony-allan
I agree 100% with the article.

I've just written a script which makes every one of the mistakes listed in the
article. I am consuming a JSON based API in a long ugly mash of code, exactly
as described. It doesn't look pretty.

In my defence, I wrote the code as I was trying to understand the API. I had
not read ahead and didn't really know which API's I would need or how fiddly
it would be to bring it all together.

I am now exposed to pain if the API changes or if anything breaks. Time to go
back and tidy up the code!

~~~
tony-allan
I went back and fixed the script to turn a portion of the JSON API into a set
of useful Python objects. The code looks mostly OK now.

It's no surprise that it took about the same amount of time again to tidy it
up. Apart from making me feel better, and a promise of less work in the future
for updates, it's hard to justify the extra effort.

------
astazangasta
JSON is the equivalent of a UNIX pipe; it is for passing data between
applications on the Internet, just like a pipe is a way for passing data
between applications on a machine.

------
codedokode
In PHP we call it "Array Oriented Programming" (
[http://www.epixa.com/2012/04/array-oriented-
programming.html](http://www.epixa.com/2012/04/array-oriented-
programming.html) ). So looks like Python developers finally discovered this
paradigm too. Let's wait for JS developers now.

By the way, PHP has type hints for functions that help to understand what are
the types of arguments and the return type.

------
joelcorrea
Pretty much statically typed vs dinamically-typed discussion. It depends on
your particular case, if there is a sufficiently defined schema or not

~~~
ajmurmann
I don't see how this is related to statically typed vs dynamically typed. This
is more a case of Primitive Obsession
[http://c2.com/cgi/wiki?PrimitiveObsession](http://c2.com/cgi/wiki?PrimitiveObsession)
and a lack of a adapter/serialization layer.

~~~
joelcorrea
I meant, typed vs not-typed. Someone mentioned Clojure, and for me thats the
whole point: on one hand, and for a whole category of problems the best
abstraction is not to have a strong schema, given the variable parts. On the
other hand however there are scenarios where you already know the structure
sufficiently well. A matter of abstraction IMO

------
jroseattle
Just an aside, I'm not sure I agree with the title of "JSON-driven"
development.

The problem being described is about using parts of a language in an
inefficient or ineffective way of solving a problem.

The solution to this particular problem can be summed up in concepts like
encapsulation or DRY. More to the point, let's not blame components like JSON
for basically poor implementation.

------
malodyets
I prefer XML + schema (for which I use RelaxNG + jing) because this makes
validating the input very straightforward and normalizes that process. But I
understand the appeal of JSON and use it for some APIs myself. What
experiences do people have with JSON.net Schema? Does anyone know of a json
schema + validator system that is cross-language?

------
emodendroket
There's no particular reason objects can't be serialized as JSON and
deserialized back into strongly typed objects.

------
collyw
Thankfully there isn't really JSON _driven_ development as a methodology. Its
just a bit of a crappy pattern (for many tasks) that far too many people use.
Hopefully no one is advocating it as _the_ way to develop software.

------
partycoder
I like to wrap relevant objects around a type, that is responsible for
creating, validating, serializing, deserializing and mutating that object.

But taking a JSON and passing it around, with no ownership, no
predictability... it's the mindset of the tech debt programmer.

------
qwertyuiop924
I think the general rule is "validate anything coming in from the internet,
ever."

------
ris
I couldn't agree more. My life these days consists 90% of marshalling json
around. Hooray for microservices.

------
seomis
So we just need references? Let's all switch to XML (just kidding) or YAML
(maybe not kidding)!

------
Goladus
Although I coincidentally mostly agree with the conclusion "encode on entry
and decode on exit" I disagree substantially with the rest of the article.

> I know that it's now en vogue to sneer at OO

I don't _sneer_ at it. OO has been the dominant coding religion for most of my
career and I rage against the educators and propagandists who spend a decade
smothering everyone with it. I curse them for all the time I wasted trying to
build classes for my data only to realize that if I had used a simple
dictionary or list my code would be shorter, simpler, more robust, and more
flexible.

The logic in the article above is all premised on object-oriented religion. OO
for OO's sake because OO. Using the power of dictionaries is bad, using the
power of objects is good.

> It completely defeats object orientation.

You could just as easily say using objects defeats the point of having lists
and dicts.

> It offers nothing of the abstraction powers of object orientation.

Most of the abstraction power of object orientation happens when you create
the methods. If you aren't defining new classes, you write functions instead.
You still have abstractions, what you don't have is the encapsulation of code
with data.

> It doesn't say what it's doing.

Sure it does, and better yet since you are using the standard data types it
will be said in a language that any other Python developer is likely
understand immediately.

> The above code is filled with auxiliary logic that has nothing to do with
> what it actually tries to achieve.

The above code is filled with auxiliary logic because the author of it
apparently didn't write any useful functions for operating on dictionaries.

I'm not sure that any of the auxiliary logic in that code has anything to do
with the choice to use dictionaries. It's a standard data munging problem that
comes from having data from different sources. You have the exact same problem
with objects if you didn't write any useful methods for operating on them.

> but done right, it can be very powerful, especially in big and complex
> codebases.

And in my opinion, the right way to do Object-Orientation in Python is to _not
do it_ until you really know you need it: your code is heading towards big and
complex and you need to lock it down and organize the data and methods into
well-encapsulated classes. (Although maybe at that point you realize it's not
a big deal and don't bother)

Designing around lists and dicts from the start is a much more flexible
strategy than trying to get all the encapsulation exactly right on the first
try. If you don't have lots of time to spend up front UML'ing an object
heirarchy for your big, complex application, you're probably better off
sketching and iterating with JSON in mind (and yaml if humans need to edit
it). As your application takes shape, it will become apparent where it makes
sense to lock down functions and data into objects.

This is especially true given that all of Python's standard types are
inheritable classes.

------
drawkbox
Arguments for dict/list driven development:

NOTE: This isn't JSON driven-development, JSON just mimics base types like
dict, list, string, numeric etc of all languages and a big reason it is so
common especially in Python/Javascript.

\- Large unknown lists/dictionaries data structures can be deserialized into
dicts/lists for any language without issue.

Sometimes keys/data are unknown such as large attribute data sets that may
always have new keys. In that case strong typing to an OO object will always
be broken. Example: a facebook attribute set, keys/data that aren't set will
not appear, new ones are added all the time, which would create a cat/mouse
serialization/deserialization game. Same problem a binary structure has
(offsets) when you really need a flexible keyed structure.

One missing key doesn't break your whole serialization/deserialization system
built with strongly typed OO. Validation can be done on accept and _if
necessary_ convert into an OO system.

\- If needed, classes that are backed/extended/inherited by a dict/list or set
(or composition) that can load in the JSON/dict/lists and only expose the
needed values after validation are useful.

i.e. a class that inherits from or composes to Dictionary<string,object> for
instance in C# would only fill keys that are necessary for the view data, not
a bunch of extra null fields because it might not have that key/property. It
also has the ability to deserialize objects that may have new keys. Not
everything is a perfect world where data structures are known before-hand.

\- It reduces complexity many times, no need for an OO serialize/deserialize
layer when you are passing back as basic dict/list or JSON.

Why add complexity to something simple?

\- Unless you control the server and the client, real-world data structures
aren't a perfect map of keys/values to OO properties.

Assuming that is a system ready to break in the real-world many times. Someone
adds a field to the DB object then all clients that use it can't
serialize/deserialize. Real world serialization/deserialization has to accept
in basic types dict/list, validate and then use as needed (some go to OO
objects behind scenes). I see too many systems where people just have an EF
object and expose that over a web api and just expect it to work, that is a
bad example of poor encapsulation. Some fields don't need to be serialized to
public apis. In Microsoft land MVVM was created to help stop this practice but
still creates two sets of OO objects and breaks on any new keys/data (though
breaking here may be desired for strong typing).

\- Dict/list data structures can be easily setup to have cleaner naming and
keys without tons of attribute/helpers

i.e. first-name key instead of first_name, FirstName, or firstName. This is
more friendly to web/url naming that is common.

\- Noted in the article, less memory used in many cases and highly optimized
time in basic lists/dicts.

There are many more reasons...

Dicts, lists and basic string, numeric types are the base of all languages and
computer science types. The reason this is common is it is simple to work with
these types without added cruft of OO when needed.

OO does add complexity and if it isn't necessary you are just upping
complexity for no reason (and memory). It is similar to the complaints of C
coders to C++, basic structs and sets are sometimes less complex than C++ OO
objects. Same thing with dict/list of some monstrosity of an OO
serialization/deserialization system that breaks on every new key or field and
you have to update the server and client rather than just increment a version
and validation. The longer you code the more you see this.

OO objects should not be used all the time just like dict/lists shouldn't be
used all the time.

------
hardlianotion
Completely agree.

------
catnaroek
> If an object is just there to store data and no behaviour, then that's fine
> - don't add behaviour if it doesn't need it.

In that case, you want values rather than objects. Alas, Python doesn't have
compound values.

~~~
qwertyuiop924
Please stop it.

Every single time anything so much as slightly related comes up, you talk
about compound object. Incessantly. And then you act superior when nobody
knows what you mean, or cares about compound values. They're not really
relevant to this discussion anyways, as storing data as desribed in GPP is
perfectly reasonable.

The worst part is, your definition of "value" and "object" in this context is
so far out of the ordinary that people should be expected to not understand
it, and you should provide an explanation pre-emptively.

To quote Randall Munroe, "Communicating badly and then acting smug when you're
misunderstood is not cleverness"

~~~
dang
You've repeatedly become uncivil in this thread. That's not ok, regardless of
wrong or provocative someone else may be. If you can't remain civil, please
don't comment here.

[https://news.ycombinator.com/newsguidelines.html](https://news.ycombinator.com/newsguidelines.html)

[https://news.ycombinator.com/newswelcome.html](https://news.ycombinator.com/newswelcome.html)

~~~
qwertyuiop924
You are, of course, correct. I appologize, and will go to greater lengths to
remain civil in threads in the future.

~~~
dang
We very much appreciate it. Thank you.

~~~
qwertyuiop924
It's all part of being a good commenter. My irritations, frustrations, and
nitpicks ade my problem, not yours, and I'll do my best to ensure it stays
that way in the future.

