Hacker News new | past | comments | ask | show | jobs | submit login
Arguments against JSON-driven development (okigiveup.net)
257 points by afroisalreadyin on Aug 25, 2016 | hide | past | web | favorite | 294 comments



> The fundamental advice on Unicode is decode and encode on system boundaries. That is, you should never be working on non-unicode strings within your business logic. The same should apply to JSON. Decode it into business logic objects on entry into system, rejecting invalid data. Instead of relying on key errors and membership lookups, leave the orthogonal business of type validity to object instantiation.

This right here is the correct approach. Serialisation formats should be serialisation formats, whether they be JSON, S-expressions, protobufs, XML, Thrift or what-have-you; application data should be application data. There are cases where it makes sense to operate on the serialised data directly, for performance or because it makes sense in context, but in the general case operate on typed application values.


Part of the problem is that JSON and Sexprs aren't that they AREN'T serialization formats. They've been pressed into service as such, but they are actually notation for datastructures: In python, it may not be idiomatic to crawl dicts like this, but in JS, those aren't dicts, they're objects. If they've been de-serialized to some degree, they may even have their own methods.

By the same token, in Lisp, Sexprs aren't a serialization format. They're a notation for the linked cons cells that Lisp data is made of. In Lisp, that Sexpr will be crawled for data, or maybe even executed.

So while in Python, both may seem to be serialization formats, they aren't.

Either way, if the application programmer has any sense, they'll abstract away the format of their data. In a lisp app, you won't be cdring down a sexpr, you'll be calling a function to grab the necessary data for you, usually from a set of functions that abstract away the underlying sexpr implementation, and treat whatever it is as a separate datatype.

Of course, the sexpr might have been fed to an object constructor. Heck, it might be an object constructor, or a struct constructor. All of those types typically provide O(1) access, and autogenerated access functions, so it's the same story.


A notation for a data structure is a serialization format.

> In Lisp, that Sexpr will be crawled for data, or maybe even executed.

A s-expression cannot be executed; it's just text.

The object which it denotes can be walked or executed via eval.

Before that happens, the s-expression must be converted to that object.

In other words, deserialized by the reader.


Good point kaz. I meant that the notation signifies a specific set of general data structures, unlike XML, which doesn't specify the data-structure used in memory afaik, only the actual tree structure of the data itself.


In that s-expressions are a notation for computation, they can be executed by an interpreter. This is what we refer to as execution. Even assembly is like this, there's no other reasonable way for it to work right now.

This feels like you're hairsplitting to no obvious benefit other than increasing confusion. I could as easily say "mathematical notation isn't math it's just text, you can't evaluate '1 + 2' without a human being because otherwise those are just marks on the page otherwise." This is true(ish) (with the correct escaping), but it's difficult for me to see how it's relevant to the discussion? We could imagine the situation where I've created the Texpr, which has slightly different notation but the same properties. I don't know that we would necessarily classify it differently or treat it differently.

This leads to the alternate conclusion that maybe the s-expression is the underlying set of objects in the interpreter/compiler. (S-expressions are a special type of linked list, in that case). This rings true(er) to me, because of the way that we talk about s-expression manipulation in lisps. We most certainly are not using string operations to generate them. In which case the thing with the many parenthesis is merely standard lisp syntax, not s-expressions. This is further justified by the existence of different syntax in Dylan or Clojure, and availability of reader macro manipulation as its own entity.

Tldr; it most certainly is not text! Nor can it be executed.


> mathematical notation isn't math it's just text

That is correct. It's just text which talks about concepts that don't have text, like transcendental numbers, infinities, infinitesimals and so on.

There are functions in mathematics that can't be written down in symbols at all, like the integrals of certain functions (which themselves can be written down).

Math text has some useful properties in that certain transformations you can think of as typographical (manipulations of the text) actually preserve semantic properties in a useful way. So for instance addition commutes, semantically; and in the text, this lets us swap the left piece of text for the right one, around the plus sign.

There can be a very close correspondence between typography and semantics (like in Douglas Hofstadter's "TNT": typographical number theory, which he uses to explain Gödel's incompleteness theorem).

> This leads to the alternate conclusion that maybe the s-expression is the underlying set of objects in the interpreter/compiler.

I assure you that it isn't; not in any main-stream Lisp interpreter or compiler.

(Where by "main-stream Lisp interpreter or compiler", I intend to rule out cute hacks like this:

https://news.ycombinator.com/item?id=7956246 )


No, kaz has a point. do not confuse the shadow for that which cast it: Sexprs are a serialization format/notation for sets of conses. Most of the time, saying so is splitting hairs, but it bears mentioning here, as we're discussing serialization formats.


JSON is a serialization format that was based on the data structure notation for Javascript. It is, however, a serialization format. Javascript objects are a superset of JSON, as they can contain arbitrary objects and functions, which JSON can not, and "true" Javascript object notation can elide quote marks or use apostrophes for keys, whereas JSON strictly specifies double-quotes around keys.

The problem that arises in Python and other dynamically-typed languages is that there exists a default deserialization that is so good it very strongly tempts the programmer to use that exclusively. However, as good as the default serialization may be, it's also quite dangerous, for the reasons you mention and more. In strongly-typed languages there's a stronger focus on parsing the JSON instead, which has the advantage of producing objects with stronger guarantees (which isn't quite the same as pointing out there's stronger typing here, you could theoretically get the same guarantees in Python with some sort of JSON schema library or something), but has the disadvantage of generally being more challenging, since it's hard to beat the conciseness of "json.loads(s)" in Python. There generally is a default serialization in strongly-typed languages, but it's far more likely to become inconvenient if you need anything beyond simple numbers and strings, and people generally learn to prefer true deserialization in my experience, unless they, alas, start their program out from day 1 inputting and outputting JSON and accidentally structure their entire program around the default JSON structures and end up with the exact same problems as you'd get in Python. But as long as JSON isn't the very first thing to go in you're generally in better shape.

(I have witnessed a Java program primarily written as a map of string to map of string to map of string to string. It was unsalvagable. And for all the cynicism I may occasionally muster, I don't say that often, because refactoring can be pretty powerful in Java, but this was beyond help. It actually had no JSON in site, but the same fundamental forces were in play.)

Personally, despite preferring the more strongly typed approach in most ways, I must confess that when I'm working in Perl I am generally unable to resist the temptation to just JSON::XS::decode_json, and cover over the differences with unit testing rather than dealing with "true" deserialization. I make myself feel better by also telling myself that if I do anything else, I will confuse my fellow programmers who don't generally expect to see fancy deserialization routines when dealing with JSON, which is true enough, but in my heart I still know guilt.


> In strongly-typed languages there's a stronger focus on parsing the JSON instead

I think you mean _statically_ typed languages here. Python is a strongly typed language.


The definition of "strongly-typed" that Python conforms to is a useless one, because very few weakly-typed languages exist anymore, and even those are only very, very partially "weakly typed" by having operators that are defined to do automatic coercion, generally only between strings and numbers, which isn't even the way the original "weakly typed" was meant. The only truly "weakly typed" language I know of that is still extant is assembler/machine language, which can never really go away, where all you ever have are numbers, and thus absolutely nothing stops you from adding a string pointer to the first element of some structure. (Modulo some distinctions that still may exist between floats and integers and such... even assembler isn't as weak as it used to be, though it is still by no means a strongly-typed language.)

So I don't use the useless definition. By any useful definition of strong vs. weak typing, that is, one that actually creates two or more non-empty, non-trivial sets of members in the universe of discourse, Python is a weakly-typed language.


You can use whatever definition you want in your own head, but you cannot expect anyone else to accept it.

Furthermore, one of the most popular languages in use today is weakly typed: JavaScript!


Here's a plausible definition: Strongly typed languages disallow programs to escape the type system (e.g. put an integer into into memory, then treat it as a float, or vice versa, as in the famous Quake fast-inverse-sqrt hack.). Oh, look, Python is strongly-typed and C is weakly-typed.

Note that I do not endorse using "strongly-typed" to mean this definition, or any other. There are no useful definitions of this phrase, don't use it, except when correcting people.


I guess you need a term for "typing of moderate strength"


I'm not sure I prefer the strongly typed approach: I come from Lisp, so the approach is: "read it, validate it, and then wrap it in functions to hide the implementation in case we change it."

This works well in Lisp, where the line where objects end, and structs, lists and functions begin is hazy at best. Besides with a bit of wrangling, you could probably just pass your validated serialized data to the object constructor as the arguments. Or you could just write a struct, which is simpler than an object, provides O(1) access, and you can still probably easily pass your datastructure, or something close, into the constructor as the arglist.


What are serialization formats, then? What makes them different from notations for data structures?


Not gp, but one difference is you can have a notation that can't capture some state: think of the date class in JavaScript. JSON can't serialize this without resorting to string encoding, whereas something like a protobuf or pickle could.


What's really being described there, I think, is the notion that you can serialize and unserialize some data and get, to some extent, the "same" data back.

Whilst that's more possible with protocol buffers or pickling (or whatever your language calls it), I can't think of any languages offhand which can round-trip any data. It's generally not possible to serialize objects denoting external resources - such as open file handles or network sockets. It's also often not possible to serialize closures, weak references (without dereferencing them), and not necessarily possible to serialize self-referencing objects: e.g. a list which contains itself - pickle can handle it, but I don't believe protocol buffers can do it.


Some extended sexpr notations (particularly those used in Common Lisp, although many Schemes also support it, as do other lisps) support self-referential data structures.

fds and sockets CAN, IIRC, to some degree, be sent to other processes on the same machine, but it's fairly limited.

As for serializing closures, CHICKEN Scheme's s11n egg is the most prominent, although not the only, example. It's fairly limited, once again, to avoid sending the forest with the bannana, as Joe Armstrong would put it.

This has nothing to do with our discussion, I just thought it was cool.


That means JSON is missing support for some kinds of data, that doesn't mean that it's not a serialization format at all. Just one with less descriptive power.


That's fair, I guess it's not really a different class but a different shade of the same thing.


First off, make no mistake, JSON and sexprs ARE serialization formats. I didn't establish this well in my orignal comment. However, they're designed be a notation for specific structures: dicts and arrays in json's case, and cons cells, specifically Lisp code in the case of sexprs. This is unlike say, XML, which defines a tree hierarchy, but NOT what the underlying structures are. That's the parser's job.


From the official source: "JSON (JavaScript Object Notation) is a lightweight data-interchange format. "

http://json.org/

It's nothing about pressing into service. This is the authoritative source.


From the official source: "Democratic People's Republic of Korea"

http://www.korea-dpr.com/

It's democratic, and for the people. It says nothing about being a totalitarian dictatorship.

Not everything is/does what it says on the tin. You don't have to agree with official or otherwise authoritative sources without question.

(not that I wish in any way to compare Mr Crockford or whoever runs json.org with DPRK or its leadership - I'm just using a deliberatly extreme example to highlight that what is written may not be what is, at least not from absolutely everyone's point of view)


I hope you feel clever, because in context, this is far too inapplicable to the discussed reality of JSON to be an actual point.


> I hope you feel clever

I generally do, thanks, though I'm not any sort of genius by any measure.

> because in context, this is far too inapplicable to the discussed reality of JSON to be an actual point.

You seem to be missing a bit of an intentional context switch. The comment was more about the logic of the GP's response to "<something> isn't really X" which was "yes <something> is X, it says so on <something>'s home page", than it was about <something> or X in particular.

So it was relevant in the context of the discussion and the facts being used for reference but not, as you call out, in the context of the subject of the discussion (hence my somewhat defensive clarification of intent in the last sentence)


"lightweight data interchange format" is not the same as "serialization formats".

For one, serialization usually handles binary data as well. And types, lots of types. Serialization includes class definitions, etc, which are used to create a (in this case) python object. JSON is literally javascript object notation, and has nothing to do with pickling python.


Yes it is.

Serialization doesn't have to handle a particular type system in its entirety to be serialization.

A serialization scheme can dictate its own type system; which can be smaller than that of the programming languages which support that serialization scheme.

JSON has a simple type system; it serializes that system.

(Might you be confusing serialization for other concepts like object store databases, or image saving?)


That's the source that pressed it into service: JSON is exactly what it says it is: Javascript Object Notation. Specifically, it's a subset of javascript's, well, object notation.

The point is, the syntax behind JSON was originally designed for a specific language, as a textual representation of that language's objects. It just happened to make a convenient serialization format.


Just to be technically correct, because that's the best way to be correct...

JSON isn't a strict subset of JS. JSON strings can contain literal line terminators. JS strings cannot.


When using parsers like e.g. Jackson or Gson for Java, this process is completely transparent and does not require any active thought from the developer - well, maybe if there's very specific formats that don't map 1:1 with the class that should be instantiated or generated from the json object.

It's a bit more tricky in JS, both client-side and node. You can't work with the json string there, but after that you work directly with the json object. They're not OOP languages, really. I wouldn't want to work with too much untyped / unstructured json in back-end land myself to be fair.


I've never had good experiences with automated serialisation -- even though it sounds like other people do it with success. What's the secret?

To give you a flavour of the kind of poblem: In C# (or rather .net) json.net reads JSON and calls setters from the target class.

That means the setters have to be public, and you don't know what order they will be called in, and you have no real signal about when it is all done. The constructor is no longer enough to guarantee the object's invariants are met.

Most awkward.


It sounds like you're parsing the JSON straight into your business objects, which is the source of the problem. You need an intermediate class which represents a strongly-typed version of the JSON message. So JSON.net goes from a JSON string into this message object, then you write your own code (or, if it works for you, use a tool like automapper), to go from that into your business classes.


This is what I settled on -- at least in the hard cases. And if I understand his acronyms, it's also what @mythz is recommending.

Perhaps I should have done it for the easy cases as well (where the business objects are struct-like enough that it doesn't matter) and just lived with the boilerplate.

But I see little advantage in this over just having a dictionary that I can inspect to initialise my real business object. True that is not strongly-typed, but the stage between the message-object and the business object can have validation errors anyway, so why not treat typechecking as part of that?


In Java land with Jackson/Gson they can use the getters/setters or reflection and find the private fields. The only time it is not completely automatic is when some json object is mixed cased myField1 and my_field1. Even then, just adding an annotation fixes it. For any special formats, for example iso8601 dates, you can quickly define a serializer/deserializer and be done.

Is it really that hard in c#? It is not something I ever think about in Java.


Even beyond that, Jackson can use a private constructor if you use the @JsonCreator annotation on the constructor and @JsonProperty annotations on each parameter.


That's because you should be serializing purpose-specific DTOs or clean POCOs not Business Objects with behavior.


JSON.NET can use private setters. They just have to exist.

You do have to use thin constructors, but in JSON.NET there's a way to call a method post-deserialization.


Yes, the automatic serialization is not a solution for the most pressing problems presented by the article -- it's just the first thing from all the things that has to be done at the boundary.

You have some DTO class that is your system typed idea about the structure of the JSON -- this class is quite useful as an implicit documentation, but it really has to stay internal to the boundary. You will use an autodeserializer to such class and then you will continue by constructing real object from deserialized data that can be presented to the rest of the application. During such construction you can validate state and return errros.

This step can be eased by some validating attributes on the boundary DTO properties, but there is always some custom logic that describes what is acceptable and what is not.


Automated serialization has gotten much better than it was in the bad old days of RPC and COM!


I have nothing good to say about COM, but I'm seriously thinking about gRPC [0] to get away from the sloppy json endpoints we code around today, at work. Before I dive in I would love to hear, what it is that makes that architecture a bad one.

[0] http://www.grpc.io/


Automated serialization is the devil. Gson and Jackson require you to write EJB-style objects to get automatic serialization - default constructors, with getters and setters for each field - to achieve automatic serialization.

The problem with this approach is that you've completely abdicated the power of the type system to ensure that your objects are valid. What happens if a field is missing from the JSON? Well, that field just becomes null. So now you have one of two options:

1) Write highly defensive code with null-checks everywhere. This is a pain to write, a pain to read, and almost impossible to get right and actually prevent null pointer exceptions. This is a nightmare. Switching to a null-safe language like Kotlin doesn't really help you beyond making sure that you actually code in all the null checks - the code is still ugly and a pain to maintain.

2) Call a (potentially) expensive verification method at the beginning of each method call for your object. This is less error prone than having null checks everywhere, but it's not much of an improvement. Because verification happens not at object creation time but rather when it's used, you'll find yourself with a verification exception at the entrance to some business logic where the JSON was passed to your system a week ago, immediately stored in a schema-less ORM, retrieved now, so you kind of have an idea that you have some client which didn't populate the field, but you have no idea which of the many, myriad versions of the client is responsible. So now you're fucked, and you're doubly fucked if you're losing data because of it.

Or you could just take advantage of type safety and write immutable object factories which refuse to instantiate invalid objects. Then you can write clean code using objects which you know must be valid because of type system guarantees. Libraries like immutables.github.io make this a piece of cake.


> Gson and Jackson require you to write EJB-style objects to get automatic serialization

Not the case. I successfully used Jackson combined with Lombok to achieve some really nice DRY class definitions that Just Worked with Jackson. It took a little figuring out and a couple bugfixes to Jackson but it worked. That said, part of the hassle was that I insisted on being able to do this with @Wither so we could have the objects be immutable too.

Then you can write stuff roughly like

@Value public final class Thing { String name; int age; boolean boiling; }

Although IIRC I had to use some other random set of lombok annotations instead of @Value to get it to work right with Jackson (this was a while ago, don't recall details)

> The problem with this approach is that you've completely abdicated the power of the type system to ensure that your objects are valid.

Yeah, this was a big problem with my approach. The other choice would have been to use @Builders instead of @Withers. Then you get a little more boilerplate (having to type .build()), but you can guarantee the built objects meet consistency requirements. (In retrospect, I doubt I chose the right tradeoff there)


Automatic serialization is not the devil, overly forgiving automatic serialization is the devil. I use JSON serialization libs all the time in Scala which properly support optional vs required fields. For Java devs, Gson has bad required/optional support [0] but Jackson does have it for creator properties [1]. It is important to qualify statements like your initial one to include the specific situation which it is bad instead of using a broad brush.

0 - https://github.com/google/gson/issues/61 1 - http://static.javadoc.io/com.fasterxml.jackson.core/jackson-...


It's a language/framework issue. C# supports first class properties with get/set semantics. In your method (actions) in the controller you would write something like this:

public List<CustomerModel> Get(SearchRequest request) { ..... return customers; }

Somewhere else in the (configurable) pipeline the framework can decide how to deserialize the SearchRequest and how to serialize the List<customer> based on the Accept-Header.

(CustomerModel/Request would not be business objects. They would only be used to on the API layer)

As far as validation. You could just put attributes on the properties of the Request like [Required] and they would automatically be validated before your Get method is called. Of course if the types don't match, the framework would send the appropriate error.


> Automated serialization is the devil. Gson and Jackson require you to write EJB-style objects to get automatic serialization - default constructors, with getters and setters for each field - to achieve automatic serialization.

Gson does not require this. The following class will serialize and deserialze fine with Gson:

    class Example {
      private final int foo;
      private final String bar;
    
      private Example(final int foo, final String bar) {
        this.foo = foo;
        this.bar = bar;
      }
    }
More complicated cases will require custom serializers and deserializers, but any class that defines only basic data types (including collections) works just fine.


> Automated serialization is the devil. Gson and Jackson require you to write EJB-style objects to get automatic serialization - default constructors, with getters and setters for each field - to achieve automatic serialization.

I've used Gson fine with Scala case classes.

Although now I just use Scala JSON libraries, which do not suffer from the two problems you list at all.


Jackson allows you to create immutable object without any problems, with final fields initialized in constructor. It also supports numerous annotations that will throw an exception when field is missing, etc. You just have to know your tool and use it properly, that's all.


if you're just using python for simple scripting and a random failure now and again isn't going to ruin your day, it's fine to just use json.loads, IMO. I've written quite a few scripts where the time it would take to do it 'right' wouldn't be worth the effort.


I think this might be where tools like Flow or TypeScript can become useful, since you can have typed javascript objects.

On the other hand, the fact that there's no runtime typecheck renders static analysis somewhat impotent when it comes to the result of a network call.


...unless your application is doing an in-place edit.

For instance, if your image compression application throws out my EXIF data that it doesn't understand, I'm going to be pissed. (Unless you give me an option to preserve it.)


This right here is the correct approach. Serialisation formats should be serialisation formats, ... application data should be application data.

True, although the OP seems to be advocating having your app pretty much ignore serialization altogether in favor of object-oriented design. In particular the author objects to use of dictionaries and lists instead of objects.

It is true that if you're designing an application with a json api in mind, you're likely to stick with the data structures that are easiest to serialize.

Personally, I started writing programs that way before json became so common. I did it simply to take full advantage of the native data structures and to avoid prematurely confining myself into an object hierarchy that wasn't a good fit for the problem domain. It also winds up making code more generic and easier to rewrite in a different language if necessary (for example, moving server-side code to client javascript).


That's the approach I took with my DNS library (https://github.com/spc476/SPCDNS)---extract the DNS packet into a C structure that's easier to deal with (for instance, the A RR structure: https://github.com/spc476/SPCDNS/blob/master/src/dns.h#L270).


I'm not entirely in agreement.

Use of object-oriented programming paradigms here would merely distribute the logic that is necessary to achieve the desired mapping over multiple points in the code.

The example function presented is only marginally too complicated. I'd split it in two: one to obtain the book list given the same arguments as the example function, and one taking the result as its only argument to build the mapping.

I find myself shying away from rigorous adherence to encapsulation more and more these days. I prefer small functions that operate on data explicitly.

Edit: and I'm a bit confused how the example has anything to do with "JSON-driven development", other than the coincidence that a hash/dictionary is the core data structure being manipulated here. This example function could exist and be (mostly) reasonable had JSON never existed. I'd expect to see an argument that the JSON serialization schemes that abound are problematic, given the title.


This. I've been programming this way for over a decade. Long before JSON was a thing. I find I rarely need anythng more than a list or a dict for most of the data manipulation I do. Being on the web has only strengthened my tendency for this, since everything ends up being stringly typed anyways. Nearly every function/API I write is: get some data from somewhere (hopefully serialized), manipulate the data, return data (very possibly serialized). Nearly every time I've seen coworkers try to improve things with classes, it complicates the code, and often adds little encapsulation given how much we do is reliant on external data sources.

Every once in a while I think how nice it would be to be able to use typed data and smart setters to avoid much of the bounds checking I have to do, but I find there's never enough code between the boundaries of serialization to make it worth the added complexity that this introduces (also my problem domain involves mostly copy so most things are basically strings, ints, or datetimes anyways).


I can't stand the development style that takes takes the associative array values from a web action (POST, JSON parameter...), creates a custom class code, dutifully copies stuff into it (with varying degrees of library support and manual make-work), then immediately passes it somewhere else to copy and immediately throws it away thereafter.

Such as waste of typing and perpetual reading.

The alternative deserves a "top level" comment...


Lists are lists, and I have no issues there. It's dicts as objects (often nested) where things to get hairy. I often see folks end up relying on internal implementation details of other libraries, or other data sources, and things can subtly break (or explode in a ball of fire).

The more systems I've built, the more I've wanted to have very well-defined seams between "inside" and "outside" -- well-defined interfaces with external APIs, libraries, databases, systems that may be maintained at a different speed, etc. Having an explicit translation/serialization/etc. step forces you to do this. It's not the only way, but pretty much every codebase I've interacted with that doesn't do this gets careless, and it's can get really messy when things get any longer than a "script".


> I often see folks end up relying on internal implementation details of other libraries, or other data sources, and things can subtly break

I'm not sure how you can get around this as a consumer of a service? How do you know what is an internal implementation detail? Why are they exposing implementation details?

As a producer of a service there are lots of techniques. E.g.

- Only expose data and provide a spec for that data.

- Provide "helper" classes in target languages for consumers to use

- Publish an API spec + gaurantees

- Always maintain backwards compatibility


As a consumer, you're not going to get around it. But what you can do is quarantine it to a specific area within your application. I find dependencies to be a very natural place to divide up an application -- at the very least, to consider "What would have to change if I ripped this thing out entirely?". "What are the methods I would need to implement?", etc, and writing a translation layer implementing that interface.

As a hypothetical, let's say you needed to implement a binary persistence layer in your application. Rather than interacting with S3 in 10 different places, you define a "BlobStore" interface with the methods that you need, code against that interface, then implement an S3BlobStore that handles the calls to Amazon using whatever library necessary.


With the added benefit that you can reuse these functions for new data shapes more often than I expected when switching to this style.


And better testability.


I think I failed to represent the scale in the example. That's a heavily modified function from a code base I'm working on, and the inventory, book, cell etc. dictionaries and lists containing them are all over the place, with similar looping logic (e.g. find item with given label) and combinations are all over the place. Adding the objects to the above function would of course complicate it in the sense that it would get longer, but it would improve the actual example considerably. I will try to come up with better sample code that represents my worries better.


Still, if there's way too much looping logic strewn around the code, that's probably because the code lacks good abstractions for the most routine data access tasks in this code.

If the code used more objects, each data type would be packaged up with its own methods for looping over the data. For code that doesn't wrap every bit of data in a specialized type, you may find that a few small utility abstractions will clean up your code substantially.

The tradeoff is that you must import these functions in every module that needs to use them, since they aren't passed to your code with the object.


I'm also confused about this example. I've often seen similar code in C using structs. What's really missing for me is some context about why he wants these data structures and what he's going to do with them. Essentially he's doing a join on 2 tables and you are left with the thought, "Why do you need to do that join?"

I think what he's really trying to get at is that he dislikes the style of programming espoused by one of the child posters: Make everything a dict/hash and write filters than manipulate those dicts/hashes. I think the reason he dislikes it is for exactly the reason I'm confused about his example: you can lose track of why you need the types in the first place.

One thing you often see in Javascript (and I presume Python, although I don't have much experience in that ecosystem) is the idea that types don't matter. You have an object (essentially a hash) and you can transform it any way you want. If it is slightly more convenient to access your data in a different way, then transform, transform, transform.

Now all your functions have different signatures: "No, in this function we use the store inventory, which is exactly the same as a book list, but grouped by store". And then you have 25 different functions all doing slightly different versions of the same thing to keep track of all the weird mutations of types along the way.

Again, this isn't new stuff. We've been writing crappy code like this for decades. One of the nice things about languages like C++ is that it's such a PITA to define arbitrary data structures that you avoid doing it, but you still see variations of that theme even there.

As for OO or not OO, I think it's a red herring. If I have functions: make_foo(bar, baz), print_foo(foo), manipulate_foo(foo), or if I have a class called Foo with a constructor(bar, baz) and 2 methods called print() and manipulate(), it's exactly the same thing. Even if you write the equivalent code functionally, mostly all you are doing is moving the context (bar and baz) out of the heap and putting it on the stack (yeah... I know... lack of mutability is a pretty important bit too ;-) ).

This is almost as long as the original rant, but I'll jam one more thing in. Serialization, I think, has little to do with the problem except that people don't know how to separate their concerns at layer boundaries. The main bad idea that perpetuates is that I should have the same data structure in my database as is in my business logic as is in my UI views as is in my UI wigits as is in my communication protocols. Back in my day, we even thought that it was a good idea to serialize entire objects (with executable code!) from one end to the other, so I guess it's getting slightly better ;-)

To sum up: you can't ignore types even when it is easy to morph types in your language. At your layer boundaries you also need to transform your data from one set of types to the other set of types (and you should never expect that a 1:1 mapping is automatically going to be a good idea). Within your layers you should never mutate your types and you should write functions with clear signatures. OO helps you do this. Non-mutating state is also a really good idea and functional helps you do this.


I disagree with the anemic object argument. If an object is just there to store data and no behaviour, then that's fine - don't add behaviour if it doesn't need it. A large portion of back-end services are CRUD and data wrangling operations anyway - as in, convert data format A to data format B (which I guess could be a constructor or factory method if you're comfortable with having the conversion logic in a data class).


Especially true if your business objects are generated code, e.g., protocol buffers.

Combining business logic with business objects is a mistake. That's a textbook example of tight coupling.


> Combining business logic with business objects is a mistake. That's a textbook example of tight coupling.

Isn't that a textbook example of object oriented programming? Whether OOP is a mistake in and of itself is another question ...


Textbook examples of object oriented programming are notoriously bad. Stuff like,

  Dog dog = new Dog("Spot");
  dog.bark();
These are toy examples meant to quickly introduce the mechanics of objects, not teach good software engineering patterns.


Where else are the business rules of an object supposed to go? Textbook? It's what you're supposed to do!

And if you say in a bloody "service", which just has the object passed as the first variable into every bloody method, I swear I'll come down there and strangle you with your own keyboard cord.

That's the actual textbook definition of tight coupling.

I'm working on a project which has 5 layers, business objects with no methods, a dto layer, a dal layer, a service layer and finally the website. All incredibly tightly coupled and utterly pointless.

It's an utter nightmare. Every time I touch the code half of it disappears, right now I've still deleted more lines than I've added while adding a ton of functionality.

That idea is so broken because it violates KISS, YAGNI and DRY all in one in a futile attempt to "decouple" things which are obviously coupled because the data is needed to perform the business rules.

There's a malaise in modern programming and it's the dogmatic pursuit of decoupling over simple and clear code.


I agree, but the problem is that usually the representation of the data comes before the logic you need around it, which can accumulate over a period of months or years. Depending on the application, depending on the programmer(s), that logic can turn into a real mess since there's no obvious place for it to live. This reduces code reuse, which leads to bugs.

It's not always appropriate, but building some language-idiomatic encapsulation around data from the very start makes it much less likely that the inevitable addition of hundreds or thousands of lines of logic will descend into incomprehensible spaghetti hell. This doesn't have to be OOP; it could just as easily be e.g. a module in a purely functional language.


Very good point. I would say that if your object is doing e.g. validation, or if/then/else'ing on field values to normalize them somehow, it's already far from anemic. But the key point is that you should not put data in objects, and then put the business logic, as in the small code sample, into some routine that simply accesses fields. That's the anti-pattern.


> If an object is just there to store data and no behavior

Then why do you have it at all?


To define a valid shape for related data.


I suppose because the language doesn't have typed records. You have to simulate them with objects.


Sometimes many different values are related to each other and it is useful to keep them close together for explicitness (and other reasons).


... to store data?


The main reason this happens in Python is that creating actual datatypes is incredibly clunky (by Python standards) because of the tedious "def __init__(self, x): self.x = x". The solution here is to have a very lightweight syntax for more specific types, e.g. Scala's "case class".

I'd also argue for using thrift, protobuf or even WS-* to put a little more strong typing into what goes over the network. Such schemata won't catch everything (they have to have a lowest-common-denominator notion of type) but distributed bugs are the hardest bugs to track down; anything that helps you spot a bad network request earlier is well worth having.


An article about the "attrs" library was posted here a couple weeks ago. Really highlighted the tedium of Python objects while offering a neat solution.

https://glyph.twistedmatrix.com/2016/08/attrs.html

Regarding protobuf, I'm a bit disappointed with the direction of version 3. Fields can no longer be marked as required - everything is optional; i.e. almost every protobuf needs to be wrapped with some sort of validator to ensure that necessary fields are present. I understand the arguments, but I did enjoy letting protobuf do the bulk of the work making sure fields were present.


Required fields are bad; don't use them.

You should be very careful about marking fields as required. If at some point you wish to stop writing or sending a required field, it will be problematic to change the field to an optional field – old readers will consider messages without this field to be incomplete and may reject or drop them unintentionally. You should consider writing application-specific custom validation routines for your buffers instead.

https://developers.google.com/protocol-buffers/docs/proto#sp...


They're a tradeoff. Sometimes you really are confident enough that this attribute will be required forever that the saving of not having to write custom validation is worth it.


That's a really interesting article, thank you!

EDIT: I liked it so much that I posted it here:

https://news.ycombinator.com/item?id=12359522


How, if at all, does attrs interact with serialization and deserialization?


Named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index.

https://docs.python.org/2/library/collections.html#collectio...


Named tuples are pretty great! I used to think their immutability was a drawback, but I'm starting to come around to the opposite point of view.


I've not yet made use of it, but using the `type` keyword to create new classes quickly looks promising.

https://docs.python.org/3.5/library/functions.html#type


At one company I worked at, we used Avro to transfer data over the network. It's strongly typed with schemas, and it has both a compact binary form for transfer over the network and a text-based form for storage on disk that looks like JSON except field order matters (the schema and data are stored in separate files).


aeruder already posted the awesome glyphobet post on attrs; I agree with everything in there. The Python object protocol is great, but difficult to use for small classes. If you are not doing some kind of schema validation on REST endpoints, you're doing it wrong, I would say. But JSONSchema is also really sucky; write more JSON to validate JSON is not my idea of simplicity. Will have to look at the alternatives at some point.


> The main reason this happens in Python is that creating actual datatypes is incredibly clunky

It's not clunky, it's outright impossible. Datatypes are inhabited by compound values (data constructors applied to arguments), but Python simply doesn't have compound values. All it has is object identities, which are primitive and indecomposable values no matter how compound the object is.

Sadly, the same is true in Scala.


This basically repeats the ORM arguments/counter-arguments, but now it's a slightly more complex data structure instead of the DB-row-as-hash/array you get there. "row-driven" in this context often leads to barely wrapped DAO Objects.

On the other hand, sometimes (surprisingly often) a hash is good enough and the effort spent in modeling the database (...) doesn't need to be replicated.

And as with ORMs/SQL generators/DAOs/etc., there's a whole spectrum of solutions and you really have to look at the task to see what's appropriate...


This isn't JSON-driven development, it's just choosing to apply logic over loose-typed data structures instead of named constructs. It's more awkward in Python because it doesn't have sugar syntax to index an object like JavaScript has.

But using clean built-in data structures instead of named types has its benefits especially if you need to serialize for persistence of communication as it doesn't require any additional knowledge of Types in order to access serialized data, so you can happily consume data structures in separate processes without the additional dependency of an external type system that's coupled and needs to be carried along with your data.

This is why Redux uses vanilla data structures in its store or why JSON has become popular for data interchange, any valid JSON can be converted into a JavaScript object with just `JSON.parse()` which saves a tonne of ceremony and manual effort then the old school way of having to extract data from data formats with poor programatic fit like an XML document into concrete types.

If your data objects don't need to be serialized or accessed outside of the process boundary than there's little benefit to using loose-typed data structures, in which case my preference would be using classes in a static type system to benefit from the static analysis feedback of using Types.


> as it doesn't require any additional knowledge of Types in order to access serialized data

You still need to know the shape of the data you're working with, or you won't get anything useful done. So you can't skip defining types or a format, you're just skipping the tools that help you follow said format.


You only need to know how to access the data you need, not the entire class structure that's coupled to the monolith that created it.


Maybe they use untyped hashes and arrays just because there is no other data structures in JS?


Anemic objects and whether they are harmful or harmless has been debated in software engineering for long.

I find over-relying on encapsulation more harmful than useful nowadays specially if you are going to write scalable software that are inherently distributed. For example, hiding accessing a database behind a simple getter function makes another programmer ignore performance implication and other issues that may arise.


Yes, but OTOH, it lessens the likelyhood of errors, and means you'll have to rewrite minimum amounts of code when you, say, switch from MySQL to Postgres.

Abstraction always lessens awareness of that which is abstracted. Decide where to draw the line for your app.


It sounds to me like these arguments aren't so much against JSON, per se. They're against using JSON.parse() (or json.loads() in Python, json_decode() in PHP, or whatever) as your entire data-import process.

Instead, the argument goes, one should load the JSON, walk the redulting structure, and use it to build your native data structure/objects/whatever. Similarly, when the time comes to save, you crawl through your native structure to build a dict/array/primitive structure, then call JSON.stringify() (or the analogous function) to serialize that.

Uncoupling your data structure from the serialization format, though, is really just basic good software design anyway, is it not? Does anyone argue in favor what this article calls "JSON-driven development" as a design principle? Or is it just a shortcut that developers -and I am no less guilty of this than anyone else- sometimes take in the interest of getting a quick-and-dirty solution out the door?

Yes, working directly on the output of JSON.parse() is a code smell. But I'm not sure that claiming there's a rising trend of "JSON-driven development" is entirely founded. It's just people taking shortcuts.


This. "${PRACTICE}-driven development" suggests a practice that someone actively pursues because of perceived merit rather than a shortcut taken because of time/resource constraints.


With dynamic languages (certainly Python, Ruby, and JS), there's a definite lack of nudge from the tooling to translate from "interchange" format into an internal "smart" format (procedural code + everything is a hash = easy hacks). Whereas with something like Scala, the tools make it very clear that if you serialize/deserialize on the periphery, you're in for a bag of hurt (or, at the very least, fighting with one hand tied behind your back).

This is not to say that one tool is better than the others, but tools do have opinions, and while being more permissive/ambivalent makes throwing together a quick script easier, a tool that nudges you in the direction of building something in a more-sustainable way is useful when building nontrivial systems.


While I see the point the Ulaş is getting at, I wouldn't call this JSON-driven development. I think JSON-driven development would use abstraction layers that are based on JSON, like JSON schema, and perhaps an OOP library that leverages it.

What I'd actually call this problem is a lack of abstraction. In functional programming, simple data structures are often preferred, and composable functions are used to manage complexity. A functional programmer might declare a function `to_structured_dict(enumerable, path)` and call it with `to_structured_dict(book_list, path=('shop_label', 'cell_label, 'book_id', count'))`


And then EVERY programmer would further abstraact that, as you really want the bare minimum of your code depending on your datastructure internals.


Yeah, I was seeing code like that in Java long before the term "JSON" was coined.


If you're in Python, and are afraid of "anemic" objects, I would recommend checking out collections.namedtuple. It's a fantastic lightweight and performant object-like data structure.

You also get a few additional features, such as in-order iteration, the parameters are fixed at run time, and there's a method for turning it into an ordered dictionary (which is serializable in, wait for it, JSON).


If you're not limited to the standard library, https://github.com/hynek/attrs is also worth taking a look at.


I just found out about it in this article, and already posted a new story (it looks that good):

https://news.ycombinator.com/item?id=12359522


> Once you go dict, you won't go back. This style of development is too easy, since dictionaries are baked into Python, and there are many facilities for working effectively with them.

How is this an argument against using dictionaries?

After 10 years of Python development, I do find myself using dictionaries rather than objects, in just the way that the author proscribes, but I'm finding it to be a genuine pleasure.


REPENT!!! :-)

Yeah, get shit done, YAGNI, and all that.

If needs wrapper, make one for repeated access types. If needs one or more bonafide objects for passing around, updating, and general good behavior, the make them as needed. Otherwise, there are 15 other little jobs that need coding, and we gotta move on.


And what happens when one of those keys changes?


You change your code. Loose coupling a nice goal to aim for, but at the end of the day, somewhere deep down inside the code, you have to tightly couple to actually get anything done. Where that transition occurs is entirely programmer's discretion.


But there's a difference between changing it once when you serialize/deserialize it, and changing it every time you try to access the key.


Writing the code with the assumption that a key will change is on the same level with premature optimization, in my opinion.

It might be reasonable to make the assumption for some keys. In these cases a function taking the data and key as arguments and returning the value is sufficient and appropriate.

Frankly, if the code base is so littered with references to that specific data and key combination it might actually indicate a poor design.


"Writing the code with the assumption that a key will change is on the same level with premature optimization, in my opinion."

Considering how easy it is, and how often I've had keys change on me, I have to strongly disagree.

"Frankly, if the code base is so littered with references to that specific data and key combination it might actually indicate a poor design."

That's kinda the point of the article.


A single constant will fix that problem.


Or even just a search and replace...


I think this article is somewhat off-base. The problem isn't JSON, it's lack of respect for separation of duties. JSON is just a data exchange format.

Want to program in an OO way using JSON? Easy. Just build a factory to generate objects from JSON input. Put your validation and error handling right there. Now you can get a known valid object from the JSON, a class instance with all the encapsulation and business logic your heart desires. Need to share it with the outside world? Provide a JSON output method.

Translating data formats is at the heart of day-to-day programming. It ain't rocket surgery. Fix the problem, not the blame.

(And if you think JSON sucks, believe me, you never dealt with data file formats from the pre-XML days!)


The author isn't saying the problem is JSON.

The same programming style exists quite a bit in older PHP code as well. This is because one of it's primary data types is an list/hashtable hybrid. And JSON is similar -- it promotes those same structural types; the list and the hashtable (array and object, respectively). So programmers are using them, not just for building structures for data interchange, but for actual programming logic.

The fix for the problem is just education.


There is so much wrong with this blog post that I don't even know where to begin. He appears to have included Python-specific details in his list of why he hates lists and dictionaries. Apparently Python throws exceptions if a key doesn't exist and seemingly has no Maybe/Option alternative? I don't know if that is true or not.

He claims using lists and dictionaries means you lose encapsulation - does it? A smarter programmer would realise that actually it entirely depends on the _types_ you are storing in those data structures.


The rule of thumb I've always used for when to use OO is "will there be more than one extant object at once or not?" If yes, and especially if these objects need real behavior, then use OO.

If you're essentially going through one object at a time, then discarding them, you're may just be doing conduit data processing, and so there's little advantage to using objects. I think what's missing in this (well-written) analysis is this distinction; if you're slurping data from one place, making a few changes (or especially if you're not making any), then sticking into a DB or vice versa, OO may be the wrong choice.

Ask yourself while writing the code: "are these active, behavior-driven objects that need encapsulation and relatively sophisticated behaviors, or is this just data I'm doing some relatively simple processing on?"


The author has lots of good points. Because I write most of my Python in the style he is advising against, I recognize that style has issues. The main issue for me is that a dict of dicts of dicts is not an interface. It doesn't have any constraints. It doesn't communicate expectations for use for the actual intent of the code. The best you can do is a comment explaining what to expect and a lot of error checking.

That said, almost all of the python I write these days is in the form of functional transforms on built-in data structures. And I love it!

There was a great Pycon2012 talk titled "Stop Writing Classes". You can find it linked and discussed here https://news.ycombinator.com/item?id=3717715



On the one hand I agree with OP on directly interacting with JSON is not really a good idea but on the other hand I completely disagree with that behavior should be shoved into data objects. Also I think part of the problem is Python doesn't have much typing (I know they recently added optional typing in python but I don't think many use it).

As more of an FP guy I'm firm believer of the separation behavior and data. Clojure's Hickey sort of has a valid point... its freaking data... stop making it complicated to access it.


I'm surprised no-one has linked Steve Yegge's Universal Design Pattern – http://steve-yegge.blogspot.co.uk/2008/10/universal-design-p...

It argues that loosely defined objects are an excellent design pattern, but I'm too tired to decide if it is directly relevant to this.


It's called a hash. Or a dict. Or a map. Not JavaScript Object Notation, FFS.


At this point everyone should be using an evolvable (thrift, protocol buffers, avro, etc) schema format when they are storing or transmitting their data if they want to run an always on service - there is no downtime for migrations in the real world. Trying to do this ad-hoc with JSON is a lost cause and will eventually lead you to failure at runtime or worse, data loss situations.


JSON isn't un-evolvable. In fact, thrift can serialize to JSON.

What makes thrift evolvable in practice is that we don't remove fields and don't add mandatory fields. The same discipline can be applied to JSON definitions.

Well thrift also tags all fields with integers, so a consumer with an older schema can parse a record with a newer schema, skipping the new fields. Of course JSON trivially has this property.

Maybe the key here is "ad-hoc"; something like JSON-schema is needed.


Yep, I mentioned below that using JSON as a serialization format is fine but you still need to specify a schema and understand what happens when you read data written by newer/older code.


Anyone have a good blog post handy on this?


This is a related post that talks about the entity store that I built for my startup that was ultimately acquired by Twitter. They internally had their own similar store called ThriftStore that worked similarly. Google also builds their systems like this using protocol buffers. It is a pretty easy pattern and could theoretically be done with JSON if you provide some kind of schema and evolution strategy on read.

https://javarants.com/havrobase-a-searchable-evolvable-entit...


At least lists, dictionaries map relatively well to a tabular (SQL) format. Objects don't map well at all! Anyone who's spent enough time with "mature" ORMs knows this. Especially when there's a deadline and you have to write "native" SQL just to get whatever the hell you needed in the first place. "Well maybe you should have read everything and understood the ORM to its most minute detail..." NO! That's the whole point of abstraction! If I understood everything about that code, I'd be better off re-writing it to better suit MY specific problem. Look, I don't want to be another OO basher. OO definitely has a place in complex systems like game development, where the lives of the objects are longer than a page refresh. But in web dev, its becoming increasingly obvious to me that the OO paradigm is a huge time suck. /rant


I feel like we have this discussion at work daily involving nhibernate. It is an abstraction that makes 80% of work quicker and easier, but what it makes easier and cleaner would have been trivial anyways.


Lots of religion in this thread!

I think the point is, if json is your data exchange format it could be bad if you let that structure propagate into your application.

so in general you should prefer :-

json => <chosen languages best form for dealing with data>

over json => <chosen languages tools for dealing with json>

Different languages are going to have different mechanisms. Some languages you may abstract from json completely, some languages may natively deal with json, and your persistence layer may deal with json also.

So what you need is a well considered design that takes advantage of your chosen languages philosophy / mechanics, whatever that may be. There is no one way to design anything. The thing to avoid is - not working out how to structure your code to make things easy / appropriate to the task at hand. That path leads to messy code.


I find that code that uses dictionaries a lot ends up with mysterious unnamed types used in various places throughout the system disguised as dictionaries. They have required fields, and must interacted with using business logic that is not obvious. This becomes a real problem when the original authors of the system are gone, and new maintainers have taken over and have to implement new features.

By using objects, or lightweight objects like namedtuple, which have already been mentioned in other comments a bunch, you formally document the data-structure. You give it a name, expected fields, and required behaviours when interacting with it. The code becomes much easier to follow and understand clearly. Bugs don't creep in when a new maintainer forgets about the mysterious undocumented required business logic.


For crying out loud, you don't have to build an object hierarchy around the thing (although it would make sense to in this case), but at least have the common sense, or the sense of shame, to abstract away data lookups into separate functions. That's data structure abstraction 101.


I take a couple of issues with the author here. I don't personally worry much about breaking away from strict OOP. When a pattern like this develops it is usually because the data is too dynamic for a static property list. An obvious example is for creating reports. No one is going to sit down and hand design every single multi column report in a large project (I tell a lie, people do, it just makes the code base a horror show). By letting the data be more dynamic (Usually with JSON) it is trivial to create generic report structures and populate them.

Additionally, if you are using NoSQL as a backing store then solutions like class serialization don't make any sense since you will need to communicate in JSON anyway.


The code in the example has poor encapsulation, but I do not think it's the "JSON" style that causes that.

Much OOP code includes a hodgepodge of exposed internal state and methods that offer a combination of derived state and behavior to mutate that state.

Often, using data literals (like JSON) can make code clearer by making it explicit what is going on with state (when/if it is being mutated), and making the system easier to snapshot, test, etc.

While much code that uses JSON-like constructs is overly verbose and error prone, adding a bit of structural typing (with Flow) or creating schemas to ensure system invariants (jsonschema) can lead to a system that is easy to reason about and maintain.


Use GraphQL or Really stick with RESTful routes. The more predictable the schema of these Dictionaries/hashes/JSON are the less likely you are to see that mess above. This is true whether you are using a FP approach or an OO approach. Using an Imperative coding style when doing ETL will always be hairy.

That function also violates the Single responsibility principle. I wouldn't even know where to begin to write a unit test for that other than breaking it down into smaller parts. There are design patterns that could be followed in dynamically typed languages that would avoid that mess altogether other than just OO.


Coupling your code to the JSON you receive over the web can lead to some interesting problems. If the system on the other end decides to make some change you are not expecting, it can lead to errors.

In JavaScript, a simple thing that helps is to use lodash.get and provide a path to the property you are wanting.

  lodash.get(someObject, 'path.to.a.property')
If the path isn't there, the lodash.get returns undefined. This is much nicer than getting the error "Cannot read property 'to' of undefined" when "path" isn't there.


Yeah, this is currently pretty bad in Python, which led me to create the jsane library:

https://pypi.python.org/pypi/jsane

>>> j = jsane.loads('{"foo": {"bar": {"baz": ["well", "hello", "there"]}}}')

>>> j.foo.bar.baz[1].r()

u'hello'


Simple rule of thumb: Don't Repeat Yourself. If part of the data structure is accessed in multiple places, create a wrapper routine (or even an object/class) around it. Otherwise, if it's in one place, perhaps "You Aren't Gonna Need It".


Yes, any useful technology will get over-applied, at the detriment of better ways to do things.

Not the fault of the technology, but of the developer who failed to consider alternate ways of accomplishing the same task.


JSON is best when it's solely used for serialization (or config files).

Using it deep into the project makes no sense, the first step in handling JSON should always be to code it into native data structures.


The point made here seems to be that often, these native data structures are dicts and lists, because that is JSON. Meanwhile, you'd often want something else when looking at only the internals.


First step is validation, then conversion, then logic. This keeps the json structure errors and changes from affecting the business logic.


I think the "anemic objects[domain model]" is a red herring in this case. It would be much cleaner to create separate serialized and deserializers that convert your actual domain models to JSON and back. By the time you are doing something as shown in his example like building a book inventory it should be all proper objects and no primitives dictated by what should be the serialization layer.

Edit: Fixing phone auto correct typo - "property objects" -> "proper objects"


I actually saw this coding style way before JSON, for example at Apple. I even jokingly created DUKE: Developers United against Keyed Everything.

Another place I see this is in the eternal dynamic/typing debate. A lot of the criticism of dynamic typing will be with examples from JavaScript, Ruby, Python and maybe even PHP. Hardly ever from Smalltalk (or Objective-C), because Smalltalk code tends to not have the types of problems cited. This puzzled me for a while, because I also find these languages somewhat less "solid", yet couldn't quite put my finger on why.h

That is, until I realised that all of these languages use hashes as their basic object representation. Coincidence? I think not. So I coined the term "hash language" for these languages, both because they are hash-based and it appears to be easy to make a hash of things in them, possibly for precisely that reason.

That said, I think it's also a mistake to disregard the power this sort of very generic programming brings, especially once you consider objects composed of multiple facets that are interpreted in different contexts.

IMNSHO, the way to combat hash-programming is to provide powerful and convenient metaprogramming facilities for object representation, so dealing with objects generically is just as easy and obvious as dealing with dictionaries.

Not entirely surprisingly, my own language ( http://objective.st ) has some facilities for this, mostly by making identifiers into first class entities. More research needed ;-)


Is the OP familiar with Object.keys()?

You don't have to hard-code explicit dot notation into your code when you're processing JSON or any other hierarchical object serialization format, which is what JSON is.

If you want to make your code more robust, you should process the structure of the JSON document and infer meaning from its keys based upon your position in the tree and of the values of the key names that are meaningful to your application.

This makes it possible to accept any kind of JSON, even if the original format changes, and you won't get uncaught exceptions and your application can decide what to do in a more graceful manner.

You should also centralize the code that is responsible for serializing and deserializing your JSON wire format and creating objects. There's no reason to have ad-hoc code in each object constructor like his example. A good example of such a thing is dnode https://www.npmjs.com/package/dnode. It handles all the JSON abstraction (in this case for RPC) and you don't even need to worry about the JSON ever again.

This has nothing to do with JSON and more to do with poor design and tight coupling of interfaces.


I may be missing something, so I'd appreciate a correction, but why all that effort when you can use collections.namedtuple and a custom object_hook for json.loads?

    import json
    from collections import namedtuple
    
    data = '{JSON string goes here}'
    fancy_data = json.loads(data, object_hook=lambda d: namedtuple('X', d.keys())(*d.values()))


I agree with this and I've raised it several times in Objective-C codebases that can some times end up littered with NSDictionary:s everywhere taking no advantage of the Objective-C type mechanics. I don't think it's necessarily as bad in python or javascript. This is because these languages are dynamically typed and that diminishes the benefits of deserializing json to a native model. It's still valuable because you can guard against bad data by rejecting at the boundary of your program.

In statically typed languages however the added bonus is a lot more significant because the type system increases the benefits of deserializing JSON to native models. Take this Swift example

    import Foundation
    
    enum SerializationError: ErrorType {
        case InvalidData
    }
    
    struct Thing {
        let a: Int
        let b: String
        
        static func deserialize(fromDictionary data: [String:AnyObject]) throws -> Thing {
            guard let a = data["a"] as? Int, let b = data["b"] as? String else {
                throw SerializationError.InvalidData
            }
            
            return Thing(a: a, b: b)
        }
        
        static func deserialize(fromArray data: [[String: AnyObject]]) -> [Thing] {
            return data.flatMap {
                try? deserialize(fromDictionary: $0)
            }
        }
    }
    
    let data: [[String: AnyObject]] = [
        [
            "a": 10 as NSNumber,
            "b": "Hello" as NSString
        ],
        [
            "a": "10" as NSString,
            "b": 10 as NSNumber
        ]
    ]
    
    let models = Thing.deserialize(fromArray: data)
Not only do you end up with a native array of models you can also be certain that any type information is correct because invalid results have been thrown away during parsing.


It's good to use lots of NSDictionary's in Objective C, in my opinion. The alternative is to create lots of different object types that takes much more code, and very little benefit for that extra code. If you're just shifting data around then there's no need to define objects for them.


This is what keeps dragging me back to moose & perl5. You describe the attributes of a class, the constructor for that class is created for you, and you can pass in hashes and it will automatically instantiate (and fail if the rules you have set for attributes are not met).

I've found that you can kinda sorta do the same with other languages (python/ruby/javascript) by writing static factory builders inside the class that do this checking for you and raise an exception or return an object for you, but it still doesn't compare to me to Moose/Moose::Util::TypeConstraints::coerce/subtype and attributes with the coerce option set. It makes it so easy to coerce a deep json object into a deep class structure.

I always try to hunt down things that are similar in other languages (python allows named arguments from a dict, IIRC, and that allows similar things, but you still have to write the constructor yourself), but I've yet to find something that makes it as simple.


If he used tuples as keys to the dict, he wouldn't have this absurd code and the article wouldn't have been written


And this is where we end up without easy to use and well supported schemas...

If this was XML, you'd write a very simple RELAX NG grammar (use the compact syntax: http://relaxng.org/compact-tutorial-20030326.html ) that describes the structure of the incoming data, then use it to validate the input data before processing it.

After that, you know data is valid and in the right structure, so you can throw away most of the "is this in the right place?" checks.

JSON and YAML's various schema implementations can't hold a candle to this, and it's been around for over a decade.

The XML ecosystem does have some very bad parts, but it's not all bad, so it's worth learning from places where it actually works well.


In Java land, swagger + dropwizard validation do a rather good job of this with json. I wouldn't be excited to return to using soap/xml all the time.


This is space that the Clojure community has already visited.


And, somewhat, tamed using tools like get-in and update-in.

Clojure.spec is looking promising as a route for making this general style more robust without losing the ease and flexibility.


And every other Lisp community.


And what was their outcome?


heat death.


I agree that the supplied code is improvable. Consider this:

  def set_r(adict, keypath, val):
    key = keypath[0]
    if len(keypath) == 1:
      adict[key] = val
      return
    if not key in adict:
      adict[key] = {}
    set_r(adict[key], keypath[1:], val)

  def build_book_inventory(book_ids, shops):
    shop_labels = [shop['label'] for shop in shops]
    books = Persistency_books_table_read(
      shop_labels=shop_labels,
      book_ids=book_ids)
    inventory = {}
    keys = 'shop_label cell_label book_id'.split()
    for book in book_list:
      keypath = [book[k] for k in keys]
      set_r(inventory, keypath, book['count']
    return inventory
First, the author clearly needed "autovivification" as supplied by Perl. We supply a substitute with set_r().

Second, I'd avoid creating local variables like "book_id". It creates mess. We never had the slightest interest in the book_id; it's just part of the wine we are pouring from one bottle into another.

Third, I've preserved (modulo names) the interface of this function but I suspect the surrounding code could also be improved. Also call a list of books "books", not book_list; list is the assumed sequence container in Python. "books=book_ids" is unfortunate; to thrive in a weakly typed language we need variable names that distinguish objects from ids.

Larger point: the author wants to create classes for the various business objects, which is a common enough pattern, but ultimately just makes extra work and redundant lines of code. A relational database can handle a wide variety of objects, with some knowledge of their semantics, without any custom code per-class.

As you know, the difference between dicts and objects in python is mostly semantic sugar. We can easily enough make a class that gives dot-notation access to values in a dict, if one objects to the noisiness of foo['bar'].

If you want to enforce object schema at system boundaries, there are better ways (more compact, expressive and maintainable) than writing elaborate "classes" for each type of object.


More generally, using data structures well-suited to your problem is really important but often underappreciated in software engineering. Elegant code and algorithms naturally follow.

In the example in the article, OO seems like a good way to go.


From an OO perspective I can completely agree with this and the encode/decode pattern the author is suggesting seems to be the right way to deal with this problem as it will also hopefully translate in having an esperanto data type which can be used for all sorts of APIs and formats.

But from a functional perspective I would disagree. The code example wouldn't even make sense in that world. You would query it as is and compare it with other data or create new structures as with any other data you handle in your language.


I agree 100% with the article.

I've just written a script which makes every one of the mistakes listed in the article. I am consuming a JSON based API in a long ugly mash of code, exactly as described. It doesn't look pretty.

In my defence, I wrote the code as I was trying to understand the API. I had not read ahead and didn't really know which API's I would need or how fiddly it would be to bring it all together.

I am now exposed to pain if the API changes or if anything breaks. Time to go back and tidy up the code!


I went back and fixed the script to turn a portion of the JSON API into a set of useful Python objects. The code looks mostly OK now.

It's no surprise that it took about the same amount of time again to tidy it up. Apart from making me feel better, and a promise of less work in the future for updates, it's hard to justify the extra effort.


JSON is the equivalent of a UNIX pipe; it is for passing data between applications on the Internet, just like a pipe is a way for passing data between applications on a machine.


In PHP we call it "Array Oriented Programming" ( http://www.epixa.com/2012/04/array-oriented-programming.html ). So looks like Python developers finally discovered this paradigm too. Let's wait for JS developers now.

By the way, PHP has type hints for functions that help to understand what are the types of arguments and the return type.


Pretty much statically typed vs dinamically-typed discussion. It depends on your particular case, if there is a sufficiently defined schema or not


I don't see how this is related to statically typed vs dynamically typed. This is more a case of Primitive Obsession http://c2.com/cgi/wiki?PrimitiveObsession and a lack of a adapter/serialization layer.


I meant, typed vs not-typed. Someone mentioned Clojure, and for me thats the whole point: on one hand, and for a whole category of problems the best abstraction is not to have a strong schema, given the variable parts. On the other hand however there are scenarios where you already know the structure sufficiently well. A matter of abstraction IMO


Just an aside, I'm not sure I agree with the title of "JSON-driven" development.

The problem being described is about using parts of a language in an inefficient or ineffective way of solving a problem.

The solution to this particular problem can be summed up in concepts like encapsulation or DRY. More to the point, let's not blame components like JSON for basically poor implementation.


I prefer XML + schema (for which I use RelaxNG + jing) because this makes validating the input very straightforward and normalizes that process. But I understand the appeal of JSON and use it for some APIs myself. What experiences do people have with JSON.net Schema? Does anyone know of a json schema + validator system that is cross-language?


There's no particular reason objects can't be serialized as JSON and deserialized back into strongly typed objects.


Thankfully there isn't really JSON driven development as a methodology. Its just a bit of a crappy pattern (for many tasks) that far too many people use. Hopefully no one is advocating it as the way to develop software.


I like to wrap relevant objects around a type, that is responsible for creating, validating, serializing, deserializing and mutating that object.

But taking a JSON and passing it around, with no ownership, no predictability... it's the mindset of the tech debt programmer.


I think the general rule is "validate anything coming in from the internet, ever."


I couldn't agree more. My life these days consists 90% of marshalling json around. Hooray for microservices.


So we just need references? Let's all switch to XML (just kidding) or YAML (maybe not kidding)!


Although I coincidentally mostly agree with the conclusion "encode on entry and decode on exit" I disagree substantially with the rest of the article.

> I know that it's now en vogue to sneer at OO

I don't sneer at it. OO has been the dominant coding religion for most of my career and I rage against the educators and propagandists who spend a decade smothering everyone with it. I curse them for all the time I wasted trying to build classes for my data only to realize that if I had used a simple dictionary or list my code would be shorter, simpler, more robust, and more flexible.

The logic in the article above is all premised on object-oriented religion. OO for OO's sake because OO. Using the power of dictionaries is bad, using the power of objects is good.

> It completely defeats object orientation.

You could just as easily say using objects defeats the point of having lists and dicts.

> It offers nothing of the abstraction powers of object orientation.

Most of the abstraction power of object orientation happens when you create the methods. If you aren't defining new classes, you write functions instead. You still have abstractions, what you don't have is the encapsulation of code with data.

> It doesn't say what it's doing.

Sure it does, and better yet since you are using the standard data types it will be said in a language that any other Python developer is likely understand immediately.

> The above code is filled with auxiliary logic that has nothing to do with what it actually tries to achieve.

The above code is filled with auxiliary logic because the author of it apparently didn't write any useful functions for operating on dictionaries.

I'm not sure that any of the auxiliary logic in that code has anything to do with the choice to use dictionaries. It's a standard data munging problem that comes from having data from different sources. You have the exact same problem with objects if you didn't write any useful methods for operating on them.

> but done right, it can be very powerful, especially in big and complex codebases.

And in my opinion, the right way to do Object-Orientation in Python is to not do it until you really know you need it: your code is heading towards big and complex and you need to lock it down and organize the data and methods into well-encapsulated classes. (Although maybe at that point you realize it's not a big deal and don't bother)

Designing around lists and dicts from the start is a much more flexible strategy than trying to get all the encapsulation exactly right on the first try. If you don't have lots of time to spend up front UML'ing an object heirarchy for your big, complex application, you're probably better off sketching and iterating with JSON in mind (and yaml if humans need to edit it). As your application takes shape, it will become apparent where it makes sense to lock down functions and data into objects.

This is especially true given that all of Python's standard types are inheritable classes.


Arguments for dict/list driven development:

NOTE: This isn't JSON driven-development, JSON just mimics base types like dict, list, string, numeric etc of all languages and a big reason it is so common especially in Python/Javascript.

- Large unknown lists/dictionaries data structures can be deserialized into dicts/lists for any language without issue.

Sometimes keys/data are unknown such as large attribute data sets that may always have new keys. In that case strong typing to an OO object will always be broken. Example: a facebook attribute set, keys/data that aren't set will not appear, new ones are added all the time, which would create a cat/mouse serialization/deserialization game. Same problem a binary structure has (offsets) when you really need a flexible keyed structure.

One missing key doesn't break your whole serialization/deserialization system built with strongly typed OO. Validation can be done on accept and if necessary convert into an OO system.

- If needed, classes that are backed/extended/inherited by a dict/list or set (or composition) that can load in the JSON/dict/lists and only expose the needed values after validation are useful.

i.e. a class that inherits from or composes to Dictionary<string,object> for instance in C# would only fill keys that are necessary for the view data, not a bunch of extra null fields because it might not have that key/property. It also has the ability to deserialize objects that may have new keys. Not everything is a perfect world where data structures are known before-hand.

- It reduces complexity many times, no need for an OO serialize/deserialize layer when you are passing back as basic dict/list or JSON.

Why add complexity to something simple?

- Unless you control the server and the client, real-world data structures aren't a perfect map of keys/values to OO properties.

Assuming that is a system ready to break in the real-world many times. Someone adds a field to the DB object then all clients that use it can't serialize/deserialize. Real world serialization/deserialization has to accept in basic types dict/list, validate and then use as needed (some go to OO objects behind scenes). I see too many systems where people just have an EF object and expose that over a web api and just expect it to work, that is a bad example of poor encapsulation. Some fields don't need to be serialized to public apis. In Microsoft land MVVM was created to help stop this practice but still creates two sets of OO objects and breaks on any new keys/data (though breaking here may be desired for strong typing).

- Dict/list data structures can be easily setup to have cleaner naming and keys without tons of attribute/helpers

i.e. first-name key instead of first_name, FirstName, or firstName. This is more friendly to web/url naming that is common.

- Noted in the article, less memory used in many cases and highly optimized time in basic lists/dicts.

There are many more reasons...

Dicts, lists and basic string, numeric types are the base of all languages and computer science types. The reason this is common is it is simple to work with these types without added cruft of OO when needed.

OO does add complexity and if it isn't necessary you are just upping complexity for no reason (and memory). It is similar to the complaints of C coders to C++, basic structs and sets are sometimes less complex than C++ OO objects. Same thing with dict/list of some monstrosity of an OO serialization/deserialization system that breaks on every new key or field and you have to update the server and client rather than just increment a version and validation. The longer you code the more you see this.

OO objects should not be used all the time just like dict/lists shouldn't be used all the time.


Completely agree.


> If an object is just there to store data and no behaviour, then that's fine - don't add behaviour if it doesn't need it.

In that case, you want values rather than objects. Alas, Python doesn't have compound values.


I'm not sure what the dynamic is behind this weird flamewar, but it's definitely not the sort of discussion we want on HN, and your comments seem, presumably unintentionally, to have trolling effects. Please don't do this here.

We detached this subthread from https://news.ycombinator.com/item?id=12358875 and marked it off-topic.


> and your comments seem, presumably unintentionally, to have trolling effects.

I have no idea why stating facts would constitute “trolling”. But, anyway.


It's entirely possible to communicate other things while - first order - only stating uncontroversial facts. What, you don't think your shop is nice? You don't think it would be a shame if something happened to it?

Given that, it's also entirely possible for people to perceive other things as communicated in cases where there's maybe not the intent.


I agree that it's strange, but the discussion has gone badly, and it has something to do with how you engage people. That's what I mean by unintentional trolling.


As tantalor reminds us in a sibling comment, the named tuple works nicely in this role.


All you need to do is use the `is` operator to see how even tuples (named or otherwise) are objects, not values.


> All you need to do is use the `is` operator to see how even tuples (named or otherwise) are objects, not values.

That is not language feature of Python but an implementation detail. Python implementations are permitted (but not required) to intern any immutable value (which tuples, contrary to your description, are), and "is" is a mechanism for revealing what the implementation has done.

You seem to be equivocating between talk of values as a logical thing (where Python absolutely has compound values, including tuples) and as an implementation feature (where individual Python implementations may or may not implement certain logical values through interning.)


That sounds like basically the same newbie-pitfall that exists with code-literal Strings in Java: Just because a certain comparison operation can work in some circumstances (where the underlying platform makes an optimization) doesn't mean is a safe/sane choice in general.


> That is not language feature of Python but an implementation detail.

So you're saying that the behavior of the `is` operator is implementation-defined?

> values as a logical thing (where Python absolutely has compound values, including tuples)

The (informal, unspecified) metalanguage that you're using to reason about Python programs has compound values. Python itself doesn't.


> So you're saying that the behavior of the `is` operator is implementation-defined?

No, the behavior has a standard definition: it reveals whether the operands refer to the same in-memory construct.

Whether immutable values are stored in the same in-memory construct is, however, AIUI, implementation dependent.


> Whether immutable values are stored in the same in-memory construct is, however, AIUI, implementation dependent.

If I can't bind it to a variable, it isn't a value. You can't bind the list [1,2,3] to a variable, because Python has no such thing as the list [1,2,3].


I don't know why we are talking about lists, here, since (implementation details aside) lists in Python aren't conceptually values; for one thing, they aren't even immutable.

If you meant not to change the subject from tuples, yes, its true that whether the tuple (1,2,3) -- which is logically a value -- has a unique in-memory representation is not guaranteed at the language level in Python (and, in fact, it does not in the most common implementation.)


Err, sorry, yes, pretend I said “the tuple (1,2,3)”.

Regarding in-memory representations, the whole point to using values is that you don't care about the representation. A value may be represented in a myriad different ways, but from within the language (as opposed to, say, using a memory debugger), you can't observe the difference. If you're allowed to probe differences between two representations of the same value, the value abstraction is leaky.


That's a useless distinction. Everything is an object in Python. Does that mean Python doesn't have values?


It has primitive values:

(0) small enough numbers

(1) True, False, None, etc.

(2) object references

But it doesn't have compound values.

And the distinction isn't useless. Values have a richer equational theory than objects, enabling lots of automatic optimizations.


...And outside FP, nobody cares, or uses that definition of value. As noted in my above post.


It's useful when writing Prolog programs too.


Logic programming's actually pretty close to FP.


Not really. Prolog is first-order. Functional languages, just like object-oriented ones, are higher-order.


"Close," not "is."


[flagged]


Don't mock me. And yes, I did mean the notion of mathematical value and variable. That's one of the core tenets of both.


[flagged]


They are somewhat close in paradigm: They both favor declarativism, and have mathematical values. Given, they're pretty far apart in paradigm, but those are some strong similarities.


> declarativism

I'm not familiar with that term. Could you give a rigorous definition?

Anyway, after some googling, I found a very plausible definition that makes functional programming not a declarative paradigm: http://semantic-domain.blogspot.com/2013/07/what-declarative...


While that's a good point, the namedtuple is value-like enough to address the "anemic objects" problem.

Could you expand on the advantages of this:

    (a, b, c) is (a, b, c)
being True instead of False?


The runtime system's memory manager could automatically hash-cons equal compound values, reducing their memory footprint. It's like the flyweight pattern, except the runtime system gives it to you for free.


I agree that's nice for memory use. It's totally consistent with the immutability of tuples, too. I suppose Python has elected to take the memory hit to reduce complexity in the interpreter?


As things stand now, the main reason why Python can't do it is because it could potentially break programs.


> As things stand now, the main reason why Python can't do it is because it could potentially break programs.

Since the documented behavior of Python is compatible with what is suggested as a change, any program that relies on the behavior not reflecting as described in the "change" is asking to be broken (and quite possibly already broken across different implementations -- including different versions of the same implementation.) Immutable types in python already are defined with the semantics of values, in that "for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed." [0]

[0] https://docs.python.org/2/reference/datamodel.html


So, which is the case?

(0) Python provides a compound value abstraction, but this abstraction leaks, because we can distinguish between multiple representations of the same compound value.

(1) Python doesn't provide a compound value abstraction.

With a few exceptions (e.g., garbage collection, which pretends memory is infinite, even though it's not), I tend not to consider so-called “leaky abstractions” actual abstractions, but if you know of a good reason to do otherwise, please tell me.


Python provides a value abstraction that includes (but is not distinct for) compound values.

It also provides an independent mechanism for examining physical (rather than logical) identity -- which is object identity; for all compound values (and all but a very narrow subset of simple values) there is no guarantee that the logical identity of values is equivalent to physical/object identity.

I don't see that this makes the value abstraction "leaky", though; for values, value identity is tested by equality. For objects, equality tests object equality but not object identity. Because Python is an "everything is an object" language, you can also test object identity for values, but for values in general there is no defined relationship between object identity and value identity, so this has no defined general logical (as opposed to physical) meaning for value types. (There are some values which are defined to also be singleton objects so that value and object identity are equivalent for these values; but that's a feature not of the value abstraction in Python but of the value-object mapping in Python.)

So I don't see the value abstraction itself as leaky.

OTOH, I'd probably be happier if Python had (whether it was "is" or something else) a clean logical identity test operator.


> I'd probably be happier if Python had (whether it was "is" or something else) a clean logical identity test operator.

Isn't that what == is?


> > I'd probably be happier if Python had (whether it was "is" or something else) a clean logical identity test operator.

> Isn't that what == is?

Not really.

"is" is logical identity for mutable objects (where logical and storage identity are equivalent), and "==" is logical identity for values (where that is equivalent to structural equivalence) but (only) structural equivalence for mutable objects.

What I was thinking of is an operator that would be logical identity for both values and mutable objects.


Somehow Python lets you distinguish “logically identical” things, though.


If you're interested in properties other than logical identity, yes.


By definition identity checking is the finest-grained distinction you can make between things. If two things are identical, they can't possibly be different in any way. This is what philosophers call the “indiscernibility of identicals”, and, if it sounds like a tautology, it's because it is!


> By definition identity checking is the finest-grained distinction you can make between things.

Yes, but there are different identities. "is" implements storage identity checking between python objects (a term broader than the sense in which you use "objects", which includes both what you call "objects" and representations of values).

Logical identity checking between Python values (as well as object equivalence, but not identity, checking between Python "objects" in the sense you use the term) is done by "==".

For most python value representations, the relationship between storage identity of the representation and logical identity of the value is undefined.


> which includes both what you call "objects" and representations of values

This is exactly what I'm saying is wrong: branching on the representations of values.

> For most python value representations, the relationship between storage identity of the representation and logical identity of the value is undefined.

If it's undefined, then how come I can query it?


> This is exactly what I'm saying is wrong: branching on the representations of values.

Well, yes, its wrong in that (except in the cases where value identity and representation storage identity are guaranteed to be equivalent) you generally shouldn't do it (the exception being if you are building code to do something oomphaloskeptic where the purpose of code is to answer questions about what its own implementation is doing.)

But the fact that doing that is possible in Python does not change the fact that Python does, in fact, support values (including compound values) with value-oriented semantics.

> If it's undefined, then how come I can query it?

The relation between physical identity of the storage representations and logical identity of the values they represent is undefined by the language specification.

At runtime, every representation of a value has some storage identity, and you can query the relationship between that and the storage identity of another representation of a value. But the answer you get has no guaranteed correlation to whether the values represented by those representations are the same value (which you can also query.)


> Well, yes, its wrong in that (except in the cases where value identity and representation storage identity are guaranteed to be equivalent) you generally shouldn't do it

A language's semantics doesn't tell me what I “should” do. It tells me what I can do, and what other people who write code that interacts with mine can do.

> At runtime, every representation of a value has some storage identity

Not in the semantics of the source language. Value representations are purely an implementation artifact.


> By definition identity checking is the finest-grained distinction you can make between things

By your definition, perhaps. The rest of us don't always use the word "identity" with this meaning. And since you don't own the English language, you don't get to tell the rest of us how to use words.

> If two things are identical, they can't possibly be different in any way.

Then by your definition, two Python tuples (1, 2, 3) are not identical. Which says absolutely nothing about how I can write programs using them, or what concepts of "identity" the operators in those programs can implement.

> This is what philosophers call...

We're talking about programming here, not philosophy.


> Which says absolutely nothing about how I can write programs using them,

I can write a program that treats all Python objects identically, by simply doing nothing. That doesn't make all Python objects actually identical.

> We're talking about programming here, not philosophy.

Programming is applied logic, which is a branch of philosophy.


> I can write a program that treats all Python objects identically, by simply doing nothing. That doesn't make all Python objects actually identical.

Yes, you can. Thank you for proving my point that Python programs don't have to use your definition of identity.

But that's a silly example. Here's an example that's not silly: in an earlier discussion you said dictionary keys have to be values--which would imply that dictionary key lookup must use your concept of "identity", so only two objects that are identical by your definition will behave as identical keys in a dictionary. But two distinct Python tuples (1, 2, 3), which are not identical by your definition, do behave as identical keys in a Python dictionary. In other words, Python dictionary key lookup does not use your concept of "identity".

To everyone else, this means your concept of identity is simply not relevant. To you, it appears to mean that Python is violating Western logic.

> Programming is applied logic, which is a branch of philosophy.

To you, perhaps. Not to me. And not, I suspect, to most of the other programmers in this discussion.


> But two distinct Python tuples (1, 2, 3), which are not identical by your definition, do behave as identical keys in a Python dictionary.

The numbers 2 and 4 behave identically when passed to a function that tests whether its argument is an even number. Are 2 and 4 identical now?

> To you, it appears to mean that Python is violating Western logic.

This is the third time I'm saying I never said Python is violating Western logic. It's like saying I made a machine that violates gravity - it's literally impossible! Python just doesn't have compound values.


> The numbers 2 and 4 behave identically when passed to a function that tests whether its argument is an even number. Are 2 and 4 identical now?

They are with respect to the even/odd property. So if that's the property I'm interested in, they're identical. They might not be if I'm interested in some other property.

In other words, as dragonwriter has already pointed out, there is more than one concept of identity.


> there is more than one concept of identity.

There's only one: Two entities are equal if nothing can distinguish them. This is the “identity of indiscernibles”.

In a language with abstract data types, such as Standard ML, you could define a new type whose internal representation is an integer, but which provides no operations that would distinguish between two even or two odd numbers. But Python doesn't have this.


>There's only one: Two entities are equal if nothing can distinguish them. This is the “identity of indiscernibles”.

This doesn't exist. If you can distinguish past and present, you can distinguish anything by time. An object in one moment is not identical to an object in another moment because the moment changed? No. You have to decide on what invariants you care about to have a consistent notion of identity. Yours is inadequate to do anything, as you demand the whole universe be invariant in all aspects; such a thing is trivial and vacuous. And it surely is not the structure encoded in any usage of the term "identity", as that word has non-trivial structure.


> If you can distinguish past and present, you can distinguish anything by time.

That's the thing: Objects exist in time. Values don't. Values exist in the language's semantics, which is a timeless mathematical object. Does it make sense to ask when the number 2 suddenly came into existence?

> An object in one moment is not identical to an object in another moment because the moment changed? No. You have to decide on what invariants you care about to have a consistent notion of identity.

You're confusing “identity of indiscernibles” with “indiscernibility of identicals”.

And, obviously, you can't use the temporal properties of objects to determine whether their atemporal identities are equal.

> Yours is inadequate to do anything, as you demand the whole universe be invariant in all aspects; such a thing is trivial and vacuous.

Nope. It just requires you to distinguish between things that exist in time and things that exist independently of time.


> There's only one

Maybe to you. Not to me. And not, I suspect, to most of the programmers in this discussion.

> Python doesn't have this.

Sure it does:

  class NerfedInteger(object):
      
      def __init__(self, i):
          self.__i = i
      
      def __repr__(self):
          return "<NerfedInteger({})>".format(self.__i)
      
      def __str__(self):
          return str(self.__i)
      
      @property
      def i(self):
          return self.__i


What you've implemented isn't an abstract type. You can inspect the internal representation anytime, distinguishing between things that were meant never to be distinguished.


This simply doesn't make sense. It violates the indiscernibility of identicals, which is one of the cornerstones of Western logic. I can accept different opinions on several matters (e.g., the extent to which using objects is a good idea), but throwing logic out of the window is just too much.


> It violates the indiscernibility of identicals, which is one of the cornerstones of Western logic.

Oh, please. Python isn't violating any principles of Western logic. It's only violating your idiosyncratic insistence that there can only be one in-memory representation of any immutable value. Python does this for some types but not for others. Yet computers manage to run Python just fine, Western logic notwithstanding.

If you want to argue that Python ought to change its implementation to guarantee that, for example, any two references to the tuple (1, 2, 3) must refer to the same in-memory representation (so the 'is' operator would always return True), because that would save memory, or make the runtime faster, or whatever, that's fine. Then we can talk about the tradeoffs involved, such as increasing complexity in the interpreter code. But trying to claim that Python is violating "one of the cornerstones of Western logic" is just too much.


> It's only violating your idiosyncratic insistence that there can only be one in-memory representation of any immutable value.

I never said this! There can be multiple representations of the same value. For example, the same ordered set may be represented as two distinct red-black trees, balanced differently.

Values are different from their memory representations. Different memory representations of the same value are okay. Branching on the difference is not. Of course, to hide representation differences from users, you need abstract data types [0], and, sadly, Python doesn't have these.

[0] https://www.cs.cmu.edu/~rwh/introsml/modules/sigstruct.htm

> If you want to argue that Python ought to change its implementation to guarantee that, for example, any two references to the tuple (1, 2, 3) must refer to the same in-memory representation (so the 'is' operator would always return True), because that would save memory, or make the runtime faster, or whatever, that's fine.

I never said this either. I said that it should be a valid optimization, not that implementations absolutely have to do it. As things stand now, it's not a valid optimization, because it would break existing programs.

In actuality, I don't care much about the optimization itself. However, it's a good benchmark for assessing whether Python has compound values. Values can be deduplicated, because they may be represented more than once. Objects can't be deduplicated, because by definition an object exists exactly once in memory.

> But trying to claim that Python is violating "one of the cornerstones of Western logic" is just too much.

It seems appropriate to me. But, to be clear, what I was claiming violates the indiscernibility of identicals is dragonwriter's explanation, not Python itself. A simple explanation that doesn't violate the indiscernibility of identicals is to admit that Python doesn't have compound values. Which takes us back to square one [1].

[1] https://news.ycombinator.com/item?id=12358968


> I never said this!

As Rhett Butler said to Scarlett O'Hara, you gave a very good imitation.

> Different memory representations of the same value are okay. Branching on the difference is not.

Branching on a difference other than a difference in logical identity, if you are intending to test for logical identity, is obviously an error. The fix for that in Python is simple: if you want to test for logical identity, use the == operator.

Branching on a difference other than a difference in logical identity, if you're interested in a difference other than a difference in logical identity, is perfectly reasonable. Maybe you've never had occasion to do that in programs you write, but others might.

> I said that it should be a valid optimization

Consider my comment suitably modified. The rest of what I said still stands: there are tradeoffs involved, and reasonable people can differ on where the tradeoff ends up. You have one opinion; the Python developers have another. That doesn't mean the Python developers are violating Western logic.

> what I was claiming violates the indiscernibility of identicals is dragonwriter's explanation, not Python itself. A simple explanation that doesn't violate the indiscernibility of identicals is to admit that Python doesn't have compound values.

And to me this is a distinction without a difference. Nothing is stopping you from defining "compound values" so that a Python tuple isn't a compound value. But nothing is stopping the rest of us from not caring about your definition. Which, as you say, takes us back to square one.

What would be helpful is if you could give some reason why your definition of compound values is so important, other than talking about identity of indiscernibles and violating Western logic. What makes a language that allows compound values, by your definition, better than a language that doesn't? And if your only answer (other than preserving Western logic) is "better runtime efficiency" (less memory, faster operations), then it seems to me you would do much better to make that argument directly, instead of cloaking it with all this talk about "compound values". But maybe that's just me.


> Branching on a difference other than a difference in logical identity, if you're interested in a difference other than a difference in logical identity, is perfectly reasonable.

It's unreasonable to have a finer-grained distinction than logical identity. If two things are the same, they are the same. If they are not, well, they are not. Again, tautologies!

> That doesn't mean the Python developers are violating Western logic.

I never said Python developers are violating Western logic. I said dragonwriter's explanation of how so-called “compound values” work in Python is inconsistent with Western logic. I have an explanation of how Python works that's consistent with it, but it requires rejecting the assumption that Python has compound values.

> Nothing is stopping you from defining "compound values" so that a Python tuple isn't a compound value.

The definition of compound value is simple: a value that you can tear apart into its constituent parts, then reassemble, getting the original value. You can't get the original tuple object by creating a new tuple object with the original tuple's components. The `is` operator will brush the difference on your face.

> What makes a language that allows compound values, by your definition, better than a language that doesn't?

The easiest way to reason about programs that's completely rigorous (i.e., not just testing on a couple sample inputs) is equational reasoning, which is basically showing that two syntactically different expressions evaluate to the same value. (For example, one expression might be an obviously correct but unacceptably inefficient program, and the other expression might be the program you actually intend to deliver.) In practice, realistic problem domain have to be modeled using compound entities (values or objects), so if you want to use equational reasoning, you need compound values.


> It's unreasonable to have a finer-grained distinction than logical identity.

Maybe to you. Not to me. And not, I suspect, to most of the other programmers in this discussion.

> The definition of compound value is simple

Your definition. Why should I care? See below.

> if you want to use equational reasoning, you need compound values.

Ok, so this would mean that, according to you, it is impossible to use equational reasoning with Python programs. Whereas, according to me, it is only impossible to do this with Python programs that use mutable objects (where "mutable" here includes tuples whose slots point to mutable objects). And of course this wouldn't just apply to Python; the general distinction here could be drawn, in principle, in any language. Or even independently of choosing a language.

So I have another tradeoff here: I can use mutable objects, which can make programs easier to write, but then I can't reason rigorously about them; or I can restrict myself to immutable objects (which means that any time I would have mutated an object writing programs the other way, I instead have to construct a new immutable object with the same logical properties that the mutated object would have had), which makes programs harder to write, but allows me to reason rigorously about them.

Can you show me any real-world examples where using the latter programming style has paid dividends?


> Ok, so this would mean that, according to you, it is impossible to use equational reasoning with Python programs.

You can use equational reasoning on the values that Python actually gives you: small numbers, special constants and object references.

> Whereas, according to me, it is only impossible to do this with Python programs that use mutable objects (where "mutable" here includes tuples whose slots point to mutable objects).

You can in principle use equational reasoning on programs that manipulate mutable objects. What you can't do is treat two distinct objects as equal. But, of course, in practice, you can do whatever you want. Whether your reasoning is sound is a-whole-nother matter.

> I instead have to construct a new immutable object with the same logical properties that the mutated object would have had), which makes programs harder to write, but allows me to reason rigorously about them.

No, this would be wrong. You can't use equational reasoning on objects themselves. You can only use equational reasoning on values. The value itself might be a program that manipulates objects, but you can't equate two distinct objects.


> You can't use equational reasoning on objects themselves.

I didn't say you could. I said you could use equational reasoning on programs that only manipulate immutable objects. "Immutable" means that the object's value can never change, so you can substitute the object's value for the object itself everywhere it appears in the program. Then you can apply equational reasoning to the values so obtained.

In the particular case I described, you would end up using equational reasoning on the value transformation implemented by the code that constructs the new immutable object. You would obtain that transformation, as above, by substituting object values for objects in the syntactic statement of the code.

> You can in principle use equational reasoning on programs that manipulate mutable objects.

How do you reason using an object's value if that value can change?


> "Immutable" means that the object's value can never change, so you can substitute the object's value for the object itself everywhere it appears in the program.

No, you can't. Immutable objects still have physical identities. If you want to do equational reasoning, you need to get a hold of the value itself.

> How do you reason using an object's value if that value can change?

Objects don't have “values”, they have “states”. And I said you can reason about programs that manipulate objects, not about the states of these objects. (Though maybe in a few special cases you can do the latter too.)


Is there some kind of reference that explains the terminology and theory you're using? Because it makes no sense to me as you're explaining it. It would be nice if such a reference also gave some real world examples where the terminology and theory you're using actually pays dividends, as I asked before.


> Is there some kind of reference that explains the terminology and theory you're using?

On values:

(0) “In call by value, the argument expression is evaluated, and the resulting value is bound to the corresponding variable in the function” (https://en.wikipedia.org/wiki/Evaluation_strategy#Call_by_va...)

(1) “Most languages use a call by value [evaluation] strategy, in which only outermost redexes are reduced and where a redex is reduced only when its right-hand side has already been reduced to a value - a term that has finished computing and cannot be reduced any further.” (Types and Programming Languages, p. 57)

For the definition of redex, see: https://en.wikipedia.org/wiki/Reduction_strategy_(code_optim...

On objects:

(0) “In computer science, an object can be a variable, a data structure, or a function or a method, and as such, is a location in memory having a value and possibly referenced by an identifier.” ( https://en.wikipedia.org/wiki/Object_(computer_science) ) In some languages, object states aren't values, though.

(1) “object: a cell (unless otherwise explicitly stated)” (p. 325), “cell: a number of contiguous memory fields forming a single logical structure” (Garbage Collection: Algorithms for Automatic Dynamic Memory Management, p. 322)

> It would be nice if such a reference also gave some real world examples where the terminology and theory you're using actually pays dividends, as I asked before.

Compiler authors take advantage of the notion of value all the time. For example, If two expressions are guaranteed to evaluate to the same value, a compiler may emit code that evaluates the expression once, then reuses the result twice.

---

Sorry, I can't reply to you directly because I'm “submitting too fast”, but:

These blog posts show how to do equational reasoning on Haskell programs. (Technically, the subset of Haskell that doesn't contain nonproductive infinite loops.)

http://www.haskellforall.com/2013/12/equational-reasoning.ht...

http://www.haskellforall.com/2014/07/equational-reasoning-at...

http://www.haskellforall.com/2013/10/manual-proofs-for-pipes...

Although Haskell is particularly well suited for using equational reasoning, it can also be used in other languages, provided your program primarily manipulates values, rather than objects.


I'm talking about a reference that explains this whole "equational reasoning" thing and the terminology and theory specific to that.


> > It's only violating your idiosyncratic insistence that there can only be one in-memory representation of any immutable value.

> I never said this!

The standards you have set require either

(1) That there is only one in-memory representation of a given value, or

(2) The language has no facilities that allow you to inquire about the in-memory representation used by a particular reference to a value.


> That there is only one in-memory representation of a given value, or

No.

> The language has no facilities that allow you to inquire about the in-memory representation used by a particular reference to a value.

s/reference to// , otherwise yes.


I would argue that, because Python requires its users to reason about storage and interpreter state, we shouldn't consider (a,b,c) and (a,b,c) identical under is unless they're aliases for the same structure in memory. I'm not sure I understand dragonwriter's idea for a third equality operator, however. Isn't == adequate for that purpose?

By the way, I'd like to thank both catnaroek and dragonwriter for an excellent discussion; this sort of thing is why I come to HN.


> we shouldn't consider (a,b,c) and (a,b,c) identical under is unless they're aliases for the same structure in memory.

Which is the entirety of my point.


> In that case, you want values rather than objects.

In dynamic OOP languages, value/object distinctions are often not exposed to the language user (they may actually exist in the underlying implementation, but from the PoV of the programmer using the language, there may be no discernible distinction between an "immutable object" and "value".)

Conceptually (and ignoring the implementation details, which may have performance implications), immutable objects are equivalent to values, anyhow.


> Conceptually ... immutable objects are equivalent to values, anyhow.

Nope. Since values don't reside in computer memory other than through their representations, the language implementation (compiler, runtime system, etc.) is free to apply optimizations such as:

(0) Determine whether a value is represented more than once in memory, and eliminate the redundant representations.

(1) Store multiple values in a single dynamically allocated memory block.

If object identities matter, all of this is unsound.

> ... (and ignoring the implementation details, which may have performance implications), ...

The performance implications can be constrained by equipping the language with a cost semantics.


Please stop it.

Every single time anything so much as slightly related comes up, you talk about compound object. Incessantly. And then you act superior when nobody knows what you mean, or cares about compound values. They're not really relevant to this discussion anyways, as storing data as desribed in GPP is perfectly reasonable.

The worst part is, your definition of "value" and "object" in this context is so far out of the ordinary that people should be expected to not understand it, and you should provide an explanation pre-emptively.

To quote Randall Munroe, "Communicating badly and then acting smug when you're misunderstood is not cleverness"


You've repeatedly become uncivil in this thread. That's not ok, regardless of wrong or provocative someone else may be. If you can't remain civil, please don't comment here.

https://news.ycombinator.com/newsguidelines.html

https://news.ycombinator.com/newswelcome.html


You are, of course, correct. I appologize, and will go to greater lengths to remain civil in threads in the future.


We very much appreciate it. Thank you.


It's all part of being a good commenter. My irritations, frustrations, and nitpicks ade my problem, not yours, and I'll do my best to ensure it stays that way in the future.


It's relevant. If it weren't, there wouldn't be a debate regarding the usefulness of behaviorless objects.


Well than FFS express yourself clearly.


[flagged]


Actually, OOP, particularly statically-typed OOP, seems to be a major domain in which the distinction between "value types" which contain "values" which do not evolve over time and "object types" which contain "objects" which may evolve over time is used frequently [0].

There are some OOP languages that take an "everything is an object" approach and provide no clear object/value distinction (this seems to be more the case with dynamic OOP languages, and even there while there may be little ergonomic distinction beyond the lack of mutability, there often are immutable "objects" that are stored without indirection which are far all intents and purposes values. Though in some cases the distinction between these and simple immutable objects is obscured from the programmer.)

[0] Though in most such languages, the distinction between object and value involves more than just mutability, and is more deeply associated with indirection of storage, and its quite possible to have immutable objects which might be values from a technology-neutral conceptual perspective, but which are objects from an in-language implementation perspective.


C# is an interesting case, in relation to your comment. It has `struct`, which defines value-semantic types instead of reference-semantic types. However, `struct` types still inherit from `System.Object` and may still encapsulate and provide behavior. So you don't necessarily need a dynamic type system to blur these lines.


> Though in most such languages, the distinction between object and value involves more than just mutability, and is more deeply associated with indirection of storage

Values, unlike objects, don't reside on memory, but rather in the semantics of the language. Of course, representations of values reside in memory. But that's an implementation detail. What matters is the abstraction the programmer is exposed to.


So your assertion is that objects are not an abstraction or defined by language semantics?


Objects are a different abstraction: they're defined by the language's semantics to have a unique identity and be stored in memory exactly once at runtime. I won't blame objects for not being values.


I think we might have just found the Ken M. of hacker news.


>Nobody other than an object-oriented programmer would think a value is something that evolves over time.

I don't think you fully read qwertyuiop924's post to get to the Randall Munroe quote, which makes this comment priceless.


I'm not an object oriented programmer. The word "value" is used in many contexts. Functions take values and return the same, sometimes mutating the values they took. rvalues, which are almost anything, are assigned to lvalues, which are locations. You can probably think of more.

OTOH, your definition of object is almost entirely unique outside FP, AFAIK.


> assigned to lvalues, which are locations.

lvalues aren't values.


...And here you are, completely demonstrating my point: Your defintion of "value" is not the same as the one in common usage: if you're going to go around using strange definitions of words we all know, at least tell us beforehand.


> lvalues aren't values.

lvalues are values in the same way that object references (not objects themselves) are values; perhaps more precisely, lvalues are values the same way that pointer values are values.


An object reference is totally an rvalue: you can't mutate the object reference, only the object it refers to. AFAICT, lvalues are more akin to objects themselves, in that they can be mutated.


Here, look, I can do overly-broad, potentially-insulting generalizations too:

Only a functional programmer would think that recreating the entire universe is necessary to change a value.

See? Not very helpful or particularly insightful, is it?


> Only a functional programmer would think that recreating the entire universe is necessary to change a value.

That's wrong. You can't change a value at all. What you can change is the state of an object. And functional languages have mutable objects too:

https://docs.microsoft.com/en-us/dotnet/articles/fsharp/lang...

http://hackage.haskell.org/package/base-4.9.0.0/docs/Data-IO...


It wasn't meant to be correct... That's the whole point.

"What we've got here, is a failure to communicate."

Your definition of a "value" seems to be a strictly immutable, functional-oriented definition. Which is fine; that's a valid definition and there's nothing wrong with it. The issues come from the fact that you seem to refuse to accept that that is one definition of many, and continue to push it without compromise.


[flagged]


It's correct only using your definitions and axioms. Other definitions and axioms can and do come to other conclusions. Your refusal to acknowledge that other definitions and axioms even exist is what is earning you the ire you are experiencing.


[flagged]


> I'm deliberately provoking it.

I'll just start flagging you for trolling then. It's one thing to attempt to have a meaningful discussion and unintentionally provoke ire. Meaningfully doing it by being obtuse and smug... I guess qwerty was spot on with the Randall Monroe quote.


> Meaningfully doing it by being obtuse and smug...

I don't think I was being obtuse. I just expected (in the statistical sense of the term “expectation”) the reaction I got, and decided that it don't mind it.


Why do you not mind being poorly understood? The goal of effective communication is to have others understand you. Since you're not trying to communicate effectively, what exactly is it that you think you are doing?


> Why do you not mind being poorly understood?

What I said I didn't mind is the “ire you are experiencing”. It somewhat saddens me that programmers, supposedly logical thinkers, can't see the distinction between a value and an object.

> Since you're not trying to communicate effectively, what exactly is it that you think you are doing?

I'm pointing to the distinction between values and objects pretty clearly:

(0) Objects exist in memory. Values exist in the language's semantics, not in computer memory, but representations of values exist in computer memory. Furthermore, objects exist in memory exactly once, but a value may be represented in memory any number of times.

(1) It doesn't make sense to distinguish between representations of the same value. If a program treats two memory blobs differently, their contents represent different values, period. (The converse is not true, of course.)

(2) The language implementation has absolute freedom to represent values as it wishes, as long as (1) isn't violated.

But this isn't the first time I've said all of this in this thread.


>It somewhat saddens me that programmers, supposedly logical thinkers, can't see the distinction between a value and an object.

We can see the distinction, we just don't use the same terminology you use. We use standard terminology by which what you call an object is a value. Your definition of the terms only shows up in an obscure part of FP, which is still a bit obscure in itself. Use alternate terminology, or explain yourself the first time you use it in a thread - otherwise, we'll all be confused.

I've tried to point this out to you at least three times, and I'm losing my patience.

This brings to mind another Munroe quote: "You're like the religious zealots who are burdened by their superiority with the sad duty of decrying the obvious moral decay of each new generation. And you're just as wrong."


> We can see the distinction, we just don't use the same terminology you use.

No, most of the people I've talked to here clearly couldn't see the distinction between a value and its representation. (FWIW, I'm not fully convinced you can see it either.) They're not used to thinking about values without thinking about how they're represented. They're abstraction-challenged.

> We use standard terminology by which what you call an object is a value.

That's not consistent with some of the things other people have said here. But let's ignore that. What do you call what I call a value? Not that I'm a nominalist, but in practice, I've seen that people don't understand concepts they don't have a name for. (Though I probably have the causality backwards. It would be more accurate to say that, as soon as people understand a new concept, they rush to give it a new name.)

> Your definition of the terms only shows up in an obscure part of FP, which is still a bit obscure in itself.

It's not obscure. It's in the operational semantics of any call-by-value language, which for practical purposes means any language other than Haskell.

> I've tried to point this out to you at least three times, and I'm losing my patience.

Well, it's not like you have to deal with me if you don't want to. You do so out of your own volition. shrug


There's a differance between being abstraction challenged and wishing to understand the abstraction.

We don't have a name for what you call values, as they're either irrelevant, or nonexistant in most contexts, and barely mentioned. However, we can't call them values, as that name is taken. Most CBV languages call them atoms, but that doesn't fit because they aren't necessarily atomic.

Let's try... Symbolic value? It seems to work: a given symbolic value represents all other equivalent symbolic values, and it's value that behaves like the symbol type in most languages. So that works unless it's already taken.

The point is, your definition of value needs a name that doesn't collide with the names of similar concepts and ideas. It doesn't really matter what it is. Unfortunately, this is one name clash that gensym can't handle for us. :-)


> We don't have a name for what you call values, as they're either irrelevant, or nonexistant in most contexts, and barely mentioned.

They do exist. Python has va... errr... the-thing-for-which-we-don't-yet-have-a-name: small enough numbers, special constants and object references. And they are relevant, because these are the things that you can bind to variables, pass to and return from functions, etc. If you think they are irrelevant, you don't understand them well.

> However, we can't call them values, as that name is taken.

I won't quibble about names.

> Most CBV languages call them atoms,

C is call-by-value. No atoms. Java is call-by-value. No atoms. ML is call-by-value language. No atoms. Maybe by “call-by-value”, you mean “inspired by Common Lisp”? That's not what call-by-value means, though.

> but that doesn't fit because they aren't necessarily atomic.

Right. In fact, the whole point is to use compound ones whenever we can!

> Let's try... Symbolic value? It seems to work: a given symbolic value represents all other equivalent symbolic values, and it's value that behaves like the symbol type in most languages. So that works unless it's already taken.

A symbol is supposed to represent something else, but a va... errr... the-thing-for-which-we-don't-yet-have-a-name doesn't represent anything else. If anything, what I call representations are the ones representing something else.

I propose “mathematical value”. Not everything you call a value can be plugged into an equation, but mathematical values by definition can.


Why do you reference call-by-value? Call-by-value vs. call-by-reference (and the related value- and reference-type semantics) is entirely about the in-memory representation of parameters, and you have been very adamant that your definition of value does not depend on in-memory semantics.


I was comparing call-by-value with non-strict evaluation strategies like call-by-name and call-by-need.


...And that should have taught you something: it should have taught you that the language you're using isn't being readily understood, and that you need to either explain it, or change it.


> I just expected (in the statistical sense of the term “expectation”)

If you're going to argue here about programming languages terminology it behooves you get terminology from other fields correct.


I got the terminology right. The term “value” means what I mean by it, not what qwerty means by it. Check TAPL, pages 34 and 57.


Both definitions of "value" are accurate. That's what makes this whole thing so confusing.

Officially, yes, cat is right, but in common usage, value leans more towards my definition.


It doesn't make sense to talk of value as a location. A value is a piece of data, plain and simple. So lvalues and objects aren't values.

OTOH, I can agree with the imperative programmer's intuition of a variable as a location where you can store a value (rather than a symbol that can be consistently substituted with a value). It's not a mathematical variable, but it's a sufficiently established meaning to be taken into consideration in serious discussion. (Furthermore, the connection between imperative variables and mathematical variables can be restored using Hoare logic.)


>It doesn't make sense to talk of value as a location. A value is a piece of data, plain and simple. So lvalues and objects aren't values.

It doesn't have to make sense (I think it makes perfect sense, but that's neither here nor there): people do it, and the default assumed definition of a value is broad enough that it allows for it, IME.

I don't object to your definition, but can you please just tell everybody what you mean by value in your comment if it's not what people expect, so that people like me don't have to build a deeply nested discussion thread to establish what you mean?

If I was sure of its legality by the rules of HN, I'd be lf half a mind actually write a bot to insert the definition below your posts, and save people a lot of time trying to ascertain what you mean, so the we could all have a more interesting discussion about the ideas, rather than the terminology.


> I think it makes perfect sense, but that's neither here nor there

Would you conflate a word with the piece of paper in which it's written?


I'm talking about "expectation".


Oh, sorry. I was implicitly making the following assumptions:

(0) Reactions can be quantified - assigned numerical values, roughly corresponding to our intuition of a “positive”, “neutral” or “negative” reaction.

(1) The possible reactions can be meaningfully averaged, and the result can be interpreted as a reaction value as well.

So by “expectation”, I meant “expected value”, in the usual sense. If your objection is that “expectation” can't be used as this, I have evidence that suggests otherwise:

(0) http://ocw.mit.edu/courses/mathematics/18-05-introduction-to...

(1) https://www3.nd.edu/~rwilliam/stats1/x12.pdf


It's rather non-standard to say "I expected" in this sense but since you've gone to the trouble to define your terminology and back up your claim, fair enough!


Yeah, to clarify, my initial gripe was that he didn't clarify his terminology to begin with. His definition is correct, it's just uncommon, and confusing as a result. He really should have clarified this in the head comment.


Given that "object-oriented programmer" probably constitutes most of the people here you ought to know your audience and explain what you mean to say.


It seems that you're really saying that your values are immutable.

For the rest of us, it seems we follow the line of thinking that, if our mental model (or "struct", if you will) has values (or types with fields/properties, if you will), that our mental model can be changed to reflect lessons learned (or, our values are inherently mutable, if you will).


Mutable objects are totally fine. They're just not values.


It's funny how catnaroek was completely vendetta-downvoted in this thread for speaking valid things. For those who don't understand the difference between objects and values, this video may be of help: https://www.youtube.com/watch?v=-6BsiVyC1kM




Registration is open for Startup School 2019. Classes start July 22nd.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: