Hacker News new | past | comments | ask | show | jobs | submit login

This is awesome, because the ordered map is the best data structure out there for easy & predictable programming, possibly only barring the array.

There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem). PHP really got this one right (and immediately messed it up by mixing ordered maps with arrays into a big soup, but hey, PHP). And so did, JS, sorta-kinda-by-accident.¹

When I went from PHP to Ruby back in the day, Hashes not being ordered definitely was my biggest gotcha. Everything about Ruby (except the deployment) was nicer, but the hashes... I've spent serious amounts of extra thinking just to make code work that would've worked out of the box in PHP.

Yay for ordered maps! Every language should have them, they're the best! There's just so much ergonomics packed in such a simple API.

1) JS objects are ordered maps, iff the keys are strings that cannot be parsed as an integer (really) (yeah it's nuts) (but still better than unordered maps!!)




> JS objects are ordered maps

So currently (now that there are Symbols in JS), the order is: array-indexed properties first (integers), then string-key properties, in insertion order, then Symbol-keyed properties in insertion order.

The reason for the funny behavior in treating integer keys differently is that property keys are always treated as strings, so obj["3"] and obj[3] can't be distinguished, and arrays are also ordinary objects, so setting obj[3] was made to do the same thing on any object rather than special-casing arrays and non-array plain objects...

Today, JavaScript is a small, reasonably elegant scripting language with reasonable semantics, buried in a medium-sized, reasonably expressive scripting language with reasonable but different semantics, added 20 years later. The kitchen-sink disease is far enough along that it'll probably never recover the appeal it once had as a beginners' language. It still has the advantage of backwards compatibility, and it's become more practical as a compilation target.


> The reason for the funny behavior in treating integer keys differently is that property keys are always treated as strings, so obj["3"] and obj[3] can't be distinguished, and arrays are also ordinary objects, so setting obj[3] was made to do the same thing on any object rather than special-casing arrays and non-array plain objects...

Note that special-casing arrays is precisely what implementations used to do (i.e., for non-Array objects, the order in most VMs was simply "properties in insertion order").

V8 was the first JS VM to stop special-casing Array (and this was done before Chrome went public, and was the behaviour in the first Chrome beta). It turned out that virtually no websites relied on "array index" properties appearing in enumeration order (there was some breakage, but it was relatively few and far between, and believed to be worthwhile for the gains in cache-hits when accessing properties), and this also allowed the more compact array representations to be used for them.


I forgot about that when writing my comment but yes, V8 made that change along with "shadow classes" or whatever the object specialization stuff was called. Heady days for JS performance.

In retrospect if something was going to be standardized I'd have preferred it to be the older behavior which was simpler to explain, but so it goes.


SpiderMonkey actually had inline caches before V8 shipped if I'm not mistaken; the motivating factor for the behaviour change in V8 was just to improve the cache-hit ratio.


JavaScript is a small, reasonably elegant scripting language consisting entirely of edge cases.

/s


I did not even know that integer is a good key for an object. Is this being used anywhere outside arrays?


You can use it anywhere you want. Whether it's a good idea or not, like most anything in JS, is a matter of opinion.


> PHP really got this one right (and immediately messed it up by mixing it with arrays, but hey, PHP).

No, it didnt. What PHP calls "array" is actually an Ordered Map, as you alluded to:

https://yaml.org/type/omap

it literally says that in the documentation, paragraph one, sentence one:

> An array in PHP is actually an ordered map.

https://php.net/types.array

The issue, if one exists, is that PHP doesnt have a sequence type:

https://yaml.org/type/seq

but one is provided as a package:

https://php.net/class.ds-sequence

https://github.com/php-ds/extension


Sure, PHP arrays are actually ordered maps, but it's super confusing when used as arrays. I mean, look at code like this:

    $arr = [];
    $arr[2] = "two";
    $arr[0] = "zero";
    $arr[1] = "one";
    echo join($arr, ", ");
    // two, zero, one
In any other language, the result of similar code would be "zero, one, two". You'd need to `ksort` this thing to get it to behave like a normal bog-standard boring array.

Everybody's writing PHP code pretending that it has arrays when really, it does not. The gotchas are way bigger than the gotchas in Python <3.7's unordered maps. It's a mess.


Why wouldn‘t you just push/append to the end of the array with '$arr[]'? There is no need for explicit keys in PHP. The gotcha is writing code in PHP pretending you are using a different language.


I literally just learned "bog-standard" 10 minutes ago and now I see it here. And no, don't Baader-Meinhof me, I would have noticed this weird term in random text in the past.


Would you though? Appears at least once a week: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...


>I would have noticed this weird term in random text in the past.

Well, no so weird, I'm not a native english speaker, and I've seen the term hundreds of times over the years...

It's also quite common on HN:

https://www.google.com/search?q=site%3Anews.ycombinator.com+...


I'm aware of the terms "bog" and "bog-standard", but not being from the UK, I'm not sure what is so standard about bogs.


Baader-Meinhof denial!


Baader-Meinhof Dunning-Kruger?


I have to disagree also.

The order inserted makes most sense in most cases. Hand crafting integer based array indexes out of order is rare. And as you say a simple ksort call guarantees sort order.

    ksort($arr);
    echo join($arr, ", ");
    // one, two, three


Exactly. Of you hand craft an array you probably need it in that order.


Then why not put it in order while you're hand crafting it?


I disagree. The behavior you exhibit makes more sense than the behavior you imagine.

You start with an _empty_ array, then (somehow) set the third element of that array? That makes very little sense.

Had you _initialized_ your example as an array with a length of 3, your desired behavior would, indeed, manifest.


The problem is PHP calling an ordered map an array. Array in pretty much every other language means a sequence indexed by integers.

Ideally the maintainers of PHP would rename it and deprecate the use of `array()` over a long period of time.


The PHP documentation refers to them as "associative arrays", which is technically correct, but I agree I'm not sure how they could have conflated these concepts so badly.


I believe the "associative array" terminology came from Perl. Though Perl's associative arrays don't have any guaranteed order. And Perl also has normal arrays.


>You start with an _empty_ array, then (somehow) set the third element of that array? That makes very little sense.

Well, there's no reason a language could not initialize an empty array to some initial capacity, either on declaration or when you add an indexed element.


I guess my reasoning is that:

$arr = []

doesn't appear to be initializing to any capacity. So that after:

$arr[2] = "two"

I would expect:

count($arr) === 1

And honestly, sure. A language could be designed to pad the array with empty values if you specify an index greater than the upper bound (or lesser than the lower bound). I believe a sibling mentioned matlab did something like that.

But is that really better behavior or just different behavior? I don't really see an important difference. My point was that the criticism was fairly weak. PHP, indeed, has a very useful abstraction in its concept of "array". Most of the hate should be directed at the name, not the construct.


In that situation, Matlab allocates an array of length 3, fills it with “empty” values (depending on the type), and then sets the third element to the value. That’s what I’d have expected to happen here too...


Would you expect it to do that if somebody wrote the following?

    $id = 12835151;
    $arr[$id] = Get_thing_with_id( $id );


If $arr were an array I would, though I'd cringe at the wasted space (unless it were a sparse array).

Obviously, I'd agree that a map/associative array/dict/etc is a better choice here; I'm just annoyed about the name.


In my bog standard PHP I would do $thing = Get_thing_with_id( $id ); if(!empty($thing)){process($thing);}


yes.


So how can you access, say, the third element of your ordered map in PHP ?


    array_values($arr)[2];


I mean youd probably have to iterate. Either that or use a package with proper array support.


That's messed up


In practice, it's almost never an issue. The standard lib has push/pop/shift/pad methods available.

So example, if you had numeric indices that you wanted to guarantee that index order is equal to insertion order, you'd create an array using $my_array = array_pad([], 20, null); If it's an array that you didn't create, the ksort(...) method will order the array for you as expected.

Edit: after reading through the documentation on array_pad, it looks like it will even correct key order in existing arrays. https://www.php.net/manual/en/function.array-pad.php#87735


>There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem)'

Don't know if this could actually be the case in practice, but theoretically the ordering could allow for a timing attack to glean some bit of information when performing a linear scan of the map (size of the map, relative location of the data, etc).

Just a contrarian thought given the definitive statement of "none where it's a problem". I'm generally of the same opinion as you though; mostly if not entirely harmless, potentially helpful in certain applications.


>Don't know if this could actually be the case in practice, but theoretically the ordering could allow for a timing attack to glean some bit of information when performing a linear scan of the map (size of the map, relative location of the data, etc).

An attack where the attacker has access to your ...program code and can run instructions there? In that case, leaks from a "timing attack" would be the least of your worries...

If they just provide an input to your program somehow externally, then whether you put that input into a map or an ordered map or not is an implementation detail. You could make your program rid of the "timing attack" in 100s of ways... (or have one, in 100s of ways). That doesn't make an ordered map more unsafe than any of the 1000s of ways to have a timing attack.


I think it's also worth noting that requiring an ordering is preventing a performance optimization

This doesn't impact most high level python code, and an OrderedDict is a very reasonable default. But there's a reason why Google's c++ map intentionally randomizes iteration order

(Hint: it allows the hash map and hash function to be extremely high performance while allowing themselves the flexibility to change the hash function)


Python dicts are not ordered maps, they are dicts preserving insertion order.


The title of the post is misleading: it means "ordered by insertion order" (not sorted).

Previously it was unpredictable, and that's why collections have OrderedDict; which makes sense instead of trusting an implementation detail of CPython from 3.6.

Is not mentioned in the docs: https://docs.python.org/3/library/stdtypes.html#mapping-type...

EDIT: but the behaviour is documented in OrderedDict itself! https://docs.python.org/3/library/collections.html?highlight...

Interesting!


I wonder why OrderedDict isn't outright deprecated (or reimplemented as a trivial wrapper over dict).

You can get the first/last key of a 3.8 dict with dict.keys() and reversed(dict.keys()) which you can then delete to reimplement OrderedDict.popitem.

Deleting a key and reinserting it will let you reimplement move_to_end.

The only cool thing that OrderedDict's implementation might be useful for is to move to front (or anywhere else not the back) but it doesn't expose that api.


When OrderedDict is compared with another OrderedDict, it also checks whether the order of items is the same (this way also violating LSP, but that's another story), dict instances don't do that.


Thanks, I just learned something.


...thats the same thing. Maybe youre thinking about "sorted maps"?


It's not the same thing. Compare two OrderedDicts that weren't constructed in the same order and the comparison will fail. Compare two dicts that weren't constructed in the same order and the comparison will pass.

The difference is whether order is part of its identity. A dict is still just a set of pairs, not a list of them. It just happens that it also guarantees now that if you _iterate_ over the set you'll walk the keys in insertion order.


Good point, but close enough! My points about how it makes the data structure more awesome stands.

FWIW I don't think people agree on whether "element order is part of the value's identity" is part of the definition of "ordered map". There's plenty other languages & libs with ordered maps where the order isn't part of its identity but they still use the term "ordered map" in the name or the docs. Eg PHP's arrays or ImmutableJS's OrderedMap.


Could you explain some examples of what this is useful for? What sort of algorithms or operations do you have in mind where you both want insertion order, and also key-based lookup in the same data structure?

I've made heavy use of all kinds of maps, and of queues and channels and arrays, but I don't recall ever noticing a situation where I wanted the properties of both mixed into the same data structure.

I'd love to learn more about useful tools to add to my toolbox!


For "algorithms", I mean, some really do care about insertion order. Like idk, if you have a priority queue, then you generally want FIFO ordering when the priority is the same? It like a pretty obvious desire in most cases... imagine a thread scheduler or a packet scheduler or what have you.

But generally it's about determinism and avoiding loss of information, not just whether a particular algorithm needs it. For example, you'd want serialize(obj) and serialize(deserialize(serialize(obj))) to produce the same output, otherwise you e.g. might not be able to cache stuf. But for a data structure like a hashtable, it's pretty tough (not logically impossible, but rather pointlessly difficult) to make that happen without preserving insertion order.

As another example, it's incredibly handy for a user to see items in the order in which they were inserted. Like say you're parsing command-line arguments, and the command is ./foo --x=y --w. If the user sees {x: y, w: None} then that tells them w was passed after x. That can be extremely useful for debugging; e.g. maybe you expected the caller to specify w earlier, in a different context and for an entirely different reason. Seeing that it came afterward immediately tells you something is wrong. But when such information is lost it's harder to debug code.


If --w can be specified in two places for "an entirely different reason", then an unordered data structure is simply inappropriate, full stop. That's not a question of debugging, that's a question of correctness.


  console.log(object)

  {
    keys
    in
    headache
    less
    order
  }
I find this use case severely undervalued in this thread and have no idea why. It helps so much in logging and/or debugging.


I wrote a program that read in a proprietary data format, ran a few transformations, then outputted CSVs. Using an ordered map was super-useful, because it gave me deterministic output between runs. Otherwise I ended up with CSVs with differently ordered rows...


I think there is a lot of counters you might want both.

For example, supposed you have a list of ids accessing some system, you may want to count the raw number, but then also have some transformations thereof that are still ordered by count.

Example

  l = getListOfAccessIds()
  counts = Counter(l)
  total = np.sum(counts.values())
  fractions = {k:v/total for k,v in  counts.most_common()}

  #use the fact new dictionary is still ordered by most common
  plt.semilogy(fractions.values())
  plt.plot(np.cumsum(fractions.values()))

  #still works like dict
  print(fractions[id_of_interest])


Very few algorithms really care, rather, it's when you're composing them that maintaining insertion order avoids a lot of boilerplate in having to cart that information around.

For the same reason, though, it's potentially a bad idea. All these algorithms that generate dicts generally won't promise to maintain insertion order, they just happen to by chance. Then consumers come to depend on it and be surprised when it inevitably changes.


I use ordered maps to keep track of state snapshots/patches of redux/mobx-state-tree stores.

It allows me to both apply them sequentially as generated, or to "jump to" a particular point in time.


Say, a simple LRU cache:

* Inserting into the cache is just normal insertion.

* To delete excessive items, simply iterate to get the first few items' keys, then delete.

* To lookup, simply do a key-based lookup, then delete and re-insert.


Working with GraphQL.


Point of order: Ruby’s hashes do preserve insertion order. Python dicts are now the same.


Good for Ruby! Nice to hear.

I'm pretty sure they didn't in Ruby ~1.8 though. It's been awhile :-) Can't remember exactly which version, but it bit me more than once so I'm pretty sure :)


Yup, I think Ruby 1.9 or so made hashes ordered.


Awesome! Thanks for the heads-up both of you :-)


Ruby hashes are currently ordered and have been for quite a while now. Ruby 1.9 maybe?

But back in the day they indeed were not, for appropriate values of back in the day. :)

it did make a LOT of things more convenient and less buggy when they made ruby hashes ordered, I recall. I didn't expect it would matter much, by found myself loving it. I believe the ruby maintainers investigated and determined they could do it with very little performance hit. It turns out that having a repeatable and predictable order, that also matches insertion order, is what a lot of people end up assuming whether they realize it or not, just makes everything smoother when it is.


> There's so many cases where it's a benefit for map entries to retain order (and none where it's a problem)

Of course there’s a trade off. There has to be.

In this case it prioritizes usage patterns oriented around insertion and enumeration over usage patterns involving repeated removal Or modification of items.

as items are removed, the insertion-ordered index will either fragment, require removal operations to not be constant time, or be implemented using a data structure that has per-item storage overhead (such as a linked list).

also there are two possible, reasonable things for an ordered list to do when you write a new value to an existing key - leave the key in original insertion position, or move it to the end. Whichever one the implementation chooses, people who want the other will need to implement their own key ordering state alongside that which the collection is doing for them.


> and immediately messed it up by mixing ordered maps with arrays into a big soup, but hey, PHP

And here I thought Lua was the only language insane enough to do something like that.

Just realized Lua tables aren't ordered, but it still mixes arrays and maps / tables into a single data structure.


In JS's defense, it really tried as well. Fortunately though they failed harder at it, which is why arrays and objects are probably more different from one another than Eich had intended there in that mad 2 week language design frenzy. Good for us!

Back when JS got popular and the majority of web programmers knew PHP, there were dozens of tutorials out there trying to convince you not to do stuff like this:

    var a = new Array();
    a["moo"] = "five";
    a["foo"] = "six";
After all, well, it works! Like in PHP! So what's the problem? :-) Most people felt the same way about these tutorials as people feel about monad tutorials nowadays.

Good times!


Also don’t forget about JS Map which is ordered

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Refe...



That's not about JS Map objects, that's about regular objects which don't keep their properties in insertion order.


You missed the point.





Applications are open for YC Summer 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: