Hacker News new | comments | show | ask | jobs | submit login

This is probably not a popular opinion, but I believe that PHP's associative array is one of the best-designed data structures in programming languages.

Its main distinguishing property, as mentioned in this article, is that values can be indexed by key, but are still iterated in the order they were set. This is "do what I want" in so many cases that it's just nuts.

Sure, just as often it's just needless overhead, but as a programmer who prefers to reason about domain and not performance, I often don't care about that. I hate that many other languages, including C#, Ruby and Python, force me to choose between either an unordered map or a list of (key, value) tuples. EDIT: clearly, i'm behind the times with that remark. thanks commenters :-)

I wish more languages had a native data type like this. It scares me that in practice JS objects have the same property, but officially the iteration order is not specified.

(that said, PHP's choice to mix regular arrays and associative arrays into a single type strikes me as a bit odd. i've also never seen a good use case of arrays with mixed string/int keys)




> I hate that many other languages, including C#, Ruby and Python, force me to choose between either an unordered map or a list of (key, value) tuples

Another commenter mentioned Python's OrderedDict, and Ruby's hashtables are ordered (from 1.9+): https://www.igvita.com/2009/02/04/ruby-19-internals-ordered-... It looks like C# also has an OrderedDictionary class: http://msdn.microsoft.com/en-us/library/system.collections.s...

I never had to use any of these classes even if I do use hashmaps all the time, so IMHO the Python way is best (default to a 'normal' unordered hashmap, offer an ordered alternative)


Thanks for the pointers!

I find myself needing this surprisingly often:

An array of elements in a certain order which I also want to lookup by Id.

Sometimes the order is defined by configuration, sometimes by some other criteria that is not accessible for this particular component, so I can't resort to a SortedHashMap.

Still, I need a 1) fast and 2) convenient way of lookup by some key. Convenience is often more important to me than performance in those cases, since the data size is not huge, but I hate it when I have to write array.find(e => e.Key == myKey) instead of orderedLookup[myKey].


I think there is one reason to default to something ordered, which is that iterating through unordered hashmaps can be non-deterministic from one run to the other. For instance, if you hash by id, the same objects may be assigned different ids from a run to the next, meaning that the map will be iterated in a different order. If there is a bug that's sensitive to iteration order, it will not be reproducible and that can be very annoying to debug.


If you've got bugs due to iteration order you have a whole other issue. If you know that iteration order is important, it should be explicit in the way you prepare/store/handle the data. If your code only works when you iterate in a specific order and you didn't deliberately define that order, it's only really working by accident.

Don't get me wrong, it's actually a bug I've run into in the past, and you're right, the non-deterministic issue makes it harder to debug. But when I discovered it I blamed myself for not being explicit with my constraints.


> values can be indexed by key, but are still iterated in the order they were set

Java's LinkedHashMap provides this as well ( http://docs.oracle.com/javase/7/docs/api/java/util/LinkedHas... )


And it doesn't needlessly restrict you to integer or string keys like this "best designed data structure."


Ruby 1.9.x and later (at least in MRI) also guarantee to iterate over hash elements in the order in which keys were inserted.

http://www.ruby-doc.org/core-1.9.3/Hash.html


Yes, and I think I can safely say I've never used this feature (in either Ruby or PHP) for anything more serious than a code golf challenge.

This might just be me, but I don't think an ordered hash is a particularly easy data structure to reason about. Almost all the code that I've seen depend on it in either language has been too clever by half.


I've used it when I want to parse query params from a URL into a hash, and then possibly modify them, and then write them back out to query params -- in the same order they came in, so the query params will be identical if I didn't end up modifying them, and identical but for what I modified otherwise, etc.

There are probably other analagous circumstances. Maybe even some I could think of that I've encountered.


Python has https://docs.python.org/2/library/collections.html#collectio...

I've never wanted this though. I just discovered OrderedDict when I was looking for something like std::map.


> I've never wanted this though.

When it's not the default normal thing, you don't build solutions around it, so you never see what you're missing :)

In PHP I make lots of tiny uses of it in many places. I really missed it when I switched to Python (sure, there's OrderedDict, but it's a second-class citizen: there's no syntax for literals and standard APIs don't explicitly take advantage of it).

* It's very useful for deduplicating things without losing order (especially when you have a bigger algorithm that collects data from multiple sources or a tree structure into one array).

* It's neat for sorting objects without having to mutate them to add a key or wrap them in a key/value object.

* It's great for configuration with JSON-like structures, but key order gives extra flexibility in the design, e.g. instead of [{id:"foo"},{id:"bar"}] you can use ["foo"=>[],"bar"=>[]].

It's not a major feature, but it makes things nicer. I've never had arrays accidentally randomized in PHP, but had bugs due to careless list->dict->list conversions in Python.


What I read is "if all you have is a hammer, everything looks like a nail."

> In PHP I make lots of tiny uses of it in many places. I really missed it when I switched to Python (sure, there's OrderedDict, but it's a second-class citizen: there's no syntax for literals and standard APIs don't explicitly take advantage of it).

Missing syntax sugar makes it a second-class citizien? How? Also what advantages could standard API (<- what does that even mean?) take?

> It's neat for sorting objects without having to mutate them to add a key or wrap them in a key/value object.

It's called set()

> It's great for configuration with JSON-like structures, but key order gives extra flexibility in the design, e.g. instead of [{id:"foo"},{id:"bar"}] you can use ["foo"=>[],"bar"=>[]].

...and the reason you cant use OrderedDict here is you dont like it.

About ordering stuff: after years I can still remember the problems I had with ordering in PHP. There's more than a dozen of sorting methods, which is mess. No one can remember if they all behave the same way and what's the order of its parameters. In Python you've sorted() and that's it. You cant do any advanced stuff with these array, because sometime they act as lists and sometime they act as hashmaps. I'll take this example from "Fractal...": $first = array("foo" => 123, "bar" => 456); $second = array("foo" => 456, "bar" => 123); array_diff($first, $second);


Cool! I didn't know that!


JS objects don't have the same property, whatever order you see from iterating JS objects is only a side effect of the standard hidden class optimization, not because there was intention of having a certain order. In fact if you mix integer keys (which are represented differently) you will not get the "expected" iteration order:

    var o = {
        key: 3,
        1: 4,
        value: 10,
        0: 2
    };
    Object.keys(o)
    ["0", "1", "key", "value"]
If iteration order was specified you couldn't do those optimizations, note how even PHP7 is still paying a lot for the iteration order: the 100k element array only takes roughly 25% of the memory in JavaScript. And what is it even for? How often do you even need to iterate integer keys in insertion order over value order?


> only a side effect of the standard hidden class optimization, not because there was intention of having a certain order

Do you have a reference for this? I clearly recall a Lars Bak interview in which he says that adding a property .x and then .y results in an object of different hidden class than adding .y and then .x exactly because people want to rely on iteration order.

(Might not apply to numeric keys, though.)


It results in different hidden class because the whole point is to be able to reference named fields by fixed offsets from the object location in memory (same as for example reading struct fields in C). If the order changes, so will the offsets too, so same names with different order must have different hidden classes.

Integer keys are not practical to treat as fixed because they are used as array indices 99% of the time which are dynamic. So they should be optimized differently, and they are. They are backed by a dynamically resized array (so it's implemented like Java's ArrayList). And it's not possible to track insertion order in this representation, so integer keys have different iteration order from named keys.

Note that doing hash-tabley things with the objects (like changing their property order constantly, deleting named keys and so on) will change the backing representation to ordered hash table. The ordered hash table emulates the same order that results from the nature of the above optimizations to make it less surprising for user when the representation changes. If it wasn't ordered, it would be very surprising when the iteration order suddenly changed from ascending integer keys and insertion ordered named keys to something completely random.


I don't think your explanation makes much sense. Hidden classes describe the layout of objects: if two insertion orders produced the same hidden class, it would mean they have the same layout, with the same fixed field offsets.

The only difficulty I can see with doing that is that is insertion order.


I am not sure what you are even saying.

My point is that if the most optimal way would result in some other iteration order, that would be the iteration order experienced by users and that it is just a coincidence that the most optimal way results in insertion order.


I'm saying that argument is wrong, because there is no efficiency loss due to different insertion orders mapping to the same hidden class. In fact there are efficiency advantages because more objects would pass through class-based guard checks into favourable paths. The only reason I can see to not do that is because it would lose iteration order.

Sketch out an example of hidden class transitions and it should be clear:

In the order preserved case (what JS engines actually do):

  start with {} of hidden class <empty>
  add .x, transition to hidden class <A> with field .x at 0
  add .y, transition to hidden class <B> with field .x at 0 and field .y at 1

  start with {} of hidden class <empty>
  add .y, transition to hidden class <C> with field .y at 0
  add .x, transition to hidden class <D> with field .y at 0 and field .x at 1
In the order discarded case:

  start with {} of hidden class <empty>
  add .x, transition to hidden class <A> with field .x at 0
  add .y, transition to hidden class <B> with field .x at 0 and field .y at 1

  start with {} of hidden class <empty>
  add .y, transition to hidden class <C> with field .y at 0
  add .x, transition to hidden class <B> with field .x at 0 and field .y at 1
In the second case the field .y changes offset, but that's fine because the hidden class also changes to indicate the difference in structure. The only problem is that the order of the fields changes.


By the way, JS runtimes used to have insertion order for enumerating properties long before the hidden class optimization or V8 was a thing. V8 was the first to break the insertion-order enumeration for some kinds of objects [1] and eventually other browsers followed suit, even those that don't do hidden classes.

[1] http://code.google.com/p/v8/issues/detail?id=164


Who doesn't do "hidden classes" (or maps, or inferred-classes, or whatever else we're calling them today)? All major JS engines certainly do.


My point was that hidden classes is not a reason to keep nor break insertion-order enumeration. The example I gave was V8 breaking insertion-order enumeration, but for a reason entirely unrelated to its use of hidden classes - it implements Arrays as Objects with consecutive numeric keys, so Objects with consecutive numeric keys get enumerated like Arrays.

So yes, while all major browsers use hidden classes today, that itself is not sufficient to explain why insertion-order enumeration is not maintained.


Yes, the requirement to preserve order predates V8 by years (I believe all the way back to SpiderMonkey, though maybe it was JScript?). Certainly it's long been the case the web de-facto relies on insertion order being preserved (except V8 found that it didn't really for numeric keys, where there was far more variety between implementations, and Carakan and Chakra followed by similarly dropping order for numeric keys).


I've found one use-case for mixed-keys: parsing complex headers. For example:

    Link: <http://cdn.example.com/stylesheet.css>; rel=stylesheet; type=text/css

    [
      [0] => <http://cdn.example.com/stylesheet.css>
      [rel] => stylesheet
      [type] => text/css
    ]
It doesn't come up very often.


This actually reminds me of what you get if you use named capture groups in PHP:

    php > $link = 'http://example.com/';
    php > preg_match('#http://(?P<domain>[^/]+)/#', $link, $matches);
    php > print_r($matches);
    Array
    (
        [0] => http://example.com/
        [domain] => example.com
        [1] => example.com
    )
There's some utility to it, but it can provide unexpected results if you blindly iterating through the match array (though I can't see any reason to do so if you know what offsets you want).


Maybe it's just me but I rarely need the ordering, seems a big waste to carry this unneeded overhead.


I finally found a use for mixed string/int keys.

I have a routine that plucks key/value pairs out of one associative array and builds another, and allows you to rename the resulting key names at the same time. Using mixed keys allows you to be more terse if you don't want to change the name:

$json = $data->valuesForKeyPaths(array( 'foo' => 'path.to.foo', 'bar', 'baz' => 'path.to.baz', ));


Good find. But you could easily do it with lists and tuples in many other languages, for example in Python:

json = data.valuesForKeyPaths([('foo', 'path.to.foo'), 'bar', ('baz', 'path.to.baz')])


Perl used to let you do this, btw, but they moved away because apparently some denial of service issues were found regarding hashmaps, and so there are security reasons not to. I think you can tell Perl still you don't want that security and it will behave in this way. THere are other ways to do this only on some hashes but they have something of a performance penalty.


IIRC a Perl hash has never been in the order of key add order, but rather that two hashes, with its keys added in the same order, returned their keys in the same order, even in separate executions. Now hash key order is always "random".

Tie::IxHash [1] is available on Cpan.

[1] https://metacpan.org/pod/Tie::IxHash


.Net has OrderedDictionary as well now.


Given no other information, I'm also in the "disagree with your unique opinion" camp.

What are the use cases that you would need an ordered dictionary?


I just used one to make a sorted map between names of colors and their RGB values. The order isn't necessary, but it is far more convenient than pulling out the keys and values as a list and then sorting them.


When I have a dictionary of dictionaries, I want them in a particular order.




Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | DMCA | Apply to YC | Contact

Search: