Hacker News new | past | comments | ask | show | jobs | submit login

I mean, its a hashmap we're talking about. I'm not sure if iterators even make sense in any regard. Things inside of a hashmap have no order at all.

And determining if a cell is full or empty in a hash-map isn't exactly efficient either (You pretty much have to visit those locations and see if a tombstone or empty symbol is there).

-------

I recognize that people want to foreach() over a HashMap, but I'm just not convinced its an efficient operation, nor a good fit for the iterator interface (especially with all this discussion of invalidations and movement...)




> I'm not sure if iterators even make sense in any regard. Things inside of a hashmap have no order at all.

Iteration doesn't necessarily imply a specified order.

> but I'm just not convinced its an efficient operation

Why does iteration have to be efficient?

> nor a good fit for the iterator interface

What would you use instead?

Iterator invalidation only comes into play when you erase elements. If you just iterate over a hashtable and, say, print the elements, you wouldn't have to worry about it at all.


In both Robin Hood and Cuckoo hashing, insertions will move elements around. That means the iterator guarantees (or: that you'd visit every element) could be broken).

So a foreach(cuckoo_table) operation may not in fact visit all elements if the table has an insert somewhere in the loop.


Well, inserting elements while iterating is never a good idea :-) This will cause problems even with std::vector.

BTW, inserting elements can invalidate iterators with any hashtable if it might cause a rehash.


That's what I'm talking about though.

    foreach(datastructure){
        datastructure.what-functions-are-legal() ??
    }
Vector: Inserts are allowed as long as element fits in vector.capacity(). Erase (which "shifts" the elements down) are allowed at the end of the vector (specifically: all locations "after" the erase are invalidated. So pop_back() is always legal)

List: All operations allowed, except for reading from an erased element.

map: All operations allowed, except for reading from an erased element.

unordered_map: ??? Under debate right now.

-------

So even if you had assurances that unordered_map.insert would work (ex: Robinhood hash, you check size() and compare to capacity(), much like a vector), the movement of data means that your iterator is basically invalidated on any insert.

So unordered_map(Robin-hood): Nothing. No erases, no insertions allowed during the foreach loop at all.

-------

This comes up in practice. The minute you have 2 threads, and a 2nd thread _might_ insert while the 1st thread _might_ be in a foreach loop, you suddenly have to have assurances for what is or isn't legal from an iterator-invalidation perspective.

I guess you could just have the entire for-each loop inside of a lock, but that seems a bit inefficient?


> So unordered_map(Robin-hood): Nothing. No erases, no insertions allowed during the foreach loop at all.

Yes, but this applies to any hashtable that might rehash on inseration (= most). In practice this is not a big problem. For example, you can just put the new elements / to be deleted keys on a std::vector and only insert/erase after the for-loop.

I don't see how this makes iteration itself problematic. You can still iterate over a std::unordered_map perfectly fine. Just don't erase/insert elements while iterating.

> I guess you could just have the entire for-each loop inside of a lock, but that seems a bit inefficient?

Yes, you would have to lock the whole for-loop. It's the same with any other data structures where the iterators can get invalidated on inseration/deletion, e.g. std::vector.

Even for containers where iterators are not invalidated, you have to be careful with fain-grained locks. For example, you can only increment the iterator while holding the lock!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: