

Maps are Broken, for Some Definition of Broken - andreyf
http://sigusr2.net/2010/Jun/22/maps-are-broken-for-some-definition-of-broken.html

======
Strilanc
Interesting fact: you can choose a random element from a sequence of unknown
size in a single pass. The cost is having to always iterate over the entire
sequence, and generating O(n) random numbers instead of O(1).

The algorithm is quite simple:

    
    
        var choice = elements.First()
        var n = 1
        for e in elements.Skip(1):
           n += 1
           if rand(n) == 0:
              choice = e
    

The algorithm is correct because, at each stage, each of the preceding
elements is chosen with probability 1/n. The invariant is maintained because
the next element is chosen with probability 1/(n+1) and therefore scales the
preceding probabilities by n/(n+1) giving 1/(n+1).

------
sprout
HashMap is implemented as a dynamic array, and you could probably get "good
enough" random performance from accessing the underlying array, picking a
random element, then iterating over that bucket and picking a random element.
In the average case this will be pretty much constant time.

I'd be a little worried about the randomness of the distribution, but if you
need a proper random distribution, you shouldn't be using a map.

------
billnapier
HashMap (unlike treeset) itself doesn't provide an order guarantee. So can't
you just grab the first element from the entrySet and use that?

But I think the bigger problem is trying to use a Map in the first place when
maybe a different data structure will solve the problem better.

~~~
apgwoz
I of course didn't talk about why I'm using a map, but as an example: consider
redis. It's a giant hash table, and it allows you to pick a random key.

------
gojomo
In his hypothetical "using imaginary iterator() on AbstractMap" O(n) solution,
he's overlooking that the Set returned by keySet() has an iterator, and
doesn't involve any reallocation/copying. So randomKey() can be:

    
    
      int index = random.nextInt(map.size());
      for(String key : map.keySet()) {
        if(index==0) return key;
        index--;
      }
    

Abstractly, maps aren't the right structure for quick/constant-time random-
choice. He could possibly choose an internal implementation that was better at
offering random-choice, but that's unlikely to be the optimal implementation
for other map uses. He could maintain a parallel structure during inserts that
allows constant-draws later. Or do one array shuffle after all inserts, before
all draws (if that fits the usage pattern).

 _(edit: code corrected per replies below)_

~~~
apgwoz
I did overlook that, so yes, this should work properly. Thanks!

One thing to note is that Random.next() is protected, so it's gotta be
random.nextInt(map.size()).

~~~
gojomo
Sorry to have referred to you in third-person; wasn't sure you were reading
this thread. Thanks for the correction; I've changed my code fragment.

------
nervechannel
Maps have an entrySet() method that returns a set of key-value Entry objects
you can iterate over. Is this what you were after?

~~~
apgwoz
keySet() is more appropriate. I just want the key.

------
lutorm
If you want random elements, why don't just store it in a simple array?

------
apgwoz
Thanks everyone for the responses. I'm a Java n00b.

------
InclinedPlane
If you want random access to a map you should write your own implementation.
This is a bit like complaining that random access to a linked list is too
slow.

Choose appropriate data structures for the task at hand, no one data structure
will fit every use.

