Hacker News new | past | comments | ask | show | jobs | submit login
Solving Problems Using Data Structures (variadic.me)
20 points by variadic on Mar 6, 2013 | hide | past | web | favorite | 14 comments

There is actually an LRU map implemented in Java, though it is sort of hidden in the LinkedHashMap using a special constructor.


One issue with that is that it doesn't expel older entries. The capacity specified in the constructor is just the initial capacity.

from collections import deque

class LRUDict(dict):

    """ LRU queue with dictionary lookups based on the Python Cookbook RingBuffer recipe.
     A deque-based ringbuffer expires entries past size_max.
    Doesn't allow deletion for obvious reasons.

    def __init__(self, size_max):
        self.queue = deque()
        self.size_max = size_max-1

    def append(self, value):
        if len(self) >= self.size_max:
            self.append = self._full_append

    def _full_append(self, value):
        return super(LRUDict, self).__delitem__(self.queue.popleft())

    def __repr__(self):
        return "%s(%s)" % (self.__class__.__name__, dict.__repr__(self))

    def __setitem__(self, key, value):
        if key not in self:
        return super(LRUDict, self).__setitem__(key, value)

    def __delitem__(self, key):

Wouldn't this be better implemented using a standard priority queue where the score is related to the timestamp of insertion?

My thought exactly. I don't get his example. But maybe it's a speed issue. Adding items to a Dictionary might be faster than updating a timestamp on an object.

That doesn't make sense. The fundamental operations on a queue is push and pop. If you pop from the queue, you remove it from the queue. So on page reload, the 5 latest viewed entries would be gone. You popped them.

A queue is fundamentally wrong for this type of use case.

I think they are implying you only pop() when you are doing the replacement (Trim). The rest of the time you Sink and Swim depending on how the LRU was done. IE: every time you get() an item you set the access time to now() and then call sink/swim to fix the ordering. This would give logn time to get and set.

See here for a very good writeup of priority queues and variants. http://algs4.cs.princeton.edu/24pq/

(not hugely familiar with Java so this stuff might be wrong) The implementation presented has some bad timing issues (List.Contains is a linear search). I think List.Remove is also a linear search. As well this implementation doesn't seem to do Point (1).

The OP is talking about a list with at most five items in it. Linear search is not an issue. Using a dict may even be slower than just searching a 5 element array linearly...

I wonder if this is just a naming thing... the terms "min-heap" (or max-heap) and "priority queue", while not identical, are often used interchangeably. So if you replace priority queue in the comments above with "min-heap", it makes perfect sense, and would be the right data structure.

From my point of view, the min-heap is appropriate to pop/peek the min value (O(1)), but how do you 'peek' the first 5 values for instance?

This is a valid min-heap: (5 ((7 (9, 10)), (8 (11, 12)))). Where in (a (b, c)) b and c are children of a. If you want to peek(5) you have to:

1. peek 5

2. peek the min of 7 and 8 (7)

3. now you have to peek the min of 9, 10 and 8.

4. it's 8, now you have to peek the min of 9,10,11 and 12

The thing is, the only guarantee you have is that a root node of a tree or subtree is smaller than its children. You know nothing of a sibling subtree.

I think maintaining a simple list is the simplest way of doing that, and to speed up lookups for existing items, an index (map item->index) can be maintained. Or the other way around, keep the objects in a map and the keys in the order list. Considering a size n=5 it might even be reasonable to do everything with a simple array and iterate over it.

A queue implies the items leave the data structure, that doesn't fit the use case.

Can't items leave when getting kicked out by a new item?

Why not use a Set and a doubly linked list?

I'd use a sorted set.

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact