

Practical NoSQL - Solving a Real Problem with MongoDB and Redis - latch
http://openmymind.net/2011/5/8/Practical-NoSQL-Solving-a-Real-Problem-w-Mongo-Red

======
pkteison
I really like seeing people's thought process as they decide to switch
something major, but if I were doing this same task, I think I would have
tried more on speeding up the sql solution before scrapping the db.

There are several ways you can calculate a rank, and some (e.g. self join) are
inherently slow. Maybe the problem is as simple as not having the right
clustered index? Or maybe sql dbs can do more optimization if you tell them to
use their rank function instead of selecting out a rownum? Postgres 8.4+, sql
server 2005+, and Oracle all have rank functions (one of the 'window
functions' in the SQL:2003 standard
[http://en.wikipedia.org/wiki/Window_function_(SQL)#Window_fu...](http://en.wikipedia.org/wiki/Window_function_\(SQL\)#Window_function)
)

~~~
teh
The critical functionality in this application seems to be online calculation,
i.e. he wants to know a player's exact rank immediately after insert. A sorted
set is good for that.

In general I agree though. If you relax the consistency requirement slightly
you can get away with regenerating the rank table every minute or so. It takes
~500ms on my machine to generate the ranked table for 500k rows on PG9.
Unlogged tables in 9.1 would probably make it faster.

~~~
latch
as sbov mentioned in another thread, the pro-RDBMS/SQL argument is that _if i
only reduce my expectation (admittedly slightly), SQL is a great choice_
doesn't seem too convincing. It's true that approach would be slightly less
complicated (only 1 store, but now you have transformation jobs).

------
ameasure
I'm all for expanding our toolsets, but this seems like the wrong tool for the
job. Relational databases are excellent at this sort of thing.

~~~
teh
I am intrigued. I've been thinking about this problem today, but I can't think
of an efficient relational implementation in current DBs. How would you do
this? I came up with the following:

1) On get_rank() do a count(*) where score < users_score;

2) Store row_number() over() [...] order by score in a lookup table and
refresh every so often. Use this for get_rank() lookups.

The problem I have is that you either have to calculate the rank on read using
a linear-time count() or on write which means updating n/2 rows on average
with their new position.

edited formatting

~~~
ameasure
Recalculating ranks for everyone every time the score changes seems like a
difficult task no matter how you approach it. I don't think you really need
that for most use cases though. Some reasonable simplifications I've seen are
don't rank anyone below a certain cutoff score, only calculate rankings
periodically, or only calculate rankings for a subset of the data.

~~~
sbov
But if there is a straightforward solution that allows you to not make these
concessions and it can do so with great performance, I'm not sure how you can
continue to claim that relational databases are the right tool for this job.
Sure, it can do it, but it seems like something else does it better.

~~~
ameasure
You're right, I don't have much of an argument for relational databases here.

I am concerned about the complexity of this approach though and I don't think
the concessions we're doing away with are that terrible. How meaningful is it
to be rank 5623 vs. rank 5249, or even rank 2519? I don't think people can
make much sense of ranks beyond 100, but maybe I'm wrong. I would rather break
rankings into smaller groups. Which is more interesting, being ranked 23,528
worldwide, or being the town champion? I think the latter is much more
meaningful to a person.

~~~
latch
limiting ranks (to 5000 for most games and 1000 for a few particular ones) is
the concession we ended up doing in v1. When we introduced the change, no one
complained (or has since) - our developers have been great.

As for complexity. The complexity of learning is a noop as far as I'm
concerned. Duplicate data? Not concerned. Managing 2 stores...slightly
concerning. Agreed. Maintenance? The entire change is largely encapsulated in
a new ~30 line Rank class. Sure, I'd rather not have that extra code.

I hadn't considered rank-groups (thanks for the idea though!). I'll definitely
give that more thought. Nonetheless, I think I rather have the raw data
available and accessible (in addition to any additional abstraction). I don't
like making decision for the developers. Maybe 100 ranks isn't enough for them
(I've seen them do some pretty creative stuff with the bit of data we're
collecting already).

------
__david__
Interesting. For <http://greenfelt.net> we have a single PostgreSQL DB and
store all our games in one big table. For the score based games (like tetris)
we do everything with Postgres's great expression indices to segregate things
into daily, weekly, and monthly high scores ("DATE_TRUNC('YEAR',date at time
zone 'UTC')"). We use Postgres's windowing functions to get the ranking
numbers.

We don't have very high traffic though--we manage roughly 3000 games/hour at
our peak times with our one DB server.

------
ericHosick
Maybe a funny question but why have the word device included as part of the
unique key? Wouldn't 1-leto, 1-paul, 3-duke and 2-jess been ok entries for the
unique key?

~~~
latch
we don't use the actual word "device"..it's just a placeholder for the sample.
Our actual unique key is a hash of the real device id (which is a UUID) + the
name. This means 2 users with the same name on different devices have, or 2
users with different names on the same device, are treated differently.

The approach has worked well.

