

Ask: How to use memcached to handle a large multi-to-multi relationship? - angus-t

Ask: How to use memcached to handle a large multi-to-multi relationship ?<p>We are using MySQL and memcached. 
We are going to implement a feature "User subscription", which allow our users to subscribe our diff. category of news.<p>There are two main data query.
1) Get list of User subscribed / unsubscribed this Category. &#60;&#60;It mainly used in Cron Job
2) Get all subscribed Categories by a User<p>The problem is action 1), the data size will be very large for memcached, 
say 10 millions of user x 500 of categories. 
If we simply use CategoryId as cache key and the value possibly has millions items,  it will caused performance issue to get large cache from memcached servers.<p>We come up a method, to 
- group every X user into pages e.g. categoryA_page1=[user1, user2, ...] ; categoryA_page2=
- maintain a index in memcached e.g. user1_categoryA=page1 ; user1000_categoryA=page2<p>Please suggest if there is better solutions
======
maxpert
If I were you I would have inspired myself from iNode structure found in
ExtFS. Here is what I would have done:

Category -> Array of tuples (imagine nested array) where each tuple contains
(Key to a group, Number of elements in group).

You can use something like JSON (I won't recommend that) or MsgPack (this will
work best for you my guess) for serialising your array. Now the nifty part
comes in holding the number of elements in a group you can have fixed sizes or
you can find out average number members in a group (do the maths) combine it
with your standard deviation and make an intelligent decision. Adding requires
you looking at the last element in category if you filled the cap create a new
group and append it into category container. If you face yourself with value
size do yourself a favour and compress the values against keys ( I've two
complete blog posts to get you convinced
[http://blog.creapptives.com/post/27331500697/msgpack-vs-
json...](http://blog.creapptives.com/post/27331500697/msgpack-vs-json-in-the-
jaws-of-compression) and
[http://blog.creapptives.com/post/25026698783/speeding-up-
you...](http://blog.creapptives.com/post/25026698783/speeding-up-your-
datastore-with-compression)) use something fast like libsnappy or lz4 to
compress your values. Hope it helps!

------
seunosewa
In your testing, was MySQL too slow after proper indexing? I don't think you
need to cache that sort of query. Personalized queries are usually not the
best candidates for caching.

