
Twitter says they fixed the Memcache calcification problem. Dormando disagrees. - diwup
https://github.com/twitter/twemcache/issues/2
======
zxypoo
Just to clarify things since I helped write the initial blog post about
twemcache... in our original post, we had an error regarding the slab
calcification problem we mentioned, this problem ONLY applied to our v1.4.4
fork of memcached. After speaking with the upstream maintainers, we learned
that recent memcached versions have addressed some of these problems. These
are the type of conversations we want to have.

At the time we adopted memcached, that's the version we went with and made
sure it worked well in our production environment as we scaled as a company.
We also open sourced twemproxy [<https://github.com/twitter/twemproxy>] which
is a lightweight proxy for memcached which has worked well for us in
combination with twemcache and may work well for others too.

We just want to reiterate that twemcache has worked well for our unique
environment and any teams evaluating memcached should try all their try all
their options, just like any other piece of software you adopt in your stack.

One of the reasons of open sourcing our work was to share our ideas with the
memcached community to see what worked well for us and help everyone. For
example, this is also how we treat our work with our MySQL fork
[<https://github.com/twitter/mysql>] which we maintain in the open and have
signed an OCA with Oracle to help get work pushed upstream so everyone
benefits in the long run.

------
alttab
I enjoyed the jab at twitter's "we are the only website on the planet to have
scaling issues" holier than thou attitude. Rails is still trying to get over
the character assassination by twitter when they failed to scale it. I know
first hand that rails can scale very well. Do bad carpenters blame their
tools?

~~~
timaelliott
Twitter's attitude is laughable. Yes, you have a decent amount of traffic but
you're also only dealing with ~160 uncompressed bytes (plus whatever overhead)
per event. The hurdles you've overcame aren't that particularly amazing nor
challenging.

~~~
InclinedPlane
What does content size matter? The challenge is that every single page of
content except for each individual tweet is utterly unique for every user.
That defeats the vast majority of straightforward caching implementations. You
can't cache fully rendered pages ever because the chance that one random
timeline view at a given time will be identical to any other view (even by the
same person at a different time) is pretty much as close to zero as possible.
Every view is dynamically generated content from up to several hundred or
thousand different streams of data and needs to be put in order and have all
of the per-user metadata set correctly.

Once you start looking into the actual mathematical constraints of the problem
of twitter you realize that it's a scaling nightmare. Hundreds of millions of
updates per day and tens of thousands of views _per second_ (billions per
day). There's only a few people in the world who have the right to look down
on stats like that.

~~~
burningout
Again, as the parent poster also posted, I think you have never worked on
large data. Twitter is like a big mailbox, only that every mail only has 160
bytes. This has been solved 10 years ago.

~~~
InclinedPlane
If you treat twitter like a big mailbox, things will work "ok". It's not the
worst approach ever, that's for certain. But end-user perceptible performance
would be a fraction of what twitter has today.

P.S. How many images does twitter serve up per day at present? That's a tad
more than 160 characters of data.

~~~
jasonwatkinspdx
Another key difference is that email users generally contribute directly to
their provider's infrastructure costs in providing email as a service. Email
infrastructure (and the user experience) is fragmented, and global funding
generally scales with global load.

Instead twitter must monetize via advertising of some form, and so the
percentage of folks who do not respond to ads acts as a really strong factor
in your cost calculations. In this sense, email software has it easy, and can
be extremely wasteful in the resources it consumes.

It's not just that the availability expectations of twitter are higher than
email, it's also that the economic base of the infrastructure is far more
sparse.

~~~
InclinedPlane
And yet another key difference is that there are few email server
installations that support half a billion users. Saying that the scaling
problem is "solved" because all you have to do is copy, say, gmail, is kind of
silly.

------
forgotusername
tl;dr your memcache branch fragments my consulting clients, you should use my
memcache branch instead

------
khangtoh
Apparently another Twitter engineer says it's doing 23M/s

"23 million queries per second with zero fucks given"

<https://twitter.com/timtrueman/status/222793786345013248>

------
krakensden
That response from manjuraj was kind of evil

~~~
floydprice
Has open source changed? I remember when people didn't fork projects, rebadge
them as there own and then promote them over the original.

Now don't get me wrong, I think its amazing that Twitter are opening up these
enhancements to the community, but it feels like a kick in the teeth to the
memcached folks to slap a twitter badge on it, why isn't this a collaboration
that benefits the whole community? you know, like open source used to work.

I know at 34 I'm a dinosaur in this industry but I do try to keep up with the
new way of doing things... This just feels wrong to me.

~~~
zxypoo
I think open source has changed a bit with the rise of the GitHub era. IMHO, I
think @mikeal did a good post regarding this change "Apache considered
harmful" [http://www.mikealrogers.com/posts/apache-considered-
harmful....](http://www.mikealrogers.com/posts/apache-considered-harmful.html)

Even in the old days, it's easier to fork than work with upstream. I think
these days it's just easier share those forks with services like GitHub. It
should help spread ideas and improved solutions IMHO so downstream consumers
actually benefit.

In Twitter's case, they are planning to do what works for them at the moment:
"While we initially focused on the challenging goal of making Memcached work
extremely well within the Twitter infrastructure, we look forward to sharing
our code and ideas with the Memcached community in the long term."

