Hacker News new | past | comments | ask | show | jobs | submit login

Thank you for asking, but I probably shouldn't get into that :) I'm inherently conflicted / biased due to designing a solid chunk of Tumblr's backend, and also previously worked for another Automattic competitor (Six Apart, the defunct company behind Movable Type).

Edit to avoid anyone misconstruing, I'm not trying to imply one thing or another, just that I can't approach this impartially. And in any case, I wish everyone well on both sides of this acquisition. I'm just genuinely curious how they plan to proceed from a technical standpoint, as it's a really interesting challenge.

Just curious about your just curious; what is the architecture of the current Tumblr back end?

This appears to be 6 years old. Is it still relevant?


Ehhh parts of that article were never accurate, especially the stuff about having an hbase-powered dashboard feed.

Primarily the product backend is monolithic PHP (custom framework) + services in various languages + sharded MySQL + Memcached + Gearman. Lots of other technologies in use though too, but I'll defer to current employees if they want to answer.

Fantasy big data: let's use Hadoop and Kafka!

Reality big data: Let's shard it across Mysql.

Not exactly. Tumblr has a pretty huge Hadoop fleet and decently large Kafka setup too. It's just a question of OLTP vs OLAP use-cases being powered by different tech stacks.

My answer above was limited to the product backend, i.e. technologies used in serving user requests in real-time. And even then I missed a bunch of large technologies in use there, especially around search and algorithmic ranking.

That's kind of the point though. Everyone has a Hadoop/Kafka, but when it comes to actually getting things done, good ole MySQL to the rescue.

I honestly don't see the draw for Kafka. And by all means I get it, I just don't buy it. Maybe I'm just holding it wrong or something.

It really depends on the task at hand. I'm one of the most vocally pro-MySQL commenters on HN, and have literally built my career around scaling and automating MySQL, but I still wouldn't recommend it for OLAP-heavy workloads. The query planner just isn't great at monstrous analytics queries, and ditto for the feature set (especially pre-8.0).

For high-volume OLTP though MySQL is an excellent choice.

Regarding Kafka: in many situations I agree. Personally I prefer Facebook's approach of just using the MySQL replication stream as the canonical sharded multi-region ordered event stream. But it depends a lot on the situation, i.e. a company's specific use-case, existing infrastructure and ecosystem in general.

I don't think youre quite getting my point.

Kafka is not going to replace MySQL specifically because it depends on the task at hand.

If you can't replace MySQL with Kafka, then why not just stick with whatever queue/jobs/stream infra you had before kafka. At least those solutions are quite limited in scope and easily replaceable.

At this point Kafka is a solution looking for a problem.

My feeling about Kafka is that it's a useful tool to solve the "we MUST get this data to reliable storage IMMEDIATELY" problem. And to greatly mitigate the "each item must be processed and shown to be processed, exactly once" problem.

But there are relatively few situations where that's absolutely vital. And you can solve it with good ol' SQL.

You’re being generous. Most of that article was a pipe dream that never came to fruition.

That's a really underrated statement - as a lot of "scale blogs" are often referred to as fact. I'll have to reconsider a lot of those in hindsight.

Cool, thanks for the answer.

Applications are open for YC Winter 2020

Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact