Production Twitter on one machine? 100Gbps NICs and NVMe are fast

BeefWellington · on Jan 7, 2023

I'm going to preface this criticism by saying that I think exercises like this are fun in an architectural/prototyping code-golf kinda way.

However, I think the author critically under-guesses the sizes of things (even just for storage) by a reasonably substantial amount. e.g.: Quote tweets do not go against the size limit of the tweet field at Twitter. Likely they are embedding a tweet reference in some manner or other in place of the text of the quoted tweet itself but regardless a tweet takes up more than 280 unicode characters.

Also, nowhere in the article are hashtags mentioned. For a system like this to work you need some indexing of hashtags so you aren't doing a full scan of the entire tweet text of every tweet anytime someone decides to search for #YOLO. The system as proposed is missing a highly critical feature of the platform it purports to emulate. I have no insider knowledge but I suspect that index is maybe the second largest thing on disk on the entire platform, apart from the tweets themselves.

trishume · on Jan 7, 2023

Quote tweets I'd do as a reference and they'd basically have the cost of loading 2 tweets instead of one, so increasing the delivery rate by the fraction of tweets that are quote tweets.

Hashtags are a search feature and basically need the same posting lists as for search, but if you only support hashtags the posting lists are smaller. I already have an estimate saying probably search wouldn't fit. But I think hashtag-only search might fit, mainly because my impression is people doing hashtag searches are a small fraction of traffic nowadays so the main cost is disk, not sure though.

I did run the post by 5 ex-Twitter engineers and none of them said any of my estimates were super wrong, mainly just brought up additional features and things I didn't discuss (which I edited into the post before publishing). Still possible that they just didn't divulge or didn't know some number they knew that I estimated very wrong.

BeefWellington · on Jan 8, 2023

I think the difficult part would be that tagging and indexing the relationship between a single tweet and all of its component hashtags (which you would then likely want metrics on to avoid needing to count indexes on, etc.) is where it would really start to inflate.

Another poster dug into some implementation details that I'm not going to go into. I think you could shoehorn it into an extremely large server alongside the rest of your project but then you're looking at processing overhead and capacity management around the indexes themselves starting to become a more substantial part of processing power. Consider that for each tweet you need to break out what hashtags are in it, create records, update indexes, and many times there's several hashtags in a given tweet.

When I last ran analytics on the firehose data (ca. 2015/16) I saw something like 20% of all tweets had 3 or more hashtags. I only remember this fact because I built a demo around doing that kind of analytics. That may have changed over time obviously, however without that kind of information we don't have a good guesstimate even of what storage and index management there looks like. I'd be curious if the former Twitter engineers you polled worked on the data storage side of things. Coming at it from the other end of things, I've met more than a few application engineers who genuinely have no clue how much work a DBA (or equivalent) does to get things stored and indexed well and responsively.

panarky · on Jan 8, 2023

Twitter has full-text search, not just hashtags.

Also, the big data storage isn't text, it's images and videos.

sulam · on Jan 8, 2023

You’re missing metadata in your size estimates.

mr90210 · on Jan 7, 2023

[flagged]

Aeolun · on Jan 7, 2023

I’m not sure why you are asking him to do the thing you’ve literally quoted him doing?

literallyroy · on Jan 7, 2023

I believed he was saying the author should run his idea of indexes on disk taking up a lot of space by the engineers.

lelandfe · on Jan 7, 2023

If an inspector reviews your house and finds no issues, that is indeed evidence of absence.

ninth_ant · on Jan 8, 2023

This is critically wrong, and misses the point of the cliché entirely.

Absence of evidence, in your case via a clean building inspection, does not mean the building is safe. It just means the checklist of known items was considered and nothing bad found.

Ask a building inspector if their clean report proves nothing is wrong with the building.

They will be firm and quick to inform you that it’s not a warranty — anything not checked was not covered. Items not covered could still be significant problems.

That’s the whole point of the saying. Absence of evidence is not evidence of absence.

lelandfe · on Jan 8, 2023

I believe you have conflated "proving a negative" with "evidence of absence".

saagarjha · on Jan 8, 2023

Sure, but it’s not definitive evidence.

quickthrower2 · on Jan 7, 2023

But evidence is not necessarily proof.

lelandfe · on Jan 7, 2023

Sure, but if someone accuses your house of having issues, and you retort that you've had it inspected by professionals, a reply of "Hah! That's evidence, not proof!" is just a bit smarmy.

pdpi · on Jan 7, 2023

A few weeks ago there was in incident[0] in Jersey, where some people called fire fighters one evening because they could smell gas, the fire fighters didn’t find any leaks, and the building literally blew up the next morning. Experts make mistakes, and failing to understand that evidence != proof can literally kill people. Sometimes, making the distinction is smarmy; other times, it’s just being sensible.

0. https://news.sky.com/story/amp/jersey-tower-explosion-questi...

yjftsjthsd-h · on Jan 8, 2023

Okay, but... we're spit balling database sizes. None of this is safety critical, or even in the general neighborhood of things where it's important enough to go and mathematically prove that our numbers are perfect.

icedchai · on Jan 7, 2023

Not necessarily. The inspector could be corrupt, bad at their job, whatever.

867-5309 · on Jan 7, 2023

>Absence of evidence is not evidence of absence.

>possible that they just didn't divulge

busymom0 · on Jan 8, 2023

I don’t think that hashtags are a search only feature. In the posts themselves, the hashtags are clickable to view other tweets. I don’t think that qualifies as a search.

mattbillenstein · on Jan 8, 2023

It does strike me as a feature you'd typically serve out of some sort of search index since if you had to build search, you'd essentially get indexing of hashtags "for free"

busymom0 · on Jan 8, 2023

You are probably right and I am wrong. I just looked at a tweet and clicking the hashtag takes to the search page with that hashtag typed in. Probably implemented similarly behind the scenes. Though hashtag most likely does an exact match search instead of fuzzy searching for regular words and phrases.

herval · on Jan 8, 2023

it does case matching (#hashTag === #hashtag === #HashTag) too

vidarh · on Jan 8, 2023

Biggest problem with this is the lack of considering analytics.twitter.com and ads.twitter.com. Twitter stores event data about everything that happens to a tweet, and lets you target ads with a lot of precision.

While some of those writes may well be acceptable to lose, letting you write to caches, effectively you need to assume there are more analytics events triggering writes to something than there are tweet views.

wtallis · on Jan 8, 2023

A Twitter-like service that fits on a single server could probably get by with the reduced revenue that comes with not offering obsessively fine-grained analytics and ad targeting.

poslathian · on Jan 8, 2023

Best comment. And one you don’t hear nearly enough from typical product and eng managers.

tinyspacewizard · on Jan 8, 2023

Ehhh a large company would much prefer:

    $$$$$$ income on $$$ costs

to

    $$$ income on $ costs

gonzo · on Jan 8, 2023

Perhaps. But VCs prefer the first.

icedchai · on Jan 8, 2023

True. You'd also save a ton of less operations and engineering staff.

Running anything on a single server, however, is really a non starter for anything remotely serious. What do you do if you need to do an OS update? I suppose you could just never do those, like a former employer (1000+ day uptimes...)

kijin · on Jan 8, 2023

Compare the cost of operating multiple servers, on one hand, with the lost revenue from having weekly or monthly maintenance windows during which you just put up a Fail Whale page. Most people overestimate the latter by a huge margin.

IshKebab · on Jan 8, 2023

That's fine if your service is really local - you can do it at night. Not really an option for a global site. Imagine if Twitter went down for a few hours every month. People are addicted to Twitter. It might be at a critical time for an entire country (e.g. the Queen dies). Even worse you can't guarantee how long the upgrade will take.

You'd definitely need at least two servers. But I think you could surely just have simple master/slave replication and switch between them.

kijin · on Jan 8, 2023

Yeah, I was just responding to the "how about OS updates" part of the parent comment, for which scheduled downtime is a reasonable option. To protect your service from unscheduled downtime though, like a failing RAID array, you would need at least two servers.

Personally I wouldn't run a critical service on only one server, but two servers? Definitely doable. I actually have a service running on two servers in different DCs 700 miles apart. Zero downtime in 9 years. :)

femiagbabiaka · on Jan 8, 2023

Twitter actually used to have downtime during certain updates IIRC. The state department asked them to skip one such maintenance in service of some Iranian protests in 2009. I highly doubt that the era of US ops in twitter has ended, so downtime is probably a nonstarter.

Source: https://foreignpolicy.com/2009/06/16/state-department-interv...

Sebb767 · on Jan 8, 2023

> You'd also save a ton of less operations and engineering staff.

You absolutely would not. The cost of having developers put out extremely optimized code (due to the scaling limits) and cuddle that single server to never, ever fail easily eclipses the cost of a having a multiple servers by a few orders of magnitude.

EDIT: To the downvoters, I'd really love to see the calculation on how engineering time would be cheaper than buying a second server in any reasonable timeframe.

icedchai · on Jan 8, 2023

I was comparing the costs to building/operating on a single server to the real Twitter. Twitter is a massive, distributed, hugely complex system, which requires a large team to build, maintain, and operate. That costs $$$. Both for people and servers.

(You are right, servers are cheap compared to employee costs though.)

glass3 · on Jan 8, 2023

Twitter is not serious enough that there cannot be a timeout.

With a second OS partition, the server can alternate between the working copy and a copy that is updated in a VM. For a free service, customers cannot complain even if there is a reboot every day and the service is down for a couple of minutes.

Realistically, there would be a mirroring server to be prepared for hardware failures. One server can be restarted while the other is the main server.

Sebb767 · on Jan 8, 2023

> With a second OS partition, the server can alternate between the working copy and a copy that is updated in a VM. For a free service, customers cannot complain even if there is a reboot every day and the service is down for a couple of minutes.

Are you seriously suggesting that a service (the size of Twitter, no less) has an acceptable downtime of a few minutes a day?

> Realistically, there would be a mirroring server to be prepared for hardware failures. One server can be restarted while the other is the main server.

But for that mirroring, you need to replicate disk writes, databases, backups etc.. This additional load would easily bring the server to a point where a single server would no longer suffice, even an insanely spec'ed one.

vidarh · on Jan 8, 2023

> has an acceptable downtime of a few minutes a day?

We kinda know the answer to this: Twitter was struggling with harm to its reputation for a long time because of regular Fail Whales. It absolutely was a huge problem for them at the time.

glass3 · on Jan 8, 2023

There is a difference between open-ended, unscheduled downtime and a known 5 minute reboot window. If I can choose between an ad-free service that reboots at noon and a downtime-free service with ads, I would choose the ad-free service.

Who needs Twitter to be a service without any downtime?

serverholic · on Jan 8, 2023

I think about this often. Specifically, how much bloat exists in the world because individuals in our society are forced to justify their existence on a daily basis.

Everyone has to be employed so it's better to keep adding more crap to products and make those products disposable in order to give people a job.

serial_dev · on Jan 8, 2023

If your team is full of engineers whose specialty is running clusters with 1K+ machines, you'll get solutions that need clusters with 1K+ machines.

anticristi · on Jan 8, 2023

Not to mention that your business would end up being more profitable by avoiding GDPR fines. Most users in the EU would click "Reject all" anyway, so basically that code is not needed.

throwayyy479087 · on Jan 8, 2023

This is true from the POV of capitalism but not institutions. This would be a hyper profitable but politically unsustainable business.

teaearlgraycold · on Jan 8, 2023

Analytics data can easily be 90% of your data.

social_quotient · on Jan 8, 2023

Phrased differently for effect. Analytics data can be 9x larger than core data.

mrtksn · on Jan 7, 2023

Twitter's "linked" tweets seems to be strangely unattached from context.

What I mean is, Twitter seems to be processing data based on whatever it is in the tweet and doesn't maintain some grand coherent database.

So I changed my Twitter handle and opened a new account with my original Twitter handle and to my surprise, I was receiving notifications of engagement with tweets my old account sent previously.

I also heard that a method for spamming Twitter trending topics is to send tweets and delete them quickly.

My impression is that Twitter is big on real time processing. They definitely don't search the entire database for #YOLO tweets, instead they seem to be searching the almost-live stuff and some archived stuff(probably ranked and saved as noteworthy tweets or something).

madeofpalk · on Jan 8, 2023

Old stuff on twitter is weird. Tweets seem to eventually forget that You specifically liked a tweet, and will allow you to like a tweet again, and the like count will reflect you liking it twice.

kaba0 · on Jan 8, 2023

Like count is almost surely just a locally incremented number, your like request will eventually get processed and will be dropped if it was already liked. It only has to be an eventually consistent value.

madeofpalk · on Jan 8, 2023

This is beyond optimistic updates (which twitter also has plenty of issues with - you'll reply to your tweet and it'll jump immediately to two replies, but on refresh just show your one).

Xeoncross · on Jan 7, 2023

> I think the author critically under-guesses the sizes of things (even just for storage) by a reasonably substantial amount.

While true, and not to take away from the parent comment, I've noticed that the size of things is often partially the result of scaling out horizontally. Most companies I've worked at end up with a lot of duplicate records as each subsystem might want a copy or to cache a copy.

vidarh · on Jan 8, 2023

This is indeed a problem, and one of the reasons to be careful about knowing you can fit things in one machine before taking that approach, because the moment you're forced to move to a distributed model it's rarely one machine to two but one machine to a dozen coupled with a major rearchitecturing effort at just the wrong moment.

It's often fine to start without a fully decoupled system (net present value of the time and money needed to scale out might be far too high), but you need to know whether or not it's likely to come and what to look for so you can start preparing in time.

zxcvbn4038 · on Jan 8, 2023

This is absolutely the sort of thing I wish more developers did - and I think the good ones already do. Most of what you find in blogs will work just fine at 1 request per second (OMG! 1M Hits!!) or 10 requests per second (and I think someone did post their “how I scaled to 10 million hits per month” blog to Hacker News once which sounds impressive until you do the math) but when you get into thousands of requests per second you really do need to understand the network stack, the different storage tiers, your choice of algorithms, how to interact with CDNs, etc. a lot more then any blog will tell you.

When interviewing developers I always ask them what is the largest public web site they ever worked on and then probe about performance issues they encountered and how they resolved them in order gague how far along they are in their skill development.

I would never plan to run a production service on a single server just because coordinating changes in the active dataset among two or more production servers often changes your design significantly, and you want to plan for that because the consumer grade hardware we all use has a nasty habit of not working after power cycles (which still tends to be the most strain a system goes through, even in a world of SSD storage).

api · on Jan 7, 2023

I didn't get the impression that this would duplicate the entire functionality of Twitter, just what amounts to the MVP functionality. If you are only talking about the MVP it's at least somewhat plausible with a lot of careful engineering and highly efficient data manipulation.

Adding images, videos, other large attachments, rich search, and all the advertising and billing and analytics stuff would blow this out of the water, but... maybe not by as much as people think...? I would not be surprised if a very performance-engineered version of Twitter could run on a few dozen racks full of beefy machines like this with HPC-grade super-fast fabric interconnects.

I have a strong sense that most large scale systems are way less efficient than what's possible. They trade development ease, modularity, and velocity for performance by using a lot of microservices, flabby but easy and flexible protocols (e.g. JSON over HTTP), slow dynamic languages, and layers of abstraction that could be integrated a lot more tightly.

Of course that may be a rational trade-off if velocity, flexibility, and labor costs and logistics matter more than hardware, power, or data center floor space costs.

BeefWellington · on Jan 8, 2023

    I didn't get the impression that this would duplicate the entire functionality of Twitter, just what amounts to the MVP functionality. If you are only talking about the MVP it's at least somewhat plausible with a lot of careful engineering and highly efficient data manipulation.

I agree mostly. Where I differ in that I would argue that hashtags were THE thing that Twitter is most known for but that could be a perspective from having been on the platform for forever and a day and I recognize not everyone may make that same association anymore.

saagarjha · on Jan 8, 2023

FWIW my opinion of hashtags (relatively new to Twitter) is that they’re only used by brands and mostly cringe people

jwmoz · on Jan 9, 2023

Originally they were the main feature I'd say. Now they've been deprecated by the algorithm that surfaces and recommends stuff.

dspillett · on Jan 8, 2023

The hashtag index is not going to be any bigger than the tweets storage though, and may be significantly smaller, so this part is not if by an order of magnitude (even a binary one). Assuming something like a common SQL database is used for storage there would be a tags table (one row per unique tag, tag string plus a numeric identifier, indexed by both which bloats the size a bit but it'll still be small) and a link table (one row per tag in a message, containing the tag and message ids). Even if using 64-bit IDs because you don't want to fall over at 2 thousand million messages (or 4, if your DB supports unsigned types or you start at MININT instead of zero or one) then that structure is going to consume about 32 bytes per tag per message (plus some static overheads and a little more for non-leaf index pages of course). In theory this could be the same size as the messages table or even larger (if most messages contain many small tags), but in practise it is going to be significantly smaller.

Yes, this would be big enough to need specifically factoring into a real implementation design. But it would not be big enough to invalidate the proposed idea so I understand leaving it off, at least initially, to simplify the thought experiment.

Similarly to support a message responding to, or quoting, a single other you only need one or two NULLable ID references, 16 bytes per message, which will likely be dwarfed by the message text in the average case. Given it likely makes sense to use something like SQL Server's compression options for data like this the average load imposed will be much smaller than 16 bytes/message.

We are fiddling, fairly insignificantly, measurable but to massively, with constants a & b in O(a+bN) here, so the storage problem is still essentially of order O(N) [where N is total length of the stored messages].

chippiewill · on Jan 8, 2023

> I have no insider knowledge but I suspect that index is maybe the second largest thing on disk on the entire platform, apart from the tweets themselves.

I'd probably go as far to say that the indexes _generally_ at twitter could be larger than the tweets

gorgoiler · on Jan 8, 2023

Isn’t a hashtag just another kind of user account — an account from which anyone can post?

The data structures for the @BeefWellington timeline of tweets and the one for the #BeefWellington timeline of tweets could look roughly the same.

bithaze · on Jan 8, 2023

Hashtags aren't like user accounts, no - they're strings that are part of a tweet. In theory, a separate data structure shouldn't be needed since you can just search the full text of tweets, but in practice, I don't know how that scales for the number of all-time tweets.

HyperSane · on Jan 8, 2023

I think he was talking about how hashtags are implemented.

walrus01 · on Jan 7, 2023

> I have no insider knowledge but I suspect that index is maybe the second largest thing on disk on the entire platform

I really wonder how much of a challenge this is and how much it occupies, not even talking about disk, but continuing the theoretical exercise in the linked URL, you can get 1U size servers with 2TB of RAM these days.

knorker · on Jan 8, 2023

I would be surprised if indexes were not larger than the raw tweets.

Text search, hashtag index, some structured data for popular tweets, etc...

In order to deliver search results I would not be surprised if tweets are duplicated/denormalized, for quick search/lookup.

redbell · on Jan 8, 2023

> I think the author critically under-guesses the sizes of things (even just for storage) by a reasonably substantial amount.

I want to add another concept that may impact, considerably, the storage, which is "threads". I'm not sure what is the percentage of threads/tweets but what I consider an important factor is that threads do not have a maximum number of characters.

aetimmes · on Jan 8, 2023

(Disclaimer: ex-Twitter SRE)

> There’s a bunch of other basic features of Twitter like user timelines, DMs, likes and replies to a tweet, which I’m not investigating because I’m guessing they won’t be the bottlenecks.

Each of these can, in fact, become their own bottlenecks. Likes in particular are tricky because they change the nature of the tweet struct (at least in the manner OP has implemented it) from WORM to write-many, read-many, and once you do that, locking (even with futexes or fast atomics) becomes the constraining performance factor. Even with atomic increment instructions and a multi-threaded process model, many concurrent requests for the same piece of mutable data will begin to resemble serial accesses - and while your threads are waiting for their turn to increment the like counter by 1, traffic is piling up behind them in your network queues, which causes your throughput to plummet and your latency to skyrocket.

OP also overly focuses on throughput in his benchmarks, IMO. I'd be interested to see the p50/p99 latency of the requests graphed against throughput - as you approach the throughput limit of an RPC system, average and tail latency begin to increase sharply. Clients are going to have timeout thresholds, and if you can't serve the vast majority of traffic in under that threshold consistently (while accounting for the traffic patterns of viral tweets I mentioned above) then you're going to create your own thundering herd - except you won't have other machines to offload the traffic to.

MichaelZuo · on Jan 8, 2023

What do you think about his interesting comment on the possibility of a mainframe?

"I also didn’t try to investigate configuring an IBM mainframe, which stands a chance of being the one type of “machine” where you might be able to attach enough storage to fit historical images."

It seems theoretically possible it could accomodate the entirety of Twitter in 'one machine'.

aetimmes · on Jan 8, 2023

It depends on what you (or OP) mean by "one machine".

There was a HPC cluster at Princeton when I worked there (which, looking at their website, has since been retired) that was assembled by SGI and outfitted with a customized Linux unikernel that presented itself as a single OS image, despite being comprised several disparate racks of individual 2-4u servers. You might be able to metaphorically duct-tape enough machines together with a similar technique to be able to run the author's pared-down scope within a single OS image.

With respect to the IBM z-series specifically - if the goal of the exercise is to save money on hardware costs, I'm imagining purchasing an IBM mainframe is in direct opposition to that goal. :) I'm not familiar enough with its capabilities to say one way or the other.

MichaelZuo · on Jan 8, 2023

Perhaps the hardware cost would be higher with one big mainframe, but there could be many subtle advantages that would combine to reduce the overall cost, since personnel costs make up a huge chunk of total opex.

I don't have enough experience to say whether having the entirety of Twitter sit in one really big metal box would be perceived to be sufficiently advantageous or not.

aetimmes · on Jan 9, 2023

I think there are several TCO issues you'd run into here:

- vendor lock-in: anyone who has worked at a shop running Sun SPARC machines when they got purchased by Oracle can speak to the pain involved with negotiating software licenses or hardware support contracts with the Only Game In Town.

- the price/scarcity of mainframe talent: you're going to have to pry IBM z-series experts away from banks who are paying 50-100% over market rate, oftentimes in straight cash, to maintain systems that are propping up the United States economy in its entirety. Not to mention - my first job out of college >10 years ago had a mainframe, and I was incredulous that _anyone_ still had or needed one in the 21st century. Now I can appreciate the specific tradeoffs being made that caused the business to choose a mainframe, but attracting top-tier cost-effective junior dev talent out of college becomes several orders of magnitude more difficult once the word "mainframe" leaves your recruiters' lips.

- scalability: in the event that you ever decide to add features or functionality (or, say, increase your tweet character limit by an order of magnitude), you have now committed yourself to scaling your systems in units of mainframes costing millions per unit, as opposed to servers costing five figures per unit (not to mention, you probably need a dev environment that's airgapped from your prod environment, which means yet _another_ mainframe...)

- build vs. buy: using the same commodity x86_64/ARM hardware and Linux kernel that everyone else is using allows you to take advantage of all of the open-source datacenter software being built for that happy-path profile. The minute you stray from that path, the engineering-hour cost of everything you do has the potential to skyrocket, because you can't use anything off-the-shelf and need to recompile everything for z/Architecture. In fact, based on some cursory web searches, it doesn't appear that you can compile the Rust toolchain to even _run_ on z/OS as of today, so at minimum, OP would be committing to implementing that.

But at the end of the day, the constraining resource in every software organization I've encountered has been engineering hours, and by choosing a mainframe you're drastically limiting the potential number of engineering hours available to you in the employee market.

MichaelZuo · on Jan 9, 2023

That's the tradeoff I was referring to.

Whether or not removing several intermediary layers of abstraction, and the commensurate 100x (?) boost in efficiency, between users tweeting a hashtag and the actual electrons vibrating, is worth taking on the significant constraints you've enumerated.

deterministic · on Jan 9, 2023

What are your thoughts on the fact that Twitter is still functioning fine after thousands of engineers have left?

aetimmes · on Jan 9, 2023

It sounds like those engineers did good work.

_zachs · on Jan 9, 2023

Thanks for the insight! At a high-level, how did Likes work when you were at Twitter? Were a certain amount of Like requests batched then applied at the DB level at the same time to ease writes?

HAL3000 · on Jan 8, 2023

> OP also overly focuses on throughput in his benchmarks

Because OP is a junior developer, he reads a lot of theory and blog posts, does a lot of research, but doesn't have much practical experience. Just look at his resume and what he wrote. As a result, most of what he write about is based on what he have read about senior developers doing in the companies he have worked for, perhaps he created some supporting software for core services but did not design or implemented the core, so he doesn't have firsthand experience. This is evident to anyone who has actually used DPDK (which is ridiculous proposal for Twitter like service in 2023 where you have XDP and io_uring, it's not HFT), designed and implemented high volume, low latency web services and knows where the bottleneck is in that kind of services from experience, theory will not give you that intuition and knowledge.

jameshart · on Jan 8, 2023

Getting everything onto one machine works great until... it no longer fits on one machine.

You add another feature and it requires a little bit more RAM, and another feature that needs a little bit more, and.. eventually it doesn't all fit.

Now you have to go distributed.

And your entire system architecture and all your development approaches are built around the assumptions of locality and cache line optimization and all of a sudden none of that matters any more.

Or you accept that there's a hard ceiling on what your system will ever be able to do.

This is like building a video game that pushes a specific generation of console hardware to its limit - fantastic! You got it to do realtime shadows and 100 simultaneous NPCs on screen! But when the level designer asks if they can have water in one level you have to say 'no', there's no room to add screenspace reflections, the console can't handle that as well. And that's just a compromise you have to make, and ship the game with the best set of features you can cram into that specific hardware.

You certainly could build server applications that way. But it feels like there's something fundamental to how service businesses operate that pushes away from that kind of hyperoptimized model.

seanhunter · on Jan 8, 2023

It's sort of strange you have to make these points, but as an industry we seem to have an extremely short memory.

Vertical scaling was absolutely the way most big applications were built up until well into the 90s. Companies like Oracle were really built on the fact that getting performance and reliability out of a single highly-contested massive server is hard but important if that's the way you're going. Linux became dominant primarily because horizontal scaling won that argument and it won it pretty much exactly because of:

1)what you said - you hit a hard cap on how big you can make your main server at which point you are really screwed. Scalability pain points become a hard wall.

2) when I say "server" I mean "servers" of course because you'd need an H/A failover, at which point you've eaten the cost of replication, handling failover etc and you may as well distribute

3) cost. Because hardware cost vs capability is exponential, as your requirements become bigger you pretty rapidly hit a point where lots of commodity hardware becomes cheaper for a given performance point than few big servers

So there's a reason that distributed systems on commodity hardware became the dominant architectural paradigm. It's not the only way to do it, but it's a reasonable default for many use cases. For a very high-throughput system like twitter it seems a very obvious choice.

Clearly there are costs to distribution, so if you can get away with a simpler architecture then as always Occam's razor applies. Also if you can easily distribute later then it probably makes sense to leave that option open and explore it when you need it rather than overcomplicate too early.

Joeri · on Jan 8, 2023

The thing is that hardware scales faster than humanity. When the internet boom happened there was no choice except to scale horizontally to reach a global audience, but as this article points out that assumption might no longer hold true for many services. It might make sense to return to vertically scaled highly reliable servers to achieve software simplicity and a lower overall cost.

I’m always reminded of how stackoverflow essentially runs off a single database server. If they can do it, most web properties can do it.

jameshart · on Jan 8, 2023

When the hardware scales, the tricks to wring maximum performance out of it change.

When you come back to build the new version of your game on the next gen console sure you can now add all those features but the processing pipelines are different now and the disk performance and memory to cache ratios have changed - getting your hyper optimized code to work in the new platform takes a ton of effort - so you either run it in some kind of emulation mode, sacrificing some of the performance for productivity, or you rewrite it.

Same happens with new generations of server hardware. Your clever hack to maximize NUMA locality of data to each core becomes a liability when the next hardware gen comes out and on-die caches are bigger. Decisions about what should use RAM get invalidated by faster SSDs.

Maybe you can build a service this way - hardware first. Pick a server platform for a couple of years; build to that capacity; ship; then start designing the next gen service to run on a new set of hardware?

To some extent this is how database or virtualization systems software is written. And if I’m not mistaken Twitter actually did develop their own database stack to optimally handle their particular data storage model, and I assume that was done pretty close to the metal.

saagarjha · on Jan 8, 2023

Stack Overflow happens to go down fair maintenance a fair bit. Now, it’s not really a service that might need reliability like Twitter does*, but it’s important to keep in mind.

*Some engineers may disagree

brazzledazzle · on Jan 8, 2023

Also worth pointing out that Stack Overflow's Microsoft-centric architecture may also incentivize them to maximize vertical scaling to save on the licensing cost overhead horizontal scaling would incur for at least part of their stack.

kaba0 · on Jan 8, 2023

Most, sure. Twitter? No way on Earth.

popcorncowboy · on Jan 8, 2023

It's less "short memory" than the fact that you can be a "senior software engineer" after just 5 years or so experience. There is a significant cohort of (particularly web-tech) developers who were young children in the 90s, and whose professional careers started in the 2010s and have only ever known "the cloud", big-tech and big-tech tech (k8s, etc).

It's a similar phenomenon to the observation that tech "innovations" tend to recapitulate research that had its roots in the 50-60-70s.

"The industry" doesn't seem to put much stock in generational knowledge transfer.

britannio · on Jan 8, 2023

I'm a CS undergrad, do you have any recommendations for effectively combating this?

wrigby · on Jan 8, 2023

No approach is a silver bullet here, but what I've found effective is to seek out friendships with / mentorship from senior and staff-level engineers. They'll have tons of war stories from 10-20 years ago, and may even have some snarky opinions on what "new technologies" are just re-inventions of something that the industry had already solved decades ago.

As a current undergrad, you can also look to your professors for this (especially those with industry experience before they went into teaching).

After graduation, this may mean working at a company _with_ those older engineers, as opposed to a 5-20 person startup with a homogeneous group of 20-somethings.

jameshart · on Jan 8, 2023

It’s gotta be tough though for young devs to tell the difference between old fogeys just dismissing new tech because it’s new, vs because it’s a recapitulation of an old mistake.

And the thing is that what was a bad idea in 2000 might now be an idea whose time has come, because the surrounding context has changed - be it browser technology or the size of machine memory or the capabilities of programming languages.

So, like, when I point out that kubernetes is just DCOM all over again I’m not actually dismissing kubernetes (just because we don’t use DCOM any more doesn’t mean it wasn’t a good choice then); less still suggesting that we should go back to using DCOM; I’m just saying ‘maybe there are some lessons we can learn from how people used DCOM back in the day about what cases kubernetes is suited for and what the pitfalls might be’. And, also, maybe raising the possibility that in a few years time we will look back at kubernetes as a bloated outdated approach and be glad to see the back of it - even though right now it might be a great technology to use.

But I’m not sure how a new junior dev can possibly pick up all that nuance from just listening to old farts like me talking about how this reminds us of how we used to do things back in the old days.

switchbak · on Jan 8, 2023

Wouldn't DCOM be more akin to Corba or RMI? I'm failing to see the analogy to Kubernetes.

jameshart · on Jan 8, 2023

Independently deployed components, connected up to discoverable queues and data providers, relying on a registry for discovery and load balancing… there’s a lot more in common than you’d think.

switchbak · on Jan 8, 2023

Not the OP but the courses I took on operating systems (writing a simple one, in C) have yielded lessons that I've used throughout my career. If you can find a course too challenge you, and it's taught well, that should provide a ton of the learning that us old neckbeards couldn't avoid back in the day.

Forcing yourself to use barebones languages and environments is good too. Hacking on ancient machines or targeting embedded hardware is another good way to get a better intuition on the order of magnitude performance differences of various approaches.

Reading about promising tech of the past is also useful: prolog, Smalltalk, etc. Lots of inspiring and fruitful lessons to mine there.

vidarh · on Jan 8, 2023

> And your entire system architecture and all your development approaches are built around the assumptions of locality and cache line optimization and all of a sudden none of that matters any more.

Indeed, with the hyperoptimized version here, the moment you tip over into two machines each machine will need two copies of every tweet from anyone who has followers sharded to both machines, so the capacity of two machines is going to be far less than twice the capacity of one as a large proportion of tweets will cause writes on both shards. This inefficiency will now always be with you - the average number of writes per user per tweet will go up until your number of shards approaches average follower counts.

This is why it's common to model this with fan-out on write, because the moment you accept that there is a risk you'll tip over into a sharded model you need to account for that. If asked the question of such a design, it's worth pointing out that if you can guarantee it fits on one machine, and this is true for many more problems than people expect, then you can save a lot, but then I'd set out the more complex model and contrast it to the single-machine model.

You don't need to fan-out to every account even in such a distributed system, certainly. You can fan-out to every shard/instance, and keeping that cached in RAM would still allow you to be far more efficient than e.g. Mastodon (which does fan-out to every instance for the actual post data, but relies on a Postgres database)

p-e-w · on Jan 8, 2023

> You certainly could build server applications that way. But it feels like there's something fundamental to how service businesses operate that pushes away from that kind of hyperoptimized model.

That "fundamental" thing is the cultural expectation that SaaS offerings constantly grow in features, rather than in reliability or performance. As your example from the world of video games demonstrates, there is no industry-wide belief that things must be able to do ever-more, forever. It's really mostly SaaS and desktop software that has this weird and unreasonable culture around it. That's why your word processor can now send emails, and your email provider now does translations as well.

nkellenicki · on Jan 8, 2023

You're not taking into account data, you're only talking about features. What about when the data no longer fits on the one machine? Or processing the data exceeds the capacity of the machine?

Data growth through user growth or just normal day-to-day usage is expected.

p-e-w · on Jan 8, 2023

If Twitter's data can fit on one machine, then the data of 99.99% of companies can. Not every product needs a billion users with Gigabytes of storage each. The assumption that if your startup's tech isn't scalable enough to become the next Google then it's the wrong tech is hilarious nonsense driven largely by ego fantasies.

nkellenicki · on Jan 8, 2023

It does not fit on one machine. Tweets alone generate petabytes of data a year, and other events are petabytes per day.

https://blog.twitter.com/engineering/en_us/topics/infrastruc...

https://ankush-chavan.medium.com/twitter-data-storage-and-pr...

dmitriid · on Jan 8, 2023

> Tweets alone generate petabytes of data a year

Nope. It's not Tweets that generate that data. It's the insane amount of (mostly unnecessary) noise that gets thrown into the mix: analytics, logs, metrics, you name it.

Every time you scroll Twitter sends multiple events to the server. That alone will generate a large chunk of those petabytes.

nkellenicki · on Jan 8, 2023

No, that's the second link - generated data, separate from tweets.

Tweets alone generate petabytes of data a year.

https://ankush-chavan.medium.com/twitter-data-storage-and-pr...

Also, many people would disagree that stuff required to run a business is "mostly unnecessary".

jrk · on Jan 8, 2023

No, they don't. In spite of the confusing wording in the post you cite, its petabytes/year claim is not derived from the 500m tweets/day claim – it must include metadata and/or multimedia.

This was all already derived (correctly) in the original post. Recapitulating:

500m tweets/day * (conservatively) 512B/tweet * 365 days/yr ~= 90 TiB/yr

Assuming compression and variable-length encoding of this long tail in colder storage, it's more likely <20 TiB/yr (<=115B/tweet on average)

Yes, this excludes analytics metadata, which as you suggest would not support Twitter's current ad products. But your core repeated claim about tweets alone is two orders of magnitude off.

Sohcahtoa82 · on Jan 9, 2023

> 500m tweets/day * (conservatively) 512B/tweet * 365 days/yr ~= 90 TiB/yr

I wonder if the "Petabytes" figure being claimed includes pictures/videos that can be attached to a Tweet. In that case, I could easily see "Petabytes/year" be accurate.

mapme · on Jan 8, 2023

Twitters data cannot fit on one machine. In 2015 their Hadoop cluster was 30 PB per earlier comments/their blog. How do you fit that on one machine?

vidarh · on Jan 8, 2023

Many of us will remember that Twitter in fact did start out with a monolithic database and had to rewrite a bunch of stuff when they couldn't make that work anymore.

Of course they could fit a much larger dataset on one machine today.

(But I will note the article is also assuming a chronological timeline by default, but that of course hasn't been true for years - the ranking Twitter does now is far more complex)

gnuvince · on Jan 8, 2023

There's a story by Bryan Cantrill [1] about how he went to Twitter to help them understand why it would take 400 milliseconds of compute to process a request (I'll leave the reveal to Bryan). Scaling horizontally is probably necessary for something the size of Twitter, but that doesn't mean that we can half-ass the code and just throw more machines at the problem. If we write code with a bit more mechanical sympathy and avoid the latest non-proven fads, we can surely write software that is 10x faster and requires much less scaling.

[1] https://www.youtube.com/watch?v=LjFM8vw3pbU&t=3240s

vidarh · on Jan 8, 2023

Oh, absolutely. I think in terms of making people think about putting things in-process in RAM and being a bit more imaginative at looking at machines sizes as worth exploring is a good thing, and the article is interesting as a look at what you can do.

Many things can plausibly fit on a single machine irrespective of uptake (e.g. I worked on a system not long ago where even if we cornered the entire global market and the market expanded several magnitudes in size, our entire working set would still fit comfortable in memory).

But even when you need to scale, it absolutely can scale better if you're willing to not automatically resort to a standard database stack for everything for example.

kaba0 · on Jan 8, 2023

Twitter used to (not sure how much did that idiot fck it up) be quite great from this perspective from what I gathered. They had technical blog posts and are even featured in the famous data intensive book as an example for many scaling problems. Sure, they didn’t write it in assembly but instead use Java/Scala with Graal, but architecturally they had a sound system (plus what people routinely forget, the system should handle the general load, but it worths nothing if it fails at peaks, and twitter can easily hit a billion daily users during big global events)z

gnuvince · on Jan 8, 2023

Fixed timestamp: https://youtu.be/LjFM8vw3pbU?t=1613

joshspankit · on Jan 8, 2023

Weird criticism but o.k.

Edit: Unless I missed something, the author never argued that Twitter should be hosted on one machine and therefore criticizing the “fun stunt” like this makes no sense to me

jameshart · on Jan 8, 2023

There are certainly people reading this and nodding and thinking ‘yeah, this makes sense! Why don't we build services like this?’ And adding it to their mental list of arguments against microservices or whatever - and I wanted to make sure people hear the reasons why this kind of performance maximization tends not to be the norm.

eismcc · on Jan 8, 2023

You’d end up synchronizing feature releases to Moores law. Which while it sounds untenable there are large corporations that continue to use monolithic approach and vertical scaling.

Ingaz · on Jan 8, 2023

I think that the main point of OP is that it's possible to serve production load using just one server.

I did not looked into source code yet but I suppose that OP if not implemented already than there should ideas for implementation.

In addition: from my POV implementation of scaling for such service should be trivial: - sharding of data between instances by a criteria (e.g. regional) or by hash - configure network routing

I think it should work

Winsaucerer · on Jan 8, 2023

How about smaller projects that might be hugely successful, but will never be remotely close to a Twitter level of success?

It’s interesting to see how much can be done with a single machine, because most projects will never be this big.

Though there will still be other concerns like redundancy to deal with.

Ingaz · on Jan 8, 2023

Can you explain me what exactly meant by "success of Twitter"?

It's not sarcasm, I have twitter account but I never understood hype about twitter.

I see nothing in twitter from technical POW, closed twitter protocol looks very strange, they banned Trump, they were profitable in 2021, Elon Musk bought them for ~44 bln.

Maybe sellout of company with problems for such price is success.

nyanpasu64 · on Jan 8, 2023

Video games have had 3D water for decades before screen-space reflections, and many look serviceable decades later (Super Mario Sunshine looks great at 480p though dated at higher resolutions).

jameshart · on Jan 8, 2023

Curses. That undermines my argument completely. It wasn’t merely a random example of how you have to compromise on features to fit in the available hardware budget, but actually the premise upon which my entire argument rested.

TacticalCoder · on Jan 8, 2023

TFA, to me, touches about something I've wondered about a very long time ago: what are the implications of CPU and storage growing at much faster rates than human population?

Back in the 486 days you wouldn't be keeping, in RAM, data about every single human on earth (let's take "every single human on earth" as the maximum number of humans we'll offer our services to with on our hypothetical server). Nowadays keeping in RAM, say, the GPS coordinates of every single human on earth (if we had a mean to fetch the data) is doable. On my desktop. In RAM.

I still don't know what the implications are.

But I can keep the coordinates of every single humans on earth in my desktop's RAM.

Let that sink in.

P.S: no need to nitpick if it's actually doable on my desktop today. That's not the point. If it's not doable today, it'll be doable tomorrow.

Waterluvian · on Jan 8, 2023

In undergrad I had a bonus GIS lab assignment to complete the task using the prof’s instructions from the 80s.. maybe 90s? (a lot of FORTRAN) and then complete it again using modern GIS software. Such an eye opener. The thing that stuck out the most to me was how many hoops we jumped through to batch the job and commit the incremental results to disk because, bahahahha, fitting even 1% of it in RAM was out of the question.

pornel · on Jan 8, 2023

Thanks to Snowden's leaks we know one of those is surveillance.

From scanning every message of every person, it's going to expand to recognizing every face from every camera, and transcribing and analyzing every spoken word recorded.

narag · on Jan 8, 2023

TFA, to me, touches about something I've wondered about a very long time ago...

It was a little more than ten years ago for me. I realized that a hard disk could store a database of every human alive, including basic information, links (family relations) and maybe a photo.

I still don't know what the implications are.

Maybe we don't want to know, but it's not really that difficult to think about.

doublepg23 · on Jan 8, 2023

Is the storage really the complex part? Isn't gathering the actual information and avoiding errors (ex: I have a co-worker who's name is incorrectly spelled three different ways in prod services) the hard part?

withinboredom · on Jan 8, 2023

Data hygiene is a problem everywhere. Most companies I worked at just throw out 'bad data' and file a bug. Occasionally, if the data is a secondary source, it will be recreated from a primary source (assuming the primary source is still available).

In the particular case, your coworker would be stored by some identifier (like an SSN or similar) and their name would be stored as "aliases" and allow multiple names. I have two nicknames that I answer to, depending on when in my life you met me, and my family calls me by my proper name. Online, I go by several handles depending on whether I want the reader to be able to figure out my real name. I even used to work somewhere where I was called by this handle (withinboredom) more than my real name.

narag · on Jan 8, 2023

Is the storage really the complex part? Isn't gathering the actual information and avoiding errors (...) the hard part?

Not for someone that works in consulting, or at least it wasn't. I remember that I had Access access to the production database that stored all the customers, present and pass. They wouldn't give me a password to that, but apparently they thought it was safe to enter the password without me looking so I could try some queries and test my last fixes.

Not sure if it still works the same, but I did some dumb query, left the computer on and, next day morning, a temporal file was in my %TEMP% with a lot of data of millions of persons worldwide. Had I be so inclined, with an external hard disk I could have started my homegrown NSA project.

Now think of this: how many times have you heard that the data of millions of customers were on sale after a data breach? Do you doubt that, let's say, China has every single person in the West on file?

Our own governments have us legally or semi-legally (exchange) anyway.

kaba0 · on Jan 8, 2023

Hold it in memory? Sure. But actually working with it besides copying it around and doing some operation on a “row-by-row” basis? Not really.

adam_arthur · on Jan 9, 2023

The implication is that scalability problems in software will get easier and easier over time, and far fewer developers will be needed to maintain these systems.

Which is already largely true today with the advent of serverless. Most maintenance work can center around application logic rather than scaling physical machines/maintaining versioning.

It's clear that many modern applications would take an order of magnitude more people to run even just 20 years ago. That trend will only continue

saalweachter · on Jan 8, 2023

A bacteria has order of 50 billion atoms. (Eukaryotic human cells, 100 trillion.)

That's getting to the point you could store 20 bytes per atom in a terabyte.

(The big bottleneck is that you need picosecond resolution simulation steps and to cover minutes to see a protein fold.)

hacker_9 · on Jan 8, 2023

Wait until you hear about DNA

moeny · on Jan 8, 2023

's/(human|desktop)/smartphone/g'

untech · on Jan 8, 2023

Using two 32 bit numbers for coordinates, each record would take 8 bytes, which is 64 gigabytes for 8 billion population. Don’t think many smartphones have this RAM today.

Dylan16807 · on Jan 8, 2023

The planet has 2^47 square meters of surface, so more like 6 bytes.

Plus you can group together people in the same area and/or sort positions as integers and store only the deltas between them, so you can probably get down to 2-3 bytes per person.

And you can get dozens of models of smartphone with 16GB of RAM right now. So there might be a gap there but it's a very small gap. The phone of tomorrow will have the RAM.

Edit: Thinking about it more, with 2^33 people and 2^47 locations the average delta would be 2^14, and it's pretty easy to guarantee that fits into 2 bytes per person. And with a more accurate world population count you'd free up at least a gigabyte for your phone to actually operate with.

renonce · on Jan 8, 2023

You can build that phone. Phones made an engineering decision to put more battery rather than more RAM, and it's just a matter of putting more LPDDR5 chips onto the circuit board.

cellis · on Jan 8, 2023

Don't you need 3 numbers? Unless you believe in a flat earth ;). Also you need some slack space for metadata. Let's call it 100GB all in.

withinboredom · on Jan 8, 2023

GPS altitude is notoriously bad via consumer devices. It's better to store altitude as less than 2 bytes with the range from being a few km below sea level to the maximum altitude consumer devices will report (3,000 km -- though maybe less because there is also a speed limit at which they will stop reporting too so you can't just buy a GPS device and build a guided missile).

toast0 · on Jan 8, 2023

Lat long and assuming at ground gets you most of the way there.

Dylan16807 · on Jan 8, 2023

"Add it all up, and the US has around 340 billion square feet of building stock[3]. This is about 12,200 square miles, or 0.00032% of US land area"

Judging by that, you need a negligible increase in the number of locations you can represent to handle everywhere stable and off the ground someone could be. Much less than one bit per person.

If you want to deal with people currently in airplanes then you could give them an extra couple bytes. It's less than a million people so it won't affect your total storage at all.

ilyt · on Jan 8, 2023

Most people don't live in caves or in the sky city, 2 is enough.

And in places you care (multi-floor buildings) you aren't getting GPS signal inside anyway...

sethev · on Jan 7, 2023

John Carmack tweeted something that made me noodle on this too:

>It is amusing to consider how much of the world you could serve something like Twitter to from a single beefy server if it really was just shuffling tweet sized buffers to network offload cards. Smart clients instead of web pages could make a very large difference. [1]

Very interesting to see the idea worked out in more detail.

[1] https://twitter.com/id_aa_carmack/status/1350672098029694998

threeseed · on Jan 7, 2023

> just shuffling tweet sized buffers to network offload cards

Except that's not what it is doing at all.

It assembles all the Tweets internally, applies an ML model to produce a finalised response to the user.

nimish · on Jan 8, 2023

Great, staple a few ML accelerators to your NIC. Nvidia sells them! You could build an entire supercomputer style setup 100% optimized for Twitter data movement and computation with COTS hardware IMO.

I strongly doubt that entire datacenters need to be used if and only if Twitter obsessively optimized for hardware usage efficiency over everything else. In reality they don't and make some pretty big compromises to actually get stuff built. Hardware is cheap, people are not.

threeseed · on Jan 8, 2023

a) No one has said that Twitter doesn't use ML accelerators.

b) No one has said Twitter operates entire data centres.

c) You need more than just NICs and ML accelerators to built a Twitter timeline. You need to rank the content, determine appropriate ads and combine them together. You can't do that in your network card.

pg314 · on Jan 8, 2023

> No one has said Twitter operates entire data centres.

They do though. 3 in the US alone (https://www.datacenterdynamics.com/en/news/report-elon-musk-...).

nimish · on Jan 9, 2023

Good news, Nvidia will now sell you a GPU+NIC+CPU with competitive performance: https://developer.nvidia.com/blog/accelerating-data-center-a...

Twitter operates a handful of datacenters because their scale is such that it makes sense.

seritools · on Jan 7, 2023

> if it really was [which it isn't]

threeseed · on Jan 7, 2023

But many people including the OP think it is.

It’s like me running a web crawler on my phone and saying I can replace Google.

Dylan16807 · on Jan 8, 2023

If you were able to index the same amount of content you'd have a damn good alternative. And that's the level of this experiment.

threeseed · on Jan 8, 2023

It is trivial to index the same amount of content as Google as the web is mostly static.

The hard part is in being able to translate a search query into a list of pages.

And that requires a level of sophistication that far exceeds a laptop.

Dylan16807 · on Jan 8, 2023

It is not trivial at all. That is a huge amount of scraping.

Any machine that can do that will be a similar spec to what you need for serving queries. Not as fast as google does it, but a good amount of them.

threeseed · on Jan 8, 2023

It is trivial compared to the task of translating a user query into search results.

It's akin to saying the magic behind OpenGPT is the dataset.

Dylan16807 · on Jan 8, 2023

Keyword search is not that hard. Pagerank is not that hard.

The comparable goal to the article is to be a search engine, not to fight google for best results.

threeseed · on Jan 8, 2023

> Keyword search is not that hard. Pagerank is not that hard.

George Hotz, is that you ?

Dylan16807 · on Jan 8, 2023

Keyword search and pagerank had working solutions decades ago.

Hotz was trying to make a car controller that had never been done before, by himself, and then he wanted to """improve""" search with no explanation of what that meant that I saw.

I think if he was tasked with taking twitter from no search to "has a search" he probably could have managed it. A team of five people definitely could have managed it.

ilyt · on Jan 8, 2023

I think you severely underestimate the effort search engines have to put to filter out all the SEO garbage and come up with some sensible results

Dylan16807 · on Jan 8, 2023

I don't underestimate the effort, I claim that it's not strictly necessary to have more than a couple people working on it for a minimum viable product, and less for a beta.

seritools · on Jan 8, 2023

What? The OP covers much more than "just shuffling tweet sized buffers to network offload cards"

to11mtm · on Jan 8, 2023

Maybe it should be, though?

I sometimes wonder how much value ML provides vs a proper sort function for anything but advertising.

threeseed · on Jan 8, 2023

It's an easy thought experiment.

Do you think that the most successful web companies in the world with arguably the best people i.e. Amazon, Facebook, Instagram, TikTok, LinkedIn, Pinterest, Youtube, Netflix, Snapchat etc. have no idea what they are doing. That the highly complex, expensive and latency impacting recommendation systems could be replaced by trivial sorting.

Or maybe they do work, do translate to increased usage and do significantly impact revenue.

fzingle · on Jan 8, 2023

Maybe they are considering that we shouldn't build and optimize our society solely for the purposes of maximizing revenue.

Would it be better to live in a world where Twitter (for example) existed because it is a useful thing and not because it might make lots of money?

afrodc_ · on Jan 8, 2023

Doesn't it lose a lot of money and its usefulness is directly correlated with its current massive usage.

saagarjha · on Jan 8, 2023

It ran about break even until very recently.

brazzledazzle · on Jan 8, 2023

A lot of people don't realize that Dunning-Kruger can catch any of us unaware. It's easy to look at a problem, think about the surface level challenges you'd have building it and come to the conclusion that you could do it better or simpler.

vidarh · on Jan 8, 2023

Try changing your ordering on Twitter to chronological and see how much you miss.

It's not just ads, it means the set of people you follow becomes extremely critical for your experience in a way that makes it far less engaging.

That may be good or bad for you as a user depending on what you want, but for Twitter having most people stick to the ML augmented timeline is essential to keep you hooked.

PaulDavisThe1st · on Jan 8, 2023

which arrives at a browser running Tweak New Twitter as a browser extension, and that strips the response to just what the user actually wants to see. Effficiency!

varjag · on Jan 7, 2023

Isn't that what an OPA sorta kinda does.

drewg123 · on Jan 7, 2023

How much bandwidth does Twitter use for images and videos? Less than 1.4Tb/s globally? If so, we could probably fit that onto a second machine. We can currently serve over 700Gb/s from a dual-socket Milan based server[1]. I'm still waiting for hardware, but assuming there are no new bottlenecks, that should directly scale up to 1.4Tb/s with Genoa and ConnectX-7, given the IO pathways are all at least twice the bandwidth of the previous generation.

There are storage size issues (like how big is their long tail; quite large I'd imagine), but its a fun thing to think about.

[1] https://people.freebsd.org/~gallatin/talks/euro2022.pdf

cortesoft · on Jan 7, 2023

It is way more than 1.4TBs a second globally.

_zoltan_ · on Jan 7, 2023

in this specific discussion it's very important to use Tb and TB correctly.

Gigachad · on Jan 7, 2023

This is never going to happen. If I want to indicate it's correct I'd write Tbit/s or Tbyte/s. Otherwise its a coin flip if TB and Tb has been used correctly.

jrockway · on Jan 8, 2023

I like /s for bytes and ps for bits. 100Mbps = 100 million bits per second. 100MB/s = 100 million bytes per second. (The capitalization is important of course. I tried writing some examples where it's not preserved and it's too weird. I tend to not use capital letters on things like Slack, but for bytes, you just have to. The difference between milli and Mega can also be important, but since nobody talks about the negative powers with bandwidth, you are probably OK if your shift key breaks.)

xyzzy123 · on Jan 7, 2023

I wonder how much is api traffic and how much is assets & images.

koolba · on Jan 7, 2023

I wonder how much of that is crypto spam bots replying to each other.

skywhopper · on Jan 7, 2023

Can you detect and block those spam bots with less effort than it would take to process them?

koolba · on Jan 8, 2023

That’s a not so simple calculation as comparing the raw request process time isn’t the complete picture. Spam bot content must be persistently stored, indexed, archived, etc so the long term cost is much more than the one off POST to create the entity.

You’d also have to quantify the improved user experience from seeing less spam v.s. inflated ad revenue for garbage views / content.

JosephRedfern · on Jan 7, 2023

I suppose that in practice you'd need to consider burst bandwidth and not just 95/99 percentiles.

quickthrower2 · on Jan 7, 2023

… HN thread that reinvents CDN …

thorncorona · on Jan 8, 2023

Twitter is big on real time though. Each user gets served their own feed in real time.

icedchai · on Jan 8, 2023

This doesn't seem like an issue. You can store images / other assets on the CDN, and serve the real time stuff from the origin without caching.

habibur · on Jan 7, 2023

He will be up for surprise.

HTTP with connection: keep-open can serve 100k req/sec. But that's for one client being served repeatedly over 1 connection. And this is the inflated number that's published in webserver benchmark tests.

For more practical down to earth test, you need to measure performance w/o keep-alive. Request per second will drop to 12k / sec then.

And that's for HTTP without encryption or ssl handshake. Use HTTPS and watch it fall down to only 400 req / sec under load test [ without connection: keep-alive ].

That's what I observer.

trishume · on Jan 7, 2023

I agree most HTTP server benchmarks are highly misleading in that way, and mention in my post how disappointed I am at the lack of good benchmarks. I also agree that typical HTTP servers would fall over at much lower new connection loads.

I'm talking about a hypothetical HTTPS server that used optimized kernel-bypass networking. Here's a kernel-bypass HTTP server benchmarked doing 50k new connections per core second while re-using nginx code: https://github.com/F-Stack/f-stack. But I don't know of anyone who's done something similar with HTTPS support.

jck · on Jan 8, 2023

I once built a quick and dirty load testing tool for a public facing service we built. The tool was pretty simple - something like https://github.com/bojand/ghz but with traffic and data patterns closer to what we expected to see in the real world. We used argo-workflows to generate scale.

One thing which we noticed was that there was a considerable difference in performance characteristics based on how we parallelized the load testing tool (multiple threads, multiple processes, multiple kubernetes pods, pods forced to be distributed across nodes).

I think that when you run non-distrubuted load tests you benefit from bunch of cool things which happen with http2 and Linux (multiplexing, resource sharing etc) which might make applications seem much faster than they would be in the real world.

lossolo · on Jan 7, 2023

TLS handling would dominate your performance, kernel bypassing would not help here unless you would also do TLS NIC offloading, you still need to process new TLS sessions from OP example and they would dominate your http processing time (excluding application business logic processing).

sayrer · on Jan 7, 2023

Userspace networking is pretty common. The chair of the IETF even wrote one: https://github.com/NTAP/quant

"Quant uses the warpcore zero-copy userspace UDP/IP stack, which in addition to running on on top of the standard Socket API has support for the netmap fast packet I/O framework, as well as the Particle and RIOT IoT stacks. Quant hence supports traditional POSIX platforms (Linux, MacOS, FreeBSD, etc.) as well as embedded systems."

pixl97 · on Jan 7, 2023

And I would say real life Twitter involves mostly cell phone use where we see companies like Google try to push HTTP/3 to deal with head of line issues on lossy connections. Serving at the millions of hits per day on lossy networks is going to leave you with massive numbers of connections that have been abandoned but you don't know it yet. Or connections that are behaving like they are tar pitted and running at bits per second.

hinkley · on Jan 8, 2023

Vertical scaling doesn't have to be a single machine. You can do a lot with a half dozen machines split for different responsibilities, like we did in the 90's and 00's. Database, web servers, reverse proxy.

ilyt · on Jan 8, 2023

That's so low overhead compared to everything else needed that it is near-irrelevant

saagarjha · on Jan 8, 2023

I don’t believe Twitter ever got around to rolling out HTTP/3 to their clients.

lossolo · on Jan 7, 2023

> Use HTTPS and watch it fall down to only 400 req / sec under load test [ without connection: keep-alive ].

I'm running about 2000 requests/s in one of my real-world production systems. All of the requests are without keep-alive and use TLS. They use about one core for TLS and HTTP processing.

habibur · on Jan 8, 2023

Fascinating. Any special optimization you are using, or is it from off the shelf software and with standard configuration?

kijin · on Jan 8, 2023

Sounds totally off-the-shelf.

I have a basic LAMP server running on a 4-core VM on a laptop. I just threw ApacheBench at it (not the fastest benchmarking tool, either -- it eats up 1 core all by itself), and it handles 1200 req/s TLS with no keepalive, and 3400 req/s with keepalive. This stuff scales linearly with core count, so I wouldn't be surprised to see much higher numbers in real servers.

tpetry · on Jan 8, 2023

Are these all new TLS connections? Because most benchmarks use TLS resumption which means the TLS handshake was only done once!

lossolo · on Jan 8, 2023

In my case, all of TLS connections are new and from real clients (not a benchmark/test)

kijin · on Jan 8, 2023

AFAIK apachebench is so old it doesn't support TLS resumption, but I might be wrong.

brazzledazzle · on Jan 8, 2023

Is AB running on the same machine during the tests?

kijin · on Jan 8, 2023

Same machine, pegging a CPU core the whole time. So the server side only has 3 cores left to use.

lossolo · on Jan 8, 2023

For offloading SSL I use haproxy with some custom settings and a few non standard kernel settings.

brazzledazzle · on Jan 8, 2023

Is this static or dynamic content? Are the simulated load test clients requesting the exact same pages/resources?

lossolo · on Jan 8, 2023

All dynamic content, all hitting data storage. There are no simulated clients, this is all real traffic from real clients, a lot of requests do writes, some do only reads.

JanisErdmanis · on Jan 8, 2023

I guess that the biggest chunk of a slowdown from TLS comes due to group operations alone. So wouldn't it be practical to configure TLS for session resumption and limit the number of handshakes per second it could do?

ilyt · on Jan 8, 2023

using what ? That numbers are on low side even for my old desktop

summerlight · on Jan 8, 2023

I think many people in this thread are making the mistake of ignoring evolutionary factors in system engineering. If a system doesn't need to adopt or change, lots of things can be much more efficient, easier and simpler, likely the order of 10x~100x. But you gotta appreciate that we're all paid because we need to swap wheels on running trains (or even engines in flying airplanes). A large fraction of demand for redundancy, introspection, abstraction and generalization comes from this.

Why do we want to apply ML at the cost of a significant fleet cost increase? Because it can make the overall system consistently perform against external changes via generalization, thus the system can evolve more cheaply. Why do we want to implement a complex logging layer although it doesn't bring direct gains on system performance? Because you need to inspect the system to understand its behavior and find out where it needs to change. The list can go on and I can give you hundreds of reasons why we need all these apparently unnecessary complexities and overheads can be important for systems' longevity.

I don't deny the existence of accidental complexities (probably Twitter can become 2~3x simpler and cheaper given sufficient eng resource and time), but in many cases you probably won't be able to confidently say if some overheads are accidental or essential since system engineering is essentially a highly predictive/speculative activity. To make this happen, you gotta have a precise understanding of how the system "currently works" to make a good bet rather than re-imagination of the system with your own wish list of how the system "should work". There's a certain value on the latter option, but it's usually more constructive to build an alternative rather than complaining about the existing system. This post is great since the author actually tried to build something to prove its possibility, this knowledge could turn out to be valuable for other Twitter alternatives later on.

ilyt · on Jan 8, 2023

> A large fraction of demand for redundancy, introspection, abstraction and generalization comes from this.

Sure, you need to invest into it but those are things you can reuse for every app and feature you build.

And those are not the reason why those systems are so complex, those are just ways to keep complex systems running and manageable. In most they also do not stand in the way of making system better but help in it.

They need to exist because the architecture of system grew organically from smaller system over and over again and big restructurization was deemed not worth it. It's "just have a bunch more hardware and engineers" vs "we're not delivering features and we might not get rewrite right".

And every time you throw money at the problem the problem becomes a bigger problem and potential benefits from "getting it right" are also getting bigger. But nobody wants to be herald that tells management "we 're going to spend 6-12 months" for somethinkg that have few years of pay-off

jasonhansel · on Jan 7, 2023

If you really wanted to run Twitter on one machine at any cost, wouldn't an IBM mainframe be much more practical?

You can even run Linux on them now. The specs he cites would actually be fairly small for a mainframe, which can reach up to 40TB of memory.

I'm not saying this is a good idea, but it seems better than what the OP proposes.

trishume · on Jan 7, 2023

My friend mentioned this just before I published and I think that probably is the fastest largest thing you can get which would in some sense count as one machine. I haven't looked into it, but I wouldn't be surprised if they could get around the trickiest constraint, which is how many hard drives you can plug in to a non-mainframe machine for historical image storage. Definitely more expensive than just networking a few standard machines though.

I also bet that mainframes have software solutions to a lot of the multi-tenancy and fault tolerance challenges with running systems on one machine that I mention.

jiggawatts · on Jan 7, 2023

> which is how many hard drives you can plug in to a non-mainframe machine for historical image storage.

You would be surprised. First off, SSDs are denser than hard drives now if you're willing to spend $$$.

Second, "plug in" doesn't necessarily mean "in the chassis". You can expand storage with external disk arrays in all sorts of ways. Everything from external PCI-e cages to SAS disk arrays, fibre channel, NVMe-over-Ethernet, etc...

It's fairly easy to get several petabytes of fast storage directly managed by one box. The only limit is the total usable PCIe bandwidth of the CPUs, which for a current-gen EPYC 9004 series processors in a dual-socket configuration is something crazy like 512 GB/s. This vastly exceeds typical NIC speeds. You'd have to balance available bandwidth between multiple 400 Gbps NICs and disks to be able to saturate the system.

People really overestimate the data volume put out by a service like Twitter while simultaneously underestimating the bandwidth capability of a single server.

ilyt · on Jan 8, 2023

> People really overestimate the data volume put out by a service like Twitter while simultaneously underestimating the bandwidth capability of a single server.

It's outright comical. Above we have people thinking somehow amount of TLS connections single server can handle is a problem, in service where there would be hundreds of thousands lines of code to generate the content served over it, all while using numbers from what seems like 10+ years old server hardware

trishume · on Jan 7, 2023

That's really cool! Each year of historical images I estimate at 2.8PB, so it would need to scale quite far to handle multiple years. How would you actually connect all those external drive chassis, is there some kind of chainable SAS or PCIe that can scale arbitrarily far? I consider NVMe-over-fabrics to be cheating and just using multiple machines and calling it one machine, but "one machine" is kinda an arbitrary stunt metric.

ksec · on Jan 8, 2023

It depends on how you think of "one machines". :) You can fit 1PB in 1U without something like NVMe-over-fabrics. So in a 4U unit gives you plenty of room.

We have Zen4c 128 Core with DDR5 now. We might get a 256 Core Zen6c with PCI-E 6.0 and DDR6 by 2026.

I really like these exercise of trying to shrink the amount of server needed, especially those on Web usage. And the mention of Mainframe. Which dont get enough credit for. I did something similar with Netflix 800Gbps's post. [1] Where they could serve every single user with less than 50 Racks by the end of this decade.

[1] https://news.ycombinator.com/item?id=33451430