Hacker News new | past | comments | ask | show | jobs | submit login

I have a feeling some of the numbers are extremely inflated from their actual values.

0.54 petabytes of “tweet” data a day, no multimedia, seems extremely high. This also assumes no compression or LUTs, which would make it multiple orders of magnitude off.

The multimedia as well leaves me with questions, how much of it is original content? What’s the rate of deduplication (because many people post the same non-original content)?

I’m sure they work with orders of magnitude of data larger than most of us are working on, but I highly doubt that Twitter alone contributes to 5-10% of Amazon AWS total revenue.




It’s a nonsensical, clickbait article that uses trivial techniques to create ridiculous conclusions. Using AWS street prices to estimate TWTR’s on-prem costs, leaving aside their initial assumptions, is not only incompetent, it suggests they don’t have the most basic understanding of their domain.


Seems like the author just completely made up the numbers here, as well as all underlying technical assumptions. In reality, based on previous public information, Twitter runs most of their infra from their own datacenters, and the number of tweets per day hit 500 million back in 2013.

Not to mention, the author also completely ignores read traffic (which is the majority of traffic for a social network), replication/redundancy/HA, so many other things. It's just a garbage article. It appears the company hosting this blog pays randos to submit posts on any topic? https://www.cohesive.so/write-for-cohesive


Yeah there's a screwup here. The earlier calculation is 0.54 PB of data per five years, not per day, including a tripling of the cost to achieve 3-way replication.

The same mistake is made for video. But one paragraph later the author uses the per five years value as if it's per day.


The estimation for a video's average size seems nuts too

> 1% of tweets contain videos of about 100MB each

I can't imagine the average video on twitter is taking up 100MB of storage.


Do you have to multiply by 3 at all? I thought AWS/Azure already do that redundancy storage for you


Even if you assume the rest of the numbers are right, nobody in their right mind at that scale pay list prices.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: