Hacker News new | past | comments | ask | show | jobs | submit login
Processing billions of events in real time (blog.twitter.com)
200 points by 1cvmask 73 days ago | hide | past | favorite | 104 comments



> For the interaction and engagement pipeline, we collect and process data from various real-time streams and server and client logs, to extract Tweet and user interaction data with various levels of aggregations, time granularities, and other metrics dimensions. That aggregated interaction data is particularly important and is the source of truth for Twitter’s ads revenue services and data product services to retrieve information on impression and engagement metrics.

All this engineering talent and effort to track dwell times and clicks for ads. At some point, I think we need to take a critical look at the ad economy and whether continued investments actually help anybody, or rather just give certain parties competitive advantages in selling ads. Smart regulations can perhaps make ads "dumber" and drive a lot of the engineering away, as well as being a boon to privacy, while still leaving sites with a way to generate revenue from content.


Ads seem like kind of a generic and misleading term for it. It's less about showing you a flashy clip for an advertised product in the classic sense, and more about building complex personality and interest profiles on users based around their engagement patterns across sites and services. That's the real resource which organizations pay money for, especially once it gets aggregated into constantly growing targeted data collections on individual users and the increasingly monopolized share of their digital fingerprints and track records.


>about building complex personality and interest profiles on users based around their engagement patterns across sites and services

Twitter very clearly thinks I'm a doctor. I am not a doctor. They show me lots of ads that are obviously oriented towards medical professionals–many of them include the language "your patients" as well as medical jargon. My wife is not a doctor, my parents aren't doctors, the only doctors I know are casual friends and family I see at most once a year. I do not have any medical conditions, and nobody I know has any of the conditions treated by the products being advertised.

I've been using Twitter at least a few times a week for several years, I follow and engage with lots of other users, none of whom are doctors as far as I know. If the ads I'm seeing are derived from this complex personality and interest profiling, it has totally misfired. This has been the case for at least two or three years.

Several years ago a conversation about a similar topic prompted me to look at the ad targeting data Facebook had on me. At the time I'd had a Facebook account for 12 years with lots of posts, group memberships and ~500 friends. Their cutting edge data collection and complex ad targeting algorithms had identified my "Hobbies and activities" as: "Mosquito", "Hobby", "Leaf" and "Species": https://imgur.com/nWCWn63. Whatever that means.

I've managed millions of dollars in ad spend on these platforms over time, and still regard most targeted ad platforms as dancing on the edge between legitimacy and being blatantly fraudulent. They work well if you're a sophisticated buyer, but if you're not they're pretty much a hole in the internet into which you can pour money.


You are a technically competent, well versed person with stated platforms. Compared to your knowledge, the rest of our citizenry are mere peons, who have little hope of understanding, or even realising the degree with which they are tracked.

As well, most people in tech are outliers of some sort, with highly unique searches on platforms. Simply put, we are not the norm. We aren't 'norms'.

While I'm sure you're correct in stating that the accuracy of such tracking platforms is not 100%, I suspect that when it comes to those who do not understand technology? Tracking is much better, more accurate.

My position for clarity, "being able to use a phone" is not "understanding technology". Some seem to think they are tech aware, because they can navigate a phone's OS. Or use a computer for work, by using word.

These people are likely more accurately tracked, for their lack of understanding, whilst combined with their heavy usage of computing devices, makes them most susceptible to tracking.


YouTube thinks I'm a Latino who loves TikTok.


It's* an ecology-shattering waste of resources to satisfy the profit demands of a small number of globally distributed artificial legal personalities. It's a fabulously inefficient yet monetarily useful model of pernicious theft of human attention. It's bad Physics.

* It being the Advertising "Industry"


Sad to say, ad-tech is one of the more accessible spaces if you enjoy creating systems that have to dynamically scale, provide high throughput and/or near real-time streaming data solutions, while also ensuring data correctness (for a given value of correct).

Maybe fintech could offer similar challenges in some areas? I'm thinking HFT etc.

I greatly enjoyed the challenge of building data pipelines in my role at $LAST_COMPANY, an SSP, that delivered valid transactional + related entity data to a near realtime reporting system - and scaled up and down as needed to maintain data timeliness while minimising costs (as ad traffic (well internet traffic as a whole) has high seasonality throughout the day).

But I didn't enjoy working for a company in the ad-tech market though - far too many deals that, while legal, felt sleazy - and (from my limited experience) often seemed to begin with handshakes made by sales reps on booze / cocaine fueled nights in red light districts.

If I could solve similar business problems without being in a market that makes me want to have a long shower, I'd be keen.


There are probably several software engineers optimizing ads that would have otherwise made breakthroughs in some other technology, like I don't know, nuclear reactors. It is bizarre situation.


I'm pretty surprised that Twitter is moving their homegrown stack to GCP. Twitter has a storied big data platform (Storm, Summingbird, Lambda Platform, Manhattan), and they have completely moved to the cloud. I wonder how the finances worked out; I'm sure Twitter paid a lot for their bespoke platform and Google must've been competitive. It was probably also easier as I imagine a lot of the engineers who considered those projects their children may have already moved on.


For some reason I'm kind of sad seeing Twitter's (very cool) homegrown technologies in the "Old" diagram with the "New" architecture basically Google Cloud. I'm sure it makes sense internally, but it feels like the loss of an innovation center in the streaming space.


Or it's the result of some good wine, dinners and gifts. The spreadsheets justifying it can always be engineered afterwards.


The rationale is likely reminiscent of that motivating the outsourcing craze from 15 years ago.

When all you care about is the next quarterly report, it's real easy to sell the idea of getting rid of all that expensive technical competence from the company payroll, letting someone else know that stuff instead, then hire some a team of cheap oompa loompas straight outta college to spend all day writing -X-M-L- YAML instead.


I always wonder how those lucrative corporate sales happen.


maybe they got a sweetheart deal being a marquee customer. Although, not as sweet as a deal that the CME got.

> Google also makes $1B equity investment in CME Group

https://www.cmegroup.com/media-room/press-releases/2021/11/0...


Well, Google paid for my employer to migrate to GCP from AWS.


Sad to see Heron go


It looks like Kafka is by far and away the way to handle persistent logs/events at scale. AFAIK a company here in Japan called LINE has all their messaging flowing through a large kafka cluster themselves.

Wonder if anyone is running large NATS Jetstream[0]/Liftbridge[1] or Pulsar[2] (yahoo runs those) clusters. I guess Pulsar might be #2 in terms of adoption at large scale?

[0]: https://docs.nats.io/jetstream/jetstream

[1]: https://liftbridge.io/

[2]: https://pulsar.apache.org/


Pulsar is a much better fit when your architecture absolutely requires many queues ex: you need one queue per customer across 100's of thousands of customers.

This architecture certainly exists, but is a lot more burdensome and less frequent than partitioning by customer id across a Kafka topic.


Kafka is a wonderful tool. I built a few systems on top of it and all of them delivered the scale that was promised and more. With surprisingly little hardware.

I'm very hostile to a lot of hipster tech but Kafka is one of the few genuinely good pieces of tech from the whole "Big Data" craze of the past decade.


It seems weird to hear "a company here in Japan called LINE" -- LINE is big enough in Japan that it sounds kind of equivalent to "a company here in America called Discord".


I think that works one way but not the other -- America has the blessing of being the source of lots of new apps and tech companies with global success/ambitions (i.e. Discord has some penetration for gamers anywhere), but Japan is less so.

AFAIK LINE has not had such success. I wouldn't be surprised if most people in the US did not know of LINE, unless they were avid readers of TechCrunch or something it just doesn't come up.

Would be interesting to know what % of people on HN know about LINE though


I’m pretty sure LINE has more than twice the users that Twitter has. Not knowing it is like not knowing about WeChat: it’s because you’re not familiar with things outside of the US, rather than not being up-to-date with the space in general.


Never heard of LINE, but looks like they're around 85M monthly active users, twitter somewhere past 330M

https://www.statista.com/statistics/560545/number-of-monthly...


The statistics only count users in Japan. It is also used on other asia countries.


How well does Kafka handle high density data (e.g. A/V and images)? I'm scouting out systems for our computer vision pipeline and Kafka would simplify the aggregation/collimation step for marshalling to GPUs, and it would be simplest if I can just send raw frames vs some alternate transport.


I think the important thing there would be the frame size no? Clearly Kafka can handle the throughput side of things but it doesn't seem to be meant for large messages out of the box[0].

I wouldn't be surprised if it was perfectly fine though -- with compression (and all the video/image specific tricks) the file sizes should get pretty small...

[0]: https://stackoverflow.com/questions/21020347/how-can-i-send-...


Thanks. That's kinda what I figured, but wanted to sounding board it out a bit as a sanity check.

The link is a great reference by the way.

> Your API should use cloud storage (for example, AWS S3) and simply push a reference to S3 to Kafka or any other message broker.

This is more or less what I figured. We already archive to S3 anyways so switching to using it as transport would be straightforward.


> Thanks. That's kinda what I figured, but wanted to sounding board it out a bit as a sanity check.

I'm by no means a Kafka expert or a video expert of course, but glad I could serve as a rubber duck. Maybe there's some lessons to be learned from Encore?[0]

> The link is a great reference by the way.

Yeah the amount of info in there is pretty good -- feels like Kafka could definitely be tuned to do the job but maybe it's better to just start with something better attuned.

> This is more or less what I figured. We already archive to S3 anyways so switching to using it as transport would be straightforward.

Yeah I figured this is what you were trying to avoid -- the round trips to S3 to get the data to the processing would be wasteful if the data is in this case small enough to flow along the processing route. Guess it really depends on your data. I could have sworn I saw some analysis of how kafka performs versus the size of messages it must deliver...

Looks like DZone has some good content[1], LinkedIn of course[2]... Ah I finally found the one I was looking for and it's DZone[3]. All those links make mention of message size

[0]: https://svt.github.io/encore-doc/

[1]: https://dzone.com/articles/processing-large-messages-with-ap...

[2]: https://engineering.linkedin.com/kafka/benchmarking-apache-k...

[3]: https://dzone.com/articles/benchmarking-nats-streaming-and-a...


It can, but this depends on the volume and size of the topic messages. The broker and consumer will need a LOT more memory. I did this at a previous job and the GC on the broker started getting very shitty and performance was crap. Consumers were constantly getting OOM and needed bigger containers, etc. It was a bad idea and we just moved the stuff to S3.


Are they any alternative that aren't using the Java runtime?


NATS is built with golang which is one of the reasons I like it…

https://github.com/nats-io/nats-server


I’d love to hear more about why they deployed Google-managed message brokers and event processing protocols downstream from Kafka. If you’re already using Kafka as a broker, why introduce PubSub? For event processing, why Dataflow and not Kafka streams or ksqldb?

If it’s that your company signed a sponsorship with Google, I understand. But then why not replace Kafka with Google-managed services altogether? Are there things that Kafka does that PubSub doesn’t do that you really need (such as unlimited message retention)?


I'd think it is simply not changing all moving parts at the same time. It is good to be conservative while making such large move to cloud.


This is an interesting blog post, but one thing I really wish more blog posts would do is compare the actual user experience from the simpler architecture.

For example this new architecture, from the end-users perspective, compared to 2010 Twitter would be interesting. I'm sure much of the technology is needed for monetization, but it would be a fascinating look anyway.


It's true, the product is much worse than it was 10 years ago (I don't use it anymore since it's such a chore), but I don't think a company that has been obsessed with growth for growth's sake is really capable of that level of introspection. That'd be a difficult pill to swallow. Billions of dollars and thousands of human-hours spent to make something...worse. And since all that we really have at the end of the day is a desire/need to work on something that we feel is valuable, what's the point in asking the question?


This infrastructure work seems to be very far disconnected from anything a user would interact with. What kind of use experience would you like it to talk about?


Where I work, they use more and more google services every day, to my chagrin (since I've spent years learning the quirks of AWS).

Amazon's solutions can be fit into various architectures, but are more generalist tools than the BigQuery, DataFlow and BigTable application.

Google's solutions are also cheaper and/or easier to work with, for very large data processing.


Cloud Service Usage Price matters a lot. I actively move systems off of AWS to save costs. Take Cloudfront as an example. Bunny is half a cent per GB and has no per-request cost. I've seen Cloudfront cost over 20x more. A $10,000 bill instead of $200,000 per month is well worth the engineering work.

Btw, Bunny is also $10 flat rate for unlimited image processing... and they only bill the post-processed bandwidth cost. The price of my AWS equivalent for that pipeline was quite the comparison. :)


I misread your comment as "Bunny" being a GCP service competing with AWS CloudFront.

Is Bunny Optimizer (bunny.net) [1] what you are referring to?

[1]: https://bunny.net/optimizer/#pricing-details


Ah yeah, sorry. I should have elaborated. Bunny is a CDN provider. They charge half a cent per gigabyte for their Limited Points of Presence tier and 1 cent (or higher depending on the region) for their Many Points of Presence tier https://bunny.net/pricing

Plus, as you found, $10 for unlimited image processing :)

I look forward to the day they have a key value store along with functions at edge. I expect their pricing model to be simply irresistible.


What does this mean?

> but are more generalist tools than the BigQuery, DataFlow and BigTable application

Doesn't AWS provide similar tools?

    BigQuery == Redshift
    DataFlow == Kinesis (Analytics/Flink/EMR-Spark/EMR-BEAM)
    BigTable == Redshift Spectrum


Similar in that they are building blocks that you have to tailor to be efficient for your use case.

"Kinesis (Analytics/Flink/EMR-Spark/EMR-BEAM)" - this is kinda the point. In AMZ you're building out the dataflow process that fits your use-case and optimizes where you would like, rather than submitting to an existing (Google) design that is optimized+priced for very large dataset.


> There are various event sources we consume data from, and they are produced in different platforms and storage systems, such as Hadoop, Vertica, Manhattan distributed databases, Kafka, Twitter Eventbus, GCS, BigQuery, and PubSub.

I'm surprised that they mentioned Twitter EventBus, I thought that they were migrating away from that to Apache Kafka entirely. [1] Mind you, they've got a lot of tech going on, so not surprising if it's still present in legacy systems.

Fun fact, if you dig into the architecture of Twitter EventBus, it will seem awfully familiar to the architecture of Apache Pulsar (storage separated from brokers, storage based on Bookkeeper), and that's no coincidence, as Sijie Guo, CEO of StreamNative developed EventBus at Twitter (also a main dev of BookKeeper). StreamNative is to Pulsar as Confluent is to Kafka.

And the reasons that Twitter moved from EventBus to Kafka also apply to Pulsar, which is worth keeping in mind the next time an HN commenter proclaims "Le roi (Kafka) est mort, vive le roi! (Pulsar)".

[1]: https://blog.twitter.com/engineering/en_us/topics/insights/2...


Their whole platform sounds like an unholy mess by the sound of that. A big pile of dung that just accreted over time and they never cleaned out their technical debt. I just left a job like that a few weeks ago. It was such a nightmare I took a slight paycut.


Don't all code bases that are continually responding to changing business needs and/or scaling demands fated to become big piles of dung? (Or as we all politely call it "legacy").


They use the term EventBus, but really it's a thin EventBus client library that uses Kafka under the hood. I think all EventBus clients at Twitter have been migrated to use Kafka.


Ah, cheers, that makes sense, I didn't see it mentioned further in the architecture.


There's an HN post from a few years ago with a bunch of interesting reading from the "old days" at Twitter.

https://news.ycombinator.com/item?id=17147404

I think somewhere in there is a link to a story about how only one popular user (Was it Ashton Kutcher?) could tweet at a time. I seem to recall it ran on a singly MySQL server for quite a while too.


i think it was justin bieber. a former engineer who founded Signal went on the joe rogan podcast and talked about how when bieber tweeted, the building would shake or lights flicker or something to that effect


It's funny, while I was reading the "before" architecture I found myself thinking "my god, why don't they just move to a cloud already? this would be much simpler if they did". I "turned the page" and there was GCP :D.

Also, I wonder why they went with GCP instead of AWS. Does Twitter have a deal with Google that I'm not aware of?


It appears to me they left a lot of the real time processing in house, and the pub / sub to web clients and processing of that data to GCP where I'm guessing the rest of their web delivery stack is. I think they're wise to keep the real time processing in house (eg: kafka streams, etc)


agreed! the real-time processing should be bespoke and is where I think Twitter previously shined. The data processing wasn't a point of differentiation for them and so it makes sense for them to offload it to a cloud provider and let someone else deal with the operations associated with it.


oh, also, what's next, the iota architecture?


Title maybe a little misleading. After reading the document, the chart shows the new architecture as processing approx. 4-million events/sec…I think they are deriving the 400 billion events in real time by taking 4MM events/s x 86,400s/day. The actual number comes out to 345 billion but I’m guessing if we’re that high, might as well round it up-to 400 billion. I personally would consider 4-million events/s as the benchmark for processed events in real-time.


Genuinely curious ... of the 463k events-per-second that are implied here, how many of them are actual events (like a tweet or a like or retweet, etc.) and how many of them are advertising/tracking events ?


When I was at Twitter (6yrs ago now) they averaged around 5k tweets/s. There were MANY more notifications and other system type events.


Seems surprisingly low, and I imagine definitely at least timezone dependent. Any indication of what the largest kinds of spike traffic looked like?


I was involved in World Cup features and it definitely would spike around goals. Maybe 2-3x as I recall but it was always surprising to me how “low” the tweet volume was relative to the read volume.


Isn't it 4.63 million eps?


Those could all legitimately be tweets. The amount of spam tweet bots makes that number almost seem small.


Undeniably impressive, but it's funny that it all lands in a UI where you can no longer tie together a thread (of said events) into something readable. Doesn't twitter get why all those 3rd party sites that string things back together are popular?


Considering the recent acquisition of Threader, I’d say they are at least aware of this area: https://threader.app/


I won't use the official app or twitter.com for exactly this reason - I want to read things chronologically, not in a disjointed fashion with occasional adverts thrown in too.


I wonder what is the monthly cost on running their pipeline on new Google-backed system vs the old system they wrote themselves (but could've been running on public cloud before). I bet they got a massive discount, so would be good to know the unit prices (GB/s etc.) before that discount so that one can relate it to another company's problems and solutions.


Impressive, but I don't get why folks don't normalize it to events per second instead.

It is 463k eps for anyone wondering.


Isn't it 10x that?

Sanity check: say 10^5 seconds/day. Then it would be

400 * 10^9 / 10^5 = 4 * 10^6

https://www.google.com/search?q=400+*+10%5E9+%2F+%2824+*+60+...


You are right. 4.63 million eps. My bad.


And the nameplate capacity - up to 1GB per second - is easily within the nominal abilities of one single machine.


I love reading these threads and thinking this sort of stuff to myself... How much of twitter's infrastructure is self-perpetuated or otherwise exists just to serve other infrastructure? You obviously cant fit everything their business does on a single machine, but you could probably get dangerously close with enough determination. Especially, if you just focused on the core 280 character tweet abstraction.

An options order consists of more data than an average tweet, and we can certainly process them at a higher rate than twitter would need to go in practice. Many financial exchanges operate on a single thread. 1-100 million transactions per second with jitter measured in tens of microseconds. I don't see why other software products & services can't leverage similar concepts.


There’s certainly lots of room for improvement in this kind of system, but I think it’s a bit reductionist to compare this kind of work to HFT, which hits one hot path that is ultra optimized and doesn’t have to deal with complexities like “this data needs to be replicated and kept consistent”. Your question is a bit like “planes go fast, why can’t my car use some of the same tricks to go fast as well?”: yes, your car can probably go faster, and it could probably steal some aerodynamics and material science from your plane, but reducing the problem to something simple that they should “just” do is absurd.


> doesn’t have to deal with complexities like “this data needs to be replicated and kept consistent”

I'm sure there are other excuses available, but this is not a good one. Non-repudiation is a critical system requirement of any financial exchange. The figures stated include persistence to durable media, and the whole point of running everything on a single thread is to ensure serialized processing of all activities (i.e. consistency).

I would argue the need for replicating tweets is less urgent than ensuring 7-8 figure financial transactions don't go unaccounted for. We could probably make some compromises for the twitter use case to make this even faster.


Yep, probably a single good mainframe would handle all of it with a smile. And provide the uptime that most "cloud" vendors can dream of.

But that wouldn't be nowhere near as cool as building "cloud infra", right?


1 GB/s is on the order of how fast a computer can parse requests. Throw in processing on top of that and you’ll slow down considerably.


Is it just me or are most of the links broken in the blog post?


Oops I meant to comment on this HN post: "Handling five billion sessions a day in real time" https://news.ycombinator.com/item?id=29231932


what DNS server are you using, old sport?


Works for me


Do you remember the Fail Whale? Pepperidge Farm remembers.


Stopped reading at Kafka. Java technology that combines bad networking with bad messaging and bad queueing.

I would have expected one of the big Internet websites to use better technology.


Better tech? Which is?


Well to begin with, Java is not a serious programming language to do systems programming in, since it has poor control of networking, threads and memory.

Then it's a message bus built on top of TCP. Anyone with basic understanding of networking can see that if you have a producer that wants to send the same data to multiple consumers efficiently, you should use multicast.

Kafka also lacks proper mechanisms to throttle the speed of producers when consumers are too slow, which is the first thing you should ever be concerned about whenever you introduce a queue.

If you want something somewhat decent within Javaland you could try Aeron.


Kafka can buffer indefinitely (so long as you give it the disk space) and you can throttle to your heart's content within the consumer's loop or just consumer.pause().

I don't think you know the basics of Kafka's API or how it works internally.


This person doesn't appear to know the basics of a single thing they're maligning, but certainly isn't letting that slow them down.


The fact that you think this is a proper way to deal with a queue is quite concerning.

You completely missed the point. It's the producer that needs to be paused, and the reason for that is that memory is not infinite. You cannot just keep buffering until the consumer catches up, because it may never catch up.


While it is true that Kafka does not guarantee long-term steady state behavior, it is modeled as an impedance adapter of infinite capacity. It presents no impedance to the producer, has storage that is much larger than the arrival rate, and drains to the consumer as available. Any feedback between the consumer and producer has to happen out of band, which is fine.


And that is one of thr major defects I pointed out.

It's not "fine".


I'm a pretty serious systems programmer with 30 years in the industry and I would never even consider using IP multicast for any purpose.


The fact that the poster mentioned IP multicast and Aeron implies they are talking about HFT and Stock Exchange environments, where high performance switches with explicit support for low latency multicast are the norm and not the exception.

There is a Signals and Threads podcast episode that goes a bit into the history of it.

https://www.youtube.com/watch?v=triyiLwqWUI (Transcript) https://signalsandthreads.com/multicast-and-the-markets/


Pretty much any L3 switch supports UDP multicast. What can be more rare is PIM support which is a router feature.

If anything in HFT environments, people use L2 switches for latency reasons. Those operate at the Ethernet level, so they don't really care about IP at all.

Anyway I don't see why that's specific to electronic trading or even just a low-latency concern. Sending the same traffic to hundreds of people with unicast means using hundreds of times the bandwidth, which is a huge problem.


Oh I'm well aware, I've loaded more than enough Metamako alpha firmware releases for one career.

But there are a lot of professional software devs with zero real networking experience. Sure, they may understand the text book definition of TCP or they may even have seen pictures of fibre with labels like 'this is how far light travels in a nanosecond'. But would have no idea how to calculate the serialization latency of a 10G link, let alone know the duty cycle required to saturate one.

But none of that matters in the cloud (which is where twitter is stacking their jenga tower in the original blog post). And even if both Google and AWS have custom silicon (or at least FPGAs) doing hardware offload for their internal SDN encapsulation protocols at the server level and that their custom switches all support it, it doesn't matter. They hide all of that from you the customer and rarely even acknowledge it's existence.


One does wish the cloud network were a little more exposed to the tenant so we can do fancy stuff. Amazon's EFA is as close as they get. On the other hand I suspect Google of having bog-standard merchant silicon switches and maybe custom silicon NICs in their latest and greatest machines but for the most part off-the-shelf stuff at the host as well.


Still stuck with IPv4?? :)


> is not a serious programming language to do systems programming in, since it has poor control of networking, threads and memory.

Tell that to the Netty folks.


Here we go. You must be some HFT semi-god. For the rest of us, Java and TCP are just fine.


You still haven't answered the question. What is a better programming language?


Guessing he'll say Erlang


The established languages for systems programming are C and C++.


Yes, please elaborate - what’s better and why?


Elaborate?


Philosophically, isn't it wrong to process billions of events in real time? I mean, the magnitude of the moral hazard is astounding. The alternative isn't to stop sending messages, but rather to not put them through a bottleneck like Twitter. This would have the dual benefit of a) not giving central control over messages and b) not requiring exotic solutions.

I fear that technologists (including myself) are fascinated by the exotic solutions required by extreme centralization, and are more than happy to solve those rather than question the need for them in the first place.


Huh? People are using Twitter to create these events? Who else would you want to process it? It's not a bottleneck to put things through Twitter if people are literally using Twitter


I assume they're suggesting some sort of peer to peer system.


Yes. p2p systems would be the alternative and would, as a whole, process billions of messages but in a distributed way where each node only processes ~100.

I'm further suggesting there there is a relationship between scale and ethics, via moral hazard, that is worth exploring. Too bad I'm getting downvoted instead of engaged with.


You haven't really explored what or why the moral hazard could be, so your current comment really sounds like "twitter big, ergo twitter evil". Its sort of on you to surface your idea instead of expecting strangers to engage with you - you seem perfectly pleasant but it frequently leads to very negative interactions.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: