Asahi linux is essentially in a holding pattern with only support up to M2. Likely linux will never be supported above M2 and even M2 has a lot of rough edges. When my monitor sleeps on M2 linux it can never reawaken without a reboot.
A client is a layer of various components like load balancers and stats recorders, but they're also all configurable. Only using the expressiveness of scala could such a data structure be created to compose all of these pieces, yet still keep them modifiable and type-safe.
Fanout is soft realtime, but yeah, we don't block lady gagas tweets to 40 million people until they're all delivered. Note that she counts as 1 in the tps calculation when she tweets. Btw, we're hiring if you'd like to help us make twitter even better.
Not sure what the original comment was, but essentially trading firms build their own ticker plants to get their "curated" view. Obviously, twitter followers won't be building their own tweet plants. I'd imagine that the main challenges that are faced with twitter's fan out is in data persistence and scale of data involved.
We could talk about how trading system architecture does not apply, but there are potentially many ways that it does. The main difference in web architecture versus trading-system architecture is the messaging data model. Trading systems are architected for low-latency by building messaging systems that focus on transmitting events as opposed to fully fleshed-out data structures. For the uninitiated, a typical web-app will commit data to a DB, then request all data. A low-latency trading system does not build an orderbook data structure and commit that to a DB, then request it from the DB, it merely listens to a stream of events (e.g. firehose) consisting of buys, sells & executions and builds its own book, so that the orderbook data-structure is always available locally in-memory as opposed to being a DB request away. The beauty with this architecture is that special data structures need not be a complex select/join away with DB's and cache layers taking on a ton of work. It allows just the events to be propagated and let each app decide for itself what it necessary and what isn't necessary and it allows the relevant data to be built in realtime, with low-latency and a high amount of determinism.
Modern systems on higher-end commodity hardware that I have dealt with, with the right tuning can push around 2M 120Byte msgs/sec with an average latency of 6-7 microseconds per host. I'm told with extra/better tuning hitting 3M 120Bytes messages/sec sustained per host. Of course this is doing everything in-memory w/out disk, but even having a disk as a bottleneck doesn't preclude this kind of performance. It just introduced 5-20ms outliers in the message processing workflow as pages are flushed to disk.
The way I've envisioned applying trading system arch to Twitter is where there are tweet submitters that get aggregated into a sequencer that then a bunch of fan-out machines read and committing the fanned out data structures to a memcache and/or data persistence cluster of some type. Each fan-out can be its own shard discarding irrelevant tweets. Other apps that want data can get them via other apps that provide a firehose listening directly from the sequenced data from the firehose or merely subscribing/joining the firehose feeds and processing the data appropriately.
This should be able to handle the hundreds of millions of fanouts per second that are necessary in a distributed manner with great efficiency in the number of hosts involved. If you need to expand capacity, it's just a matter of bringing up extra capacity to the fan-out hosts and persister hosts.
People working on distributed tracing systems all tend to eventually come up with a similar architecture. Even going back to the IPS research system made at University of Wisconsin in the late 1980s (that's the earliest distributed tracing system I know of).
They all tend to do tracing via minimalistic, low overhead logging of RPC calls between machines. They tend to do tracing via low-level libraries which application developers can ignore. The trace systems seems to be good at uncovering latency bottlenecks. I am ignorant of what success systems like Zipkin or Google's Dapper may or may not have had in areas outside of latency checks.
We're using this at Twitter to better understand usage patterns for services upstream and downstream. For example, Gizmoduck, the user store at twitter is backed by memcache, and some disk-based storage behind the cache. While we can view individual traces that hit by memcache, the aggregate info shows us both the proportion of traffic for services calling Gizmoduck, as well as the proportion of time Gizmoduck spends in memcache versus the backend store.
Furthermore, it can be useful for notifying of unusual behavior. If a service's aggregate durations has changed since yesterday, perhaps that's something we want to look at. Or if the ratio of traffic from some upstream service doubles, that's interesting to know.
Latency isn't the only thing--if that's all you wanted, you wouldn't need the tracing, just log the latency at each step. The real amazing thing is the structure of requests. Having a record of distibuted N+1 problems is pretty amazing, for instance.
As for where you hook into the stack, it's definitely a tradeoff. The lower level you go, the wider your coverage, but it also becomes more generic and potentially less actionable data.
I've worked on building a similar system at AppNeta (it's basically commercial Google Dapper). Here's some slides from a lightning talk I gave about distributed tracing at Surge last year: http://www.slideshare.net/dkuebrich/distributed-tracing-in-5...
We built something similar last year for our internal services, but ran into problems when dealing with asynchronous code and threading context.
How do you propagate the Zipkin info from thread to thread inside your code? An example - a request comes in, we generate a new request ID and pass this down into the processing code. Part of this code executes an async call to an external service with essentially a callback (in reality, it's a scala.concurrent.Future executing on an arbitrary context or an akka actor) - how do you properly rehydrate the Zipkin info when the response comes back? The only way we could think of is some sort of fiendishly complex custom ExecutionContext that inspected the thread local state at creation time and recreates that in the thread running the callback, or just have pretty much every method take an implicit context parameter. Neither of those solutions worked well, so we've largely bailed on the concept for the bits of our code that don't execute in a linear/blocking fashion.
We actually built something quite similar to that, but how do you avoid having to artificially insert a save() and restore(context) call all over the place? It looks to me like you'd have to do something like
val context = Local.save()
val eventuallyFoo = someServiceClient.makeCall("data").map { result =>
Local.restore(context)
//do work
}
Could you please elaborate on that? Does this mean that Twitter sees no danger in exposing these? Which parts of the infrastructure would be considered secret sauce and not open source? Or does it not matter when the company is as big as Twitter, since the core strength lies in user base, not technical infrastructure? What does Twitter primarily seek to achieve when it open sources its stuff? Talent acquisition or brand image or other benefits of open source such as collaboration?
I mean this as politely and constructively as possible, but this looks like it has quite a few easy to fall into failure modes. If I'm not using a twitter Future, or if I've been given a Future from somebody else, it would look like I need to ensure to surround every compositional operation with this save and restore, otherwise my context is permanently lost. Additionally, I have to trust that any code I hand a closure to will do the right thing, otherwise my context is potentially lost since I have no guarantee on which thread that closure will actually be executed on. It seems like it would be virtually impossible to avoid this happening at least once in even a simple codebase, and when context is lost, it happens silently.
You are right that this fails when you move outside of the model. However, we don't.
You may be surprised (amazed?) to learn that, internally, 100% of composition happens in this manner. We have a massive code base, and we've not seen this be an issue.
Further, we've worked with the Scala community to standardize the idea of an "execution context" which helps make these ideas portable, the particular of the implementation transparent to arbitrary producers and consumers of futures, so long as they comply to the standard Scala future API. (Twitter futures will when we migrate to Scala 2.10.)
This is true, and I look forward to the Typesafe guys fixing both the default implicit ExecutionContext as well as any contexts that Akka creates. I actually implemented an ExecutionContext that does exactly what is described above, but ultimately we had to abandon it, since pretty much any library that deals with scala standard Futures in 2.10 has places where we could not provide our custom context. I can't wait until you guys get that ported upstream, because until then, I've explicitly banned the use of thread locals across our entire stack.
Could you explain a bit more how it works because, if a see the big picture, I fail to see how it could be useful outside Twitter's stack. I mean it seems to heavily rely on Finagle, right ? Please correct and explain if I'm wrong.
Zipkin depends on the services in the stack to be able to capture trace identifiers, record and send to the collector timing information, and forward trace identifiers for outbound calls. This is easy in Finagle as all of this logic is already abstracted away for most of the protocols. Note that Finagle is open source so its ready for use today.
If you wanted to record and forward traces inside your own services, this is relatively straightforward (though not super trivial) to do. There are other implementations beyond the Finagle one in development by the community - I would suggest hopping on to the zipkin-users Google Group.
Zipkin is designed to piece together operations that are distributed across a fleet of systems. IE: nginx calls your rails server which calls memcache then calls your login server which hits a mysql database. It's most useful when your serving architecture is distributed across many many machines and many many different services.