the customers spending billions on the big cloud providers are entitled to much more rigorous and detailed reporting (incl historical data) for end-user-visible performance... I would much rather see the graphs in the article than a green check box (especially when things aren't actually very "green").
(disclaimer: I am one of the people who started opentracing)
Someone sent me this HN thread – nice to see the questions here!
OpenTracing has had a big end-of-2017 with support landing for various service mesh(es), bindings announced from some major vendors (newrelic and datadog), many community contributions in the "core" OpenTracing repos as well as integrations with numerous external projects, lots of uptake from companies large and small, etc.
"OpenTracing-compatible" is strict API compatibility in any supported language. The cross-language spec is "terminology-based" since it's, well, cross-language.
As an OpenTracing contributor, the core value prop still seems quite strong in that instrumentation of OSS dependencies is a massive pain point and should not be tracing-system-specific since it doesn't need to be. There is also value in common protocols and formats, and in that spirit there is interest in broadening scope to include those... though from seeing many companies adopt tracing tech, I haven't observed protocol compatibility as the main pain point or blocker.
I can't resist (this is the OP): you are missing the point. It's not just throughput, it's high-percentile latency. Latency is critical if you have 1 billion users or 100 users, and it is difficult to bring the high percentiles of the latency distribution down into reasonable territory on Rails since, by default, all operations are essentially serialized.
I guess the problem of high-percentile latency is not widely understood; I'm not sure I understand it myself. Can you explain in more detail? In particular, are you talking about requests that take a while to complete because they have some complex processing, or requests that take a long time to complete because they can't be processed until some other long-running request finishes? The bit about everything being serialized suggests that the main concern is the latter. Does this apply even when using multiple threads under the C Ruby implementation? Why does running multiple web server processes on the same machine not mitigate the problem?
BTW, I don't use Rails or Ruby, but I do use Python for web apps at work (currently CPython, GIL and all). I'm curious to find out if this problem of high-percentile latency applies to Python as well.
So, for any black-box service endpoint, the latency for any given request is obviously just the time it takes for that operation to complete. Ideally one measures both end-to-end latency from the client and server-side latency in order to understand the impact of the network and, for high-throughput applications, any kernel buffering that takes place.
All of that is obvious, I imagine. By "high-percentile latency", I'm referring to percentiles of a distribution of all latency measurements gathered from a given endpoint over some period of time. If you imagine that distribution as a frequency histogram, the horizontal axis ends up being buckets of latency ranges (e.g., 0-10ms, 10-20ms, 20-30ms, etc), and the bars themselves of course represent the number of samples in each such bucket. What we want to do is determine which bucket contains the 95th percentile (or 99th, or 99.9th) latency value.
You can see such a latency distribution on page 10 of this paper which I published while at Google:
http://research.google.com/pubs/pub36356.html
Anyway, it is a mouthful to explain latency percentiles, but in practice it ends up being an extremely useful measurement. Average latency is just not that important in interactive applications (webapps or otherwise): what you should be measuring is outlier latency. Every service you've ever heard of at google has pagers set to track high-percentile latency over the trailing 1m or 5m or 10m (etc) for user-facing endpoints.
Coming back to Rails: latency is of course a concern through the entire stack. The reason Rails is so problematic (in my experience) is that people writing gems never seem to realize when they can and should be doing things in parallel, with the possible exception of carefully crafted SQL queries that get parallelized in the database. The Node.js community is a little better in that they don't block on all function calls by convention like folks do in Rails, but it's really all just a "cultural" thing. I don't know off the top of my head how things generally work in Django...
One final thing: GC is a nightmare for high-percentile latency, and any dynamic language has to contend with it. Especially if multiple requests are processed concurrently, which is of course necessary to get reasonable throughput.
In my experience, when using Django or one of the other WSGI-based Python web frameworks, the steps to complete a complex request are serialized just as much as in Rails. The single-threaded process-per-request model, based on the hope that requests will finish fast, is also quite common in Python land.
You mention that GC is a nightmare for high-percentile latency. Isn't this just as much of a problem for Go? Would you continue to develop back-end services in C++ if not for the fact that most developers these days aren't comfortable with C++ and manual memory management?
For my own project, the GC tradeoff with Go (or Java) is acceptable given the relative ease of development w.r.t. C++. Since there are better structures in place to explicitly control the layout of memory, you can do things with freepools, etc, that take pressure of GC.
For high-performance things like the systems I had to build at Google, I don't know how to make things work in the high percentiles without bringing explicit memory management into the picture. Although it makes me feel like a hax0r to talk about doing work like that, the reality is that it adds 50% to dev time, and I think Go/Clojure/Scala/Java are an acceptable compromise in the meantime.
It is possible to build things that minimize GC churn in python/ruby/etc, of course; I don't want to imply that I'm claiming otherwise. But the GC ends up being slower in practice for any number of reasons. I'm not sure if this is true in javascript anymore, actually... it'd be good to get measurements for that, I bet it's improved a lot since javascript VMs have received so much attention in recent years.
Final point: regardless of the language, splitting services out behind clean protobuf/thrift/etc APIs is advantageous for lots of obvious reasons, but one of them is that, when one realizes that sub-service X is the memory hog, one can reimplement that one service in C++ (or similar) without touching anything else. And I guess that's my fantasy for how things will play out for my own stuff. Ask me how it went in a couple of years :)
Just to be clear, do you mean that writing in C++ and doing manual memory management doubles dev time, or makes it 1.5 times as long as it would be in a garbage collected language?
Also, where does most of that extra dev time go? Being careful while coding to make sure you're managing memory right, or debugging when problems come up?
I don't think that doing manual memory management doubles dev time for experienced devs, no... I just mean that, if you're trying to eliminate GC hiccups by, say, writing a custom allocator in C++ (i.e., exactly what we had to do with this project I was on at Google), it just adds up.
I.e., it's not the manual memory management that's expensive per se, it's that manual memory management opens up optimization paths that, while worthwhile given an appropriately latency-sensitive system, take a long time to walk.
JRuby is of course incompatible with C extensions to Rails; the place I used Rails had such dependencies, and so JRuby was not an option. I agree that JRuby is otherwise preferable to cRuby, though.
I'd be curious to hear how Square has found ruby (independent of rails) to be from a maintenance standpoint.
> JRuby is of course incompatible with C extensions to Rails
The fact that old-style native extensions are tied to MRI is one of the motivations for Ruby FFI, which provides a common mechanism for interfacing to native libraries on MRI, JRuby, and Rubinius (and maybe some other implementations, as well.)
I don't think anyone is arguing that companies fail to grow because of these languages. It's merely that they would grow more quickly once at scale if they didn't have to spend several years rearchitecting while adding few innovations in the meantime (this is what happened at Twitter, for example).
Hmm, I didn't think it took any of them years to rearchitect per se? Regardless, somebody has to do it =)
I for one am grateful for these giants pushing our current platforms and languages to their limits. On top of that, they've gone and found solutions, all in their own ways to solve the problem of scaleability.
No?
It really did take twitter many years to rearchitect and break their most problematic dependencies on Rails, yes... my understanding is that it was a 4-year process.
I agree that the post would be more compelling if I wrote a benchmark to demonstrate how much faster an in-memory cache is than an off-process or off-machine cache.
From first principles, though, I believe it should be obvious (yes?) that the ~300 nanoseconds it takes to grab a read lock and read from main memory is going to beat the ~1000000 nanoseconds it takes to get a response back from a remote cache over the network. Inasmuch as an application blocks on such cache reads, these sorts of things add up to troublesome latency numbers (and Rails – or at least dallistore – does indeed block on reads like these).
JRuby was off the table for the place I used rails due to reliance on some C extensions.
And I'm sorry if the argument seemed arrogant: I was being tongue-in-cheek about the "unrelenting negativity" part. My point about objectivity was that I tried not to rely on my opinions as much as demonstrable statements. That said, I didn't take the time to actually demonstrate most of those, and for that, shame on me.
How about the performance difference between an in-process cache and a memcached instance on localhost? Surely anyone who knows what they're doing will run the cache on the same machine as the web server.
(I guess that makes a good argument for not using a PaaS that gives you too little control over locality.)
This is a lazy answer, but it needs to be said: it depends on the context and the number of roundtrips to the cache.
If we're talking about a vanilla object cache, I think that it probably only costs 50% or so to make the extra copies and system calls. However, an in-memory "cache" can be something more than just a key-value store. E.g., for the thing I'm building right now, there are some more structurally complex graph structures that need to be traversed with every request to the "cache", and of course making a localhost RPC call for each entry/exit to the cache is really problematic. Though I admit that most caching is unsophisticated stuff, and in those cases it's just a ~2x difference.
Incidentally, we will be adding a cached test in a later round [1] and leaving the implementation particulars in large part to the best-practices of the platform or framework in question. So on Rails, use of memcached is anticipated. On a JVM platform, by comparison, use of a [distributed] in-process cache is anticipated.
Preliminary results suggest an even wider spread of results than seen in the existing single-query test.
I was hoping that those who already agree with me about dynamic languages would come to understand that Go is different in this respect. I did a lousy (i.e., nonexistent) job making a case to those who don't agree with me [yet! :)] about dynamic languages, though.
I will write a followup post later this week about the long-term maintenance problems associated with languages in the python/ruby/javascript family. I don't think they're "bad" (I was known to advocate for python in certain situations when I was at Google), but they're often inappropriate, and it is my sense that many developers haven't had the requisite large-dynamic-language-project trauma yet to understand that from firsthand experience. (The toughest part about those traumas is that they happen so late in a project's lifecycle that there's no quick way back to safety...)
So I will try to make that case in a future post. Thanks for your thoughts.
> by the time the programmer realizes he's in hell, it's too late to fix it.
From what I've seen, it's more that management doesn't want to take resources from fighting fires to move some gasoline. A disciplined group can even take rat's nest code and whip it into shape: but only if management is clueful enough to make that a priority. Usually, they're making decisions on a short-term basis.
Exactly. With proper discipline and true 100% test coverage, dynamic languages work well. But over time, that ideal is a challenge for most software organizations to actually live by. Or that's been my experience... it's one of those theory vs. practice situations.
Maybe you'll convince the rest of us! If you do write another blog post, try to include the recent indie nightmare that was purportedly caused by Go[1]. I think it's relevant here (since we're discussing large projects, after all).
For what it's worth, "Go" as a language is not really implicated in that, it's more like the `go` cmdline suite that was causing trouble. I would also contend that the devs were being foolish to do what they did... Assuming everything was in a git repo, the toolchain makes proper use of submodules, and to my mind this sounds like a case of developers fundamentally misunderstanding git, not Go per se.
But my "railing on Rails" (and, to a lesser extend, Node, Django, etc) will not focus on Go... it's more of a general critique about the lifecycle of large software projects written in dynamic languages.