Hacker News new | past | comments | ask | show | jobs | submit login
Twitter survives election after Ruby-to-JVM move (theregister.co.uk)
131 points by tawman on Nov 8, 2012 | hide | past | favorite | 167 comments



It's interesting that when it comes time to scale to serve enormous loads, you have to be willing to change fundamental parts of your stack which you've made a huge investment in. Ruby holds up well enough on the majority of the sites that use it, but when you have traffic the size that Twitter does, it's just not good enough. And it turns out that Java provides a nice tradeoff with high performance and high-level code.

It's also interesting to see different companies approach this problem differently - Facebook famously recreated a way to run their PHP source code (by compiling it to C and then running it natively) instead of actually rewriting the source to a different language. I wonder if something similar would have been possible for Twitter, or if they weren't happy with how their existing code was structured in the first place which may have made the rewrite more attractive.


It's not primarily about the size of traffic, but the ability or inability to cache. At work we serve a ton of traffic with MRI ruby and 3 small VMs. Most requests are served by varnish and never hit the ruby stack. Most people do a terrible job at caching (edit: I'm not saying that twitter is bad at caching).


Twitter is hugely write intensive and needs realtime data, so their caching needs are probably vastly different from yours.


That was my point. Their caching needs are vastly different from almost everyone!


Agreed. I've got a page that takes 20 seconds to render on a dedicated fast machine. I could spend some time optimizing it, but even then it would still be slow. However, it's cacheable so I can fix performance problems with a simple post-deploy script. We can't do that for every page, but it's awesome when we can.


No, you don't need to change fundamental parts. You need to get the architecture right. Then you need to determine whether the potential cost savings of switching languages are worth it for you. If things start falling over when you scale, it's an architecture problem, not a language problem.

Why do I say that? Because in Twitters case, handling tweets is a "trivial" problem to parallelise, and the potential savings of switching language will be a rounding error in terms of scaling their system compared to getting the architecture right.

(To scale Twitter: Make trees. Split the following list into suitably wide trees, and your problem has now been reduced to an efficient hierarchical data store + efficiently routing messages. Both are well understood, easy to scale "solved" problems)


Would love to see someone at twitter refute this... seems too simple to be correct. :)


Keep in mind I gave it as an example that is conceptually simple to demonstrate that scaling it with pretty much any technology is possible. It is still a lot of work, and expensive, and likely far from optimal.

The approach I suggested would create massive message spikes, and there's undoubtably optimizations. E.g. 30 million followers for the top users is a lot, but most of them don't tweet all that often. Perhaps it's better to keep the recent tweet of the top 1000 (or 100,000) users in memory in memcached's and "weave" them into the timelines of their followers instead of trying to immediately stored the in timelines for each user.

The point is scaling is pretty much never a language problem once your needs are large enough that they'll exceed a server or even a rack no matter how efficient the language implementation is.

At that point the language choice is an operational cost vs. productivity issue down to whether or not the CPU usage differences are sufficiently high to cost you enough to offset any productivity advantages you might get from the slower language implementation.

That is a valid concern. It's perfectly possible that Twitter is right in switching in their case. E.g. even a 10% reduction in servers could pay for a lot of developer hours when your server park gets large enough.

If you're on the tipping point where switching to a faster language implementation can keep you from having to scale past a single machine, then it's slightly different. But arguably if you're in that position, you should still plan for growth.


"Python is fast enough for our site and allows us to produce maintainable features in record times, with a minimum of developers," said Cuong Do, Software Architect, YouTube.com.

There are many dynamic user driven sites which have scaled well (far less downtime than Twitter) without switching to static compilation.


You can argue that YouTube is less dynamic than Twitter. When a user posts a message on Twitter, that message has to be pushed to all the users that follow that user and so every user has a personalized stream of incoming tweets that has to be updated, not in realtime, but with small latency nonetheless.

Also, Twitter doesn't have Google's infrastructure.

In regards to "static compilation", that's not the important bit, but rather the performance of the virtual machine. The JVM, at its core, is not working with static types. The bytecode itself is free of static types, except for when you want to invoke a method, in which case you need a concrete name for the type for which you invoke that method ... this is because the actual method that gets called is not known, being dispatched based on "this", so you need some kind of lookup strategy and therefore the conventions that Java uses for that lookup are hardcoded in the bytecode. However, invokeDynamic from JDK 7 allows the developer to override that lookup strategy, allowing one complete dynamic freedom at the bytecode level, with good performance characteristics.

The real issue is the JVM versus the reference implementations of Ruby/Python. The JVM is the most advanced mainstream VM (for server-side loads at least).

Unfortunately for Facebook, they didn't have a Charles Oliver Nutter to implement a kickass PHP implementation on top of the JVM - not that it's something feasible, because PHP as a language depends a lot on a multitude of C-based extensions. The more pure a language is (in common usage), the easier it is to port to other platforms. Alternative Python implementations (read Jython, IronPython) have failed because if you want to port Python, you also have to port popular libraries such as NumPy. Which is why the PyPy project is allocating good resources towards that, because otherwise nobody would use it.


> The JVM, at its core, is not working with static types. The bytecode itself is free of static types, except for when you want to invoke a method, in which case you need a concrete name for the type for which you invoke that method ...

Not sure what gave you this impression, as the majority of Java bytecode instructions are typed. For example, the add instruction comes in typed variants: iadd (for ints), ladd (for longs), dadd (for doubles), fadd (floats), etc.

The same is true for most other instructions: the other arithmetic instructions (div, sub, etc.), the comparison instructions (*cmp), pushing constants on the stack, setting and loading local variables, returning values from methods, etc.

http://en.wikipedia.org/wiki/Java_bytecode_instruction_listi...

InvokeDynamic, as you point it out, was added to make implementing dynamic languages on the JVM easier, because the JVM was too statically typed at its core.


Arithmetic operations on numbers are not polymorphic, but polymorphism has nothing to do with static typing per se. You're being confused here by the special treatment the JVM gives to primitives, special treatment that was needed to avoid the boxing/unboxing costs, but that's a separate discussion and note that hidden boxing/unboxing costs can also happen in Scala, which treats numbers as Objects.

Disregarding primitives, the JVM doesn't give a crap about what types you push/pop the stack or what values you return.

invokeDynamic is nothing more than an invokeVirtual or maybe an invokeInterface, with the difference that the actual method lookup logic (specific to the Java language) is overridden by your own logic, otherwise it's subject to the same optimizations that the JVM is able to perform on virtual method calls, like code inlining:

http://cr.openjdk.java.net/~jrose/pres/200910-VMIL.pdf

> ... because the JVM was too statically typed at its core

Nice hand-waving of an argument, by throwing a useless Wikipedia link in there as some kind of appeal to authority.

I can do that too ... the core of the JVM (the HotSpot introduced in 1.4) is actually based on Strongtalk, a Smaltalk implementation that used optional-typing for type-safety, but not for performance:

http://strongtalk.org/ + http://en.wikipedia.org/wiki/HotSpot#History


> Nice hand-waving of an argument, by throwing a useless Wikipedia link in there as some kind of appeal to authority

No need to get agressive over this :) I disagreed with your first comment regarding the dynamic nature of the JVM, and replied trying to explain why.

I posted the wikipedia link not as kind of an "appeal to authority", but to give the readers a full listing of bytecode instructions, so that they can check what I was saying for themselves.

> Disregarding primitives, the JVM doesn't give a crap about what types you push/pop the stack or what values you return.

It depends how you see things: the JVM can't possibly provide instructions for every possible user type, so apart from primitives, the other object types are passed around as pointers or references, but whenever you try to do something other than storing/loading them on the stack, the type checking kicks in, ensuring that the reference being manipulated has the right types.

For instance, the putfield instruction doesn't just take the field name where the top of the stack is going to get stored. It also takes the type of the field as a parameter, to ensure that the types are compatible.

Constrast this to Python's bytecode, where the equivalent STORE_NAME (or the other variants) doesn't ask you to provide type informations.

But then again, we might be splitting hairs here: since this type checking is happening at runtime (when the JVM is running your code), you could indeed question calling it "static typing", which is usually performed at compile time (an is partially performed by the java compiler for example).


I don't think pushing tweets to every user is a scalable approach. It would seem more sensible to read rather than push tweets associated with a user. I don't know about Twitter's infrastructure, but storing more than 1 copy of the same data wouldn't make that much sense.


> I don't think pushing tweets to every user is a scalable approach. It would seem more sensible to read rather than push tweets associated with a user. I don't know about Twitter's infrastructure, but storing more than 1 copy of the same data wouldn't make that much sense.

Suppose I have 5000 followers. How do you propose to get the timeline? Since you said "single copy of data", I assume you would do something like this:

https://github.com/mitsuhiko/flask/blob/master/examples/mini...

At twitter scale, that query will be a major bottleneck.


When a user posts a video it gets published in the subscription feed of every user subscribed(maybe not with the ajax like twitter does); plus they tell you which videos have you already watched. Plus the search feature is probably more used than in Twitter.


Facebook also has many more features, which probably means more code, which probably means larger switching costs. Take anything: photos, privacy logic, news feed, etc., and Facebook has the more sophisticated feature set.


it's funny that this is even an issue anymore. at the time where twitter was started, the ruby vm has huge issues, and you couldn't even do evented io.

what I find even funnier is that people just throw in play and grails without a little of experience in rails. The eco system is entirely different:

- you can use java and therefore java libraries(yes you could do the same in jruby, but nvm) - bundles/gems are an order of magnitude better than classical java dependency hell. - need something in rails? add a gem. need something in grails? search throw the outdated plugins, search for missing documentation(just take a look at stackoverflow). in general -> write it yourself or pay a consultant to do it. - have a question? fight with incomplete documentation.

on top of that grails is just a stack on top of the spring mvc.

what about play? i like play more than I like grails tbh, because it doesn't want to be the rails of java. yes people compare it to one another, but that's just the familiarity effect.

now, where's the computationally intensive stuff? nowhere to be found. it's a web api. where's the computationally intensive stuff in twitter? I don't know, but chances are theres a native extension for that nowadays.

There actually was a time when you simply could not build a scalable system in ruby without too many hoops, but that's no longer the case. Yes the GIL is bad, but keep in mind that things like ruby fibers didn't even exist at the time.


> i like play more than I like grails tbh, because it doesn't want to be the rails of java. yes people compare it to one another, but that's just the familiarity effect.

My memory might be playing tricks on me, but I remember play developers talking about rails being a huge influence.

If Play isn't rails for Java, how else one does a Rails for Java? The "familiarity effect" is there because Play is modeled after Rails.

> now, where's the computationally intensive stuff? nowhere to be found. it's a web api. where's the computationally intensive stuff in twitter? I don't know, but chances are theres a native extension for that nowadays.

Search, for one is computationally intensive. I am pretty sure there are more which the outside world doesn't know about.

> There actually was a time when you simply could not build a scalable system in ruby without too many hoops,

What scale are we talking about? At Twitter scale, ruby or anything else has to jump hoops. For example, network load from users coming online and offline on FB chat will break out of box solutions.


While I can agree that scala/java/jvm deserve some credit for this (typing, concurrency support), I think stories like this do a disservice in that they underestimate the importance of building a system a second time with all the lessons learned the first time around.

This is not unlike those stories where a developer writes a trivial program in a new language that is similar to a trivial program they wrote in another language 1+ years prior and compare the results. "I wrote program foo in 30 lines of code in language X, which is much better than the 120 lines of code it took me in language Y two years ago". It's natural for a developer to write a shorter program two years later since they have 2 more years under their belt. In fact I'd expect the same program to be much better two years later even if written in the same language.


Very true. Twitter may have had a lot of scaling trouble, but that didn't necessarily mean moving everything onto the JVM was necessary to 'make it scale'. It was just sensible to pick a language that would prevent problems in future.


But if the second design is the better design theory has any impact, Twitter would have done better to build the second design with Ruby.


What this says to me, a non-RoR user, is that it's harder to build websites with Java than with RoR, but if you pass a certain (very unusual) level of traffic, you'll wish you'd made the extra effort; otherwise, you'll be glad you didn't. (Extra effort isn't free.)

Fine, but what I wish I understood was why RoR is so popular to begin with. The claim is always that Rails is so wonderful that it's worth learning Ruby just to be able to build web apps with Rails. Well, if so, then why isn't there a sequel to RoR: Python on Rails? "The benefit people care most about (Rails, not Ruby) using the language you already know (Python)." Since Python, Perl, and Ruby are so similar except in syntax, and people love the Rails part more than the Ruby part, and so many more people know and love Python, and new Python web app frameworks appear all the time...why isn't one of them Python on Rails after years of Pythonistas being forced to abandon Python and learn Ruby just to be able to use Rails?

Is there some significant difference between Python and Ruby that explains this? What makes Rails so attractive and why isn't the same thing done with Python?


What this says to me, a non-RoR user, is that it's harder to build websites with Java than with RoR, but if you pass a certain (very unusual) level of traffic, you'll wish you'd made the extra effort; otherwise, you'll be glad you didn't. (Extra effort isn't free.)

If you get to the scale of twitter, and more importantly if you have written a real time messaging server with a web application framework, it doesn't matter which framework or language you started with as you're going to be completely rewriting your entire stack at some stage as different parts fail in order to deal with the load (unless you have an incredibly experienced team who has written a twitter equivalent before and scaled it to 15000 messages a second). There's no difference between ruby or python (or perl, or php) in that regard - they are all interpreted and relatively slow, which starts to matter at this scale. And also even if you had used java or c in the fist place, if you had an architecture not written with massive scale in mind, you probably wouldn't survive that growth without radical changes on every level of your stack.

Re ruby versus python, the popularity of rails is partly historical accident, partly that ruby is a nice language which doesn't get in the way and is ideally suited to this domain, and partly that rails is deals with lots of the basics of web development for you without getting in the way too much when you need to adapt it. None of that means ruby is better than python, but I'd disagree that people put up with ruby in order to use rails - it's a really nice language in its own right, but it is not highly performant (though it is getting better). For most websites of course, many of which can employ caching, that is a non-issue, even at large scale - witness the success of Wordpress in php which would not survive even modest loads without caching.


Yup, and from experience, you're going to end up rewriting big chunks of your stack multiple times. (In a previous life, I worked on a team that grew a Java-based SMS gateway from an initial design limit of 10 msg/s on a single box to 10,000+ msg/s sustained, replicated across a high-availability cluster.) Props to the Twitter team for pulling it off, because at volumes like that you start to get into seriously arcane stuff like JVM garbage collection tweaking, or otherwise a full GC that halts all threads of execution for 30+ seconds will create entire pods of fail whales.

http://www.oracle.com/technetwork/java/javase/gc-tuning-6-14...


python is not interpreted,it is compiled in bytecode just like java.


Yes sorry that was an over-simplification due to ignorance - Ruby 1.9.3 and Python are both compiled to bytecode now, but still tend not to do as well in speed comparisons to the Java VM (which perhaps has just had more effort on optimisation). So in that sense it is like Java, but slower - for most people that really doesn't matter, but for twitter with this level of traffic, it probably would. You can of course run Ruby on the JVM now too, but I think that came too late for twitter.

What I found interesting from another twitter blog post linked from the article was that they are actually running on a modified version of Ruby 1.8.7 REE, not Ruby 1.9.x as you might expect, so that must really skew their comparisons![1] Anyway, there are a lot of variables in a huge system like twitter, so boiling it down to just a problem with Ruby performance seems pretty simplistic (as you'd expect from the register), and I seriously doubt they could have avoided this kind of rewrite and refactor of their entire stack in any language when hitting this sort of scale. To me it doesn't say much at all about Ruby as a language or whether it is suitable for websites.

[1] http://engineering.twitter.com/2011/03/building-faster-ruby-...


> python is not interpreted,it is compiled in bytecode just like java.

Depending on the implementation, so is Ruby. The canonical MRI 1.9 compiles to bytecode and interprets it. But unlike python, it doesn't write the bytecode to disk. The loading of bytecodes has been disabled until a bytecode verifier is implemented.


But the bytecode is then interpreted.


Python has a roughly equivalent MVC web framework called Django.

Ruby and Python as languages are not radically different in kind, but their respective developer communities have had different focuses, and as a consequence the library of tools are not identical.

Ruby on Rails became popular because people were dissatisfied with the way that web development was being done, and DHH is very good at marketing/propaganda. Ruby on Rails has had a substantive effect on the way that web development is done, and there have been numerous attempts in other languages (that didn't already have a framework like Django) to recreate the things that Ruby programmers enjoy with Rails (CakePHP and Grails come to mind immediately).


This is spot on.

To elaborate a bit, Rails tended to have a lot more generated code (the fabled "magic") that really sped up development of your standard CRUD apps. As far as I can tell, this was fairly novel in web development. Django didn't really have a focus on that, you spent a little (lot?) more time configuring. It's better now though.

Of course, at the complete opposite of the spectrum, you have very minimalist stuff like Flask (Python) or Sinatra (Ruby) which is doesn't include a lot of bells and whistles. You'll have to import your own ORM, templating, etc...


I'll throw out another factor:

When Rails was introduced, there were no other substantive Ruby web frameworks - Ruby itself was relatively obscure compared to every other web-capable tech. As such, it (Rails) had no competition in the framework space. PHP, Java, Python and other languages all had competing web frameworks to choose from.


Ruby has nothing to do with Python. Python is strongely typed and compiled into bytecode, just like java. It is just not the bloated and verbose language java is , which allows fast and pragmatic development.


> Python is strongely typed

Like Ruby.

> and compiled into bytecode

Like JRuby.

Python and Ruby are comparable languages. They are both high-level dynamic languages, with the same sort of tradeoffs (eg, you can do more with less characters than in most statically typed languages, at the expense of safety and performance). There are however significant differences in terms of philosophy, ecosystem and community between the two.


As far as I know, Ruby has proper support for higher-order functions and lexical closures (aka code blocks).

http://stackoverflow.com/questions/4769004/learning-python-f...


Except you dont need a JVM to get performances in python. Using a JVM destroys any perspective of cheap and easy deveployement. If i have to use a JVM , i'd certainly not use JRuby when i can get scala.


> Except you dont need a JVM to get performances in python.

Python's performance is only slightly ahead of Ruby's. Both Python and Ruby are a bad match for CPU intensive tasks(provided we are talking pure python/ruby and not native extensions). Network and IO in general is salvaged by using evented io(or some high level evented framework).


Exactly. You have the same sort of solutions for squeezing additional performance out of Python and Ruby: - using evented framework (network/IO bottleneck) - using native code (CPU bottleneck) - using a different implementation (Python has an edge here with PyPy, as long as your dependencies run on it)


Because numerical processing is a major use case, Python has a number of techniques to easily speed up critical sections of code (numpy, cython, pypy).

I am not aware of comparable features in Ruby.


Numpy is a dedicated library where the critical code is in C, but that's a library, hardly a language feature. I don't know if there is any numerical processing library in Ruby.

Cython is not intrinsic to Python either. I see there is a ruby2c gem, but I have no idea how it compares to Cython.

As for pypy, that's about switching to a different runtime, the same way Ruby people would go to JRuby.

None of this turns Ruby into apples and Python into oranges, I'm afraid.


> Using a JVM destroys any perspective of cheap and easy deveployement.

I doubt that. The fact that jruby is close to bug-by-bug compatibility means that for most projects you can go and develop on mri or rubinius for fast development and use jenkins/travis/... to run your testsuite against jruby. You can then do integration tests as well as staging on jruby to catch anything your testsuite didn't catch. I've seen more than one large project take that course of action and do just fine.

Deployment on jruby isn't any harder than deployment on any other ruby. It's still "puma start" or whichever server you chose. You can however package your app in a war use a standard servlet container such as tomcat or jetty to deploy, but that's strictly optional.


I deploy SpringMVC apps, using an embedded Jetty instance into much the same environments as I do Python apps based on Tornado or Twisted.

The biggest win for me is dependency management. One my maven file is setup right, I only need a JRE on the host.. Dependency management is a bit trickier with Python, perhaps for Ruby as well.


> but if you pass a certain (very unusual) level of traffic, you'll wish you'd made the extra effort

I'm not sure that's the case. Perhaps had they started out by trying to scale to their current levels they would never have gotten off the ground. When you start developing a system for a new company your largest obstacle is almost always lack of product/market fit.


Isn't there some famous quote about premature optimisation? :-)


It's called Play! Framework on the Java/Scala land. You got the benefits of both, ease of development and performance/scalability.


also dont forget grails!


and you can use Play from clojure too, if you think you are pragmatic enough...


> What this says to me, a non-RoR user, is that it's harder to build websites with Java than with RoR, but if you pass a certain (very unusual) level of traffic, you'll wish you'd made the extra effort; otherwise, you'll be glad you didn't. (Extra effort isn't free.)

Twitter hasn't phased out Rails, it has replaced background services written in Ruby.

> Fine, but what I wish I understood was why RoR is so popular to begin with.

That is hard to explain. You will have to try it out yourself and compare with whatever you are currently using.

> why isn't one of them Python on Rails

As other responses pointed out, Python has Django. And no, Django isn't rails inspired. Django was developed independently. Also, the languages and philosophies differ significantly to have a Python on Rails.


Python's culture heavily encourages explicit, consistent code while Ruby's culture is a lot more flexible in that respect. Rails makes many assumptions by default and has an API that is based on clever coding for the sake of maximizing productivity. That stuff would likely not fly in the Python community.


There are some of us Ruby users that find the Rails approach absolutely disgusting... I don't believe it maximize productivity either - the moment you need to deviate too much from the defaults you quickly descend into hell, trying to figure out what "magic" is happening behind your back.


You're making it sound like Ruby is a terrible language that's either much worse than Python or completely redundant with it. I simply don't think that's the case.


Maybe it's hard to build something like rails and get the details right? It's like asking why PC makers don't do a better job at copying apple.


Its called Django.


> None of this will be welcome news to the army of fanatical Ruby developers who believe the language's syntax, its high developer productivity, and its overall philosophy far outweigh any performance disadvantage it might have compared to other languages.

I'm a relative newbie to the Ruby world, but as best I can tell, the Ruby and Rails communities both accepted long ago that they weren't made for Twitter levels of traffic.

Fact is, almost no one has Twitter levels of traffic besides Twitter. That's why Ruby and Rails are still so popular, because for ~99% of performance needs, they're more than capable and also extremely pleasant to work with.


Yeah, that line feels like lazy reporting to me. Any involvement at any in the Ruby community over the past few years and they would know that very few Ruby developers are concerned about the performance issues past a certain scale.

Seems like a no brainer that something that is a lot more strict and static will outperform something that has to deal with many more possibilities at runtime.


> Yeah, that line feels like lazy reporting to me.

Now, now, it's The Register. The tabloid style is part of its identity.


The novice programmer says: "My language is better than yours."

The wise programmer says: "Use the right tool for the job."

Please HN, we're wiser than trending stories would suggest.


Do you think it's impossible for one language to be better than another?

If so, it's impossible for a language designer to do a good job, because it's impossible to improve a programming language.


It seems to me that he is arguing that languages do not have a total ordering. As such, novices will argue that a language is strictly better than another, when a broader perspective will reveal that this is only true for some use cases.

The "no free lunch" theorem would indicate to me that there is no ultimate language, merely languages that are better for common (to you) use cases.


pg's argument reminds me of plato's dialogues where socrates confronts sophistry and searches for a universal truth. At first sight, it seems hard to make a primary ordering of programming languages, because they focus on different areas (ease of use, speed of development, performance etc) and ordering those disjunct areas against each other seems to be impossible. However, it is still possible for a programming language to focus and excel on all those areas at the same time. I am still hopeful of a language for the next 100 years; be it Arc or something else.

As a happy user of Ruby, Ruby excels in many of its promises. I wish Ruby-MRI focuses a bit on performance in coming versions. The ruby community needs this guy who developed the V8 javascript engine to do the same performance leap here too.(Maybe Rubinius or JRuby already do) That said, let us not forget that Twitter performance requirement is kind of extreme for most startups, and most JVM based languages and foremost Java lag in ease of use, speed of development etc to be a viable option for a startup. (I exclude here Clojure, Scala, JRuby etc). For most startups speed of development is what matters.


> At first sight, it seems hard to make a primary ordering of programming languages, because they focus on different areas (ease of use, speed of development, performance etc) and ordering those disjunct areas against each other seems to be impossible. However, it is still possible for a programming language to focus and excel on all those areas at the same time.

I disagree. There's more than ease of use, speed of development and performance to determine if a language is "better" suited to a problem than others. And they're all trade-offs to some point. So to excel at one means that you fall behind on others. Prolog is still used heavily in some areas since it's a natural fit to logic problems. It's just pure logic theory modeled in a programming language. It's a pityful language to solve most "real world" problems, but it excels at what it was made for. So is it "better" than ruby? I don't think so. Is it possible to write a language that models logic problems so nicely as Prolog and still keep the ease of use of ruby? I very much doubt that.


That's not what he's saying at all! The "wise programmer" uses the right tool for the job because it IS a better tool for the job. In the same way, we'll continue to see an increase of programming languages, as opposed to a consolidation, as computing advances and languages improve.


If all you have is a hammer...

I think all languages have strengths and weaknesses that matter based on the context.

For example, I hate manually managing memory in C/C++. Java, Ruby, etc automate all of that for you, but it comes at a cost of using a lot more memory. That's not a big concern for many applications, but if you're doing embedded software or real-time systems it can be a deal breaker.

So in this example, is C++ better or worse than Java? The answer is that it depends what problem you're trying to solve, what environment you're running in, and what resources you have available on your development team. Personal preference matters too in terms of programmer satisfaction, but just because you like one language's constructs more than another doesn't mean it's well suited to every problem space.

As to the original article, I think it's great to learn about how different companies change their software stack, but not because "w00t Twitter hates Ruby - Java rocks"; rather it's interesting to see how business context changes over time and the implications that has on the technology. Seeing how other businesses have dealt with these hurdles can help you keep an eye out for them in your own business. I just wish that this was the lens that it was written in rather than "OMG - Ruby = Fail Whale."

If you're a language designer then it's good to hear about the complaints and preferences of programmers so that you can design a better language (and thus a better tool). And like any tool, there will be times when it's better to use one language rather than another.


I can't speak for him/her but it seems to me the point is that computer languages, while being all-purpose, are products of countless conscious design decisions of their creators. And thus by the nature of design, tradeoffs are inevitable.

So I think the (even just sometimes) discussion of "good language" should be limited to the context of purpose and goal of the said language, which is similar to "choosing the right language for the task".


I do not think the continuum of language can be mapped along one dimension when considering individuals. Constant development of languages, patterns, and idioms are global manifestations of individual frustrations.

Ideally, every person would have their own programming language (or culture), grown from their own personal experiences and desires. In that case, a qualifier for a good language is one that most easily allows expression of new languages or cultures at the individual level.

For a language designer, this could be an impossible problem due to the unique learning models of each individual. Their best hope may be in attracting those whose mental models have some overlap with their own.


> because it's impossible to improve a programming language

Well, since "improve" is a relative and a matter of taste, it is impossible to improve a language, isn't it? Make it run faster, and it will either take up more memory or be slower to develop with. Make it faster to develop with, and it will run slower or have some other tradeoff.

Hence, "use the right tool for the job". And when your focus shifts from "quickly building a product" or "quickly adding new features" to "rock-solid stable and fast", then you need to switch tools. Simple as that.


Could you please tell me what fortran is the good tool for ? :)

As much as I agree it's delusional to think one language may be adapted for virtually any kind of tasks, it seems way too much consensual to me to tell there is no such thing as bad or deprecated languages.


Fortan is great for FEM. Companies like Siemens are using millions of LOC of Fortran to virtually crash cars and explode bombs.

Also, Nastran got us to the moon (which is basically Fortran)


The wise programmer rewrites his stack in another language to scale right?


If your software is Twitter, then you're constantly rewriting the stack anyway just to keep up, and switching to different languages, frameworks, technologies, and tools component-by-component is often the best way to do that.


The wise programmer does not write programs. Therefore, the wise programmer does not exist.


Would have been nice if the title was "Twitter survives election after Ruby-to-Scala move", since that's what actually happened.

(They do use a bit of regular Java, but the majority of the core code is Scala, and it's Scala that should be getting the top line credit here, not Java.)


No. It is the JVM that should be getting the credit.


If that were the case, they could have just run JRuby and avoided rewriting a bunch of code.

The truth is, Scala is actually helpful with the architecture they came up with AND its great that it runs on the JVM.


I'm not necessarily disagreeing with you (static typed languages are going to be faster on the JVM), but JRuby - partially with the help of invokedynamic, added in Java 7 - has come a long way performance-wise since Twitter began their rewrite.

I wonder if they would have done anything different, had JRuby been where it is now?


JRuby, even when using a lot of Java based libraries, doesn't get you the performance boost you need — it is just incremental over Ruby and even then only with some code.


...except that JRuby is 3-100x slower than Java: http://shootout.alioth.debian.org/u64q/benchmark.php?test=al...


No, JRuby is not 3-100x slower than Java.

Those 10 tiny tiny Ruby programs are 3-100x slower than the Java programs written for the tasks.


Magnetised needles and steady hands.


Ive heard that Java is DSL for turning large blocks of XML into stack traces. But it does it fast and reliably.


Hah. I'd put it more like: Java is a platform for building frameworks for implementing interpreters for XML-based DSLs.


This is why I insist people to work on a robust stack from ground up, so it will be less painful in the long run. This is not to say that you shouldn't build a prototype in Rails to get everything up and running as quickly as possible and worry about scaling later, but, it is just my opinion that if you invest the time and effort in working with a robust stack (for example Scala+Lift), your investment will pay you off in the long run.

I've always admired rails for its flexibility and its enormous productivity boost, but all my serious applications are coded in Lift. I for one believe in "develop and forget", because I'd like to call myself a business guy than a programmer, though I'm deep into both. I like to spend more time expanding/marketing my business than worry about scaling it. But that's just my perspective.

JVM is terribly under-estimated and I realized this when I got started with Lift+Scala. Scala is a very powerful language and requires a totally different mindset (=functional). And Lift is fairly complex for those wishing to get started with it and has poor documentation, despite being a 5-year old Framework. But once you understand it fully (somehow), there's no looking back. Lift provides so many things out of the box, especially related to security (unlike PLay!), so it's kind of a trade-off you have to choose between. Even if you compare all the benchmarks, most of the JVM-based languages like Scala outperform even something like GO! (Ok, that's not fair, since GO is fairly new)

If you're interested in Scala, Coursera has a course on it by the creator of Scala himself (Martin Odersky).


> I like to spend more time expanding/marketing my business than worry about scaling it.

If you're truly more interested in the business side, why don't you build it as quickly as possible in rails/django and then later if it warrants it you can hire some people to build it in lift/something else?


Very good point. Truth is, I want to keep down the hiring as minimum as possible, particularly when I plan to bootstrap. Imagine, I could eliminate hiring these people just to 'scale my app' because I chose a language/framework that doesn't scale well. There's some savings in this process. I could be wrong though, because I'm only speculating and I've never hit that traffic level, and probably never will.


The point where you are having scaling issues because of the language then you are really going to be past the worrying about bootstrapping/hiring stage.

It sounds like you are spending time/effort worrying about scaling way before needed. If you can solve your problems you have today faster then I would do that, not hinder yourself today for possible problems way down the track.


Very valid point there dude. I don't know dude...maybe I just want things to be efficient right from ground up...Or it could be the after-effects of falling in love with Functional programming in Scala :)


There is plenty of merit using a language you enjoy.

A certain language might be twice as fast to get work done in, but you can be 10 times as fast if you are enjoying yourself and motivated.

I just wouldn't kid yourself that scaling is the reason for the choice. Just enjoying it is plenty of reason.


In addition, startups tend to pivot a few times, and agility is arguably more important early on than raw performance (depending on product of course).


Agreed. But, probably my situation applies when you've come past all the pivoting and you've settled with one idea you decidedly are going to work on.


Pivoting stops when you have found a scalable, repeatable business model, not an idea to work on.


The rumors of Java's death are, indeed, greatly exaggerated.


They used Scala, not Java, on JVM.


They use both.

"Last week, we launched a replacement for our Ruby-on-Rails front-end: a Java server we call Blender. We are pleased to announce that this change has produced a 3x drop in search latencies and will enable us to rapidly iterate on search features in the coming months."

http://engineering.twitter.com/2011/04/twitter-search-is-now...


At the same time they did that, they replaced MySQL with a real-time version of Lucene.

Almost every one of these "we switched from A to B and got a 3x speed increase" articles conflates a lot of different variables. The first version you build when you have no traffic and product/market fit is the most important thing. Performance is a low priority. Eventually it hits a bottleneck and you begin to look at performance. Perhaps there is another language out there that is faster than the one you're using. At this point nobody says "let's do an exact code translation from A to B". As you rewrite, you keep a constant eye on performance. It often involves ripping out abstractions and moving closer to the metal. The system you end up with usually looks nothing like the one you started out with, nor should it since it is the product of all of your experience scaling up to that point.


Actually, TFA states they use a mix of both.


Java isn't sexy any more. You use Java because your problems are, and so it doesn't matter to anyone.


Twitter's better performance is because of a better architecture than Java language. Further, the performance is from JVM, so you can use any of the JVM language including JRuby to get similar effects.

This articles seems more like link-bait to me.


I agree. I think I could write a scalable Twitter in mostly Ruby. And I've never written a line of Ruby in my life.


Have you worked on other sites with a similar scale to Twitter? If not, how would you know?


I work at Google :)


Then you should really know better than to make statements like "I think I could write a scalable Twitter in mostly Ruby. And I've never written a line of Ruby in my life."


This isn't really a comment on Java, Ruby, Twitter, Google, or my programming ability. It's a comment about the fact that once you've designed a scalable distributed system, the constant factor runtime isn't so important.

Twitter has clearly engineered a working distributed system (or you would have read stories like "5% of Twitter users lost their tweets when our servers caught on fire last night"), so the fact that they wrote it in Scala or Java is largely irrelevant. They need fewer computers than they would have if they wrote it in Ruby again, and that's not something to discount, but I think it's technically possible to have a full Ruby stack do what Twitter does. And, they would have needed even fewer computers if they used C or C++. So it's clear that Java was a social choice (it had the right books, libraries, and stack of resumes) rather than an absolute must.

Besides, Ruby and Java both make the same C library calls to do I/O. So it all ends up being the same for everything except search, trending tweets, and other cpu-intensive analysis.


I think I could write a scalable Twitter using mostly low-paid workers who manually write down the tweets and deliver them in person. It's just a matter of hiring enough workers now that I've designed this scalable distributed system.


Compared to the OP, your post is just random, non-funny sarcasm. The OP has highlighted relevant data points.

1. Writing a system the second time is easier since you already know the pitfalls and avoid them.

2. Ruby, Python, Perl, Java... provide thin wrappers for the underlying IO system calls, and hence, Java IO isn't very different from Ruby IO when it comes to performance.

3. There are only a few CPU intensive tasks. It goes without saying that Java will kick Ruby's(pure Ruby; native extensions will be a different story)ass when it comes to CPU intensive tasks.


Computers are easier (though perhaps less fun) to make than humans. Ruby is slow but it's close enough to Java that the numbers will still work out. Humans: probably not.


30x slower seems a bit of a stretch for "close enough": http://shootout.alioth.debian.org/u64q/benchmark.php?test=al...


You should look at performance difference in relevant operations. I doubt that they calculate mandelbrots at twitter.


> You should look at performance difference in relevant operations

If you have a link to a Twitter-realistic benchmark I'd be interested see the results.


Well, there's pretty much no response he can give to that then.

Unless of course, you work in the marketing department or clean the buildings?


Based on that I would like to hear how you would do so. Would make an interesting blog post or comment here.


  <!-- insert *evil grin* here -->


I suppose it doesn't matter if you didn't work on similar scaling problems in Google.


You're right and I know that "appeal to authority" is not a valid argument mechanism.

With that in mind, I just couldn't pass up the opportunity to make an amusing comment.


And..?


Fair enough :)


I don't think so. Java outperforms JRuby, probably because of typing. See: http://shootout.alioth.debian.org/u64q/benchmark.php?test=al...


Java probably outperforms JRuby primarily because the JVM was designed and tuned to run Java.

But in any case, if Twitter's architecture is truly scalable then any intrinsic slowness of the language shouldn't be a big problem, because they can just toss more hardware in to compensate. What is a problem is a buggy VM that leaks memory. To run thousands of instances in a heavily instrumented way, the VM must be stable and predictable.


Statically vs dynamically typed peformance difference is clear for virtually any static or dynamic language you care to name, it has nothing to do with the jvm being "tuned" for a language: http://shootout.alioth.debian.org/


>> peformance difference is clear for virtually any static or dynamic language you care to name <<

JavaScript

http://shootout.alioth.debian.org/u64/which-programs-are-fas...


Only when they removed LuaJIT from the shootout, it is dynamic and fast (it was around where Java is in the benchmarks).


The LuaJIT performance is pretty impressive for a dynamically typed language, but it's hard to deny the fact that mainstream dynamically typed languages our outperformed by their static counterparts.


Thats partly because they were not designed for performance. Lua of course was as it was designed partly for slow small systems. But pypy is getting on pretty well too and javascript, showing what you can do even if not designed for performance.


How does that explain Julia[0]/Node.js? Both within 1-2x speed of C, I think static has less to do with it than you think.

[0] http://julialang.org/


You pretty much have to enforce static type invariants (e.g. "this variable will never be other than an int") to JIT stuff performantly. I don't think that's a point against the GP's argument at all.


Probably because there are better Ruby programs to be written for those tasks :-)


...I never get it why scaling is seen as such a "hot" problem and everybody seems inclined to make language and technology choices based on it ...I bet Twitter could have scaled well keeping Ruby and rewriting performance critical parts in C and maybe keep tweaking and tuning the Ruby interpreter, much along the way the Youtube team did with Python and C and now some Go I hear...

I think their decision was more influenced by the experience of their team of programmers or by employable talent pool - they went the JVM|Java|Scala route because they had people with experience in high level languages and the JVM. If they happened to start with a team with "C hackers" background they would've gone the Ruby|C way and it would've worked as well.

...I think almost any language and technology can (be made to) scale, even to Twitter scale, at least if it's open-source and you have people with the required technology to hack around the internals and recode performance critical parts in lower level languages (basically C, C++ or Go nowadays)


Did you skip the bit in the article where they mentioned the faster Ruby VM they built, before moving to JVM?

http://engineering.twitter.com/2011/03/building-faster-ruby-...


no, I didn't skip it, and I can only guess that either they considered they wasted to many resources for such a "small" performance gain or didn't have people with enough skill in the right areas to keep going this direction. But no, I don't know all the details and steps of Twitter's switch and my experience is in the Python ecosystem but I think lots of things are similar...


Well, there are several problems with MRI-1.8.x/REE that come into play when you try to go for raw performance. The biggest problem is probably the "stop the world" GIL that occurs around the GC and AFAIK around all external C calls. The second biggest problem is that the lack of multi-core capabilities. Both can bite you pretty badly when you're aiming for raw performance of a single process. They're both not as bad when you're doing web-level stuff where it matters little whether you start one, two or 20 processes to handle your load.


I wonder if Node.JS would meet the needs of Twitter? It's marketed as a language for real-time applications that require a lot of concurrent connections and whatnot, not nearly as complicated as Java/Scala either but would it be stable enough to power a site like Twitter? The inner nerd in me is going haywire with the possibilities.

While most won't ever encounter the issues Twitter encountered using Ruby and Ruby on Rails these kinds of articles are very damaging for the Ruby language and Rails framework because even though I primarily still use PHP, Ruby & Rails are something I have a vested interest in as well and this will no doubt push potential newcomers away from the language.


JVM is a wonderful thing.


Github and the thousands of Rails powered websites have no problem scaling, so i'm pretty sure neither Ruby nor Rails are a problem.


Github is an easier problem: generate pages.

Twitter has to deal with very high fanout and low cacheability.


With enough hardware you can make virtually anything scale, for a price...


Well github is quite profitable , maybe that's twitter's problem... what service does twitter provides that people are willing to pay for ? None...


500 million users is more a little more demanding than 2 million users.


Scalability is an architecture issue, not a programming language issue. You can certainly build linearly scalable system in Ruby. It just costs more to scale a Ruby app vs. an equivalent Java app. In the end, the tradeoff is between engineering cost and operating cost. When you spend 100k to hire an engineer and your traffic can be handled by a few boxes. Engineering productivity is your primary focus. As your traffic grows, you need more and more servers, server efficiency becomes more and more important. At some point, the savings in server cost justifies the increased cost in engineering effort and you make a switch from Ruby to Java.


I would think that scaling the web servers to handle 15k requests per second would be relatively easy compared to scaling the database servers.

I would have thought that you could just throw cheap hardware at that problem, whereas the database would be a considerably complex scaling issue.


Not to trivialise your question, because scaling to tens of thousands of database TPS transactions is difficult.

But it is commercially solved. If you turn up with a fistful of money, Oracle, IBM, Sybase and a bunch of other companies would love to handle that for you.


probably projecting my own problems onto that one, hehe.


i wonder if invokedynamic was introduced before they started moving their stack and staff to Java - would jruby save them in that case?


This move marks the beginning of the end of a four year long effort for Twitter to rid themselves of Ruby.

History will remember the entire Ruby industry as a series of compounding failures.

The de facto formalisation and specifications.

The black-box behaviour of core development.

The broken-linked, un-versioned docs.

The rampant cargo-cult mentality.

The arcane exceptions.

The meta-frameworks.

Gem hell.

1.9/2.0

Rails.


I'm not a Ruby developer, but I still can't help thinking, "WTF".

No technology stack is perfect, but I've yet to meet a stack that was pure evil. There may be some cargo-cult personalities in the Ruby community, but, if there is, it's only because there is value in it.

An infinitesimal number of sites have to deal with Twitter's scale problems. The rest can work on getting crap done instead of worrying about Maseratti problems.


Allow me to be withering in my criticism of GP.

Cargo-culting is what you accuse others of when they are learning and you do not like them.

The appropriate way to deal with newbies who do not fully understand the consequences of the decisions they have made (perhaps even while they are advocating others join them), is to explain their rhetorical and technological shortcomings in a way that others can learn from. Accusing someone of "cargo-culting" is just unhelpful character assassination.

Also, weak-sauce.


> No technology stack is perfect, but I've yet to meet a stack that was pure evil.

As someone who has seen Ruby cause major problems I'd have to say it's not too far off "pure evil"


You must also think cars and SQL are also pure evil?


Relative to other tools we've used, Ruby has caused a disproportionate number of problems.


Some context of what those problems were, how you were using Ruby, what was the expertise of your team members, etc would be of interest for such an inflammatory accusation. Bold claims require proof.


Aren't they?


argumentum ad populum


> History will remember the entire Ruby industry as a series of compounding failures.

This is a bit dramatic. Despite its faults, the language and community around it have been quite a success story. It's had a huge and positive influence on the web development world.


It's also worth rebutting douchebag trolling with the Ruby community's actual merits.

The Ruby community, really really cares about developer tooling and teaching others. Ruby is also one of the nice places where object oriented programming touches metaprogramming.

Don't forget that a lot of sysadmin work is done in Ruby now. Both Puppet and Chef are written in Ruby, and that Github and Heroku both came up out of the Ruby community.


argumentum ad novitam


It's time to pack up and go home if you are going to straw man Github and Heroku as being popular merely because they are novelties or fads.



You can throw wikipedia links to try justify an argument ("hey, look at how many other projects there are!"), but GitHub and Heroku both leverage RoR heavily and both are incredibly large, successful products. Having competitors does not a failure make.


Q.E.D.


> Gem hell.

Is this really a problem? Every package/dependency manager seems to blow up on occasion. Gems haven't given me the problems that I've had with Pip/CPAN/Autoconf.


It's enough of a problem that bundler was written. Of course Bundler sorts the problem out pretty well so... Hell averted :P


Well, bundler solves a problem that's related, but tangential to what rubygems does. Rubygems is first and foremost a packaging format for libraries. It handles loading, provides a common format to specify dependencies (gemspec) and a default code layout for libs. It does not to dependency resolving.

So what people used to call "gem hell" was actually "I need to specify all my dependencies and take care of conflicts myself." That's what bundler does. And it uses rubygems to actually retrieve, install and load the gems.


The only Gem hell I know of is when using distro packaged gems. They tend to do weird crap like changing dependencies or backport fixes instead of packaging the new version.

There are things more funny to do than solve bugs introduced by a backported patch.


Hell: not really a problem.


Obvious flame bait. Please don't post this garbage on HN.



I call bullshit. This doesn't spell the end of anything. Ruby isn't the right tool for the job and hasn't been at Twitter for a while. This is old news. Ruby has a place in a lot of other scenarios.


sir, troll more please


And i say they should have use C++ directly, C++ is the only thing that scales ... what? oh sorry i meant Assembly. Nothing scales more than assembly, java is slow... why are we having that discussion again anyway ?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: