

Twitter: From Ruby on Rails to the JVM [video] - ahmicro
http://ontwik.com/rails/oreilly-oscon-java-2011-raffi-krikorian-twitter-from-ruby-on-rails-to-the-jvm/

======
rrrazdan
I was recently in a quandary over the choice of technology. I started RoR and
I really like it. However I was concerned about long term implications of that
choice. The thing that I am taking from this talk is that I shouldn't worry
about that, right now. If and when I need to scale, I will have enough
resources to make a better choice. Resources that I don't have right now.

~~~
xal
Shopify is still 100% ROR and we serve hundreds of millions of requests. You
will be fine :-)

It's a competitive advantage for us, we move faster then the rest of the
market.

~~~
imack
I'm actually really glad to hear it. Though, I wonder about the "rails doesn't
scale" mantra, is that really more for active record? In your experience is
active record the biggest out of the box bottleneck?

~~~
simonw
The "rails doesn't scale" mantra was discredited 5 years ago, when people
realised that it scales exactly the same way as PHP. Remember, scalability !=
performance.

------
angerman
One thing that wasn't touched was JRuby[1], on their site they state high
performance and real threading as advantages.

If twitter has (some of) the best ruby developer (mentioned somewhere at the
end of the video), why have they neglected JRuby? Why is it no option? For
legacy code with native extensions this makes sense. But is jruby slower, more
memory hungry on the JVM then scala or clojure? I always though that JRuby was
one of the more performant languages on the JVM?

Apart from that it was an interesting talk.

[1]: <http://www.jruby.org/>

~~~
cageface
JRuby is fast for a Ruby implementation, but it's still far, far slower than
Scala or Java itself.

[http://shootout.alioth.debian.org/u32/benchmark.php?test=all...](http://shootout.alioth.debian.org/u32/benchmark.php?test=all&lang=jruby&lang2=scala)

~~~
bad_user
I do get that a dynamic language like Ruby will always be slower than a
language like Java, which has primitives and where many things, including
static method calls, are solved at compile time.

But citing the Alioth.Debian benchmarks? Really?

Dude, take a look at the source-code of those benchmarks sometimes -- they are
completely useless ;)

~~~
netghost
If you want a fast dynamic language, take a look at Lua. It's surprisingly
fast, especially LuaJIT, and pretty straight forward (almost boring really).
The main downside is that there isn't the breadth of community around it.

~~~
cppsnob
Lua's threading is just as broken as Python or Ruby. Maybe even more so.

------
petercooper
Direct YouTube link: <http://www.youtube.com/watch?v=ohHdZXnsNi8>

~~~
rhizome
OP is an ontwik posting bot. Thanks.

~~~
sscheper
Surprised no other person brought this up.

~~~
petercooper
Ontwik clearly provides some sort of service, if only to dredge up YouTube
videos that we've otherwise missed.. but I can't help but feel there's a
"better way" for these videos to be discovered than a site that just embeds
and adds no editorial context.

~~~
rhizome
Of course there is: one can post a link to the YouTube to HN with a title that
describes the content. No separate website necessary.

------
gcampbell
If any of this stuff sounds interesting to you, we're hiring for all sorts of
positions: <http://twitter.com/jobs>

------
hkarthik
Part of the problem is that any typical Web Application framework is ill
suited to building a full real time system like Twitter.

I doubt their story would have been much rosier if had they gone with Spring
MVC, Hibernate, and Oracle from the start.

The minute you start moving away from CRUD based application design and moving
into SOA, you're already signing up for a significant rewrite, even if you
stick with the same platform.

------
MrMcDowall
No one will ever be in the position of handling so much real-time data as
Twitter is. The rest of us can just get on with it and stop trying to pre-empt
situations that will probably never happen to us.

~~~
xentronium
Many companies in financial sector handle similar workloads. I know a couple
of HFT shops, they're all java.

~~~
MrMcDowall
For sure, I was meaning from the perspective of a startup. There's obviously
existing industries where real-time data is big, but they don't tend to be web
fronted serving hundreds of millions of users.

------
hello_moto
Many people seem to refer only the scalability (performance, that is) side of
the argument but only few who actually pointed out the developer's
productivity of choosing Scala and Java for Twitter situation.

<http://www.infoq.com/articles/twitter-java-use>

~~~
schumihan
Agreed. You can achieve very high productivity if you use Scala and Java
properly.

------
sahglie
With the release of Java 7 (invokedynamic) the performance of these dynamic
languages (like ruby and python) may become much less of a factor (JRuby and
Jython). At least that's what the JRuby folks imply:
[http://www.engineyard.com/blog/2011/jruby-1-6-released-
now-w...](http://www.engineyard.com/blog/2011/jruby-1-6-released-now-what/)

Punch Line from Link:

"There’s a very real chance that invokedynamic could improve JRuby performance
many times, putting us on par with our statically-typed brothers like Java and
Scala. And that means you can write Ruby code without fear. Awesome."

~~~
mark_l_watson
Right on! I was going to make the same comment until I saw yours. Charles
Nutter has been very enthusiastic about invokedynamic based speed improvements
- can't wait.

A little off topic: as a consultant it seems like the demand for Clojure was
been tremendous: Clojure is a nice language and very performant. It will be
really interesting to see how much large speed improvements in JRuby will cut
into Java's, Clojure's and Scala's developer market-share.

------
troymc
I noticed that every time Twitter acquired a company, they also accrued a new
language:

* Summize brought Scala

* BackType brought Clojure

Has anyone noticed that pattern elsewhere?

~~~
jorgeortiz85
Twitter started looking at Scala before the Summize acquisition. Also, to my
knowledge, Scala was not being used at Summize.

------
jingweno
I think most people misunderstand the point made by Raffi...yes, JVM has
awesome performance, but never try to solve performance issue up front by
sacrificing the agility offered by RoR. Not everyone is building Twitter, you
don't even know whether you will hit the point where the VM is blocking your
way. If you blame all the performance issue to the VM level, you are simply
doing it wrong...In 90% of the cases, MRI is fast enough and meet your
requirement (GitHub, Groupon, Living Social and many others are using RoR
BTW). Twitter is very pragmatic at this respect, they have tried all the means
to scale the app on RoR before they move to the JVM. Never ever try to solve a
requirement that doesn't even exist in your own app...

~~~
jingweno
The point is you probably never need that performance gain on the VM level
(YAGNI) at all. Things are changing very fast in software development, your
project could die or fail long before you really think about performance
optimization on the VM level. But when you start a project and think that my
project will fail because I use Rails and Rails doesn’t scale, you are doing
it all wrong! What you really need is a tool that helps you iterate faster.
Rails falls into this category. And that’s the productivity gain that I am
thinking when starting a company.

As a pragmatic approach, a half century of software engineering says that you
should write the code first and worry about making it faster only if it is too
slow. Donald Knuth is right: Premature optimization is the root of all evil.
Don’t merely let the VM performance metric blind you to this fundamental
truth. If you are chasing for a performing language, Java/Scala is not your
ultimate solution, C/C++ is, even Erlang.

I actually don’t see a problem with “performance” emerging as the requirement
in the Ruby world sooner than others, say Java/Scala, because this “sooner” is
very contextual and depends a lot on the implementation. To give you more
info, GitHub is on RoR since it started, and till now they haven’t hit the so-
called “Rails does not scale” point ( <http://teachmetocode.com/podca..>. ).
So are many other projects. Besides, think about Twitter, they only recently
try to port everything to the JVM, after Rails has served them a couple years.
All these facts tell you, this “sooner” may never happen to your own app, and
most importantly, Rails can scale, although it may not scale as well as
others! But once you hit the point where Rails, or Ruby in general, doesn't
meet your performance requirement (assuming you are lucky enough to build
another Twitter), do what Twitter suggests you to do in the video. Is that too
late? Not at all. Because by then, you have the resources to do whatever you
want, even inventing a VM that is more performing than JVM.

To summarize, the Ruby VM was fast yesterday, is still fast today, and will be
faster tomorrow. In 90% of the cases, it's just fast enough. Do I need the
performance gain by switching to JVM? Don't know yet. It'd be better to let
the market drive you. Does Rails provide the agility I want to start a
project? 100% hell yeah!

------
moe
I'm still not getting the futz they're making over their "scale".

So your inbound load is 7000 tweets/sec or roughly 250 MBit/s (assuming 4k per
tweet). Then you fan that out to (assuming) 20 append-only mailboxes on
average.

Perhaps my assumptions are far off, but I'm only arriving at a couple GBit/s
here and a low two digit number of terabytes/storage per year.

This sounds like "a couple racks" to me, not like "a couple datacenters".

~~~
lenn0x
Here is an old presentation from a year ago.

<http://www.slideshare.net/nkallen/q-con-3770885>

Now, at the time they were doing peak 2000 tweet/s. The fan-out was 1.2M
deliveries a second... So if we go with the current 600:1 ratio at 7000/s,
that's about 4.2M/s. I actually know it's much higher now since I work there
but other things to consider is we have a large data warehouse, search, API,
pipelines to external parties for the firehose, logging at terabytes an hour,
in-house metric collection doing 3M writes/s, etc.

<http://www.scribd.com/doc/59830692/Cassandra-at-Twitter>

It add's up very fast.

------
SonicSoul
very interesting talk. I was surprised that there was no mention of using a
lower level language i.e. c or c++ in order to maximize cpu/ram utilization.
while JVM is a clear winner over ROR it does add some overhead. I guess it is
a sweet spot between performance and code manageability.

------
neduma
Any thoughts solving scaling issues with node.js..

~~~
stock_toaster
Until node gets some type of bind-fork-accept mechanism (built in) to utilize
more than one cpu in a native and simple fashion (cluster/multi-node are
close), I feel it will not gain the same level of traction that java has.

People also have opinions about java(scala/clojure) vs javascript from a
language preference standpoint. I think it is too early to tell what impact
this will have.

However, many developers I know seem to have a strong distaste for Java, the
JVM, and the ecosystem around both. I think several of those folks would look
to node, erlang, or possibly even golang (if it gets faster) simply to avoid
using java.

~~~
hello_moto
I noticed that the people who have a strong distaste for Java are largely
application-developers. In most cases, these developers usually just work with
the available libraries or APIs to build a website backed by database (some of
them are consultants that build similar apps over and over again).

Back-end developers seem to (maybe?) prefer to use Java.

~~~
stock_toaster
I don't know if my anecdotal evidence bears this out, but I admit that it is
just that...anecdotal.

People are still using C and C++ to write low level code. Databases, package
managers, games, etc. Then there are the 'application developers' as you named
them, writing http service endpoints, web apps, and the like.

It seems java still owns the colossal corporate stacks. I hear things like "it
is easier to hire" and "java is faster/better for extreme large scale". If you
think about all the languages and tools available, only the first makes much
sense.

* C/C++/D is faster than java.

* statically compiled code is easier to deploy.

* Erlang is arguably more scalable than java.

* Haskel/Ada is 'safer' than java.

* I think several languages are more fun to write in than java. Ruby, python, golang, coffeescript, etc, etc.

So java may not be the _best_ language for large scale, but maybe _one of the
best_ or _good enough_? When combined with the first point of ease of hiring,
I can certainly see why large companies are attracted to it. If your language
of choice lends itself to your workers being more easily replaceable, then as
a company that is probably better/safer.

Other than that, I can't see why someone would prefer to use java. I don't
work in/at/for huge companies though.

I admit that my own personal 'java bias' is based on dated interactions with
java. Whenever I hear 'java' I think: good performance (vm), eats memory like
candy, painful ecosystem of xml files and outdated/abandoned random libraries.
I have tried dabbling in scala, and while I enjoyed the language to a fair
extent, I still found myself wrestling with the JVM and the ecosystem (library
version incompatibilities, obscure compiler errors, namespace wrangling, etc).

~~~
hello_moto
My opinions are anecdotal at best as well and that's the reality of software
development. There's no research that can state that X is better than Y
whether it is programming languages, methodologies, architectures, patterns,
etc.

I don't deny the reality that people are still writing C/C++ code in the field
of embedded devices, games, something that requires fast performance with a
very low memory usage. On the other hand, there are a few NoSQL solutions
built using Java: HBase, Neo4J, Cassandra.

In some cases, JVM Hotspot optimizes code on-par with C/C++. I don't know much
about D performance. If the speed improvement is not night-and-day for
projects other than being mentioned above, and if writing readable code is
much better in Java, I'm not sure if we should compare C/C++ vs Java. On the
other hand, many people seem to come out and say that Ruby is _very_ slow. Is
it heaven-and-earth slow?

There are advantages and disadvantages of compiled vs dynamic code when it
comes to deployment. It all depends on the tools and ecosystem too sometime.

How is Erlang more scalable than Java? In what area? horizontal vs vertical
scaling? developer's productivity (or team performance) scale? performance?
speed? Erlang seems to excel in a niche area (in a positive speaking).

What about Haskell/Ada, how are they safer than Java? Do they have better
type-systems? handles NULL better than Java? Bulletproof from developers?
detect more bugs?

Keep in mind that Java ecosystems have grown and matured a lot since 2004. The
tools and libraries are staggering. Most of your concerns are no longer
relevant except "eats memory like candy" in most Java desktop apps. Having
said that, have you heard about Java ME? that thing runs in mobile devices
albeit a different distribution of JVM.

Outdated/abandoned random libraries seem to happen in our neighboring
ecosystems: Ruby (and Rails).

I have to admit that sometime other languages are more "fun" to dabble with. I
use Python and Ruby. I like Python because I don't have to argue when it comes
to code-style. Pythonic (PEP-8) or GTFO. It's not that I hate innovation or
artistic coders, it's just that I'm a discipline person. Best practices in
most cases, pragmatics when needed, hacks when the world ends tomorrow.

Companies chose Java for varieties of reasons and yes, one of them is the
available pool of talents. I'm sure we all have heard the old phrase
"enterprise developers". Some of them are bad, while others are quite sharp
when it comes to the typical enterprise stack. Some of them can design
systems/libraries quite well. Spring Framework comes to mind. Google Guice,
Google Guava, Android are next (yep, crazybob used to do EJB and enterprise
Java stuff yet he's one of the sharpest Java dude I've known).

I noticed that some of the well-run enterprise systems do have a better
infrastructure planning thus forcing people involved around it to know better
when it comes to certain technology choices. I'm sure there are web startups
out there that just keep on hacking PHP code and use MySQL without having
plans for backups, recoveries, etc.

Of course these are anecdotal experiences of mine.

~~~
stock_toaster

        > On the other hand, many people seem to come out and say that Ruby is _very_ slow. 
        > Is it heaven-and-earth slow?
    

Comparatively, I would say yes. Granted, most of the time it won't matter
because you are waiting on IO (disk/network), but if you are doing cpu
intensive work, it is slow and you probably need to drop down to C or offload
that work to another service.

    
    
        > How is Erlang more scalable than Java? In what area? horizontal vs vertical scaling?
        > developer's productivity (or team performance) scale? performance? speed? Erlang seems 
        > to excel in a niche area (in a positive speaking).
    

My guess would be in single server scalability. Erlang's write-once variables
and actor model, combined with a good VM ("green processes") make it very
single-server-scalable (verticle). It also has good built in node-to-node
communication mechanisms (horizontal). Performance is probably slower than
Java though. And I imagine the developer pool is much more limited than that
of Java.

    
    
        > What about Haskell/Ada, how are they safer than Java? 
        > Do they have better type-systems? handles NULL better than Java? 
        > Bulletproof from developers? detect more bugs?
    

I meant safer in the type-safety sense, yes. There are also classes of static
analysis tools for both. Granted, my knowledge of these languages is quite
limited.

I certainly see your points (especially about liking the code hygiene of
python), and agree that Java is not going anywhere soon. I guess I don't
understand why a startup, or individual developer, would choose Java over
other languages, even other languages on the JVM, for new projects.

Thanks for the good discussion. :)

~~~
hello_moto
Java ecosystem has a lot of static analysis tools that can integrate to almost
all popular IDEs and Continuous Integration systems. Findbugs, PMD, JDepend,
Sonar (recommend to check Sonar).

Checkstyle is another tool that I use since I'm kind of the annoying dude when
it comes to code-style. (Have you seen GWT API code? it's like written by one
person as opposed to a few developers with different perceptions of "readable"
code. I like that kind of thing).

There are a few reasons why startup/individual dev would choose Java:

1) Previous experience in Java

2) Java fits better for the type of problems to solve (intensive computational
that requires Hadoop like infrastructure)

3) Emotionally attached to static/compiled language with nice IDE so that one
can navigate the source code easily whether the code base is large or small
(sometime not all decisions are rational and I'm okay with that because
developing software requires more than technical skill; it also requires
passion).

4) Marketing (if you're targeting the enterprises). Zimbra, Jive Software,
Compiere, Alfresco, Day software, Liferay, Salesforce used to be startups.

Java ecosystem seems to learn and grow in a much better speed thanks to the
following actors:

\- Rails (Spring Roo, Spring MVC, JPA 2.0, and possibly MVC framework from the
upcoming JEE releases)

\- C# (Java 7 new features, Java 8 closures/lambda. Yes, Lisp does this first,
but I think C# forces Java to implement closures more than any of its
competitors).

\- REST/JSON/WS (Check out the latest JAX-RS, supports REST, JSON, XML, Atom-
Feed, and JAX-WS)

\- I/P/SaaS + Cloud Computing (Targeted for Java EE7, deployment,
infrastructure to support multi-tenant, etc).

NB: Just so that I don't sound like a Java fan-boy, I use Java by day but I
use and help to promote and organize Python community overseas (of course by
not comparing Python vs Java :)).

------
jsavimbi
Anyone pondering their technology stack should watch this video. It doesn't
matter what you initially employ as a technology/framework/server to get your
app up and running, but if you need to scale the JVM is were it's at. I say
that as a Rubyist.

~~~
mattdeboard
I was surprised when I got some pushback on this concept at a local Django
meetup last week. A lot of people believe that Python & Ruby-type languages
are the backend languages of the future.

~~~
cageface
To be fair, you're very unlikely to ever have to solve the kinds of scaling
problems Twitter has had to solve. You'll get your app off the ground faster
with Python or Ruby.

~~~
jsavimbi
You will get your app of the ground faster, but you're selling yourself short
if you're a technology-based company thinking that you'll never have scaling
problems. Competing against Twitter or any other social-based app you'll
probably never encounter that level of scale, but any financial application
will need to be both fast and handle the complexities that only the JVM can
address.

Like he states at the end of the video, when describing the 7000+ tpm during
the WWC:

"...we do things like Forex spikes upon our standard baseline growth. So right
now the JVM is really the only mechanism that we can build upon that gives us
the flexibility to do something like that."

~~~
gcampbell
I'm pretty sure he said "4x" rather than "Forex".

~~~
jsavimbi
Good catch, and make sense given the context. My mind is stuck in HFT mode and
constantly has me worried.

------
eurohacker
is there any good site that would explain when to use what technology - like

when you need many concurrent users then dont use ROR but use JVM or C++ or
Scala instead,

if you need to build a fast prototype build on ROR or PHP etc.

------
lhnn
An ignoramous question: Aren't there faster languages than Java?

------
swiharta
I'm pretty sure the project I'm working on will be the next Twitter, and this
video's making me second guess staying with RoR.

