
Java IO is faster than NIO: Old is New Again - rkalla
http://www.thebuzzmedia.com/java-io-faster-than-nio-old-is-new-again/
======
rkalla
For those that want the "Quick WTF?!" it is because: * With modern operating
systems, idle threads are practically free. * Managing non-contending threads
is extremely inexpensive now. * Multi-core systems. * Selectors and state-
restore used by asynchronous NIO libraries in high-load environments is more
expensive than waking up and putting to sleep threads.

You mix all that together and you get a new appreciation for java.io. _

~~~
jbooth
The missing detail from all of these, of course, is that transferTo doesn't
work with blocking I/O. You need to use a selector on WRITE to the client
socket to make sure you're actually allowed to transfer bytes. Given that
transferTo involves zero-copy (always faster than copying to buffers), a mix
of the two approaches is usually best.

~~~
rkalla
jbooth,

Interesting post -- I don't know if that is something Paul evaluated. What
mix/ratio would you propose?

Something like 1:10 threads:selector/connections?

~~~
jbooth
Well, if you're getting into that, you're far enough down into the rabbit hole
of optimization that you probably have a reason for being there and some
business specific logic that would drive some of your specific tunings..

I'd generally recommend a selector identifying sockets that are ok for write
and then delegating to a caching thread pool to handle actual transferTos. If
that select loop becomes your bottleneck, you can of course horizontally scale
that and have a set of selectors, striping the register()s across them to get
less lock contention and having them all feed into the caching threadpool.

------
abstractbill
This talks about numbers of threads being as high as 1000.

My job involves writing servers that can scale to at _least_ 20,000 concurrent
connections (I use Twisted for that, but I'm still interested to see what's
happening in JavaLand).

I was disappointed the article didn't look at things at that kind of scale -
it would be a much more impressive result if threaded/blocking io was still
best at that level.

~~~
metachris
This article is just a summary of Paul Tyma's 2008 presentation (posted to HN
yesterday): <http://www.mailinator.com/tymaPaulMultithreaded.pdf>

I am working on a multiplayer framework using Python, where one gameserver can
potentially handle up to 50,000 concurrent socket connections (each
interacting with a couple of other ones): <http://www.flockengine.com>

epoll() is definitely the way to go - it's very fast with high numbers of
sockets.

\- [http://docs.python.org/library/select.html#edge-and-level-
tr...](http://docs.python.org/library/select.html#edge-and-level-trigger-
polling-epoll-objects)

\- <http://linux.die.net/man/4/epoll>

------
famousactress
I guess I'm not surprised. My reason for using NIO in the past has been
because I've had systems that supported _many_ long-running, mostly idle open
sockets. Seems logical that there would be a tradeoff to abstracting threads &
state from connections.

~~~
bnoordhuis
You're spot on. NIO has always been slower but it allows you to process more
concurrent connections. It's essentially a time/space trade-off: traditional
IO is faster but has a heavier memory footprint, NIO is slower but uses less
resources.

~~~
dylanz
... and I think your synopsis is spot on as well. If someone could counter
this, I'd love to hear it.

------
kls
I did not see the specs for the systems used to perform the test, If they
where using a multicore server, this would surly slant the results in favor of
the threaded model. As well, the JVM / Java was designed from the first
release as a blocking IO style development language so you have years of
development towards that design philosophy.

I just don't see how apples to apples, on a single core, blocking can be
faster than NIO. I think C would be a better testing ground, where you don't
have built up layers over the years that reinforce one over the other.

~~~
dkarl
_If they where using a multicore server_

Where would they find a single-core server these days?

~~~
frio
Typically, in the non-blocking IO space, you take advantage of the multiple
cores by running multiple instances of your server. For instance, I run 4
Tornado instances on my quad-core server, sitting behind a simple nginx proxy
(this is, of course, Python rather than Java).

Admittedly, I haven't performed any benchmarks (I chose Tornado for reasons
other than performance), but as with the OP, I was hoping the author of the
article would provide more detail on their benchmark setup.

------
therockhead
The Atmosphere team have a good slide titled "Scaling the Asynchronous Web"
which has a few pages one threads vs NIO
[https://atmosphere.dev.java.net/conferences/2009/JavaOne/200...](https://atmosphere.dev.java.net/conferences/2009/JavaOne/20090602_ScalingPushJ1.pdf)
.

------
frognibble
Here's the previous discussion on Paul Tyma's presentation:
<http://news.ycombinator.com/item?id=1546711>

------
redrobot5050
Faster is the wrong word -- scalable. It was definitely talking about
scalability and load, not just pure speed.

~~~
rkalla
redrobot5050,

The first 30 slides or so are explicitly about the 25% performance gap between
NIO and IO on a modern system -- I think it's fair to say the article was as
much about performance as it was scalability.

I think he even addresses the misconception that NIO is perceived as "faster"
simply because it's more "Scalable" -- the "myths" slide that he keeps going
back to and crossing elements off of as he proves them wrong.

