
Faster Copilot with faster Internet traffic - gus_massa
http://www.joelonsoftware.com/items/2009/02/05.html
======
bprater
I would have like to see some technical notes in there. It almost comes across
like a glorified sales pitch for two products!

First Joel says that they don't deliver static files and that edge caching
won't help. And then they swap out some IPs, cuddle up to Akamai and bam --
suddenly the Internet delivers.

~~~
wmf
My impression is that Joel has no idea how this Akamai IPA thing works; he is
treating it as magic. I would never buy a service that is advertised as magic,
though.

There is a little technical stuff here:
<http://www.akamai.com/dl/feature_sheets/FS_IPA.pdf> It looks like a fancy TCP
reverse proxy. I wonder if it's based on the MIT RON project.

------
gecko
Hiya. I'm one of the Copilot developers, so I thought I'd answer some
questions.

The Akamai IPA thing is not "magic," and none of us think it is, but dumping a
rambling technical diatribe onto Joel's blog would alienate a lot of its
readership. We're talking on Hacker News now, though, so read on if you're
curious what's going on and how Akamai's helping us.

There are two ways that an Internet connection can be slow: you can have
insufficient bandwidth, and you can have bad lag. With Copilot, we're already
doing just about all we can to minimize bandwidth with our raster-based
format. We've got pretty fast compression algorithms, we dynamically pick
compression levels on-the-fly, and because we're using existing VNC code
under-the-hood, we do a good job sending across just the differences in your
screen. Switching the protocol to be command-based (like X, or RDC) is the
only real way left to improve bandwidth--and with our current developer
resources, that's simply not feasible. The good news is that this discussion
mostly ends up being a non-issue: relatively few customers actually have
bandwidth saturation issues, and internet connection speeds are steadily
increasing, so bandwidth's not generally a big concern.

That leaves lag. Most of the time, when customers complain Copilot is "slow,"
they're complaining about lag. Copilot gets lag in two places: on the network
connection itself, and on the glorified echo-server (which we call the
Reflector) that we use when none of our firewall bypass tricks work. We try
really hard to ensure you connect directly, but sometimes you simply can't, so
it's very important that the Reflector be very efficient.

About a year ago, I did some analysis to see how we could speed things up, and
found that our Reflector introduced nearly a quarter second of lag in some
circumstances--not terribly surprising, since it was literally the very first
thing I wrote at Fog Creek, back when I was an intern. The design used one
dedicated thread per connection with blocking I/O. On Windows, that type of
design is very slow. I took a few weeks out and rewrote the Reflector to use
overlapping I/O with thread pools. The resulting Relector now introduced
virtually no lag, and as an ancillary benefit, used a fraction of the CPU and
proved far easier to maintain. For the time being, our customers were really
happy.

Unfortunately, over the last year, customers have complained that our
performance is still sub-optimal. The problem's been compounded because
firewalls are getting smarter, which is making our firewall bypass tricks fail
more and more often. Even if the Reflector truly introduced no lag whatsoever,
you're still gaining a lot of lag when you have a reflected connection, simply
because your data has to make a full round-trip from you to the reflector to
the other party.

That's where Akamai comes in. The main thing that Akamai's doing for us is
dramatically reducing lag between our data center and our customers. That
means your keystrokes get to the other side faster, the machine starts
responding faster, and you get the graphical updates faster. Even though
bandwidth hasn't increased, your Copilot experience suddenly becomes _far_
more pleasant, because the machine feels much more responsive.

We're still monitoring our clients to see how much of a difference Akamai's
making, but in general, what we're seeing is great. The bitrates for Akamai-
enabled connections has gone way up, since the higher response rate allows our
users to get more done in less time, and we've been able to pull of some
stunts--like watching YouTube videos via Copilot--that were previously
impossible over reflected connections.

Akamai's not a panacea, and what they're doing isn't magic, nor is it useful
in all circumstances, but for a real-time application such as Copilot, it can
yield a pretty dramatic improvement in exchange for very little developer
time.

[EDIT: Fixed some snafus I caught when re-reading.]

~~~
blasdel
_The design used one dedicated thread per connection with blocking I/O. On
Windows, that type of design is very slow. I took a few weeks out and rewrote
the Reflector to use overlapping I/O with thread pools._

How'd you become enlightened as to the terribleness of your original design?

From what I can tell "overlapping I/O" appears to be windows-ese for good old
poll(2) -- non-blocking asynchronous I/O with level-triggered notification.
Does Windows not have a way to do edge-triggered notifications like Linux's
epoll(7)?

Also, is there some incentive on Windows for using a thread pool instead of a
handful of independent processes per box? Realistically you'll have the server
running on multiple machines just for sanity's sake, and you'll have a way to
distribute clients across them with failover, so why have two levels of
indirection when one will do?

~~~
barrkel
FWIW, it's not obvious that the one-thread one-connection blocking I/O design
is terrible. How terrible it is is a function of address-space starvation from
too many threads (a non-issue in 64-bit code), and the overhead of the OS
thread scheduler. In a green threading model, I/O can be implemented behind
the scenes as non-blocking I/O, but still give user code the ease of use of
blocking I/O.

Back to Windows: overlapped I/O is not equivalent to poll, it's much better
than that. It is edge-triggered, but Windows manages the threads itself, and
tries to ensure 100% CPU utilization in user code, rather than in thread
switching logic.

Windows overlapped I/O is effectively a way to write efficient I/O intensive
applications in continuation-passing style. You don't need to manage the
threads or dispatch loop yourself.

As to why use threads rather than processes, the issue is largely the
awkwardness of duplicating the connection handle for a child thread. In Posix
land, there's the fork function that makes handing off the handle fairly easy.
WSADuplicateHandle can create a new handle that can be used by a child
process, but you need to know the child process's ID first, so you need to
perform IPC to just to send off the handle.

On the other hand, shared memory concurrency has its advantages too.

~~~
blasdel
_FWIW, it's not obvious that the one-thread one-connection blocking I/O design
is terrible._

Any time you're using a 'thread' idiom, even if they are green, something has
to be maintaining their individual execution state (stack frames, thread-local
address space, etc.). I like to think of it as being analagous to unoptimized
tail-recursion, with a purely event-driven callback model being like tail-call
optimization.

Thanks for explaining Windows' overlapped I/O, it's way better than I thought
it was from reading the documentation. I didn't realize that it does all the
thread-pooling stuff all by itself -- I guess it makes the most sense not to
fight it, and just accept the free utilization (even if it does give a slight
latency hit).

~~~
barrkel
No - not tail call - event driven callback model I/O _is_ continuation-passing
style (CPS). The two models are equivalent. Code written in the thread idiom
can, in principle, be mechanically transformed into CPS and thereby use the
event-driven callback model.

In the event callback style, the state that would otherwise be on the stack is
held in the callback closure.

In the .NET space, F# does this transformation automatically with what it
calls asynchronous workflows
([http://blogs.msdn.com/dsyme/archive/2007/10/11/introducing-f...](http://blogs.msdn.com/dsyme/archive/2007/10/11/introducing-
f-asynchronous-workflows.aspx)). It uses the 'let!' assignment as a dividing
point between the call to the asynchronous method and converting the remainder
of the method body into a continuation that gets passed to the asynchronous
call.

------
nixme
Like Joel, I only knew about Akamai's CDN services. Are there any other
companies (CDN or otherwise) that provide a similar service?

~~~
bprater
What media are you trying to deliver? Traditional CDN are very expensive --
like cell phone minutes, you pay up-front for X number of GB. If you buy
1000GB and burn 20GB, you still pay your $10k/month bill.

Amazon is making a big play in this space, and doing a 'pay for what you eat'.
Hopefully this will encourage CDNs to take a look at their pricing schedules.

~~~
nixme
Not CDN service or delivering media. More like this proxy-type service he
describes where you cut out a lot of middle-tier routing and pay for better
backbone transit.

For personal use, this would be a great way to proxy your own traffic when
using the Internet overseas, where accessing US-based sites can be very slow.
Or even for setting up a faster coast-to-coast VPN within the US.

~~~
wmf
It's been tried: <http://en.wikipedia.org/wiki/Gamerail>

------
gojomo
Akamai is violating net neutrality!

~~~
Retric
Nope. They don't own the lines.

~~~
gojomo
Akamai either owns the lines on the "superfast superclean superhighway"
(Joel's words) between the Akamai nodes, or they have purchased preferential
traffic treatment from someone who owns those lines.

~~~
seiji
"superfast superclean superhighway" means "we have a private network." There's
nothing special going on. Call up your telco and say you want a 10 gigabit
connection from new york to LA and they'll be happy to oblige.

Just because two computers in the world are talking doesn't mean they use the
big-i Internet.

~~~
gojomo
Such a purchased private network is what I meant by "owns the lines" (even if
in practice such connections are rented from telcos).

Which leads me back to my main point: why should 'neutrality' regulations care
whether a telco offers a 10 gbps dedicated line, or preferential routing for
10 gbps worth of traffic on their shared lines?

And if the motivation for buying either of those is the exact same value --
snappier service for your end users -- that Akamai sells, why is Akamai's
service celebrated, but alternate services based on traffic shaping considered
a 'neutrality' violation?

~~~
paul
Because when your local ISP does it, they are abusing their monopoly.

If it was actually an open market with real competition, then I wouldn't care
about net neutrality, because there would be plenty of options for non-broken
internet access. However, if I can only get access through one or two
providers, then those providers have an incentive to extort money out of
websites by threatening to degrade access to their sites.

I think the real fix to the net neutrality problem is to open up local access
so that the phone/cable company only provides the wire, and an unlimited
number of ISPs compete to provide the net access. The phone/cable company
probably shouldn't be allowed to provide the internet access, since they will
inevitably abuse their monopoly (and necessary monopolies should be kept as
limited as possible).

~~~
gojomo
Agreed 100% that local competition is the best fix, either by enforced sharing
of the scarce last-hop wires, or introduction of new paths (new wires or
wireless technology).

Enforced 'neutrality' could push real competition further away, by limiting
the kinds of offerings and profits that attract new entrants.

Also, for a sense of proportion: even though local access in the US is usually
a cable/dsl duopoly, and total speeds lag some other national markets, prices
continue to fall and speeds continue to increase. No last-mile ISP has
'extorted' websites with threats of degraded access.

So there's no urgent need for legislators and regulators to start dictating
how service is delivered, as if there will never be competition. If they want
to help, they should adopt policies that spur new entrants, not tie the hands
of current providers.

~~~
kragen
Historically we've had real neutrality without any regulation. In the last few
years the local monopolies have made a number of moves to turn that neutrality
off, and in some cases there have been serious abuses; and they've also
started trying to persuade politicians that neutrality is a bad idea.
Regulation that maintains the status quo of network neutrality is going to be
a lot easier to achieve than regulation repeal to strip local phone and cable
providers of their legal regulated monopolies.

~~~
gojomo
_Regulation that maintains the status quo..._

...is almost always a bad idea. We need new services, new investment, new
entrants.

Opnnness can defend itself -- it got this far. If any of the alleged 'abuses'
truly became 'serious' that would give a marketing coup to competitors and
spur to new entrants.

The idea that regulating the existing monopolies is 'easier to achieve' than
true competition is a self-fulfilling prophecy. New regulations entrench
present-day services, and flatter regulators into thinking only they, and not
competition, can protect customers.

If a shortage of competition is the real problem -- and it is -- policies
should address that, not enshrine the current lack of competition in
regulations.

