
The lost Van Jacobson paper that could save the Internet  - wglb
http://rdist.root.org/2011/12/30/the-lost-van-jacobson-paper-that-could-save-the-internet/
======
Locke1689
Interestingly enough Van Jacobson has continued in this line of research, but
the future of Internet networking looks radically different than what was
proposed in these early days.

A pivotal recognition is that the Internet nowadays is dominated by _content_
, not by endpoints. The modern CDN is really a very complex kludge in order to
deal with what we now believe the Internet's primary purpose to be: content
delivery. The Internet was originally designed in a TCP/IP point-to-point
structure, with the most important aspects being the communication of two
parties. Now the most important part is that a given party (or really many
parties) receive a specific piece of content, regardless of where that content
comes from. In this light, it is wise to redesign the Internet's
infrastructure against standard TCP/IP and instead build knowledge of content
and caching layers directly into the transport protocols.

The preliminary paper which deals with many of these issues is Jacobson's
Networking Named Content[1]. The general idea is that one requests content
instead of communication with a specific party. ISPs and other intermediate
layers cache content blocks (because its in their own financial best interests
by saving money in peering costs and pure networking hardware costs) and your
request for content is delivered via the nearest available hot cache. Many
other papers have been written with their own tweaks, but this is the general
idea for what the next generation of the Internet should look like (from the
distributed systems researcher perspective). In my Distributed Systems in
Challenging Environments class one of the things my partners and I were
working on was building a prototype of CCN (the protocol described in
Jacobson's paper) on top of the current TCP/IP layer to examine its real-world
characteristics and swarm behavior. We never completely finished the prototype
for Contemplating Content Centric Protocols, but it was an interesting project
nonetheless.

[1] [http://conferences.sigcomm.org/co-
next/2009/papers/Jacobson....](http://conferences.sigcomm.org/co-
next/2009/papers/Jacobson.pdf)

~~~
tptacek
This is such a great thread.

I've been hoping to see something tangible from VJ on this for awhile now. He
was doing content-based networking even back when he was at Cisco.

To an extent I think recognized by a lot of networking researchers, including
IIRC YC's Robert Morris at MIT, the future of the Internet is probably not
IPv6 and a network consisting of operating systems addressed by scalar
integers.

Instead, IP-type protocols are going to assume the same role that Ethernet and
802.11 (and ATM) play today, as a substrate for connectivity on top of which
real applications will be built. The wide area discovery and "session" (OSI
style) management roles played by IP and TCP will be played by overlay
networks. We have primitive overlays now (DHTs, BitTorrent, Skype) and
proprietary ones (CDNs like Akamai, whatever the cable content providers are
using to push VOD to the edge), but eventually we'll get a general-purpose
open one and the game will be on.

I think this follows straightforwardly from Reed's end-to-end argument.
Intelligence belongs to the edges, not the core of the network; intelligence
at the core is necessarily a lowest-common-denominator affair, and is hard to
scale. That's a key reason why we don't have multicast today. Routers are so
bad at scale that tier 1 NSPs filtered BGP announcements by prefix length,
because they couldn't handle /32-granular routing for even the tiny subset of
machines that actually wanted it (begging the question of how we ever expect
IPv6 to do the things the typical HN reader hopes they'll do, like provide
them with their own portable /16-equivalents). Multicast effectively asked
those same overtaxed routers to address _web pages_ , _individually_.

That's obviously not going to work when the service model involves random
people demanding that AT&T and Level3 add routing table entries for their
podcasts. But it scales just fine when the thousands of machines interested in
those podcasts share a protocol and an infrastructure for arranging a fan-out
overlay.

So many unsolved problems here, from "what is the most reasonable kernel of
overlay assembly, group management, and routing to deploy to end-systems" to
"how do we avoid congestion collapse in arbitrary group messaging protocols"
(a study of multicast reliability algorithms --- at least, from the late 90s
when I last did this stuff --- will probably horrify you). But the great thing
about it is, nobody has to ask permission to make this stuff work; we can
evolve to the _real_ next Internet without getting Comcast's permission, or
for that matter (more perniciously) the IETF's.

~~~
Locke1689
Yes, there are a huge number of advances to be made in this area. IP multicast
was certainly the first draft in making this kind of networking possible but
the fundamental problem is the incompatibility with the rest of the way the
Internet is run.

The problem we were trying to address in the graduate class is that Jacobson's
paper simply assumes that the Internet suddenly "switches," like a lightbulb,
to this new method of routing. I think everyone agrees that something that
looks like CCN will become the future, but we were interested in the road to
get there. We were trying to examine the problems associated with porting to
this kind of network without preexisting infrastructure in place.

One thing that you may find interesting is that the network tends to look more
like a very layered Bittorrent network in current IP infrastructure, so it may
end up that the most effective research ground for an intermediate protocol
would be in adapting Bittorrent. The second area that I think is really
interesting is secure and (perhaps more importantly) authenticated
communication in this protocol. There's an entire other paper cited at the
bottom of the CCN paper that I linked to that details the cryptographic
authentication used in CCN but there are a number of things that I think
either need more detail to be worked through or put me on the edge a little.

~~~
tptacek
We got funded for this idea, by Sony, during the first bubble. Before
BitTorrent, when FEC-based multicast file transfer was either FLID/DL (an IETF
standard that never went anywhere) or the startup by the guy who invented
Tornado codes.

We had a centralized tree-structured directory for discovery, and then (at
first) deployable nodes running group messaging and running a weighted link-
state routing protocol (more or less cribbed from OSPF), then later a small
kernel message forwarding scheme with a programmable control plane so we could
build arbitrary routing protocols in Tcl.

Our initial application was chat (we overengineered, like, a little) and we
pivoted to streaming video.

We died in part because we hadn't the slightest clue what we were doing, and
in part because the VCs replaced our management with a team that decided our
best bet was to take our superior technology and go head-to-head with Akamai
with our own streaming video CDN.

We'd have been better off just open-sourcing.

Anyhow: I'm obviously biased, but the way I think this is going to happen is,
some open source project to build arbitrary overlays is going to catch on (the
overlay will be a means-to-an-end for some killer app people actually care
about; my bet is, telepresence).

~~~
Locke1689
Very interesting. The main difference in our work is that we really latched on
to VJ's idea of breadcrumbs and caching blocks. When I alluded to Bittorrent I
wasn't really referring to FEC-based distribution. The problem with FEC is as
you said somewhere else -- it's not very efficient for small files and it
currently isn't very location aware (aside from Ono, my professor's research
lab's Bittorrent plugin) so you can end up going halfway around the world for
relevant pieces of the file.

Instead, we were more interested with the robust swarm implementations that
were already present in some Bittorrent clients, like fairly efficient DHTs.
It is also an already existing protocol with millions of users so we could
piggyback on Bittorrent clients with plugins to measure the real performance
all over the world.

The key difference is the caching. Jacobson proposes that the caching would
happen in the ISPs, but there's no reason peers couldn't take up the slack,
only different monetary incentives. The protocol here becomes less like OSPF
and more like distributed k-minimum spanning tree. If you can find the
"supernodes" then you can implement caching on their front and prevent
unnecessary communication.

Of course, if you think anything like me your first thought is, "do we
actually know if data requests are reliably concentrated in a specific area
such that local caching will actually provide much of a benefit?" My partners
thought the same thing, which is why they ended up doing a Bittorrent traffic
analysis to attempt to plot an AS distribution curve for data requests[1].
They found that there was huge redundancy in intra-AS requests, so there's at
least potential here.

Of course there are still a number of problems to be worked out -- if the
supernodes don't have an easy incentive to provide caching to other people in
their AS, how do we develop such an incentive? But the overall protocol seems
very interesting. Most notably it would _not_ work like FEC and small files
would probably benefit even more from the caching (which is why you'll also
note that files are split into blocks or datagrams or whatever you wish to
call them in the VJ protocol).

[1] <http://acm.eecs.northwestern.edu/site_media/cccp.pdf>

I really don't deserve to be on this paper -- Zach and John really ran this
show.

------
tptacek
The linked Gettys article is fantastic too:

[https://gettys.wordpress.com/2010/12/17/red-in-a-
different-l...](https://gettys.wordpress.com/2010/12/17/red-in-a-different-
light/)

VJ is at PARC now.

The old VJ LBL papers are still collected at ICSI/ICIR (Vern Paxson is still
there!):

[http://www.icsi.berkeley.edu/cgi-
bin/pubs/browse.pl?groupid=...](http://www.icsi.berkeley.edu/cgi-
bin/pubs/browse.pl?groupid=000002)

There used to be some attention paid to deliberate evasion of congestion
avoidance (Stefan Savage did a paper on it), the idea being that a malicious
user could monopolize a link by ignoring congestion signals (interview
question: when you upload a file via Wi-Fi over your DSL connection, how does
your computer know how fast to send packets?). Maybe some of the reason that
turned out not to matter was, we were in a temporary period of not needing
congestion avoidance.

Finally, as I understand it, at least some of the fundamental ideas behind
VJ's "hydrodynamic" congestion avoidance scheme are derived from Raj Jain's
CUTE, which was done for DEC in '85.

~~~
NateLawson
Yes, that blog post by Jim is what led me to investigate and write about this.

The Stefan Savage paper you refer to wasn't so much about ignoring congestion
signals as deliberately faking reception of data that hadn't actually been
received. With things like duplicate acks or optimistic acking, you could
cause the sender to open their window size to whatever you want.

<http://cseweb.ucsd.edu/~savage/papers/CCR99.pdf>

Back in 1999, I was working on integrating these ideas into a router product.
It would be software in your home gateway that would keep a cache of the last
observed window size for each pair of hosts for a few seconds.

If you started another TCP connection to the same host within that short
interval, the router would spoof both sides to increase the window size to the
previous value / 2 and then let additive increase take over from there. This
bypasses slow start but the current available bandwidth is likely to be very
close to the estimate within that short range and hosts didn't cache window
size, even when making multiple connections in a short period (e.g., HTTP 1.0)

I archived this paper and a bunch more here. Prepare to lose an afternoon if
the history of TCP/IP interests you.

<http://root.org/ip-development/>

~~~
eternalban
Am interested in your opinion on using Bayesian or Fuzzy approach to this
system control problem.

~~~
eternalban
<http://privatewww.essex.ac.uk/~fleum/csvt.pdf> for example.

------
alecco
For internet routing, theory and reality might be very distant. RED in
particular has a terrible record of making things worse on live systems. But
that was a few years ago and perhaps newer variants are better.

Also there's a huge problem on routing algorithms due to hundreds of patents
covering very trivial uses. For example, there is an awful patent covering the
very important case of not dropping tiny ACK packets. For congestion control,
it's more useful to drop big TCP data packets to make the endpoints adjust.
And the key networking gear players are full on supporting those crazy
software patents.

So my prediction is things will stay the same and nothing meaningful will
happen for 10 years until a few key patents expire. Nobody will risk exposure
to a multimillion IP lawsuit. And anyway, we the users will just pay the bills
either way so telcos don't care.

You can thank Big Telcos, Cisco and many others.

Of course big players with a huge patent war-chest (i.e.
Google/Amazon/Microsoft) will advance their WANs.

------
lrizzo
These days, doing proper packet scheduling nowadays is probably a better
alternative to AQM than approximated solutions such as RED and many many
variants that appeared in academic papers.

RED was designed at a time when implementing packet schedulers was expensive
(theory was not completely there yet, firewalls were not widespread, etc.).
Now there are fast packet scheduler implementations available. Our own QFQ [1]
is distributed with the official FreeBSD and Linux sources, and runs
reasonably fast even on an OpenWRT device (remember that AQM needs to act at
the bottleneck [2], which is mostly your DSL router for upstream traffic).

You could really try it out easily, and if you are willing to set up a few
dummynet pipes [3] (FreeBSD/Linux/Windows) you don't even need to act on the
DSL router.

References:

[1] <http://info.iet.unipi.it/~luigi/qfq/> but the code is already part of
recent FreeBSD and Linux distributions (edited: and even on OSX you'll find an
older but similar scheduler as part of ipfw+dummynet), you just need to load
the relevant modules.

[2] there were, long ago, some proposals/products which did ack pacing on the
reverse path to do cong.control on the forward path. I think it is tricky to
make them work.

[3] <http://info.iet.unipi.it/~luigi/dummynet/> . You can set two pipes with a
bandwdith slightly below that of the DSL link, so you can control the
bottlenecks. How can you know what is the bottleneck bandwidth is a different
story. You could look at the negotiated speed on the modem, often available
through the web pages, but probably you'll have other bandwidth caps
elsewhere. You could try to run some of the BW probing apps to some nearby
host, hoping that the bottleneck is your local DSL, and dynamically
reconfigure the pipes accordingly. I have not tried this yet.

~~~
tptacek
Isn't packet scheduling experimentation also one of the reasons for the Click
Modular Router project at MIT PDOS?

The Click codebase is one of my favorite pieces of C++ code ever.

------
jws
It's worth remembering that there are cheaters on the internet. I have been
one.

I built and sold a product that cheated on TCP in order to minimize negative
customer experiences during periods of congestion.

It was a one line change in the linux networking code, and suddenly their
crappy networks were no longer a problem!

(Just so you don't think me completely evil…

All traffic was across their own networks.

Their networking folks considered 5% packet loss to be acceptable.

The product's purpose was to aggregate and tunnel connections. Multiple TCP
inside TCP gets unfairly punished by the congestion algorithms, which is why
most VPNs don't use TCP, but sometimes you find yourself under the thumb of
administrative rules and you have to do it.

I simply capped our back off time to something like 10 seconds. Short enough
that no one thinks the device broke because "the network was stuck, but now I
can ping but your device is hung so I'll power cycle it". If your network
can't take 1 packet per 10 seconds on behalf of a few dozen aggregated
connections, it isn't me that broke your network.)

~~~
tptacek
I had been thinking of:

<http://cseweb.ucsd.edu/~savage/papers/CCR99.pdf>

(And Nate's right, I drastically oversimplified my summary of it). This is
"cheating TCP" by Stefan Savage from '99. A great, great paper.

------
listening
VJ talks about something called "Evolutionary Pressures" at the end of his
Google Talk a few years ago. Watch the video if you haven't already.

SOPA if it passes may create such pressure. It's main target is DNS.

CDN's like Akamai rely on universal use of one DNS, the one that SOPA aims to
regulate, to accomplish their kludge.

Food for thought.

I prefer "Chicago" to any of the other alternatives.

End-to-end can be realised. Overlays work for small groups. Small groups can
share with each other. Networks connecting to other networks. It's been done.
;)

Evolutionary pressures may help.

~~~
tptacek
I stupidly downvoted you (SOPA! GAH!) but you made a good point, so, sorry.

Generally one of the reasons I don't freak out about laws regulating the
Internet is that the Internet as we know it today is rickety anyways.

In the last 10 years we've seen the monumental shift (predicted in the late
'90s but widely scoffed at) from ad hoc network protocols and native
client/server implementations to a common web platform. Nerds still recoil a
little at this, thinking about the native/ad-hoc stuff they still use (like
Skype), but if you look at the industry as a whole, particularly in the line-
of-business software development that actually dominates the field, it's all
web now. TCP ports are mostly irrelevant. If you were going to start a new app
today, would it be a native implementation of a custom protocol? Probably not!

One of the things that got us to this state was in fact regulatory: default
firewall configurations that don't allow anything but web out, and disallow
arbitrary inbound connections.

Over the next 10-15 years, I'm hoping we'll get similar nudges away from
first-class addressing only for machines (which are less and less useful as an
abstraction for completing tasks using technology) and towards first-class
addressing for subjects, interests, content, &c. This is an insight _lots_ of
people have had, from David Cheriton & TIBCO in the '90s through the RON work
at MIT through VJ's work at PARC & so on.

I wrote off Lessig for a bunch of years after reading _Code_, but I think he
fundamentally has it right in the sense that implementors have as powerful a
say in how things are going to be regulated as legislators do. America had the
Constitutional Convention after the Articles stopped scaling; the Internet
will have little blips of architectural reconsideration when the impedance
between the technology and what people want to legitimately do with the
technology gets too high.

(I'm trying to make a technical point here, not a political one; I support
copyright, and am ambivalent about SOPA.)

~~~
listening
With the widespread use of anycast, does "first class addressing for machines"
even matter any more?

In situations where anycast is used, how do you even know what machine a given
address to connecting to?

RON was a step in the right direction, imo. With small overlays, MAC
addressing comes into play and it becomes a little easier to know what
machines (not simply what "addresses") you are really connected to.

~~~
tptacek
Yes, because conceptually you're still talking to a computer (it just happens
to be a globally load balanced computer). It's still the unicast service
model, and it's still fundamentally about building a channel between two
operating systems.

~~~
listening
Imagine if you dialled a full telephone number, including applicable country
code and local regional code, and depending on where you were dialling from,
you reached a different telephone in a different geographical location.

As long as it's an answering machine and the message is the same at every
telephone reached by this number, it does not matter.

But as soon as you want to reach a live person, and not simply "content", then
what?

Is end-to-end about accessing "content" or is it about communicating with
someone operating another computer?

~~~
tptacek
I wouldn't want to noodle on this too much. I take your point. Anycast
abstracts away some of the notion that you're talking to a specific computer.
But the unicast service model is inherently about computers talking to each
other. Many, maybe most, of the most important Internet applications aren't
about 1-1 conversations, or if they are, they're 1-1 conversations in special
cases of systems that also work 1-many.

~~~
listening
Importance is a subjective concept.

One opinion is that a very important Internet application will inevitably be
1-1.

Who did the FCC just hire as their new CTO? What is happening to POTS?

1-to-many systems, hacked to give an illusion of 1-1 conversations, e.g. smtp
middlemen, social networking's http servers or twitter-like broadcast sms, are
what we settle for today, but, imo, this is a limitation not a desired goal.

------
tptacek
Incidentally, regarding the bottom of Nate's post: if you're at all interested
in code analysis, you really should reach out to him about a role on his team.
I know a little about what they're building, and it is possibly the coolest
business idea any HN participant is working on.

------
eps
> _packets are queued for a long time during congestion instead of being
> dropped quickly. This misleads TCP congestion control and leads to even more
> congestion._

The last statement is true only for TCP/Reno. Reno here refers to a congestion
detection algorithm, and it is that of looking for _packet loss_ to recognize
the congestion. This is the simplest and oldest approach, and it has been
suceeded by TCP/Vegas, which looks at delivery delays instead, or how it is
called a congestion _avoidance_ algorithm. Basically the idea is that if the
route gets congested, there will be buffeting happening at some hop and that
would cause the RTT along the route to climb up.

This being said, I don't know the split between Reno and Vegas adoption in the
real world. However I think newer Windows versions are on Vegas.

~~~
tptacek
This critique of Vegas by VJ is pretty famous (I love that you still have to
FTP to ftp.ee.lbl.gov to get it):

ftp://ftp.ee.lbl.gov/email/vanj.94mar14.txt

~~~
NateLawson
I've got a copy of that on my mirror, including the postcript diagram:

<http://root.org/ip-development/e2e/end19940314-diagram.ps>

In summary, RTT variance as a congestion signal is actually quite bad. It's
normal to have variation even without congestion, so utilization would be
worse if you backed off every time RTT increased.

~~~
huhtenberg
From my completely unscientific tinkering with Vegas idea few years ago I
remember that there was an easily detectable pattern in RTT change during the
congestion. I was testing using UDP between my home and a colo server and I
saw RTT climb up to several times its original level in a matter of two round
trips.

> It's normal to have variation even without congestion

Certainly, but the question is if congestion-based variation is substantially
different from an ambient variation. I am guessing that it is. Reacting to
just a RTT increase is pretty dumb. The congestion marker clearly needs to be
a bit more advanced than that.

------
nas
I've experienced congestion at my parents house. They live in a rural area and
have a connection over a WiMax wireless link. I did a little poking around and
the bottleneck is not the wireless link but some link further upstream (e.g.
maybe between the ISP and the wireless tower).

I suspect what is happening is that the uptake of video services like YouTube
and Netflix has resulted in link capacity being exceeded. The result is not
pretty. The link acts like it is dead for about a minute at a time. It
periodically starts working for a few seconds, maybe up to a minute, but then
dies again. I suspect this is the oscillation effect at work.

------
gvb
Buried at the bottom of the writeup is a chilling tale of the effects of
proprietary programs and formats: "The lost paper was never published. [...]
[Van] lost the draft and the FrameMaker software for editing it."

If it were not for a draft that was discovered, this paper would have been
lost forever.

~~~
cpr
There are plenty of MIF-file (Frame's XML vocabulary) translators out there,
so it needn't have been lost as long as the source file was around.

------
thaumaturgy
OpenBSD has supported RED for a long time as an option in the packet filter
configuration (<http://www.openbsd.org/faq/pf/queueing.html#red>). I don't
know how closely it follows the original paper, although they do cite it as a
reference.

I used RED on an OpenBSD firewall years ago at a small ISP that had too small
of a pipe for the number of sites they were hosting out of their closet. It
didn't seem to make a remarkable difference, but I wasn't responsible enough
at the time to do careful metrics before-and-after.

------
stcredzero
_Instead of the first excess packet being dropped, it gets queued somewhere._

So to update our detection scheme, we need routers to get feedback about the
>queueing<.

~~~
xtacy
This is what AQM schemes like RED/ECN do!

------
davidu
I can't actually find a link to the new paper, the "Red in a new light" paper.
Just the citeseer link. Is it online?

~~~
NateLawson
Right, the citeseer link is to the old copy of the paper. The new version has
not been finished yet so I doubt you'll find it anywhere. It's actually likely
that it will never be finished and instead will be a jumping off point for new
research.

------
ThaddeusQuay2
"The lost paper was never published. One roadblock was that the diagram of a
toilet offended a reviewer."

Google+ existed as early as 1988?

