

Ask HN: Why isn't a content addressable cache built into the Internet? - mckoss

I have a simple internet protocol in mind. Does it already exist? If not, why not?<p>The idea is that there should be a content-addressable request protocol. By presenting a (20 byte) hash the server responds with a chunk of data (say, 2M in size) which is associate with that hash. If a publisher (NetFlix) were to offer their content using this protocol, they would transmit to each authorized viewer, a list of 500 hashes for a 1 GB movie (total size of hash list - 10K).  The client then requests each chunk by hash key in order to stream a movie.<p>Any ISP (Comcast, Level 3) seeing this protocol coming over their network, realizes that they can service a chunk request by returning the cached data, instead of forwarding it upstream to the source to fetch it over and over again. Routers could incorporate the protocol into their stack, so they can short-circuit high volume chunk requests out of a local cache (from several gigabytes to terrabytes). Once an ISP has the chunk data, they can serve it 1,000 times without requesting it from the origin server.  They can manage their cache purely based on demand for each chunk.<p>With this protocol widely in place, I think it would be possible for 100M households to simultaneously watch a movie when it becomes available for download.  I don't think that's possible using the current Internet protocols and CDN's.<p>Would this not be far superior for the network to transparently handle caching this way, rather than the messy system of CDN's we have today?<p>Does copyright law or the DMCA make it impossible to operate a system like this?
======
wmf
I think CCNx is going in this direction. There's also IETF DECADE and the
older Internet Backplane Protocol.

IIRC the DMCA has an explicit exemption for caches.

To get ISP caches deployed, you have to convince ISPs that they'll save money.
CDNs actually work against you here because they are either free or in some
cases paying ISPs, and if an ISP allows CDNs to get close to its POPs then
CDNs are just as efficient as ISP-operated caches.

------
slysf
I worked at a big streaming media company and there are significant DRM
challenges with serving video in a way that protects the content. When you're
trying to get serious content deals from producers they will dive into the
deepest level of how you handle their content to make sure it meets their own
internal standards before agreeing to anything. Companies like BitGravity are
developing solutions that meet these requirements. One idea might be to expire
the hashes in a 5 minute rolling window which would still give a huge caching
advantage to ISPs:

1\. ClientA is authorized with video service. 2\. ClientA gets 50mb of hashes
to buffer 3\. ClientA starts downloading hashes sequentially, requesting more
hashes from the service as the queue is emptied. 4\. ClientB is authorized
with the video service. 5\. ClientB requests same resource and gets same list
of hashes. 6\. ISP can cache the hashes for 5 minutes and get some savings for
popular content.

I think this would be a very good solution for live broadcast where you're
looking at a huge number of people requesting the same content, but for VOD it
would have limited return.

~~~
mckoss
If you recognize that a hash is _equivalent_ to the content, you realize you
have to wrap DRM around the hash lists, just as you would the original
content.

Perhaps limiting the cache lifetimes would give some legal protection to
network operators. Nobody is sending take-down notices for network caches
today, are they? I would love to see this protocol have some legal safe harbor
- perhaps based on a limited (24 hour) TTL before requiring refreshing the
cache from source (which could just be a "ping" w/o retransmitting the whole
chunk).

------
devicenull
How do you do any sort of security on this? I doubt any copyright holder is
going to be happy when anyone with a list of hashes can download their movie.
Using most types of encryption is out, so there goes most DRM solutions. Using
IP or host based access lists defeats the purpose of the caching system, so
that's out as well.

~~~
invertd
Check out CCNx (Content Centric Networking) project from PARC/Xerox leaded by
Van Jacobson - the security seems to be baked in the design. Here is an intro:
<http://www.slideshare.net/PARCInc/contentbased-security>

~~~
mckoss
Why implement something complex, when something simple will do? This cache
should be (in my mind), DRM-agnostic. Just put your DRM around your hash
lists, just as you would around your original content. DRM can be a layer
above the caching layer.

------
tobylane
It's done manually, when the iPlayer was new there was talk of this being
done, that you can rent servers in every exchange. BBC are really relaxed
about piracy (in comparison to others) so this might not be common.

Too hard to work out what should be cached. Should BBC send the latest episode
of Eastenders, Hustle, Tudors etc to every ISP before it's up for download?
(Yes) Should Youtube try to guess which of its popular videos will be in
demand next week? (No) Should Steam send out its new games? (Yes) You have to
pick services where there aren't too many options, the iPlayer and Steam
together have less than 2k items, and have a fairly predictable future demand.

~~~
mckoss
The point of what I'm trying to describe is an adaptive edge-caching service
for the Internet. Support your ISP has cache a 100GB cache in your data center
(probably RAM-based - think memcache). They could just leave in the cache
those chunks that have the most demand from their customers. No one needs to
decide if one "show" or another is most popular. Perhaps the cache would
contain the first 15 minutes of Eastenders, and the last 30 minutes of the
Oscars.

That's the beauty of a system like this is that is self-optimizes w/o a
complex source-to-edge protocol.

------
mckoss
P.S. I also asked this on Quora - as an experiment to see what kind of answers
I'll get there vs. here.

[http://www.quora.com/Why-isnt-a-content-addressable-cache-
bu...](http://www.quora.com/Why-isnt-a-content-addressable-cache-built-into-
the-Internet)

------
alastair
It's called Bittorrent.

~~~
mckoss
Not really. Similar purpose, but it is not embedded in the network (every
intermediate hop could support a layer of cache, from backbone providers down
to consumer routers).

Note that Bittorrent is at least twice as inefficient as this scheme - every
peer request requires the content go "up" and then "down" to the peer -
through several intermediate hops.

This scheme, the data travels along the same path as the original content - it
just doesn't ask for the same copy of the content over and over again from the
source.

