
Technical Feasibility of Building Hitchhiker's Guide to the Galaxy - amund
http://blog.memkite.com/2014/04/01/technical-feasibility-of-building-hitchikers-guide-to-the-galaxy-i-e-offline-web-search-part-i/
======
ctdonath
All of Netflix, in HD, on 24TB.

The thought of putting most movies in my pocket for a few hundred bucks is ...
stunning.

Netflix could solve the bandwidth problem by just _mailing the whole library_
to customers. "Never underestimate the bandwidth of a station wagon full of
magnetic tapes."

~~~
DennisP
It's pretty amazing how much better things could be, if we could pry the
content industry away from outdated business models.

~~~
ctdonath
Well, consider what went into assembling that 24TB of content, and follow it
from there.

Each movie cost something like, well, let's just ballpark $10M average (the
>$200M blockbusters are relatively rare, balanced by old & cheap filler
content made down around $1M each). Lead article notes Netflix has 8900
movies. Round both up sensibly, and we're looking at $100B to create the
content.

Netflix has some 44M subscribers. Assuming all that content was made for &
paid by Netflix subscribers, that's $2272 per subscriber. 8-O

Talmand SWAGged the storage hardware cost at $1000. Storage prices being what
they are, that will be $300 in a year or so.

Would you pay $2500 to own, in a box the size of a pack of cards, _the entire
Netflix library_? With updates (package deal, you get 'em all as they're made
to cover production costs) of every movie added thereafter for $0.25 each?
Could we persuade the entire 44M Netflix subscribers to sign up?

Not sure what to do with those numbers, but they're fascinating.

~~~
thenmar
That's only temping because of the high resolution. I'm not sure if it's
Comcast being intentionally awful or just incompetent, but I can't seem to
stream HD anything on netflix. I turn on manual buffering and HD options just
aren't there. If that problem didn't exist, there's basically no benefit in
making paying the large up front cost.

Now, if production costs were completely covered by subscribers, it would be
interesting to see what sort of movies get made.

------
dsr_
Some information has value that remains relatively constant over time:
historical records, literature. Some information has value that decays slowly:
basic materials in the sciences, large bodies of well-established
measurements.

Some information has value that decays fairly quickly: current scientific
progress that invalidates older measurements or has better predictions than
older theories, current events, prices of common items.

And some information has value that decays extremely quickly: weather,
financial markets, prices for things that you want to buy or sell now, casual
interactions with other people.

If you build a system that can connect fast enough to the Internet in terms of
latency and throughput, you don't need much local caching. That's what we have
in current smartphones.

Without a subEtha network, having a large local cache becomes increasingly
important the farther away you are. But without a mission profile to plan for,
estimating things in terms of current storage technology is as useless as
coming up with a security plan without a threat model.

~~~
dwd
How about the mission to Mars? What repository of data should they take with
them and what should they pull across space at analogue modem speed?

------
TeMPOraL
If you plan on using it just on Earth, this becomes relevant:
[https://xkcd.com/548/](https://xkcd.com/548/) ;).

------
ajaimk
The Netflix numbers are definitely wrong. Of the 10434 titles on there 3687
are TV Shows, or more specifically seasons of TV shows.

Assuming 2 hours/movie and 15 hours/tv show season, the number rolls around to
72,486 Hours of content. Let's round that to 75,000 hours for convenience
sake.

Assuming an average bitrate of 4Mbps, it comes to about 135 TB.

Which is why the Netflix Open Connect box works so well; Netflix can dump
their entire catalog onto 2 of those boxes (well, it will take a few more
since Netflix will be caching multiple versions of the files are different
bitrates, but I wouldn't be surprised if the entire catalog in all bitrates
can fit into 1 rack)

Source for numbers: [http://instantwatcher.com/](http://instantwatcher.com/)

------
Theodores
The creators of 'Elite' had this problem way back in the 1980's when they had
to squeeze 8 galaxies complete with planet data into 32K of RAM. They solved
the problem by 'procedurally generating' all of the required data from a seed
number. Therefore, it isn't a question of cutting and pasting from Wikipedia
etc., it is more a matter of getting the right 'seed number' for earth,
updating the procedure for generating content and that should be it.

Anyway, here is something cut 'n' pasted from Wikipedia:

 _The Elite universe contains eight galaxies, each with 256 planets to
explore. Due to the limited capabilities of 8-bit computers, these worlds are
procedurally generated. A single seed number is run through a fixed algorithm
the appropriate number of times and creates a sequence of numbers determining
each planet 's complete composition (position in the galaxy, prices of
commodities, and even name and local details— text strings are chosen
numerically from a lookup table and assembled to produce unique descriptions
for each planet). This means that no extra memory is needed to store the
characteristics of each planet, yet each is unique and has fixed properties.
Each galaxy is also procedurally generated from the first.

However, the use of procedural generation created a few problems. There are a
number of poorly located systems that can be reached only by galactic
hyperspace— these are more than 7 light years from their nearest neighbour,
thus trapping the traveller. Braben and Bell also checked that none of the
system names were profane - removing an entire galaxy after finding a planet
named "Arse".[9]_

[9] Procedurally generated by unicorns.

~~~
Houshalter
It's not possible to compress more than a certain amount. E.g. you can't store
every possible combination of 12 bits with only 8 bits. It's trivial to expand
8 bits into 20 bits following some procedure, but it's literally impossible to
do the reverse.

------
protonfish
So much of what makes the internet useful is the collaborative aspect. Having
a copy of StackOverflow at a static point in time is better than nothing, but
losing the ability to ask or answer new questions or stay up to date is a
significant loss.

If you are a lone (or part of a small group) traveler, the thumb drive
Hitchhiker's Guide is probably as good as it gets. If we had a colony on Mars
(or more distant) a better solution would be to have a planetary WAN (the
MWW?) Central servers could update news, movies, apps and other timely but
read-only content over a wide-bandwidth, high latency channel to the WWW
(Renamed in 2250 to the Earth Wide Web.) Community sites (like HN and SO)
would have to be redesigned for multiple non-interactive WANs. Perhaps it
would default to your home WAN for voting and commenting, but you could click
a tab to view a read-only, slightly older version of different planets'
contributions.

------
analreceiver
That's all the information it would ideally have on earth, but then again,
there are aproximately 8.8 BILLION habitable planets on our galaxy alone, and
the more advanced ones oughta have a lot more information to store.

However, about our planet, the HHGTTG had only this to say: "Mostly Harmless".
It does have an entry for human beings though, so we actually don't know how
much information about us it contains.

~~~
logfromblammo
Humans: A species that still thinks digital timepieces are a pretty neat idea.

That's probably it.

It is already apparent that the guide is curated from two enormous office
buildings, has a host of low-paid field researchers, and culls any information
deemed not relevant to the hoopiest, most towel-aware set of galactic
travelers.

Which is to say that when people are talking about Netflix, they probably
actually mean an extensive catalog of high-definition movies that cater to the
prurient interest. And when they say Wikipedia, they mean expert tips on how
to find temporary companionship and cocktails from the native populations of
unfamiliar planets.

That's a lot harder to put together than a simple catalog of facts and
knowledge.

------
DennisP
A while back I had some thoughts about a user-generated travel guide, in the
spirit of Hitchhikers, built like a wiki or the old IMDB. Has anyone heard of
a project like that?

Maybe the problem is that it'd fill up with so much spam from local businesses
that it'd be useless. Maybe a good reputation and rating system could make it
work, if there were enough participation from neutral parties.

~~~
huskyr
Well, there's Wikivoyage:

[http://www.wikivoyage.org/](http://www.wikivoyage.org/)

------
pyre
1\. The Wikipedia estimate probably only includes text. All of the media is
distributed separately, and was ~200GB as of a few years ago. I can only
imagine it's grown since then. Some articles make more sense with images (e.g.
a photo of an animal to go with the description).

2\. Is the entirety of Twitter really useful as part of a Hitchhiker's Guide
to the Galaxy?

~~~
ctdonath
One post on Twitter: 140 bytes max. One HD frame on Netflix: 6220800 bytes
max.

If throwing the near-totality of human cinematography on H2G2 is a no-brainer
due to its manageably small size (a mere 24TB), then may as well throw Twitter
on there as well.

~~~
misnome
Don't forget Unicode. 140 Characters =/= 140 bytes.

~~~
ctdonath
Don't forget compression too. And don't forget most tweets don't use close to
the full available tweet data space. A billion tweets takes up a lot less than
140GB.

I'm just looking at rough comparisons. A single 1/24th of a second of Netflix
video approximates 44,000 tweets' worth of data. Anticipating growth, let's
assume a billion tweets per day (was half-billion last October). 171 days of
tweets takes up the same space as the complete Netflix library.

------
Raphael
A Kindle with a copy of Wikipedia on it seems close enough to me.

------
dalek2point3
but what about the legal side of things? last time I checked, Netflix wasn't
particularly excited about me trying to scrape all of their video data and
making money off of it.

~~~
Zikes
DRM (in concept, but not in practice) means shipping you an impenetrable safe
and selling the key separately.

The proposed idea means Netflix would be partnered with the hypothetical Guide
and in order to access the encrypted movies and shows you would need to buy a
key from Netflix.

------
qwerta
Perhaps we should start with interstellar propulsion system?

