
The “Million Dollar Homepage” as a Decaying Digital Artifact - sjmurdoch
https://lil.law.harvard.edu/blog/2017/07/21/a-million-squandered-the-million-dollar-homepage-as-a-decaying-digital-artifact/
======
_kst_
I can still access
[http://www.milliondollarhomepage.com/](http://www.milliondollarhomepage.com/)

I can't currently access the article at
[https://lil.law.harvard.edu/blog/2017/07/21/a-million-
squand...](https://lil.law.harvard.edu/blog/2017/07/21/a-million-squandered-
the-million-dollar-homepage-as-a-decaying-digital-artifact/)

[Insert joke about irony here.]

~~~
diminish
I think the Harvard link is down due to traffic.

~~~
Pharylon
I'm sure the Million Dollar Homepage is getting a lot of traffic right now too

------
schiffern
>Of the 2,816 links that embedded on the page (accounting for a total of
999,400 pixels), 547 are entirely unreachable at this time. A further 489
redirect to a different domain or to a domain resale portal, leaving 1,780
reachable links

Looking at the million dollar homepage, many of the links _were never valid_ :

[http://paid](http://paid) & reserved/

[http://](http://) paid and reserved - accent designer clothing/

[http://reserved](http://reserved) for edna moran/

[http://paid](http://paid) & reserved for paul tarquinio/ (1200 pixels)

[http://pending](http://pending) order/

These links are all shown in plain red ("link to unreachable or entirely empty
pages") in the "visualization of link rot," so it looks like the authors
didn't account for invalid URLs.

------
Houshalter
Gwern has a good summary of the research in this:
[https://www.gwern.net/Archiving%20URLs](https://www.gwern.net/Archiving%20URLs)

>In a 2003 experiment, Fetterly et al. discovered that about one link out of
every 200 disappeared each week from the Internet. McCown et al 2005
discovered that half of the URLs cited in D-Lib Magazine articles were no
longer accessible 10 years after publication [the irony!], and other studies
have shown link rot in academic literature to be even worse (Spinellis, 2003,
Lawrence et al., 2001). Nelson and Allen (2002) examined link rot in digital
libraries and found that about 3% of the objects were no longer accessible
after one year. Bruce Schneier remarks that one friend experienced 50% linkrot
in one of his pages over less than 9 years (not that the situation was any
better in 1998), and that his own blog posts link to news articles that go
dead in days2; Vitorio checks bookmarks from 1997, finding that hand-checking
indicates a total link rot of 91% with only half of the dead available in
sources like the Internet Archive; the Internet Archive itself has estimated
the average lifespan of a Web page at 100 days. A Science study looked at
articles in prestigious journals; they didn’t use many Internet links, but
when they did, 2 years later ~13% were dead3. The French company Linterweb
studied external links on the French Wikipedia before setting up their cache
of French external links, and found - back in 2008 - already 5% were dead.
(The English Wikipedia has seen a 2010-2011 spike from a few thousand dead
links to ~110,000 out of ~17.5m live links.) The dismal studies just go on and
on and on (and on). Even in a highly stable, funded, curated environment, link
rot happens anyway. For example, about 11% of Arab Spring-related tweets were
gone within a year (even though Twitter is - currently - still around).

~~~
Animats
There's an automatic system on Wikipedia now which attempts to rescue dead
links by finding the page in the Internet Archive and updating the Wikipedia
page accordingly.

~~~
zargon
It should also do the reverse -- find links in wikipedia that aren't in
archive.org and initiate an archival task.

~~~
Figs
A few years ago, there was a bot automatically submitting all links to
archive.is and adding the archive links to Wikipedia. It got blocked and the
site banned for spam. There was another discussion about it last year, and the
consensus was to remove the site from the spam list so that links would be
allowed again. (Not sure if that actually happened or not though.)

If you're curious, take a look at the discussions at the following links:

\-
[https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC](https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC)

\-
[https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC_2](https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC_2)

\-
[https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC_3](https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC_3)

\-
[https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC_4](https://en.wikipedia.org/wiki/Wikipedia:Archive.is_RFC_4)

I'm also not sure what the current status of automatically archiving links is
though, but as you can see, the idea has been attempted before.

~~~
zargon
Those RFCs seem to have nothing to do with my suggestion.

In Wikipedia's usual frustrating manner, it's unclear to me what was even
going on to trigger those RFCs or why people thought it was a problem. For
some reason they were upset with links to archive.is. But why? Was archive.is
replacing working links with archive links, or something?

Edit: From what I can tell, the archive.is bot was doing the same thing the
archive.org bot Animats mentioned was doing. It's just archive.is didn't
follow Wikipedia's policies and procedures.

------
resf
Decaying in more than one way. The JS files on milliondollarhomepage.com start
with:

    
    
        /*
             FILE ARCHIVED ON 5:47:20 Aug 6, 2015 AND RETRIEVED FROM THE
             INTERNET ARCHIVE ON 20:45:17 Aug 24, 2015.
             JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
        
             ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
             SECTION 108(a)(3)).
        */
    

I guess someone didn't keep backups?

~~~
haser_au
Either that, or the homepage owner doesn't want to pay the hosting (bandwidth)
costs, so he/she just references the Way Back machine. This is kinda
brilliant.

~~~
cornstalks
That doesn't seem to be the case:

    
    
      curl -v http://www.milliondollarhomepage.com/index_files/widgets.js
    

returns an HTTP 200 (not a 301/302). Additionally, the comment says:

    
    
      /*
         FILE ARCHIVED ON 5:47:20 Aug 6, 2015 AND RETRIEVED FROM THE
         INTERNET ARCHIVE ON 20:45:19 Aug 24, 2015.
         JAVASCRIPT APPENDED BY WAYBACK MACHINE, COPYRIGHT INTERNET ARCHIVE.
    
         ALL OTHER CONTENT MAY ALSO BE PROTECTED BY COPYRIGHT (17 U.S.C.
         SECTION 108(a)(3)).
      */
    

Note the dates. It was archived on Aug 6, 2015 and retrieved just 18 days
later on Aug 24, 2015. Given it's 2017, there's no way this file is being
served directly from archive.org's servers.

~~~
Jaruzel
The '/index_files/' bit seems to imply someone went to the internet archive,
pulled up milliondollarhomepage, and then in a Browser, did 'Save As Webpage
Complete' which saves the main page (in this case index.htm) and all its
assets in a sub folder called 'index_files'.

They then just copied it all as-is back into some web hosting.

------
krallja
The Million Dollar Homepage is not decaying (it is still serving its million
dollar purpose) - it is the Web itself that has decayed. The brittleness of
URIs is on full display. "Cool URLs don't change," but most of these URLs were
never cool: they had to rent coolness from Internet cool kid Alex Tew.

~~~
URSpider94
It's a frame of reference problem. Note that this article is published by a
library research group. One of the hottest topics in library science is how to
archive and reference the WWW. While it's entirely valid to view the Million
Dollar Web Page as a decaying artifact (let's call this the Art Critic view),
it is also possible to view it as an archaeological record that needs its
links to be complete (let's call this the archaeologist view).

~~~
justadeveloper2
Librarians are always worried about preserving garbage, which is what the
Million Dollar Web Page is. Now before you downvote me, bear in mind I did
Information Science in grad school, so I hung around with a lot of librarians.
I know the librarian hoarder mindset pretty well.

The point I'm trying to make is that the Internet is a different kind of thing
than a library, it is more dynamic (which is good and bad) and generally a
more living and lived-in type of medium. MDH was a money making scheme.
Clever? Maybe. Intended to be long-lasting? Doubtful.

It's a cool article, nonetheless.

------
glenstein
The article seems to be suggesting that the Million Dollar Home Page has in
some sense failed to fulfill it's promise because many of the links are now
dead. I don't follow that logic at all. To me it seems that the MDHP's job was
to be an iconic piece of internet history, and they've entirely fulfilled
their end of the bargain.

~~~
twblalock
I don't see how the article suggests that. The article is using the Million
Dollar Homepage as an example to draw attention to interesting things about
internet history, including the complexity of archiving internet content. The
page itself isn't being attacked, but rather used to exemplify a broader
concern.

The article uses the Million Dollar Home Page as an example of an interesting
historical artifact that tells us a lot about the internet of 2005 in terms of
design, the context in which the page was created, etc.

However, Million Dollar Home Page is also used as an example of the complexity
of archiving internet content, i.e. archiving a page in a complete way must
also involve archiving the pages it links to, otherwise the functionality of
the archived page will decay over the years. This has important implications
for its usefulness as a historical artifact.

Most of us remember what the internet was like when the Million Dollar Home
Page was created, but many years from now, it will be challenging for people
studying the history of the internet to really know what it was like back then
unless we archive things in a way that preserves functionality.

~~~
gcb0
that's beautiful and all, but nothing works like that. a page is lots of
things. the html and js can be preserved like an artifact in a museum, ok. but
the rest, the server hosting it, its connection to the internet, it being http
only, etc... has to be preserved like architectural works. yes, some building
facade migth still look like it was build in the 1800, but everything in it is
updated to conform to fire code, exit signaling, etc.

so, in the end, even if we managed to keep that page up, it would still be of
very little interest to the pople of the future.

the proper way to do things is to document everything. by good historians.

~~~
twblalock
The article does not purport to have a solution to this problem. It is a
contribution to an ongoing effort to describe the problem and understand it so
that solutions may be pursued.

~~~
amygdyl
But, and I feel like I'm going to get a proverbial reprimand for asking the
obvious, but why, in all this time, has no solution been proposed?

~~~
a_t48
"No perfect solution has been proposed." might be more accurate.

~~~
amygdyl
That sounds like a excellent opportunity for corporate IT departments to make
a good haul of PR and general kudos, for making a effort to release their
archived caches, wherever they may be stored due to data retention policies.

I have a huge soft spot for projects where you can get the most happening
because you are not required to jump a known hurdle to usefully contribute.

Overnight I was fretting with the necessity of sorting out any residual legal
issues that might attach to digging out old cache dumps.

But forgetting the fact that the very same companies are commonly invested in
just the tech to sort out the problem,.

Problem being one which can present all kinds of ways. Do you have any chance
of finding adult content in your cache? Do you care about how much you are
seen surfing the competition's websites? Will URLs reveal that anonymous forum
login from 2002, slagging your rivals benchmarking? Did you put Squid on your
intranet or webmaster, without https because your predecessor thought it was
on the non routeing private range? Did you use DNS in any way to point to
document resources that are accessible to users via the proxy server and
Squid?

Anyhow lots of data protection suites have been used to purge archives and
remove any trace of activity or files best kept private.

That's how the latest HDS kit is sold : cluster FS with hyperconverged local
nodes crunching security, audit and search results.

I'm sure I'm only wishing on a prayer that you can find great troves for
redecorating the empty space that the WWW once pointed to.

But imagine that we really could fill enough gaps in the dead link forests!

Even just attempting could be a superb way to promote your storage products
and bless your customers who offer up the raw stores with lots of great ways
to engage regular journalists with the subject.

I totally live best effort projects, and especially the ambitious culturally
interesting ones.

So does Joe Public.

I'm in.

Where do we start?

(if I have the time, I'm quite serious about this, my profession is
advertising not online but the traditional way of doing it. I have just got my
head back from exploding with the multitude of ways to sell, promote, demo,
boast, reminisce, predict, forecast, warn people about how society will
crumble without personal all flash arrays... Last one might be pushing it a
bit, but I see a fantastic deal of business on the back of the idea. It's
beautiful because everyone is invited to participate, no vendor or company is
locked out. So the public is not getting the boring official line and canned
quotes. This is real people showing the technology but showing the extent to
which we discard valuable culture. For thousands of years mankind grew as our
means of recording documentation and thought and expression grew. Look now how
easily we will throw it all away!!!! I know we can do better.

------
sixQuarks
I actually purchased a $300 spot on this. I did get quite a few clicks, but
very low-quality traffic. Mostly, I got lots of offers from copycat sites to
join their "billion dollar" homepage or whatnot.

It's crazy how many copycats came out, very unoriginal thinking going on.

~~~
jlangenauer
I remember that time - there were even people trying to sell __guides
__purporting to teach people how to set up their own million-dollar homepage!

~~~
vanviegen
s/million-dollar homepage/altcoin/g

~~~
nebabyte
> Any successful thing on the internet that hits the media will sooner or
> later be copied (usually sooner)

Definitely not copied from anotheer commenter ;)

I keep a list of things that follow this pattern, and yeah, altcoins/tokens
are another. It really just becomes a catalogue of what's been
popular/happening these past couple decades.

------
ChuckMcM
I think in many ways it is not a 'decaying digital artifact' as it is an
excellent representation of the fallacy upon which a lot of the Internet
hangs. In the Library of Alexandria you didn't have scrolls disappear because
the kingdom where they originated had been crushed under the boot of an
invader. But the Internet is no great library, no respository of knowledge, or
an oasis of independent thought. The Internet is a conversation in a crowded
room with amplified shotgun microphones pointed at all who walk through it.

~~~
mathattack
Yes - and a different economic model for maintenance than the Library of
Alexandria.

In this case, the library didn't fail. The ecosystem around it did.

~~~
justadeveloper2
The library burned down, iirc. Largely due to societal rot as you suggest--
invasions, pestilence, and probably the deaths of the original founders and
their vision did not translate to the following generations.

------
AdmiralAsshat
I'm not sure why the article considers it "squandered": it did its job as long
as the advertisers cared to maintain their links.

It hardly seems fair to blame a billboard being in disrepair if the company it
advertised no longer exists.

------
narrator
I think all the broken links just goes to show that failure in business is the
norm or that someone who thought it would be a good idea to promote their
company on this service is probably not good at running business.

------
aidos
Would be interesting to know how many people on the million dollar homepage
are on HN. I imagine there's a wonderful cross over between the two groups.

Even though its with a business we're not doing now, my business partner and I
are on there.

Edit: don't think it deserves a downvote - is it not an interesting question?
I bet there are loads of serial entrepreneurs on both

~~~
petepete
A company I used to work for are right up in the top left, Cartridge Save.

~~~
hambast
Ha. Also worked for Cartridge Save. Remember them buying those pixels.

------
ernsheong
FWIW, I'm building [https://PageDash.com](https://PageDash.com) as a private
web archive to address the problem of link rot, beginning from a personal
level. Launching in late August. Think of it as a private version of perma.cc.

~~~
Danihan
lol, you found your domain on impossibility, didn't you.

~~~
always_good
lushhat, zinghelm, usedgrue.

------
brosky117
I just heard about the "Million Dollar Homepage" for the first time last week.
Would this idea (or one like it) work today? Making a million dollars for
something so bizarre, fun, and straightforward sounds amazing. Can anyone
reference other attempts at similar ideas?

~~~
graphitezepp
It's just a random example of something silly going viral. Stuff like this
definately could work today (potato salad kickstarter was a thing) but any
individuals attempt probably wouldn't work, it mostly comes down to luck.

~~~
tyingq
Not a million bucks, but $71k for something very recent...that was even more
upfront about how useless it was. [https://uetoken.com](https://uetoken.com)

------
hellbanner
A more modern variant,
[https://catbillboard.wordpress.com/](https://catbillboard.wordpress.com/)

"Million Dollar Cat Billboard project sells 10 000 “squares” (places on a
billboard) $100 dollars each to make world’s first ever cat billboard and put
it up in 10 cities around the globe for a month. To proudly show your cat to
the world you need to buy at least one square. But of course you can buy as
many of them as you wish as long as they are available."

------
tejtm
As good a time as any to trot out my hobby horse with suggestions on how to
mitigate data rot. Aimed at science, but more broadly applicable.

"Identifiers for the 21st century"
[https://doi.org/10.1371/journal.pbio.2001414](https://doi.org/10.1371/journal.pbio.2001414)

note/claimer/disclaimer: Although I am included as an author I do not write
that well.

------
smegel
It's amazing how well designed the ads within the image are...it's a big
jumble but many of them stand out quite strongly with just a single word. I
wonder if they designed ads with the surrounding color context taken into
account.

------
amelius
This homepage demonstrates what an average city would look like without any
regulation.

------
cdevs
My first web page ever is in there. I'm not sure how special of a thing that
is I don't know how many icons are involved.

Also I wonder how Word got around to me about things like this in the days of
MySpace and yahoo as my internet.

~~~
com2kid
Fark, Slashdot, TheRegister, etc.

News aggregation sites have been around for awhile.

------
Gargoyle
Do this with an ICO, with your space verified via smart contract.

It's all in the marketing!

------
rxlim
I wonder how he got everything to fit as more and more space was sold and if
it was a manual process? It must have been like playing Tetris on expert mode.

~~~
ajosh
The way it worked was that as you bought, you could select the region that you
wanted. You would be charged based on the number of pixels in the region.

I actually have seen the server that it was on. The hosting company provided
free hosting to it for several years (at least 8 IIRC). I think that at some
point the server died and Alex Tew decided to move elsewhere.

At one point, he had the million dollar pixel lotto which was the same idea
except at two dollars per pixel and one person who clicked on one of the
pixels would get half the money that came in.

~~~
rxlim
So it was like buying tickets online? if multiple people were interested in
the same region, the one which payment went through first got the region?
However, this must still have been chaotic once the interest started to rise
as space would seem empty until the payment was confirmed, and others would
try to buy in the meantime. I guess you would also just have to make a rough
estimate on the design of your banner and just buy the space, no time to
design it first.

------
philip4534
Xanadu lost.

~~~
zandorg
I know Ted Nelson of Xanadu.

For the record, in 2005 I told him about Million Dollar Homepage, and he was
angry that someone could get rich off such a stupid idea when so many others
fail.

~~~
johnchristopher
Mandatory posting of one of the best wired article ever:
[https://www.wired.com/1995/06/xanadu/](https://www.wired.com/1995/06/xanadu/)

------
pul
Worst of all, only 8 of the 3306 links use https. 11 years really is an
eternity in internet years.

------
johnbowers112
Here's an archive of the article for those having trouble accessing it:
[https://perma.cc/A6ZZ-79X6](https://perma.cc/A6ZZ-79X6)

------
pishpash
Whatever happened to DOI? (Or leveraging Google's knowledge of redirects?) A
lot of rot is hosting changes; the documents, if the author cared, could well
be hosted somewhere else.

~~~
raattgift
DOI is still there; in fact one might say that it's sci-hub and not google
that has made DOI actually useful.

------
mathattack
1780/2816 links being reachable is actually much higher than I'd expect over
12 years. I'm not sure if that's what I would have predicted from the outset.

------
Shorel
Everytime I find something interesting, it goes to Pocket.

That provides me with a digital copy, and it is automatically sync with my
Kobo reader.

------
Nursie
Oh wow, I remember that.

1 million pixels for only a dollar each!

That guy made a nice bundle off the idea, it got picked up and hyped by the
media so much I'm sure the companies that bought in got some ROI, or at least
some publicity. Such was the extent of the dot com bubble that this sort of
nonsense could happen and everyone cheered...

~~~
tdurden
I don't think 2005 would be considered part of the dot com bubble.

~~~
Nursie
Wow, didn't realise it was as late as '05...

It still rode a wave of media hype and naivety.

------
5_minutes
An interview with the creator would've been a nice addition to the story.

------
peter303
I wonder what the "rot factor" is for scientific citations? Some professional
societies I am in mandate URLs for bibliographical references. Most of the
time these are peer-reviewed articles. But they can be softer references like
Wiki reviews, data repositories, etc.

~~~
Deimorz
There are some info/studies about it linked here that show it's fairly high:
[https://perma.cc/#landing-problem](https://perma.cc/#landing-problem)

Perma.cc was created by the Harvard Law School Library specifically to try to
solve that issue, since so many citation links were becoming useless.

------
chenster
I'm just jealous.

------
mavhc
All the links except twitter on the homepage are broken

------
keyboardmonkey
it was always destined to decay, was always going to be a one-off success.
interesting in it's success juxtaposed by its immediate pointlessness.

------
malthazzar
the left of the yellow coupons ad in the right middle

------
fatjokes
I didn't realize you bought the pixels permanently. How did the owner keep up
with serving costs?

~~~
Dylan16807
It's a _million dollars_. Even if it cost a thousand dollars a month to host,
you could pay that with ~1% interest on the money.

And realistically it's something that after the initial craze cools, can run
on quite cheap hosting.

