
Stephen Hawking's Ph.D. Thesis Crashes Cambridge Site After It's Posted Online - endswapper
http://www.npr.org/sections/thetwo-way/2017/10/23/559582380/stephen-hawkings-ph-d-thesis-crashes-cambridge-site-after-it-s-posted-online
======
maxpert
Ok I am confused, 60k hits in a day? What broght down the website? 72MB of
size, network congestion, or 60k hits? Even with authentication for download
what can bring the system down? I have handled more traffic on a RPi with
100MBps connection. I really don’t get it.

~~~
maxpert
Here I made a quick copy and a quick domain, almost 10 LoC and it will be more
reliable than what Cambridge had
[http://hawking.sibte.ml/](http://hawking.sibte.ml/)

~~~
fnwx17
Love the irony on that! Which makes me wonder how much valuable content and
research in locked away on technologically ancient servers in Cambridge and
Oxford.

~~~
qurashee
More modern and up to date than you'd think :) Some departments here in Oxford
have a server life-cycle of 3-5 years. It's just that nobody bothers for
higher than expected volume of traffic unfortunately (a practice that can be
extrapolated to many things in academia).

------
retox
What percent of downloads will actually be read I wonder.

I'm guilt of downloading and hoarding things that seems interesting and never
getting round to even opening them. "When I'm retired", I tell myself.

~~~
callinyouin
Guilty as charged, as well. I download just about any programming manual/book
that's free in PDF or similar form and I've probably read maybe 10% of them.

------
netsharc
I'm glad NPR used this headline, another news site said it "broke the
internet". Journalism!(tm)

------
ianopolous
Here is an ipfs link to it:
[https://ipfs.io/ipfs/QmNwcSE8BYQmHSS99dtg2VdAez4uswVkvCssj4R...](https://ipfs.io/ipfs/QmNwcSE8BYQmHSS99dtg2VdAez4uswVkvCssj4Rgce4rLp)

~~~
imrehg
Thanks a lot, and pinning it to help to seed :) `ipfs pin add
/ipfs/QmNwcSE8BYQmHSS99dtg2VdAez4uswVkvCssj4Rgce4rLp`

------
PhilWright
The PDF is just a scan of actual book printed pages. No wonder it is a
monstrous size.

~~~
ccvannorman
So a man who uses a machine for synthesis into text for speech, wrote a book
which was then printed, scanned, and uploaded.

I wonder if they faxed it at some point too?

~~~
ktta
In 1966?

HN's guidelines say that one shouldn't ask if one has read the posted link,
but I'm tempted to all the time.

~~~
vidarh
It would seem to be a joke.

But even so, while 1966 was indeed early for "regular" use of fax - the first
"user friendly" Xerox fax machines hit the market around then -, the first
transmission of facsimiles of images dates to the 1840's, and the first fax
that used similar methods to "modern" fax machines of scanning line by line
(the "scanning phototelegraph") dates to 1880. Commercial fax machines have
been around since around 1900.

So it would indeed be possible.

One weird and wonderful product of early faxes (fax over radio predates
"wired" fax machines): Finch Facsimile's [1] were used to transmit
"newspapers" via AM radio in the '30's, that was then printed on thermal paper
at the home of the subscriber.

From [1]: "Six hours overnight was enough time to print a six page two column
news bulletin, delivered in time for breakfast."

[1]:
[http://www.theradiohistorian.org/Radiofax/newspaper_of_the_a...](http://www.theradiohistorian.org/Radiofax/newspaper_of_the_air1.htm)

------
colinbartlett
Seems like a perfect use for a torrent. Is there a tracker link?

~~~
logicallee
"I've watched everything"

"What do you mean everything?"

"TV shows. Movies. Even the japanese ones."

"How about older stuff"

"I've gone through early Chaplin work. I've seen Metropolis 17 times."

"I'm sure there's something"

(grabs his friend)

"You don't understand, Paul. I've been reading Shannon. A Mathematical Theory
of Communication. I've run _out_. I've taken to begging strangers for a fix."

"That bad?"

(Guilty), "I just... " (resigned) "I just asked someone to put up a torrent of
Stephen Hawking's Ph.D. thesis..."

~~~
B1FF_PSUVM
> Movies. Even the japanese ones.

Even the Eastern European ones. And all of Ingmar Bergman.

(shudders)

~~~
scandinavegan
That's the good stuff! I recommend Satantango, and all of Bergman.

------
endswapper
Direct link to thesis: [http://schema.lib.cam.ac.uk/PR-
PHD-05437_CUDL2017-reduced.pd...](http://schema.lib.cam.ac.uk/PR-
PHD-05437_CUDL2017-reduced.pdf)

~~~
DrScump
It would have been more helpful to post _mirror site_ addresses rather than
exacerbate the problem.

Anybody know of mirror sites? A basic web search doesn't show any, and
archive.org doesn't show it.

~~~
matthewbadeau
Mirror:
[http://web.archive.org/web/20171024005153/http://schema.lib....](http://web.archive.org/web/20171024005153/http://schema.lib.cam.ac.uk/PR-
PHD-05437_CUDL2017-reduced.pdf)

------
eighthnate
I just have to ask why this is news? Is this really something newsworthy?

Cambridge's network probably isn't as hardened to spikes in traffic since they
don't get much traffic. But still, it isn't 1995. They should have some form
of load balancing or distributed/clustered web/data/file systems to handle
temporary spikes in traffic and data requests. Serving simple static data
isn't something that should "crash the site".

~~~
Insanity
The technology behind the repository itself is not _great_. (DSpace[1]), add
to that the factor that it is not actually build to handle this many requests
and scaling quickly is out of the question too because of the server set up.

Even without issues, it often felt a bit sluggish when serving locally. The
pages are quite large, and the whole pipeline from content -> webpage is
rather tedious.(Java, XSLT -> html)

It shouldn't have happened - but I assumed it would.

disclaimer: I am a former contributor to the project [1]:
[https://github.com/DSpace/DSpace](https://github.com/DSpace/DSpace)

------
zamber
Suggestions on getting this in audio form? I guess it requires transcribing
the handwritten parts. The Chrome OCR fails there. Is there a better one?

Sample:

This implies that the universe is spatially homogeneous and isotropic since
there is no direction defined in the 3- space orthogonal to Ua. In this
universe we consider small perturbations of the motion of tl1e fluid and of
the '.ifeyl tensore 1 Ne neglect products of small quantities and perform
derivatives with respect to the undisturbed metric. Since all the quantities
we are interested in with the exception of the scalars, µ, ~' e have
unperturbed value zero, we avoid perturbations that merely represent
coordinate transformation and have no physical significance. To the first
order the equations (1) - (4) and (7) - (9) are

------
accurrent
Article has a typo in the quote should be Olber not Older who described what's
now known as Olbers paradox.

------
dredmorbius
This points to challenges of digital information.

Stephen Hawking and his dissertation are high-profile as these things go. The
NPR mentions other _popular_ items generating 100s of requests per month. I've
run across items with _lifetime_ request counts in the double or triple digits
frequently (and suspect I doubled the count on one particular item).

More often, though, the truth is that this material _simply isn 't available
online._ There are several thesis repositories (either Michigan State or
University of Michigan are one, as I recall), and I can _frequently_ turn up a
shelf reference via WorldCat ... somewhere.

But there's work from surprisingly prominent names in numerous fields that
simply isn't available in electronic format. The worst case is for materials
from rougly 1924 - 1980: to late to be out of copyright, and too early to have
been composed, or converted to, digital formats (and 1980 is an early cut-off
date for that, though it's when material seems to start appearing in bulk).

This includes PhD dissertations, Masters theses, and numerous academic or
other writings, _often including government documents not under copyright._
Thankfully with Sci-Hub, actual published academic journal articles can be
found, freely, with a very high success rate. Particularly painful for me are
popular magazine and newspaper items, _for which even the indices are very
frequently locked behind site-restricted or affiliate-only access._

The time-and-effort differential of being able to look something up online,
vs. travelling many miles to a facility for access, is tremendous. And it
absolutely stops a great many incidential queries dead.

See Rick Falkvinge's excellent rant about how the KRACK vulnerability was
blocked behind corporate-only paywalls for over a decade:

[https://www.privateinternetaccess.com/blog/2017/10/the-
recen...](https://www.privateinternetaccess.com/blog/2017/10/the-recent-
catastrophic-wi-fi-vulnerability-was-in-plain-sight-for-13-years-behind-a-
corporate-paywall/)

Note that the issues here are twofold. One element is the task of scanning and
making available documents, and organising the results in a manner useful for
search.

But much the harm is the direct consequence of the present regime of copyright
and paid access to information, AS WELL AS the perverse incentives of
advertising-backed media and media manipulation have created a media regime
that is actively harmful to society.

I'd really like to see the elements of this addressed.

------
tbrock
99% chance someone used apache’s default configuration.

~~~
Bromskloss
What is the limitation in that default configuration?

~~~
jerf
KeepAlive, probably: [http://www.kalzumeus.com/2010/06/19/running-apache-on-a-
memo...](http://www.kalzumeus.com/2010/06/19/running-apache-on-a-memory-
constrained-vps/)

