
A 14-year-old could build 1998's Google using her Dad's credit card - enki
http://paulbohm.com/2012/01/16/a-14-year-old-could-build-1998s-google-using-her-dads-credit-card/
======
wickedchicken
'And if you'd wanted to use a hash table, if you even knew what a hash table
was, you'd have to write your own.'

BSD's hash table code has been around since probably longer than the author
has been alive.

Here is the FreeBSD version, it's very compact and works quite well:
[http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/db/hash/h...](http://www.freebsd.org/cgi/cvsweb.cgi/src/lib/libc/db/hash/hash.c?rev=1.23.2.1.2.1;content-
type=text%2Fplain)

~~~
kroger
'And if you'd wanted to use a hash table, if you even knew what a hash table
was, you'd have to write your own.'

And just for the record, Common Lisp had hash tables since 1984 (and I guess
Maclisp had them before that), but earlier lisp dialects had things like
plists and alists.

~~~
Vivtek
Unless you were working in academia, you weren't using Lisp. (Probably.) I
know I wasn't.

------
leif
This post seems to miss the point that the major hurdle faced by a 14-year old
trying to learn how to program and find the right libraries etc. to use is
solved by Google itself.

~~~
enki
good point! google definitely is part of what makes us so much more productive
programmers today! (despite google codesearch going away)

~~~
Stormbringer
Obligatory StackOverflow Reference

~~~
lelele
?

~~~
Stormbringer
Redundant explanation that Google has itself been surpassed by $Site as a
resource for programmers.

~~~
jfarmer
I'd wager SO's main source of traffic is Google

~~~
cas
Since he has already written about it (88%), I don't think many people will
take that wager :D

[http://www.codinghorror.com/blog/2011/01/trouble-in-the-
hous...](http://www.codinghorror.com/blog/2011/01/trouble-in-the-house-of-
google.html)

------
notJim
I thought Google's real innovation was their technique of using the
interconnectedness of the web to determine the true value of content. So
rather than only looking at the content of a page, they also look at the
content from incoming links to that page. What package out there implements
the algorithms for this, and is well-documented and trivial enough to use that
a 14-year-old can understand them?

As far as I can tell, this article says 1) Shucks, hardware sure is cheap
these days! and 2) There sure is a lot of software out there that you can mash
together! Those things make it easier to start a company, but they don't
provide the essential insights that make that company truly revolutionary.

~~~
nl
_What package out there implements the algorithms for this, and is well-
documented and trivial enough to use that a 14-year-old can understand them?_

Nutch[1].

Nutch doesn't deal with modern web spam particularly well, but I'd say it
matched early Google pretty well. Specifically, it implements Page Rank, has a
reliable web crawler and a web-scale data store.

[1] <http://nutch.apache.org/about.html>

~~~
notJim
Wow yeah, that actually looks like it would do the job. There's a part of me
now that wants to implement a spam classifier on top of Nutch to see how good
of a web crawler I can create… thanks for the link!

------
ctdonath
_By the end of 1998, Google had an index of about 60 million pages_

Sounds like a marvelous challenge. Anyone have other similar "technological
frontier then, high-school science fair project now" type challenges? OPer
notes BioCurious as one. A major factor in education is walking kids thru a
subject from basic principles to state-of-the-art, recreating historical
milestones along the way.

~~~
jaylevitt
AOL in the late 1990's, minus the dialup itself.

Content publishing: Weekend project. Rails, memcached and CloudFront and
you're done.

IM and Buddy Lists: 1.5 million simultaneous users doing n^2 pub/sub-type
distributed transactions.

Mail: 4,000 emails per second with live unsend and recipient read/unread
status. I think PostgreSQL tops out in the _millions_ of rows per second
nowadays.

Web caching/acceleration: pick your favorite proxy solution and configure it.

Single sign-on: Form strategic partn-- Hey, you said _technical_ challenge,
not political.

------
InclinedPlane
For extremely contrived definitions of "1998's Google" yes. But if all it took
was a pile of servers and hard-drives for 1998's Google to succeed then a lot
more other companies would have done so as well. It takes more than that to
build a company.

~~~
enki
(author here)

I was writing this more in the sense that kids at BioCurious (and the DIY Bio
Movement in general) are doing electrophoresis to transfer DNA from glowing
jellyfish to bacteria. This is just a few (two?) years after someone got a
Nobel prize for that.

That's progress. If stuff that used to be hard falls into kids hands, you're
gonna see impressive stuff happening.

However I fully agree that it takes more than that to build a company (Also I
wouldn't try to compete with 2012 Google using 1998 technology)

~~~
InclinedPlane
Fair enough. The title seems a bit link-baity, I think something along the
lines of "the infrastructure of 1998's Google" would have been better.

~~~
billpatrianakos
I half disagree. If you're blogging then the point is to get that blog some
eyeballs on it. Otherwise you write in a journal or don't make it publicly
accessible or at the very least don't help it get indexed and never link to
it.

I think there's link-bait and then there's _LINK BAIT! (TM)_. It's a fine line
between the two. You have to have a catchy, preferably keyword splattered,
title or you become yet another blog no one cares about. I also think there's
too much focus on the title when it comes to real link-bait. The really awful
kind of link-bait is the kind that links to an article with very little to no
content having anything to do with the title. In this case I think the article
corresponded with the title enough for it not to be link-bait-style
misleading. But that's me and there is no real answer. Just interpretations.

~~~
InclinedPlane
I wholeheartedly disagree. If you are blogging ideally you are doing so
because you are injecting valuable insights or information into the world at
large. The value is not to you that eyeballs are on your blog but to the
eyeballs themselves.

------
vecter
I think the heart of Google (at least at the get-go) was PageRank. Sure you
had to write a web crawler, but that wasn't the magic sauce that made Google's
search so good. I don't think most 14 year olds could understand the math
behind PageRank, much less derive it from scratch.

------
gghootch
I'm not sure whether this is applicable but my main objection with this
article is that the numbers don't add up. How many Ph.D. candidates do you
know who are granted a budget of $10k+ to do their research? Surely something
else must have been going on to shrink the expenses to a more acceptable
amount.

Then again, according to the wikipedia page the original BackRub was conceived
when the web was only 10 million pages large, $2000 is considerably more
acceptable for a Ph.D. project.

~~~
enki
"The SDLP is notable in the history of Google as a primary sources of funding
for Lawrence Page's and Sergey Brin (Brin was also supported by a NSF Graduate
Research Fellowship) during the period they developed the precursors and
initial versions of the Google search engine prior to the incorporation of
Google as a private entity"

This included a $4,516,573 NSF grant (that didn't go to Larry & Sergey in
full, but probably helped their project's infrastructure quite a bit).

[http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=9411...](http://www.nsf.gov/awardsearch/showAward.do?AwardNumber=9411306)
[http://en.wikipedia.org/wiki/Stanford_Digital_Library_Projec...](http://en.wikipedia.org/wiki/Stanford_Digital_Library_Project)

On the expense side I've probably actually underestimated the expenses by
orders of magnitude. Bandwidth wasn't cheap back then and the storage
requirements probably were significantly higher.

------
ChuckMcM
tl;dr version: Computers and disks are a lot cheaper now.

Basically the article boils down to this, what counted as a 'cluster' in 1998
is a single system in 2008, what used to take hundreds of disk drives to
store, you can store on 1 today.

Not particularly deep, but useful to think about from time to time. There is a
quote, perhaps apocryphal, which says

"There are two ways to solve a problem that would take 1000 computers 10 years
to solve. One is to buy a 1000 computers and start crunching the numbers, the
other is party for 9 years, use as much of the money as you need to buy the
best computer you can at the end of the 9th year, and compute the answer in
one day."

The idea that computers get more powerful every year, and that in 10 years
they will be more than 1000x more powerful than the ones you would have
started with so one can solve the same problem.

Of course they haven't been getting as powerful as quickly as they once were,
but the amount of data you can store per disk has continued to outperform.

The point is that if you are designing for the long haul (say 10 yrs from now)
you can probably assume a much more powerful compute base and a lot more data
storage.

~~~
Vivtek
That's not even close to what he's saying - I thought that was actually a
rhetorical weakness, to tell the truth.

What he's saying is that the existence of the cloud and library advances such
as MapReduce and APIs mean that the bar is lowered, when writing new software,
to an extent it's hard even to comprehend.

Every time I get a module from CPAN I still get a shiver down my spine,
remembering trying to do new and interesting things in the 80's and early 90's
and _every single time_ ending up trying to build a lathe to build a grinder
to grind a chisel to hack out my reinvented wheel.

~~~
pork
A bit off, but CPAN really hasn't changed all that much. I tried installing a
module the other day, something simple like a word stemmer, and got so
disgusted that I quit Perl.

~~~
Vivtek
Try writing it from scratch, whippersnapper. In C.

I guarantee you'll end up having to write a damn string library and garbage
collection - and you'll get it wrong.

~~~
pork
I'm 45.

~~~
Vivtek
Which explains your dismay at using new stuff. I'm 45, too. Fight it.

My point - which, as a 45-year-old programmer, you should have understood -
was that modern languages and library repositories make a whole lot of basic
work go away, so that we're working at a higher level than was possible in
1985.

------
jpzeni
This is an excellent example of link bait

------
bborud
Where does to 200Gb figure come from? I was quite busy building a web crawler
too at the time and I can distinctly remember that our crawlers had about 17Tb
of storage. So let's say we had crawled something like 15Tb of data to get a
meaningful sample of the web.

I agree with the gist of the blog posting though.

~~~
enki
In <http://www.salon.com/1998/12/21/straight_44/> it said "Page says the
current version of Google, which has indexed about 60 million pages, will
continue to be improved as the company expands." and
[http://en.wikipedia.org/wiki/History_of_Google#cite_note-
sal...](http://en.wikipedia.org/wiki/History_of_Google#cite_note-salon98-20)
said Total indexable HTML urls: 75.2306 Million Total content downloaded:
207.022 gigabytes.

------
agscala
If you think a 14 year old could build something as complicated as 1998's
google.com, think of what an adult with training could do at the same time
with the same resources. As technology advances, so do our expectations.

------
robot
Comparing 1998's problem set with today's tools is not a good comparison. The
tools are cheaper but problem sets are also much bigger.

~~~
ahi
The problem sets are bigger only because our tools allow them to be.

~~~
rachitgupta
There is also simply more information in the world now to index, because the
internet has been around for longer.

------
rudiger
This would require a 1998 Internet!

------
billpatrianakos
The author makes a great point about technology advancing so quickly that the
bleeding edge of just yesterday is now just cute compared what we have now and
about how cheap of a commodity server hardware has now become.

Unfortunately he had to use the 14 year old girl analogy and exaggerate the
ease with we could build Google circa '98 today. Now his whole point is lost
to click clacking of a thousand pedants' keyboards. Guys, this isn't about 14
year old girls nor is it about Google per se as much as it is about the fast
pace of tech innovation, the ease and costs associated with acquiring
infrastructure, and to a lesser extent there's a tiny but about how we're
totally spoiled compared to what we had to work with 14 years ago.

The stuff about Google and 14 year old girls is just a literary tool (along
with some mild hyperbole) to help illustrate his point which so far is getting
completely missed. Come on guys, is this Hacker News or Pedantic Literary
Scholar News? Focus on the point, not little Google girls. PLSN does have a
nice ring to it but no, we're not on PSLN. At least not yet.

------
joejohnson
A 14-year-old could probably do it using her mom's credit card too.

------
SODaniel
I don't even understand the point of this post. I could have started
Amazon.com at 22, but I didn't.

------
dmoy
"Google" + "bleeding edge hard drives"

hehehe

------
angersock
So, just a gripe about your startup plug at the end of the article.

Look, I don't care whether your product cures cancer, dispenses oral sexual
favors, and mints pure gold dubloons-- I will not give you my email address
without a damned good reason.

Every single goddamn link on your page brings me to a "Enter your email here"
prompt, except for the company tab, which brings me instead to a pile of vapid
marketing bullshit.

 _Flotype Inc. is a venture-backed company building a suite of enterprise
technology for real-time messaging. Flotype takes a unique approach by
building developer-friendly technologies focused on ease-of-use and
simplicity, while still exceeding enterprise-grade performance expectations.

Flotype licenses enterprise-grade middleware, Bridge, to customers ranging
from social web and software enterprises to financial and fleet management
groups._

What does that even mean? You using carrier pidgins? Dwarves? Cyborgs? UDP?
ZeroMQ? Smoke signals?

You don't even tell me how my email is going to be used.

Fix your shit.

~~~
ramanujan
^ This kind of post just drags HN down, and is the kind of thing that jacquesm
was talking about. Seeing something this rude at the top of HN for a post this
guy worked hard on is probably not what he expected, and made his lunch taste
a little worse today.

There's a time and place for profanity/verbal hostility. Feedback to a
stranger on website UX isn't it, the perceived intensity level and level of
anger is just dialed wrong. I wish pg would implement a filter for this kind
of comment.

~~~
angersock
^This kind of post clutters up discussions with metabullshit already accounted
for by the karma system.

More seriously,this is not a mere UX problem. This isn't a problem with colors
not matching, with poor navigation, or with anything else.

Absent any other information, this site appears to be a way of fishing for
email addresses. That's the long and the short of it.

I am not just a string to send messages to. I am not just a networking
opportunity. I am not just an entry in your preferred database.

I am a developer, and I don't like it when sites treat me otherwise.

I thanked the author for his (very fast) response.

I'm sorry about the tone of the post, but frankly we can't let this
dehumanization and arrogance towards users (and worse in this case, fellow
developers) slide.

EDIT: Note also that, had he simply posted a good article (which it was!)
without the shameless plug, I would've said nothing. If the plug had linked to
a page that had anything other than email scraping, I wouldn't have
complained. But the linked page was so offensive that it deserved calling out.
Let this be a lesson for you startupy folks: don't cheapen a good thing with a
bad plug.

~~~
nostrademons
Then don't give him your e-mail address.

