
Google’s Jeff Dean’s undergrad senior thesis on neural networks (1990) [pdf] - russtrpkovski
https://drive.google.com/file/d/1I1fs4sczbCaACzA9XwxR3DiuXVtqmejL/view
======
halflings
As always, Jeff Dean doesn't fail to inspire respect.

Tackling a complex problem (still relevant today) at an early age, getting
great results _and_ describing the solution clearly/concisely.

My master thesis was ~60 pages long, and was probably about 1/1000 as useful
as this one.

~~~
GuiA
I put a lot of effort in my undergraduate thesis, but none of the professors
on my committee had much interest in advising me; and after my defense, the
only professor who really gave me his undivided attention came to me and said
“I’m glad you’re not staying here for grad school; you’re way too good for
this place”.

¯\\_(ツ)_/¯

Comparing your journey to others’ is pointless.

~~~
nerdponx
Sorry you're being downvoted. I don't think you're trying to detract from the
accomplishment here, but you are raising an important point: at an age like
this, good mentorship, leadership, and guidance is essential.

There is a very small number of truly gifted people who end up discovering
things on their own at a young age. Most gifted people who discover something
do so with the benefit of a mentor who can work with them to refine their
talent into skill.

My masters thesis sucked largely because I tried to do it on my own and didn't
even pick an advisor until I was almost done. At that point all they could do
with my mess was to say "well, this is a decent descriptive paper and we need
more descriptive papers in the field," and then give me proofreading comments.
I didn't have a damn clue what I was doing, the end product was mediocre, and
I didn't learn nearly as much as I could have. I'm not an outstanding talent
by any means, but not seeking out mentorship in school is one of my only
career-related regrets.

The fact of the matter is that that some people are deprived of mentorship,
either through bad personal decision-making or through bad academic
infrastructure. These people have a much harder road to expertise and success
than the people who were mentored.

~~~
GuiA
Thanks for your comment. This was not to detract from Jeff Dean's work in any
way - obviously he is a talented, hard working engineer.

This was more to share with 'halflings that comparing one's achievements to
someone else's is rarely something that bears much fruit.

------
mlthoughts2018
An underappreciated aspect of this is finding an academic department that
would allow you to submit something this concise as a senior thesis.

My experience, mostly in grad school, was that anyone editing my work wanted
more verbiage. If you only needed a short, one-sentence paragraph to say
something, it just wasn’t accepted. There had to be more.

Jeff Dean is an uncommonly good communicator. But he also benefited from being
allowed, perhaps even encouraged, to prioritize effective and concise
communication.

Most people aren’t so lucky, and end up learning that this type of concision
will not go over well. People presume you’re writing like a know-it-all, or
that you didn’t do due diligence on prior work.

~~~
Certhas
It's kind of remarkable. There really is no literature review in this paper.
As a supervisor I would have no problem with a content part of this length,
but I would also insist on doing the scholarly work that is not demonstrated
here. Don't just throw out some code and describe it but put it into the
context of what exists. Give credit to where ideas originate.

That shouldn't add too much. No more than a few pages. It would still concise
but then also a scientific work.

~~~
phkahler
That seems to be a key difference between science and engineering. One likes
to survey the field and insist that their paper offers something novel - no
matter how big or small. The other just wants to do some solid work and get
the results.

~~~
Certhas
If you want to be snarky: The one is research the other is documentation.

~~~
phkahler
I'm not really trying to be snarky, just point out the difference in approach.
Research is in fact documentation - of what others have done.

Papers that are entirely surveys or comparisons of different approaches can be
excellent and would make good citations in any practical work.

~~~
Certhas
Not really. The point is that in research we want to generate and further
knowledge. This is distinct from generating and documenting facts. If you
don't link into the web of knowledge there is (implicitly: leave that task up
to the reader) you are just documenting facts.

This is not academic. What did reading this Master thesis teach me? That two
approaches perform reasonably (by what standard?) with a size trade-off.
That's an excellent start but also leaves open many questions: Why these two
approaches? Are there reasons to expect they are better suited than other
approaches in the literature? Were these results expected? Can I expect them
to generalize? Do they paint a coherent picture on the performance of
different designs in various contexts or are they surprising?

A lot of this is about generality of the knowledge gained. As a mere fact
("Two implementations of two algorithms that solve one problem perform
slightly differently") it's not very interesting unless I have that exact
specific problem myself. If I do, I would still need to find the paper. But if
it is linked into a wider web of knowledge ("In paper [X] it was found that
this algorithm performs well on tasks that have something in common with our
problem, paper [Y] and [Z] suggest that we should expect a trade off for small
sizes. Generally nothing is known about what should be algorithms well suited
to the problem at hand.") it allows me to reason about situations.

~~~
phkahler
>> The point is that in research we want to generate and further knowledge.
This is distinct from generating and documenting facts.

Hence the desire to constantly look for novelty in academic work. Engineers
don't necessarily care about novelty, they need to solve a problem at hand for
practical reasons. Documenting what they've done, how it performed, and what
they learned (if anything) is still important to write down for others who may
want to solve similar problems.

I personally find the quest for novelty often reads like some kind of
desperate need to justify the work or to get it funded. Solid work can stand
on it's own even if there's nothing new about it, while mediocre work seems to
stand so long as it's go some element of novelty.

If I've already decided what method I want to use to solve a problem, finding
a well-done implementation and documentation on it is all I really want. If I
don't know what solution to apply to a problem, a survey that documents the
various approaches and makes some comparisons is what I want.

------
dekhn
I guess it's not totally surprising that Dean's undergrad thesis was on
training neural networks and the main choice was between or in-graph
replication. This is still one of the big issues with TensorFlow today.

One thing most people don't get is that Dean is basically a computer scientist
with expertise in compiler optimizations, and TF is basically an attempt at
turning neural network speedups into problems related to compiler
optimization.

I'd like to thank my undergrad university for hosting my undergrad thesis for
25 years with only 1-2 URL changes. Some interesting details include:
Latex2Html held up, mostly, for 25 years and several URL changes. The
underlying topic is still relevant (training the weight coefficients of a
binary classifier to maximize performance) to my work today, even if I didn't
understand gradient descent or softmax at the time.

------
mi_lk
Wonder who was his advisor back then, because I don't think it's mentioned in
the thesis. Or he did this on his own, which is not surprising by the way.

~~~
russtrpkovski
[https://twitter.com/jeffdean/status/1033874204548984833?s=21](https://twitter.com/jeffdean/status/1033874204548984833?s=21)

~~~
mi_lk
Professor Vipin Kumar for the lazy: [https://www-
users.cs.umn.edu/~kumar001/](https://www-users.cs.umn.edu/~kumar001/).

~~~
paganel
Pretty interesting, reading his short bio I learned for the first time about
AHPCRC ([https://ahpcrc.stanford.edu](https://ahpcrc.stanford.edu)) of which
prof. Kumar was head of for about 7 years, the US military is indeed involved
almost everywhere in SV.

~~~
defen
Read Steve Blank's "The Secret History of Silicon Valley". The US military
_created_ Silicon Valley.

~~~
tntn
I've always wondered what the "secret" in the title refers to. This is pretty
common knowledge among people who are from here, is it only secret from the
people rushing in to work at FAANG?

------
mcilai
Quite incredible that he was interested in NNs back in 1990. He closed this
thread very well.

~~~
coldsauce
Weren't neural nets popular back then?

~~~
2sk21
My PhD, which I completed in 1992, was about improving back propagation in
neural networks. Neural networks were going through an initial phase of
excitement caused by the Rumelhart and McLeland book. My dissertation was on
modularizing NNs.
[https://surface.syr.edu/cgi/viewcontent.cgi?article=1130&con...](https://surface.syr.edu/cgi/viewcontent.cgi?article=1130&context=eecs_techreports)

------
scottlegrand2
Really interesting and innovative early work, and I think it also explains why
tensorflow does not support within layer model parallelism. It's amazing how
much our early experiences shape us down the road.

My entire career has consisted of reimplementing bits and pieces of things
I've previously built all the way back to high school and then reimplementing
whatever was new on the previous round in the next one.

------
slyrus
Does anyone else miss enscript -2G?

~~~
aquamo
enscript -2Gr yeah -). used to be my default mode as well!

------
yuhong
As a side note, I already have a draft of my essay (not published yet) that
replaces the mention of storage costs with a mention of Ruth Porat. The point
is why Ruth Porat was hired in the first place.

------
elvinyung
I don't know anything, but does this work directly inspire DistBelief?

------
pknerd
Interesting coding style with too much whitespace. Is it some standardized
pattern? I found something similar in the code written by John Carmack.

~~~
sigjuice
This is the usual amount of whitespace. A couple of examples

[https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...](https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/kernel/pid.c?h=v4.19-rc1)

[https://github.com/apple/darwin-
xnu/blob/master/libsyscall/m...](https://github.com/apple/darwin-
xnu/blob/master/libsyscall/mach/mach_vm.c)

------
akhilcacharya
It's really impressive that Jeff accomplished so much despite going to UMN for
undergrad.

~~~
wenc
In the U.S., one's undergraduate institution does not correlate to success as
much as it does in certain countries like France or Japan, where universities
are a pipeline for elite selection and grooming.

Also, not all intelligent American kids can or want to go to elite schools,
even if they are academically qualified. In the U.S., you often hear stories
of kids turning down really good schools for ones they felt were a better
"fit" (financially, culturally, etc.). And unlike the rest of the world, elite
colleges in the U.S. are often private and expensive. Despite need-blind
admissions, not everyone can afford them without going into heavy debt. (many
middle-class parents make just enough money for their kid to not qualify for
substantial financial aid).

So kids go to schools they can afford.

One of my college professors (who attended Princeton and MIT) once told me
that in his observation, the top 5 percentile students in (good) state schools
aren't that different from the kids who went to Princeton or MIT. I didn't
believe him at the time, but having worked with different folks over the
years, my experience inclines me to believe that there's some truth in that
observation.

Owing to its population and economy, the U.S. has a large enough talent pool
that the top percentile students at large, well-funded state schools (of which
UMN is an example) are plenty smart. If you were to meet the really smart
top-5-percentile kids from such state colleges (I have), you'd have no doubt
that many of them could have attended MIT or CMU.

To be sure, good colleges can give you a headstart in life -- but it's what
you do with that advantage that counts.

\--

Examples of smart computer folk who went to decent, but non-elite schools for
undergrad:

Doug Crockford (Javascript), SFSU

JJ Allaire (ColdFusion, Rstudio, etc.), Macalester College

Ward Cunningham (Wikis), Purdue

Rich Hickey (Clojure), SUNY Empire State (though he did go to Berklee College
of Music)

John Carmack (Doom, Quake), U. Missouri Kansas City

Sergey Brin (Google), U. Maryland College Park (before Stanford)

Larry Page (Google), U. Michigan (before Stanford)

Dave Cutler (VMS, Windows NT), Olivet College

Bram Cohen (BitTorrent), U at Buffalo

Ryan Dahl (Node.js), UCSD, then U Rochester

Larry Wall (Perl), Seattle Pacific U (before Berkeley)

Alan Kay (Smalltalk, windowing GUIs), U Colorado, then U Utah.

Brendan Eich (Javascript, Mozilla), Santa Clara U (before UIUC)

~~~
godelmachine
Thanks for the list but I am not entirely sure about Sergey Brin and Larry
Page’s alma mater.

~~~
freyr
Interesting fact: those two went on to create the search engine Google. You
should check it out.

~~~
godelmachine
I meant I am not sure about their alma mater not being among the top reputed
universities in USA, meaning I implied they were indeed from one of the top
reputed universities in USA.

Apologies for not making myself clear enough :(

