

Stuff Harvard people Like - ekm2
http://blog.echen.me/tag/mit/

======
rflrob
I'm blown away that UC Riverside is the second most predictive interest for
Berkeley. I wonder if, somehow, UCR students are also showing up in the
dataset, also bringing down the predictive power in the already "too large and
diverse for an overall characterization" data.

~~~
dexy
I'm curious whether the info about Harvard students is accurate as well. If
there's one university I would imagine gets followed by a lot of non-students
it's Harvard.

For a clear illustration, look at their respective facebook place pages:
Harvard: 918,295 likes; Stanford: 248,574; UC Berkeley: 86,214; MIT: 65,339;
Caltech: 2,686;

This clearly isn't about the schools' relative sizes or their popularity among
past students but the global reach of their respective brand names.

Some of the things listed for Harvard rang very true from my time there
(particularly the interests in consulting, new york, private equity, and
famous harvard grads like Conan) while others made less sense to me (Jimmy
Fallon?)

Still a very interesting data set though, if only to see what kinds of people
are influenced by each school's brand.

------
antics
Oooooh man, Latent Dirichlet Allocation is cool stuff, especially in the
context of topic modeling (which is what Chen is doing here). OP actually
wrote a pretty accessible blog post about how he does this sort of thing which
you can see at [1].

If you don't want to read it, Mike Jordan has a pretty neat presentation about
it at [2]. If you're statistically trained and don't want to view the video,
you will probably understand this synopsis: is that you can view each document
in a set of documents as a discrete admixture of some number of topics. If you
imagine that words can be modeled with a discrete exchangeable random
variable, and you choose some number of topics to model (let's say k topics),
then you can use a hierarchical Bayesian model, specifically with an
underlying Dirichlet distribution over the base measure of each Dirichlet
process that forms every particular admixture document. This allows topics to
share some amount of information, which then allows you to generate some
pretty useful topics, like the ones from the OP.

If you don't understand that synopsis, then go look at Jordan's talk. It makes
it all pretty clear. :)

[1] [http://blog.echen.me/2011/08/22/introduction-to-latent-
diric...](http://blog.echen.me/2011/08/22/introduction-to-latent-dirichlet-
allocation/)

[2] <http://videolectures.net/icml05_jordan_dpcrp/>

------
tikhonj
Another interesting thing to think about is what sort of student at each
university is the most likely to use Quora.

For example, I noticed that Berkeley seemed to have a bit of bias towards web
technologies (Ruby or Rails and the like). While there are certainly plenty of
people around that like web apps (and it seems almost everyone has a pet
project on the web these days), I suspect that they are overrepresented simply
because people making web apps are much more likely to be on Quora than
hippies (Berkeley has those too, believe it or not).

------
spartango
Follwers of a school doesn't necessarily mean that those people actually
attend that school; particularly for prestigious schools, there are plenty of
people who aspire to or admire the school, but actually aren't part of it. Its
a worthy difference.

~~~
maestri
It sounds like there was some filtering to try to prevent that, for example,
filtering out people who follow both Harvard and Stanford. Interesting
anyways, even just to see what people interested in attending certain schools
are interested in.

------
chollida1
> Berkeley, sadly, is perhaps too large and diverse for an overall
> characterization.

I'm hoping that the author means sadly in the I wish I could properly
characterize the school but can't rather than sadly the school is unfocused.

~~~
geebee
Berkeley's definitely the odd man out. Here are some rough approximations on
the number of undergraduates at each institution...

Stanford: ~6,800 Harvard: ~6,600 MIT: ~4,200 Berkeley: ~25,000

Keep in mind also that UCB has about twice as high a percentage of low income
student (so the class size at UCB is 4-5 times the size of these elite
undergraduate schools, with 8-10 times the number of low income students, many
of whom are transfer students). Also, UCB is probably more "local" among
undergrad populations, since about 70% come from California. So I'd expect a
very different profile for undergrads.

Interestingly, the profile of the graduate student population at all four
schools is probably fairly similar, as the elite private colleges admit far
more grad students than undergrads, whereas the opposite is true for UCB - and
PhD programs at UCs aren't subject to the same geographic restrictions as the
undergrad program.

~~~
mturmon
Caltech is the odd one out in the other direction:

Caltech undergrad enrollment: 967

------
jarin
Well, at least SOMEONE likes Klout.

(I do like getting free stuff though)

