

LinkedIn's Data Infrastructure - timf
http://www.infoq.com/news/2010/08/linkedin-data-infrastructure

======
swah
I have a strong negative reaction every time I view the InfoQ site. I can't
say exactly what is the problem thought... perhaps they should have somewhat
bigger margins to separate content from menus and ads?

------
tzs
I have one contact on LinkedIn whose profile takes significantly longer to
load than that of anyone else I'm linked to. On days when the others are
taking under a second to load, his takes anywhere from 5 seconds to half a
minute.

There are two ways his differs from the others. (1) he has way more contacts
than anyone else I know, and (2) he has a paid LinkedIn membership, whereas I
and all my other contacts have free memberships.

Has anyone else noticed particular profiles always load slow, and if so, does
this correlate with either them having a large number of contacts or with them
having a paid membership?

------
bartwe
I expected more then two engineers to work on that aspect of their backend.

~~~
timf
I was similarly surprised to read "three people work on [Facebook] photos, the
largest photo site on the internet." (<http://ow.ly/2k7ka>)

My theory is that they have good managers that take care of all the BS and
free their minds to get tons of work done :-\

~~~
bartwe
In someway it worries me that there are so little of these jobs apparently,
bit of a lottery ticket to get such a challenge.

------
yarapavan
Link to Slidedeck: [http://www.slideshare.net/ydn/6-data-
applicationlinkedinhado...](http://www.slideshare.net/ydn/6-data-
applicationlinkedinhadoopsummmit2010)

------
KrisJordan
Surprised to read LinkedIn's "People You May Know" MapReduce pipeline takes 82
jobs.

From what I've read Google's index runs in around 20 MapReduce jobs.

~~~
rjurney
Pagerank is a like a markov chain - you iterate the same dataset over and over
until you're happy with the result. If 20 is good enough, its good enough.
Good explanation here:
<http://www.iterativemapreduce.org/samples.html#Pagerank>

Whereas if you include signals from multiple sources, the joins are each one
MR job, never mind the calculations.

------
badave
Does anyone know where I could find out how Facebook does their suggestions?
It'd be interesting to compare LinkedIn's method to Facebook's.

------
xiiiiiiiiii
Great team, nice reading about them

