Facebook's Scribe technology now open source

qhoxie · on Oct 24, 2008

Overview:

- Scribe aggregates logged messages from every type of server application for Facebook.

- It has a simple design in that it takes 2 string arguments: category and message.

- There are no predefined categories; new logs are created with new categories.

- Servers are arranged as a digraph with 1 outbound and 1 inbound edge.

- To compensate for network and hardware failure messages are spooled to disk to be distributed when things are working again.

- Uses thrift to accept log data from all different applications/languages.

Hosted on SF: http://sourceforge.net/projects/scribeserver/

amix · on Oct 25, 2008

Facebook are pretty bad at driving open-source projects. For example: their Cassandra project has had no code commits for the last several months ( http://code.google.com/p/the-cassandra-project/source/list )

While this code and idea sharing is good, it seems like they mostly use it for publicity stunts (i.e. release something open-source, generate press and keep on developing it internally).

aschobel · on Oct 25, 2008

Thrift seems to have a fair amount of activity:

http://svn.apache.org/viewvc/incubator/thrift/trunk/

What I would fault them for is poor documentation. Google does a much better job on this front.

mtw · on Oct 25, 2008

maybe Cassandra was mature and stable when they released it?

amix · on Oct 25, 2008

Unlikely that such a complex system does not have any bugs or improvements. Documentation for the project is also _very_ sparse.

wayne · on Oct 24, 2008

Anyone know the advantages/disadvantages of Scribe vs. Syslog (http://en.wikipedia.org/wiki/Syslog)? Is it just that Syslog dies at scale?

aschobel · on Oct 24, 2008

I remember hearing about Scribe last year from a FB Thift lecture and am really excited that they finally got it open sourced.

This definitely solves some design problems we have been having with news feeds.

Facebook's Thrift: Scalable Cross-Language Development http://blip.tv/file/446822

Thank you Facebook!

mtw · on Oct 25, 2008

This is just the first part of the equation, I've done news feeds and one of the most difficult thing to do is aggregate similar actions from the network (such as X people from your network changed their profile photo). I've got a working thing but would love to know how they implemented this

aschobel · on Oct 25, 2008

My understanding is that each module (Photos, Notes, etc) has its own feed which gets aggregated to create the main News feed.

If each module is responsible for producing its own feed, it seems like this problem isn't too difficult to solve.

Couldn't you just cast this as a search engine problem?

You create a term vector with your social graph, and the corpus you are "searching" is people who updated their profile photo in the last 24 hours. You could compute the inverted index offline once an hour.

http://en.wikipedia.org/wiki/Inverted_index

hendler · on Oct 24, 2008

Looks useful - I'd like to understand more... But not in the same way as the only response to the post announcing Scribe's release I saw. I love tender Diane's response: "Do you have basic information on how to utilize facebook ?? I'm old and don't know all the different things I can do on it. Do you have some sort of instruction or expanation of all the different functions and wat they are for and how they work ?? Thank you for your paitence and help, Diane Cheek "

dhbradshaw · on Oct 24, 2008

"We were collecting a few billion messages a day (which seemed like a lot at the time)"