

Facebook's Scribe technology now open source - qhoxie
http://www.facebook.com/notes.php?id=9445547199#/note.php?note_id=32008268919&id=9445547199&index=0

======
qhoxie
Overview:

\- Scribe aggregates logged messages from every type of server application for
Facebook.

\- It has a simple design in that it takes 2 string arguments: category and
message.

\- There are no predefined categories; new logs are created with new
categories.

\- Servers are arranged as a digraph with 1 outbound and 1 inbound edge.

\- To compensate for network and hardware failure messages are spooled to disk
to be distributed when things are working again.

\- Uses thrift to accept log data from all different applications/languages.

Hosted on SF: <http://sourceforge.net/projects/scribeserver/>

------
amix
Facebook are pretty bad at driving open-source projects. For example: their
Cassandra project has had no code commits for the last several months (
<http://code.google.com/p/the-cassandra-project/source/list> )

While this code and idea sharing is good, it seems like they mostly use it for
publicity stunts (i.e. release something open-source, generate press and keep
on developing it internally).

~~~
mtw
maybe Cassandra was mature and stable when they released it?

~~~
amix
Unlikely that such a complex system does not have any bugs or improvements.
Documentation for the project is also _very_ sparse.

------
wayne
Anyone know the advantages/disadvantages of Scribe vs. Syslog
(<http://en.wikipedia.org/wiki/Syslog>)? Is it just that Syslog dies at scale?

------
aschobel
I remember hearing about Scribe last year from a FB Thift lecture and am
really excited that they finally got it open sourced.

This definitely solves some design problems we have been having with news
feeds.

Facebook's Thrift: Scalable Cross-Language Development
<http://blip.tv/file/446822>

Thank you Facebook!

~~~
mtw
This is just the first part of the equation, I've done news feeds and one of
the most difficult thing to do is aggregate similar actions from the network
(such as X people from your network changed their profile photo). I've got a
working thing but would love to know how they implemented this

~~~
aschobel
My understanding is that each module (Photos, Notes, etc) has its own feed
which gets aggregated to create the main News feed.

If each module is responsible for producing its own feed, it seems like this
problem isn't too difficult to solve.

Couldn't you just cast this as a search engine problem?

You create a term vector with your social graph, and the corpus you are
"searching" is people who updated their profile photo in the last 24 hours.
You could compute the inverted index offline once an hour.

<http://en.wikipedia.org/wiki/Inverted_index>

------
hendler
Looks useful - I'd like to understand more... But not in the same way as the
only response to the post announcing Scribe's release I saw. I love tender
Diane's response: "Do you have basic information on how to utilize facebook ??
I'm old and don't know all the different things I can do on it. Do you have
some sort of instruction or expanation of all the different functions and wat
they are for and how they work ?? Thank you for your paitence and help, Diane
Cheek "

------
dhbradshaw
"We were collecting a few billion messages a day (which seemed like a lot at
the time)"

