
Facebook PHP Source Code from August 2007 - scapbi
https://gist.github.com/nikcub/3833406
======
nilsjuenemann
I remeber these days. I worked at this time for studiVZ - a german social
network (still exist, but nobody is using it anymore) and this leaked was
caused by a misconfiguration in their Apache setup.

In 2008 studiVZ was sued by Facebook. They said we theft their PHP, CSS/JS
"code". ([http://techcrunch.com/2008/07/18/facebook-sues-german-
social...](http://techcrunch.com/2008/07/18/facebook-sues-german-social-
network-studivz/)).

Indeed, the first version of studiVZ was inspired by Facebook. Our founder had
seen FB in 2005 the first time in the US, but it wasn't possible to use the
service without a .edu eMail account. That was the decision to make a local
clone.

But the real story behind the lawsuit is a bit longer. In december 2006
Facebook tried to acquire studiVZ the very first time.

This picture with two of the studiVZ founder and some people from FB's
management team is from this time:

[http://img-a4.pe.imagevz.net/photo3/f3/9a/ce37499d84437cb744...](http://img-a4.pe.imagevz.net/photo3/f3/9a/ce37499d84437cb744c1dbce2a78/1-cfd8a99fbccfa5cc.jpg)

But FB hadn't the amount of cash to aquire the company. The Holtzbrinck
Publishing Group aquired studiVZ later for 85 mio. €

In 2008 (before the lawsuit) FB tried the second time to acquire studiVZ -
this time for 4,8% of shares! But this deal didn't happend - unfortunately.

~~~
janulrich
Just curious, what social networks are popular in Germany now? And what led to
the declining usage of studiVZ?

~~~
nilsjuenemann
I think the most popular social network is Facebook. But as others said,
people are looking for small services to solve a small problem.

studiVZ wasn't cool anymore. We missed the point for a rebrand to open the
plattform for other people. "studi" is a abbreviation for student. Our answer
was a new brand called meinVZ (just another UI, same database). Together with
the ability to provide new cool features (e.g. Apps, Activity stream, i18n) it
was just a question of time until FB was getting the market leadership.

------
m0th87
Does anyone else remember the friend graph visualization feature facebook used
to have? It was an oddity, both because it seemed like a strange but useful
feature, but also because it was a perl script.

One time I clicked it and got the source code to the file instead of the
graph. It's somewhere on one of my hard drives, but it seems wrong to leak,
especially since it has database credentials hard-coded into it.

~~~
minifig
managed to save a copy of several files that appeared at the time - home.php,
album, group, friends, photo, profile, readmessage ... code was an interesting
read, as were the comments in it: // You fucking link h4x0rs just got pwned //
Shortcut out of this CRAZY expensive garbonzle // What did we come up with? //
don't display the Tuna album (aid=-1) // is this group a dummy meant to
populate newsfeeds? // do not list the creator as harvard // keep track of
tuna contexts // Merman's Admin profile always links to the Merman's profile

// NOTE: ok, at this point we know we are going to display the full // page,
so it is time to do a PHATTY PHATTY MULTIGET of all the shit that // we are
going to need to make this page, or at least the most common things

// Clear fire if desired

~~~
micheljansen
Can anyone explain the strange obsession with tuna? Even in this index.php:

    
    
        // make sure big tunas haven't moved around

~~~
vinhboy
I am going to guess one of their test environment has an album with pictures
of Tunas.

------
javis
When I see code like this, I'm always amazed that I can actually read it and
(kinda) understand what it's doing. I always expect code like Facebook's to be
so finely tuned and advanced that it'd be completely uninteligable to those
outside the company and not an expert in the language.

~~~
Kiro
In my experience code become unitelligable not because it's fine tuned and
advanced but because it's messy and rushed.

~~~
moconnor
Or over-generalized with the logic split up and hidden in the interactions
between a dozen (sub)classes.

~~~
oneeyedpigeon
Or under-generalised with the logic duplicated in a dozen places, ever-so-
slightly differently

------
a1a
I think there is a valuable lesson to be learned from this piece of spaghetti.
I can't quite formulate it from the top of my head. But it's something like:
if you wanna be rich, don't waste your time being pedant - your users couldn't
care less.

~~~
sergiotapia
My mantra:

Shipped code > Well architected incomplete features.

Your user does not care in the slightest if you're using a design pattern, or
if you are using dependancy injection, or if there is 100% code coverage. Just
make it work! Then make it faster! Then make it more readable! In that order.

~~~
kaolinite
You might ship faster but this can easily lead to poorly written, hard to
maintain and insecure spaghetti code. In fact, rushing to ship / meet
deadlines is probably responsible for most of the vulnerabilities in software.

~~~
g8oz
Ship too late and none of it will matter.

~~~
kaolinite
I guess shipping is more important to you than the possibility of losing user
details (or worse). Christ, I hope I never give my details to a company you
found.

Shipping quickly is important but it's also important to write quality code.
Small bugs that can easily be fixed are fine but security problems or bugs
related to payments, for example, are not.

~~~
lmm
How many companies have failed because of security flaws in their code?

~~~
objclxt
Companies don't normally _fail_ because of security flaws, in the same way
Boeing doesn't go bust when it has to ground 787s. But in both cases you end
up potentially taking a _huge_ hit in costs. Off the top of my head, Sony had
to write down $170 million in costs when PSN was compromised, and TJ Maxx
ended up paying out $800 million in costs, damages, and compensation after
their payment terminals leaked credit card details.

These are not figures you want to see on your bottom line.

------
whalesalad
The code looks pretty clean. I dig the two-tab spacing as well, but perhaps
that was done after the fact.

Anyhow, not sure if this makes any difference or not but i'm curious as to why
true PHP constants are not used, and instead regular variables in all caps
like $PARAM_INT are used. Anyone know why this might be?

I ask because one of you PHP-gurus might inform me that there are certain use-
cases where a true constant is not wise.

An example below:

    
    
      param_get_slashed(array(
        'feeduser' = > $PARAM_INT, //debug: gets feed for user here
        'err' = > $PARAM_STRING, // returning from a failed entry on an orientation form
        'error' = > $PARAM_STRING, // an error can also be here because the profile photo upload code is crazy
        'ret' = > $PARAM_INT, 'success' = > $PARAM_INT, // successful profile picture upload
        'jn' = > $PARAM_INT, // joined a network for orientation
        'np' = > $PARAM_INT, // network pending (for work/address network)
        'me' = > $PARAM_STRING, // mobile error
        'mr' = > $PARAM_EXISTS, // force mobile reg view
        'mobile' = > $PARAM_EXISTS, // mobile confirmation code sent
        'jif' = > $PARAM_EXISTS, // just imported friends
        'ied' = > $PARAM_STRING, // import email domain
        'o' = > $PARAM_EXISTS, // first time orientation, passed on confirm
        'verified' = > $PARAM_EXISTS)); // verified mobile phone

~~~
dools
They might be simple input validation filters created using the
create_function() function which allowed you to create anonymous functions of
sorts in PHP prior to 5.3. You can't assign an anonymous function to a
constant. An alternative would have been to include a big switch statement in
the param_get_slashed() function but since the same validators are used in a
bunch of places it seems cleaner to use anonymous functions and have each
function that uses them loop through and call each function passing the
parameter named in the array key in as an argument

------
jwdunne
Hehe, these lines made me chuckle.

    
    
      // Holy shit, is this the cleanest fucking frontend file you've ever seen?!
      ubersearch($_GET, $embedded = false, $template = true);
    

In all seriousness though, I wonder how much of this was written by Mark
Zuckerberg?

~~~
jonknee
With the header "@author Mark Slee" on that file, I doubt much if any.

------
triton
This is quite neat compared to the average shitfest people leave inside
wordpress templates. I had to unpick a whole 200 page intranet written with
wordpress and buddypress (which stinks) recently and rewrite it as a non-
wordpress site as it's hit the inevitable hack brick wall.

It's now an asp.net mvc app on Azure. There's hardly any code in it now and a
the page response times are down at 50ms rather than 1500ms! One query per
page vs 400 as well.

~~~
FireBeyond
Exaggeration is the mother of ... well, something.

An entire intranet functionality with one SQL query per page. Right, okay...

------
Kiro
I don't think it looks that bad. It's clean and easy to understand.

------
mkhalil
Anyone who calls this bad/spaghetti code hasn't seen bad code.

------
fingerprinter
The comments on github are akin to 'my eyes! they bleed', but tbh, this is
actually quite readable and I imagine wouldn't be all that hard to maintain.

I haven't touched PHP since about 2007 and it seems like it would be pretty
easy to jump in and start making changes and edits. It is well templated,
seems to use variables well and not have too many (if any, didn't look all
that closely) hard-coded values. I've seen much, much worse in "cleaner"
languages than this.

As a side note. I learned a lesson in my early 20s that has served me well to
today.

Shipped code always wins.

I was a huge proponent (and still am) of good code, well architected blah blah
blah. But, in the end, if your code never sees the light of day, it doesn't
matter. I saw this when I took over a team responsible for care and feeding of
a product originally written by the founder of a company. This company had
just secured a series B on the original code (still). The product was ugly,
the code was terrible, it was slow, it wouldn't scale, couldn't be configured
and really couldn't be maintained. However, it got the company moving, got a
series A & B, secured the first customer, and a second. It basically created a
runway to build the thing right that wouldn't have been available had it not
gone live.

------
joshaidan
I like the variable naming in the code. It's good that they didn't use one/two
letter variables that span more than five lines of code. Short, meaningless
variables names--used for things other than index values is one of my big pet
peeves. I had to use somebody else's code as the base for a project and it was
full of two letter variables that didn't mean anything. Gave me a big headache
working on it.

~~~
tannerc
Seriously, some of these variables are named so elegantly it's actually a
little surprising.

Beginning developers could learn a thing or two looking over code like these
(even though it's now clearly outdated and likely defunct). I've been
programming for half a decade and my naming conventions are still horrid.

------
conradfr
I don't remember from the site at the time what "monetization_box.php" would
have been doing.

Also, my style of comments seem so tame and lame now :)

------
alecsmart1
This looks weird in search.php (line 89):

    
    
      if($user 0 && !is_unregistered($user)) { return $user; }

~~~
RossM
This is either a syntax error or a HipHop idiosyncrasy (was HHVM in use in
2007?)

$ php -r '$user = null; if ($user 0) { echo $user; }' PHP Parse error: syntax
error, unexpected '0' (T_LNUMBER) in Command line code on line 1

~~~
vezzy-fnord
I don't think even HPHPc was around in 2007. Development didn't begin until
2008, IIRC.

------
fela
line 72: // Holy shit, is this the cleanest fucking frontend file you've ever
seen?!

~~~
mproud
My favorite line, too.

------
jmadsen
There seems to be, here & on the "interverse", a battle on whether it will be
more more hip to call this spaghetti code or to call it clean code. I don't
see very much debate on whether or not it is GOOD code.

It isn't good code. It is cleanly written, especially for the time. But all
the noise about how bad it is isn't "hipster" talk - this is 400+ lines of
completely unmaintainable code; which is why it isn't part of the current code
base.

As soon as they got some money & started trying to make it do more, they wised
up and dumped it

------
lucasnemeth
People just call this spaghetti for superficial reasons, they don't see
objects and they don't see current PHP practices used in 2013. But it is
readable and maintainable code. We're out of context, and there is nothing
there to assume lack of quality. Some people mistake code with literature,
objectively speaking this is not bad code.

------
elwell
From linked article "It seems that the cause was apache and mod_php sending
back un-interpreted source code as opposed to output, due to either a server
misconfiguration or high load (this is a known issue)."

Does anyone know what he is referring to when he says this can happen via high
load?

~~~
smsm42
It is a myth. High load has nothing to do with it. However if you configure
apache wrongly, it will serve .php files as text. Only connection to load is
if you have one broken server among N proper ones, a number of times that the
broken one is hit depends on load - I.e. on low load, it may be configured so
that it is never hit at all.

I guess the story about apache serving php source under high load came from
the idea tha Facebook is a high load site (true) and they couldn't just have
made as obvious screwup as misconfiguring a production server (false) so it
must be apache/php bug that happens only in high load sites (false)

~~~
nikcub
this was a real issue, here is the bug:

[https://bugs.php.net/bug.php?id=26810](https://bugs.php.net/bug.php?id=26810)

there were many related bugs as well. it didn't get fixed for a while, and
when it was fixed it was over a number of revisions and wasn't tagged.

------
lifeformed
How did someone "expose the PHP source code"? Did they actually find a way to
make the code show up client-side, or was it just someone who managed to get
access to the backend stuff? The way it's worded makes it sound like the
former, but that seems unlikely...

~~~
0x0
It can happen if you upgrade Apache and/or mod_php, and for some reason
mod_php fails to register for .php files without being a fatal startup error.
Then apache will serve the .php files as plaintext.

It's one of the things that make one-off .php scripts easy to deploy (just
chuck it into the webroot). With a little effort, this can be mostly
prevented: let the webroot .php file just do an ' include("../outside-
webroot/actual-script.php"); '

------
ErikAugust
Pre-PHP 5.3 -> Right off the bat: no namespace support, no closures, no
surprises there

I am a bit surprised about the lack of OO though...

------
EGreg
This is why I was saying evented webserver code is better.

Look at how much I/O is happening one after the other. The latency could be
greatly reduced by doing things in paralel and waiting until all the promises
resolve.

------
elwell
index.php line 130: foreach($upcoming_events as $event_id = > $data) {

Space between "=>"...

------
rohanpai
A startup with spaghetti code of this stature will NEVER be successful.

------
pearjuice
I really hope they cleaned that mess up or I feel very sorry for all the
developers at Facebook.

>ini_set('memory_limit', '100M'); // to be safe we are increasing the memory
limit for search

>tpl_set('simple_orientation_first_login', $get_o); // unused right now

>// We special case the network not recognized error here, because
affil_retval_msg is retarded.

>all those undocumented(?) random error codes

>mix between ternary operators and regular if/else statements with no logical
choice between one or the other

>no auto loading whatsoever

Seeing as this is from 2007, there is hope.

~~~
Kiro
What's wrong with increasing the memory limit?

~~~
iand
A 100MB per request is going to seriously limit scalability (or inflate
hardware costs at least)

~~~
Kiro
It doesn't say it's 100MB per request, just that it's raised for safety in the
rare case where it's actually needed. Would you prefer serving a 500 to that
user instead?

~~~
nwh
I can't think of a single reason a PHP script like this would need that much
memory to serve a single user. Everything big would be handled by the
database, what could 500MB possibly be used for?

~~~
robryan
They would be applying some search logic in memory in PHP. Depending on what
you are doing it does make sense to load in a heap of data from the database
then narrow it down in PHP.

~~~
nwh
That sounds grossly ineffective.

Reminds me of some production PHP I saw once, returned the entire database
(`SELECT * from 'db'`) and manually walked through it with a for() loop. It
wasn't very efficient at scale.

~~~
encoderer
Yeah, you're right... It's way better to do expensive joins on the database
where horizontal scalability is just a fantasy than in the shared-nothing www
frontends that can be duplicated in clusters ad infinitum.

If you're building SomeSimpleBasicSite.com and you do this, you're probably a
bad programmer. If you're truly facing scalability problems, it's table
stakes.

Based on your comments in this thread you're out of your depth, man.

~~~
nwh
I still can't imagine not find an example of where anybody would want to
return 500MB of results to PHP for processing.

~~~
AaronIG
I think Kiro meant 500, as in the HTTP status code, due to PHP running out of
memory, not 500MB.

