Hacker News new | past | comments | ask | show | jobs | submit login
4chan source code leaked (2010) (pastebin.com)
126 points by NotUncivil on Apr 24, 2014 | hide | past | favorite | 102 comments

Not much to see here, folks. Someone took an old leak of the source code and commented out a few lines for a giggle. Only moot and the developers know what the site looks like now, but given the significant addition of functionality in the past few years, it's pretty much impossible that that would be the only difference in the source.

The original leak, from 2010 at least, possibly older: http://pastebin.com/4JVjS02b

4chan was hacked the other day, so the current source code could have been leaked, but if it was, this sure isn't it.

Noted. I have added "(2010)" to the title.

A diff that compares the old leaked source to the "new" one: http://pastebin.com/KkeLzb6q.

Wouldn't the ideal solution for 4Chan be an external data/processing server that dials in to the hosting server to dump out static files? That way the location of the external server remains, at least partially, a mystery, even after the main box is hacked?

Why? 4chan is not a high target. There is no reason to over engineer it. Mostly it will be prepubescent teenagers throwing a fit and bombarding the server with bandwidth.

The hack of earlier today was due to an obsession over a female 4chan moderator. That should say enough.

CloudFlare was hacked with the sole intention of taking over 4chan.org's domain. They're a huge target.

Okay lets say they are a massive target. There is still a monetary issues.

4chan is a cultural and ideological landmark on the American internet. Not only are their clones, but "Cloning 4chan" is almost a business in and of itself. And they fail. 4chan's month to month profits are barely to not-at-all existent. In a purely dollars and cents way, 4chan is a failure.

So their is very little monetary motivation for discovering the secrets of the 4chan's operation.

Security is a trade off of Financial Risk vs Financial Investment. There is no Financial Risk in 4chan being hacked. They have no user accounts, they have no financial data. They have no overly complex-secret-sauce-search algorithm.

The only thing to 'steal' is a collection of Japanese/American Pop cultural referential gif, jpg, and webm files.

They have tens of thousands of "passes" (basically accounts) and the payment information associated with them.

There's nothing static about 4Chan. My guess is that the performance issues would be terrifying.

I mean,

Yes I hate PHP more than the next guy,

Yes this code is terrible,

But you know what? I can read it, and follow along. And that's actually more to say than other "beautiful" code that was obfuscated behind 3 or 4 levels of unnecessary levels of abstraction or indirection.

  if ($sectrip != "") {
    $salt = "LOLLOLOLOLOLOLOLOLOLOLOLOLOLOLOL"; #this is ONLY used if the host doesn't have openssl
                                                #I don't know a better way to get random data

a few lines down:

  system("openssl rand 448 > '".SALTFILE."'",$err);
                if ($err === 0) {
                    $salt = file_get_contents(SALTFILE);

I saw that. "LOLLOLOLOLOLOLOLOLOLOLOLOLOLOLOL" was my reaction, too.

And yet, despite the horrible code, it's still powering an Alexa Top 500 page without any huge problems I've heard of.

Why would bad code hold them back? The technical debt of quick and dirty code lets you get things out faster now at the cost of development time later on (and uptime if you're very slapdash).

I don't blame people for buying into the TDD and other perfectionist bandwagons, until very recently the zeal around the topics meant that you couldn't question the fervent push for very narrow and specific types of software quality. I mean, people were saying "tests are documentation" and I had to nod my head and smile just so I wouldn't get trounced by folks without the development experience to know why that wouldn't work, but had read a blog post saying it does.

4chan doesn't need good code. Of course it wouldn't hurt, but if the main point of good code is test-ability and organization for complicated projects... Well, 4chan just isn't a very complicated project. The simplicity is a design decision from what I've read moot say.

Crazy. I knew PHP was bad, but this is just terrible.

I know hating on PHP is en vogue but you could probably write the same ugly code with another language too.

Yes you could, but it's harder. extract($_GET) does not look as bad as eval(request.GET) but it's almost the same thing for example.

Not possible. Other languages have features to prevent this.

I highly doubt you have anything else to add because I'm sure you're just another person jumping on the "hate php" bandwagon - but go on, entertain me.

Please elaborate.

Most languages dont have extract($_POST) and hop,everything's overwritten... PHP has a lot of shit like this.Yeah you dont have to use them,but they shouldnt be here at first place,if PHP core devs cared about a sane API. PHP doesnt have a sane API. PHP core devs dont give a damn. That's why facebook developped Hack and HHVM.

Sorry to anyone who down voted me. I didn't mean PHP is a bad language. It has done amazing things for the web. I just didn't like the look of the many snippets of HTML in the code which is more common in PHP than in other languages in web development.

what if I told you PHP does not require you to write ugly code? You can do the same in any other language.

Oh, i'm completely aware of that. I was just really talking about this document, didn't mean to spark the debate.

but it makes it incredibly easy to, which is the reason why people hate it. If I'm a good programmer, of course I'll write good code. If I have to fix someone's shit code, of course I'm going to be incredibly angry if it's only there because the language encourages it.

Of course, if the language makes it very easy to write bad code, even as a good programmer I could easily end up writing stuff that's incomprehensible etc.

I think this just goes to show that you can have a lot of popularity even if your code is just sorta glued together.

Don't they get a few million users? I'd say it's definitely nothing to scoff at.

It makes me wonder how many big profile websites might look like this or worse.

Having worked at a couple, I think I wouldn't be too far off to say all of them.

I still remember a week into the first job fresh-from-college me marching into the VPs office to tell him the source code was terrible and they were only still running due to luck. It was not well received (or right)

I almost did the same thing. But then calmed down and said maybe I have no idea what I'm talking about. I was right. I had no idea what I was talking about.

Users dont care what your code looks like,Early Facebook code was no better and look where they are now... it's about the product. Wordpress is a piece of shit from an engineering perspective yet it's the first blog engine in the world. Because its features are not that bad.

Things are different today though,people tend to use native apps, users want realtime features,hard to do that in pure PHP and scale.You often need 3rd party techs,mostly java based...

>I think this just goes to show that you can have a lot of popularity even if your code is just sorta glued together.

As if OpenSSL didn't prove this already.

imgur might be a candidate.

I doubt it. At their scale (a million uploads daily, three billion monthly pageviews), crappy code is unlikely.

Ask HN: Would you rather have a beautiful source code with 1000 pageviews/month or an ugly source code with millions of pageviews/month?

>millions of pageviews/month

That is technically correct but does not covey the scale at which 4chan operates. According to http://www.4chan.org/advertise,

    Page impressions per month: 575,000,000;
    Unique visitors per month: 25,000,000;
    Posts per day: 1,000,000; 
    Alexa Traffic Rank: 836 (Global) & 371 (US)
    Quantcast Rank: 305 (US)
    Google PageRank: 6
Makes me wonder if WebM will increase or reduce 4chan's total traffic (when measured in bytes, not clicks).

I can't imagine WebM impacting 4chan anytime soon, it will probably reduce 4chan's load when(if) WebM takes off but I highly doubt more than a small fraction will choose WebM over a .gif in the immediate future.

WebM has caused longer animated content to be posted that wouldn't have been so possible as gifs (the limit is at 2 minutes iirc), so there will still be people posting large files.

Ability to pay my bills

If the beautiful code with 1000 pageviews/month does that, good

If the ugly code does that, good as well. It may be harder to maintain (depending on the circumstances, some "beautiful" code is dreadful as well) which means less money in the bank

Code is usually beautiful until it meets reality with all the exceptions, corner cases, input sanitation, etc

If my business relies on it, option 1, because ugly code tends to be less maintainable and a business should not depend on magic numbers and "LOLOLO...".

If it is for personal satisfaction, option 2. I guess I don't need to explain this one.

This is not leaked recently but spread today which caused people to believe it was looted during the 4chan hack earlier today. The 4chan administration has been awkwardly silenced about the compromised 4chan website, but this isn't one of the reasons.

http://9ch.in/overscript/ http://9ch.in/overscript/files/yotsuba.txt

yes, I actually added that leaked code to overscript in 2012, previously there was another leak in 2010.

It seems that it's too terrible to be the true code. "if($_COOKIE['4chan_auser']",

"extract($_POST); extract($_GET); extract($_COOKIE);"

this makes it even more likely that it's the real code. Let's face it, no one is expecting a shining example of software design and architectural brilliance here

The original 4chan code was in Japanese and moot used Babelfish to try figure out what did what. From what I remember, the original Futaba code is just as bad (http://www.2chan.net/script/). It's no secret that 4chan is cobbled together with glue and string, moot has said this several times before.

The comments are in Japanese but if you know PHP you should be able to understand it without the comments anyway. It's a single file of less than 1K lines and written in a rather straightforward style. It's also extremely easy to setup - just edit the config parameters at the top and drop it on a webserver, and it's ready to use. Not even a database is needed.

I don't think it's "bad"; it's damn simple and works well for what it is, and contains no unnecessary complexity. No dependencies on some other huge framework, multilayered overengineering, or excessive generality. The same can't be said of the many other clones of it that were attempting to be "better designed" rewrites.

As for "maintainability" or all that other software engineering stuff: this board script doesn't really need to be maintained, because it works, and if anything needs to be changed, it's so simple that the changes can be made easily. Along the same ideas as http://suckless.org/philosophy

Also, moot was a teenager at the time.

Yep. IIRC the code hasn't really be updated or redone since then.

4chan has grown to be quite obviously more sophisticated than Futaba, which really has stayed pretty much the same for over a decade. You can see for yourself:


(that's their board for cat pictures, the most SFW one I could find, but the banner ads are probably still NSFW, so keep that in mind)

It's kind of surprising though. I wonder what the results of a 4chan re-write by 4chan users would look like. Or even splitting out some functions for re-writing.

Well, others have written their own scripts - KusabaX springs to mind. There's another one called TinyIB or something as well.

http://9ch.in/overscript/ is a fairly exhaustive list

I've played with Kusaba X... it's exactly the kind of software that people who hate PHP, hate PHP for.

I think you're thinking of tinyboard: https://github.com/savetheinternet/Tinyboard

Well I would not expect this big pile of shit, either.

What's wrong with the if cookie check? Don't the following conditionals for the mode make sure they still cannot post, which is the point? And the rest of the code is still run so any other security checks are still performed.

extract is one of those moronic things though that only exists to create security holes and other bugs.


OMG, it's horrible :D

Why there isn't a flashing red warning label at the very top of that page, I have no idea. This is one instance where I would actually sanction using the <blink> tag.

To be fair, there is a big red warning box further down that tells you using it with user-supplied data is a very bad idea, and there are flags available to prevent overwriting existing symbols.

None of which the 4chan code actually uses.

I've never seen the utility of extract -- it's the recommended way of getting wordpress plugin parameters, but to me, just using whatever array you're extracting from is always a better solution.

Thanks everyone for your insightful comments: now I've learned that the leaked code is actually from a past leak when 4chan was in a more primitive state, but you're right that it is truly ingenious even then.

> It seems that it's too terrible to be the true code.

1. It's written in PHP. Finding a good PHP developer is nigh impossible (there are exceptions, like always). 2. I expected worse, to be honest.

In my experience great PHP developers tend to find a way out from developing anything in PHP.

Luckily, Javascript is always allowed in PHP projects and it can do a lot more today than it could a decade ago. Also, using it will often lead to having NodeJS on the server even if it's just for compiling assets initially.

And NodeJS is bad ass rock star tech: http://www.youtube.com/watch?v=bzkRVzciAZg

No serious, modern PHP developer writes code like this. If it were a code sample for any respectable PHP job, it would be a massive "do not hire" flag.

No True Scotsman. I've seen a few commercial PHP codebases running worse stuff. You can say they aren't "serious" or "modern", yet they have people doing this full time shipping commercial appliances and services.

I think you live in Lala land. I too hope this, but it's wishful thinking; by far most I encounter in the wild write like this and worse (this actually works for instance).

Confirmation bias. 99% of PHP developers out there are in fact absolute shit, and they're happy with it because they're developing "websites" instead of "applications."

Right tool for the right job. You can use qualifiers like "serious" and "modern" but you're deluding yourself if you think they mean anything when the pool of PHP developers is so staggeringly high.

I'm under no delusion. Admittedly this is the wrong place to be debating anything PHP, but I wouldn't suggest that the average skill level of "everybody who writes PHP" is anything better than incompetent. The people I'm sitting next to now and have worked with in the past are as real as I am - the 1% you recognise are the serious and modern PHP developers. We exist and we're the pool you hire from.

Of course, right tool for the right job. PHP has specific use cases but that's another discussion entirely.

that's just not true. Given my experience, most PHP devs dont use a framework or even composer. When the forefront of your language is wordpress, it doesnt help spread best practices.Agreed you can write crap in every languages,you can write jsp pages, but it's unlikely you'll get a java job if you dont know OO or Spring.

Come on now, I hate PHP more than anyone else, but seriously, wordpress is the best that PHP has to offer. It doesnt suck. It has a quite nice API - its the plugins around wordpress that have quality similar to this 4chan source code.

Ive also seen much worse in Java world, with JSPs and all kinds of taglibs and action-handlers mashed up to create a soup which would make you crave for PHP.


People here love to poke fun at WordPress. I'd love to take any of them out of startup-land and make them work for a full week in a 10 years old corporate piece of software, written pretty much in any language.

> (...) serious, modern PHP developer (...)

A "serious PHP developer" is an oxymoron.

Huh, it isn't hard to find a good PHP developer at all.. (easy to find someone who can code better than that example)

>Finding a good PHP developer is nigh impossible

Yep. Worked as a sysadmin in a company who had a product in PHP before. That was not fun. The bug count grew with each release in my time there.

Ha - I noticed that as well, wowza!

I thought the *chan code was open.

Or is this some critical bit? (I noticed it handles cookies, but I'm too unexperienced with web, php or web-security to explore this wall of code)

4chan's code (Yotsuba) has always been a closed source fork of Futaba, though there are several open source Futaba clones, like Kusaba X.

I wonder what is the site's infrastructure_cost/ad_revenue ratio, because I have long had a feeling that it could be greatly improved. Moot has always been skeptical about innovating the board, even the iOS layout is still incomprehensible since the CSS shim has been added.

Imageboard is dead easy in it's essence, so why not rebuild it from scratch, instead of feeding new bells and whistles to the existing spaghetti monster?

Well at least it's neatly organized into functions :P

My eyes, the goggles do nothing!!!

How can code be "leaked"? Wouldn't this imply someone was able to terminal into one of their servers?

Any number of things could happen; maybe someone got access to a code repository, or a stray flash drive, or the web server was mis-configured and served the file as plain text (happened to fb once) etc.

Or they found a file read vulnerability, or a server was misconfigured for a period and allowed files to be read, or a multitude of other options.

4chan (or just moot's account) was hacked yesterday.

if(isset($_COOKIE['4chan_auser'])&&isset($_COOKIE['4chan_apass'])){ $user = mysql_real_escape_string($_COOKIE['4chan_auser']); $pass = mysql_real_escape_string($_COOKIE['4chan_apass']); }


Steal a cookie, gain access.. WTF

Aren't you able to hijack sessions on most webpages if you stole session cookies?

The real problem is: "extract($_POST); extract($_GET); extract($_COOKIE);"

For more information on extract: http://www.php.net/extract

Docu on extract():

       Import variables from an array into the current symbol table.
       If flags is not specified, it is assumed to be EXTR_OVERWRITE.

          If there is a collision, overwrite the existing variable.
The danger is that any state variables set before the extract($_...)'s can be overwritten arbitrarily. This also makes it essential that any and every variable is instantiated prior to any use.

Does it mean that the password is stored in the cookie or I am missing something ?

It's probably a hashed version. It would be horrendous if it was actually stored in plain text.

How do you "steal" a cookie?

The best bet is generally an xss attack. Though there are other ways, you could sniff one on a wireless network if no encryption is in use.

Get on the same WiFi as your target, open up Wireshark and grab their HTTP communications.

To make this easier, there was/is a tool called Firesheep that can be used to hijack session cookies. The popularity of Firesheep caused many sites to enable HTTPS by default (e.g. Facebook did so).

If you need to be on the same WiFi as your target I really don't see the big problem, realistically speaking.

Common, shared wired LANs at offices and workplaces are a problem. Home LANs, where family members need privacy from one another, is also a problem.

<font>? <table>?

Man, 4chan is worse than I thought.

The HTML was totally redone a couple of years ago. Check out the source on a page now.

whats wrong with <table> ?

<table> is for tables of information, not for layout.

But seeing as how I was downvoted, the important thing to remember is that any hint of levity strictly forbidden on Hacker News.

F* me. No wonder PHP has a bad rap..

I think the fact that you can drive a multi million user website that was once valued at $1.2b (by a VC admittedly) on 2600 lines of pretty bad PHP code when most of the users are exactly the sort of people who'd try to hack it is actually a testament to how good PHP is.

Redeveloping the site in Go, Dart, Python or Node, or whatever language you like best, wouldn't increase 4Chan's value in any discernible way.

At the end of the day, it works.

I wouldn't say it's "pretty bad PHP code" if it did what it needed to, and as you say, it was rather resistant to attack.

Please, please tell me you're joking about 4chan having been valued at $1.2 billion dollars by anyone.

This itched me -- I've never gotten the impression that Christopher Poole was particularly stupid, and I couldn't imagine him reacting to a genuine offer of $1.2 billion for 4chan in any other fashion than by demanding cash on the nail and then taking it -- so I scratched it.

The only thing I could find was a year-old thread from 4chan itself [1], in which the supposed VC never identifies himself, and in which (someone who is probably) Poole had the following to say:

>>this thread

>>my sides

>>the stratosphere


>If this is actually your profession, you should probably find a new job.

The advice never to believe everything you read is good advice in general; with regard to anything you read on, from, or about 4chan, it's indispensable.

[1] http://4chandata.org/q/VC-estimates-4chan-worth-1-2-billion-...

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact