

Moving from PHP to C saved my web startup. - BuckToBid

A few months ago I took the journey into startup land.  I quit my job and decided to finish the project I had been thinking about finishing for a long time. A penny auction app where users are linked to facebook identities.<p>One major worry right from the start with this project was that it required a real time accurately synchronized countdown across multiple browsers.   I was using ajax to call back to the server for an update every 1 second to accomplish this.  I knew it was going to get interesting if a lot of users joined up because they would all be calling the server once every second.<p>Initially I thought we would run into trouble with bandwidth, but I was wrong.  After 4 days we had about 150 users up and bidding when the server decided to crash (right in the middle of an ipad auction just because my luck is that good).  Being on slicehost they had us upgraded within 20 minutes, but still we were working the server hard.  The timer was skipping, we were getting all kinds of strange errors in the logs and things were looking pretty bad.  It wasn't bandwidth, but memory and processor usage that were the problem.<p>I knew from reading articles on HN regularly that PHP is terrible, but I build the app in PHP anyways along with the 1 second callback.  It was the fastest way to get things up and running for me.<p>That night I had an idea, if I built the 1 second call back portion of the app in C the server wouldn't have to load apache with all its extensions every 1 second.  So I built a very simple fork server to send updates back for the real time update.<p>Result:  Processor 99% idle even during heavy use and memory usage basically stays exactly the same whether or not more people are watching the auctions.<p>We now have a few thousand users and things have not changed at all, still running great.  Of course this system will have its limits but so far we have not even dented it.<p>If you want to see it in action you can go to http://apps.facebook.com/bucktobid<p>Edit:  For those who want a little more detail I have a lighttpd server listening on port 80 that redirects to apache for php calls.  If the call comes in for .btb (a made up extension) lighttpd redirects to the C app which listens on another port locally and serves the needed info to the browser.  The updater is 100% C/C++ not an apache module.
======
dugmartin
So it really wasn't PHP but Apache right? In an hour you could have switched
out your front end server to Nginx and had it serve responses from Memcached
and then keep the Apache/PHP backend and change it to update Memcached on bid
changes.

Here are some links:

[http://www.igvita.com/2008/02/11/nginx-and-
memcached-a-400-b...](http://www.igvita.com/2008/02/11/nginx-and-
memcached-a-400-boost/)

[http://lserinol.blogspot.com/2009/03/speeding-up-your-
nginx-...](http://lserinol.blogspot.com/2009/03/speeding-up-your-nginx-server-
with.html)

~~~
sedachv
So instead of understanding the problem and implementing the simplest possible
solution, you're recommending throwing more middleware caching crap at it?
Brilliant engineering. I think I understand "web scale" now.

~~~
wanderr
How is placing the blame on the wrong tool understanding the problem?

~~~
sedachv
You're right, I should have read the OP more carefully.

------
poink
The more generalized takeaway from this is that you shouldn't use a
heavyweight listener to handle polling (or websockets in the near future) if
you can avoid it.

PHP was the culprit here, but I can't help but think you'd have had the same
problem if you were trying to do the same thing with RoR, any Java app server,
or any of the Python frameworks.

Likewise, node.js or Twisted probably would have been an equally effective
replacement.

~~~
rarestblog
I'd probably go with node.js for this task - more manageable. Twisted has a
huge learning curve.

~~~
jemfinch
If you want to stick with Python, tornado is a really good async server
without all the learning curve of Twisted.

~~~
rarestblog
I wrote my own async servers and having done that (and used Tornado), I'm back
on threads :) Async is kind of cool in some cases (long polling for one), but
for most cases it's pain.

Right now mostly working on CherryPy's WSGI server.

------
scumola
Good for you! I read all of the other guys asking, "but why didn't you do
this? ..." I myself am a throwback from the C/C++ days and I frequently re-
write stuff in C. That's how it's supposed to be done. Write as much as you
can as fast as you can up-front, then optimize your bottlenecks using more
efficient methods. I'm happy that C improved your situation that drastically.
I agree that the overhead of lighttpd, then apache, then php might have been
the real killer in your situation, and using memcache might have helped also,
but making a simple C server using fork() to handle processes and not opening
it up to the world is a very good solution in my book. People think that if
enough people start writing things in C, that they'll have to start doing it
too - I think that's the reason for all of the backlash. Remember, there are a
_bunch_ of C and C++ programmers out there doing things that perform well and
scale, but your audience on HN is mostly php/python/java programmers and web
startup people who go the route of optimizing using more trendy technologies
instead of down-shifting into a language like C. More power to you, fellow C
programmer! :)

------
rarestblog
Why do you use AJAX to update countdown?

Using AJAX for this would give you MUCH less accuracy than plain-old
JavaScript with time delta of user time to server time:

If you just need to time sync - you could receive server time once
(remembering user time, when you sent request for server time), then just
compensate user time with that value.

In PSEUDO-JavaScript (client-side):

    
    
      // do this once:
      var user_time = (new Date()).getTime();
      ajax.call('/server-time-in-sec-since-epoch', 
        callback: 
          delta_time = recvd_server_time - user_time;
      )
      
      // then at any given second real server time is:
      var current_server_time = (new Date()).getTime() + delta_time;
      // no need for ajax calls
    

For more accuracy you should divide delta_time by 2 (since it's round-trip).

~~~
BuckToBid
The reason it has to continually sync is that any user could place a bid at
any moment. This makes the timer increase and top bidder change, so every user
must be notified of the change. Its asking the server for how much time is
left in each auction not what time it is in the real world. Sorry for the
confusion.

~~~
toolate
Why don't you use long polling rather than sending a request a second?

You could even incorporate the timer into a single request.

    
    
        do {
            sleep(1);
            $seconds_remaining = fetch_auction_time_from_memcache();
            echo "<script>updateActionTime($seconds_remaining);</script>";
            flush();
        } while ($seconds_remaining > 0);

~~~
BuckToBid
This is interesting. Would this work if I was on the site for say an hour? Or
is there some limit to the amount of time you can send data like this? I have
never tried anything like this. What is the overhead like?

~~~
toolate
I've only used it on smaller projects. Facebook use it on their frontpage for
progressive loading (check the source for
`<script>big_pipe.onPageletArrive(..);</script>`).

Longer running connections are a little more problematic, but some client side
code should be able to handle the connection being closed. You just need to
make sure your web server can handle the number of connections your are
expecting. Apache is particularly bad for this. Something like nginx should
perform better.

------
DrJosiah
So... what you built was a Facebook version of <https://www.wavee.com/> (or
any one of the other dozens of sites), which is basically a way of taking
foolish people's money.

Replacing PHP/Apache fork/threads with a C daemon is a good migration, though
most any language with an async sockets library worth a damn should be able to
handle thousands of simple requests every second.

~~~
BuckToBid
Yes there are many penny auction sites out there. The whole reason to link it
to facebook is so that you know its a real person you are bidding against. How
do you know wavee has all real people bidding? You don't some of their users
could just as easily be bots bidding items up automatically.

~~~
DrJosiah
The thing is, Wavee is not an auction site. An auction site is one in which
bids are free, and the winner pays what their bid says they pay. What Wavee
had, and what you have built are, effectively, a method of taking money from
people for the opportunity to pay money for something.

Don't get me wrong, it's an amazing racket; Wavee makes 75 cents for every bid
coming in to increment the value by 1 cent. Saw a $150 iPad. That iPad has
already earned Wavee's owners $11,250 without even being sold! And the use of
Facebook to gain the trust of people is a good marketing tactic, but it
doesn't skirt the fact that creating fake Facebook users with a bunch of
friends is easy (get some pictures, feed some content in from any one of the
millions of open twitter accounts, etc.), or that you have a real incentive to
perpetrate fraud.

Really though, being the most honest crook among crooks still leaves you being
a crook. That's why some countries have outlawed this particular kind of scam.

~~~
BuckToBid
So now you have just jumped straight into calling me a crook. I'm sorry that
Wavee or whatever site you went to took your money.

I honestly didn't know (and still don't) if you can make any money doing an
honest penny auction site. We are just breaking even with this one so far.

The fact is that every time an auction goes up we have more risk than anyone
else involved. If I put up a $500 iPad it could go for extremely cheap. The
lowest one has ever gone for is 64 cents. That means we made less than $64 and
still had to buy and ship that $500 iPad. And we have never sold anything for
more than $26 which still isn't the $2600 you think it is because some
packages give you more bids per dollar.

The legitimate complaint is when sites cheat. Which I would guess alot of them
do. Using Facebook is not some trick to get people to trust us. Its a way for
people to verify for themselves that they are in fact bidding against other
real people.

How can you say that its a scam? Do you think we are just out there creating
all these fake accounts with fake profile and personal photos and
relationships etc. I can see maybe if we had 5 or 6 or even 20 users. But we
have 100's of winners. That's a stretch even for the most paranoid and cynical
of people.

Not everybody wins every time. But the data we have so far says that the
majority of users that buy more than just a few bids are actually the ones
getting all the deals. Its the people who come in and spend $24 expecting to
win a $1500 item then leave when they don't. Those are the people who are
losing. The users who are logical enough to see how the system works are the
ones who get far more than they put in.

And some countries outlaw all kinds of crazy things, some countries are
considering passing legislation to ban homosexuality so I guess we should all
agree that its bad now too according to your theory?

~~~
DrJosiah
I'm not so foolish to have spent money at any of those sites. But I do stand
by my statement of calling you a crook. I don't believe that I will be able to
convince you, but I do hope that I'll be able to convince others.

Let's give you the benefit of the doubt for a moment that you are actually
honest. We'll say that you aren't running bots, fake identities, etc. That's
fine. My basis in calling you (and those that run businesses like yours) a
crook is not founded on that (though I have no doubt that other companies are
doing as much, if only to boost the value and bidding on an item, but I
digress).

Try to remember that the fundamental operating principle of your business
being profitable is your selling the vast majority of your customers
absolutely nothing. They aren't getting a good or service for any bid that
doesn't win (which still costs them money). They get nothing.

The money to purchase those items must come from somewhere. If you are
breaking even (as you say you are), it's not coming from the people who are
"winning", it's coming from the people who are losing. If/when you are making
a profit, it's not because the "winners" are necessarily paying that much more
for an item (they won't have bid enough plus paying for the total value to pay
for the item itself), it's because you've got more losers who are putting
money into something without getting anything in return.

Your business is breaking even, and may eventually be profitable because of
all of those who come in buying $24 worth of bids, failing, and leaving. Any
business that requires it's customers be ignorant in order to make money is
fundamentally a scam.

Also, your conflating countries making homosexuality illegal with countries
making illegal a business that bases it's operations on exploiting the
ignorant, is a fundamentally flawed argument. One is based on basic human
rights to be who they are and behave in ways that causes no harm to others.
The other is one that profits from people who don't know any better. One is a
human rights travesty, the other is the outlawing of an enterprise with
margins that organized crime wishes they could have (in the case of a
"successful" site like Wavee and others). Trying to claim their equivalence,
or that based on "my reasoning" they are equivalent, is dishonest, and really,
troll-like behavior. That may fly in some forums, but it doesn't fly here. Try
again.

------
lkrubner
Polling in real time always needs to be done in some compiled language. The
good folks at 37 Signals ran into this when they launched their CampFire app.

For instance, consider what David Heinemeier Hansson says about Campfire, the
chat software he helped developed. First written in Ruby On Rails, it soon
became clear that the code that polls to see who is in the chat room needed to
be as fast as possible:

"We rewrote the 100 lines of Ruby that handled the poll action in 300 lines of
C. Jamis Buck did that in a couple of hours. Now each poll just does two super
cheap db calls and polling is no longer a bottleneck. Campfire and a shared
todo list is different because they’re not working on a shared resource.
There’s no concept of locking. Or two people dragging the same item. So a 3
second delay between posting and showing up doesn’t matter. It does when
you’re working on a shared resource."

<http://www.ruby-forum.com/topic/62907>

Later they tore out the C code and re-wrote it in Erlang.

~~~
sedachv
...which [Erlang] is not compiled.

~~~
silentbicycle
False. Erlang is compiled natively via HIPE (the "High Performance Erlang"
compiler, nice acronym!) on many platforms, and compiled to BEAM bytecode on
the rest.

Running Erlang has some overhead, sure, but that's because it's designed for
distributed systems where you can _pull a plug out of the wall_ without
interrupting service. I wouldn't use Erlang for number crunching, but using it
as a glue language for a networked system hits all its strong points.

~~~
sedachv
I didn't know that, I thought everything ran on beam. I guess the point is the
canonical Ruby implementation is really slow when it doesn't need to be. It's
not like it's a hard problem, or even that it hasn't already been solved (look
at GemStone's Maglev: <http://ruby.gemstone.com/>)

~~~
silentbicycle
I've stopped commenting about MRI entirely, it just makes people mad, and it's
not even fun anymore. _It's too easy._ Still, I have to give Matz credit for
making a language a lot of people sincerely love.

I know it's splitting hairs whether bytecode + a VM counts as compiled or
interpreted (it's both, really), but compiling to bytecode rather than a pure
interpreter usually makes enough of a difference performance-wise that it's
worth giving some credit.

~~~
chc
If people would just stop talking about Ruby 1.8, it wold allow for more
meaningful discussion. 1.9 is a lot faster and has better support for
concurrency.

~~~
silentbicycle
Indeed, it's too bad that the transition has been taking so long.

------
bl4k
keep-alive

it doesn't make sense to make a new connection every 1 second, esp. to apache.
that is where your prob was, not php

------
adatta02
Not to be the unpopular one but couldn't you also have used Flash to open a
socket to a server written in [name your favorite language] and had the server
intermittently (every 1 second, every 500ms) send out the current tick?

------
justin_vanw
Every time I read a story like this it reminds me of a very important lesson:

The world is built from bailing wire and duct-tape. There are probably a
million better, smarter, less technology illiterate ways to solve this
problem, but that really doesn't matter. What matters is getting out there and
doing it. Being able to do this sort of app 'correctly' would be an edge, but
only to the person who can do it 'correctly', and is out there doing it.

I wish you the best of luck with your duct-tape!

------
Udo
> _I knew from reading articles on HN regularly that PHP is terrible, but I
> build the app in PHP anyways along with the 1 second callback_

You always have to use the right tool for the job. It requires a deep
understanding of what is actually going on inside the server when you write a
line of code. PHP doesn't magically absolve you of that.

It really has nothing to do with PHP, C, Ruby or [insert your most
reviled/loved technology here]. Calling a complex runtime for hundreds of
near-contentless requests per second on a single machine is a really bad idea,
no matter what environment you use.

Also, I'm sorry if I snipe from the cheap seat here, but 1 request per second
per user doesn't seem like a great solution to your problem either. It might
be more appropriate to just leave the HTTP connection open and push new data
out through that when it becomes available, e.g. when something about the
bidding process changes.

------
rograndom
Instead of loading apache for each php call, why not have a few PHP FastCGI
instances running? They're lighter weight than apache+mod_php and you don't
have to wait for them to load for each call?

~~~
BuckToBid
Mostly just because I've never used FastCGI before, and I have played around
with making C/C++ servers for fun before. Seemed like the fastest solution and
so far its worked out better than expected.

~~~
silentbicycle
This is a bit of an oversimplification, but where CGI spawns a new process for
each connection, FastCGI* starts the process _once_ , then runs a loop to
handle each connection, so the process startup, database connection, etc.
costs amortize to essentially nothing -- many of the constant factors for
working in a higher level language are eliminated.

FastCGI is worth looking into - a lot of popular webservers support it, and
it's less of a complete model change than switching to an event-based system
(e.g. node.js) or an MVC framework.

* Or SCGI, which is a newer, simpler design with similar goals.

------
RealGeek
Node.js worked great for me for similar application. Built the web application
in PHP and timers with Node.js

~~~
BuckToBid
Was it an auction site as well? If so which one if you don't mind me asking?

~~~
RealGeek
Yes, it was a penny auction site. I can't give the URL as it is not my
website. I assisted them with development.

------
AdamN
Why not use a js ntp library?

<http://jehiah.cz/a/ntp-for-javascript>

Then have a dedicated ntpd daemon running. That will be way more extensible,
maintainable, and scalability than a custom C program.

~~~
BuckToBid
The timers are all set differently, we can have 9 seperate auctions going at
once and all have different times they end at. Also when someone bids the
timers increase. So they are not only all different but all changing
constantly. Its not just a single countdown that we could sync.

------
unshift
can't you use a comet connection with something like orbited instead of
polling? i wouldn't trust polling via HTTP GET with "real-time accurately
synchronized countdown" -- a couple of small delays can skew your entire
countdown, and delays are easy to come by across the internet. especially with
creating multiple connections.

as much as i dislike like it, i doubt the problem has much of anything to do
with php. a simple fastcgi server hooked into lighttpd would have probably had
the same outcome of better performance, or even apache with mod_php.

------
jeffreymcmanus
So you saved this app by using fork(), not (just) by using C. You can use
fork() from a lot of languages, including shell scripts and PHP.

------
bluesmoon
php isn't terrible in general, but it can be the wrong tool for certain use
cases, this being one of them. In the same way, apache is also not the right
tool for some use cases, this being one of them. your solution works well for
you, so stick with it. if you're thinking of scaling up further, also consider
an event-driven server architecture (nginx, node.js, etc).

~~~
mikegreenberg
Agreed that PHP isn't terrible. Typically, it's the programmer's fault before
it's the language's. Also, I agree that PHP isn't the right solution for every
problem, but if you're set on using PHP for the wrong problems and need to
bitch about it, see the previous sentence.

A nice discussion on why PHP __DOESN'T
__suck:[http://stackoverflow.com/questions/309300/defend-php-
convinc...](http://stackoverflow.com/questions/309300/defend-php-convince-me-
it-isnt-horrible/)

~~~
mkramlich
It's not that PHP is terrible. You can definitely write code in it that gets
shit done.

It's that it exists in the same universe with Python, Ruby, Java, Clojure,
Erlang, etc.

If you have the freedom to choose your implementation language, and those are
your alternatives (to name just a few), that's when PHP is harder to justify.
Not that it's terrible in isolation. In relative terms.

------
msie
Lots of really good comments here (except mine ;-)). I'm glad I read this. Is
there a "Best of Hacker News" out there?

~~~
gcr
<http://news.ycombinator.com/lists>

~~~
AbyBeats
Any idea on how to subscribe to Best of HN?

------
earlyriser
could you give more info about how you did it? I need to implement something
like this for a frequent Ajax refresh.

~~~
BuckToBid
You could google how to do a simple fork server in c/c++ and after you have
that working you just need to do something like this:

stringstream response; response << "HTTP/1.1 200 OK\r\n" << "Server: BTB
Auction Updater 1.0\r\n" << "X-Powered-By:BTB Update Engine\r\n" << "Content-
Length: " << msg.length() << "\r\n" << "Content-Type: text/html\r\n\r\n" <<
msg;

    
    
      int n = write(s, response.str().c_str(), response.str().length());
      if(n < 0) error("Error writing to socket");
    

to write back a valid header that a browser will understand

~~~
chrisaycock
You could store the static (non-changing) text in a couple of C strings and
then call writev() to communicate everything. That would save you the step of
continuously reconstructing the header.

~~~
BuckToBid
Thanks for the suggestion!

------
codesink
you can embed your own C code without the hassle to handle HTTP/S (or
forkbombs) using KLone web server; it can easily handle thousands of requests
per second.

p.s. I'm part of the company whom made it.

------
jamespitts
Did you create an apache module? If not, what libs did you use?

~~~
BuckToBid
I completely bypassed apache all together. Apache is not used for the 1 second
ajax call at all. The only libs i used were jansson for easy json manipulation
and mysql++ for database access

------
TomK32
Couldn't it just be that you suck at writing PHP?

------
kqueue
have you considered using rabbitmq/0mq? It sounds like they are perfect
candidates for what you are trying to achieve.

~~~
BuckToBid
I had never heard of those before. Are they capable of working inside any web
broswer? Looking it up right now.

~~~
kqueue
Yes, you can reach a queue via a web browser using the proper api access you
provide.

If you have 100 users, each one waiting for a message in their queue, then you
can "broadcast" a single message to all of the queues (or a group) with one
command. All the users waiting for a message in their queue will get a copy.

Another approach which is hack-ish imo, is to use something like jabberd to
broadcast messages.

