

Ask HN: Javascript widget scaling issues - bastian

Our startup offers a javascript widget that you can embed on your website. This is similar to the Twitter widget that displays your most recent tweets. This widget automatically polls for new content (every 30 seconds) and recently we have been embedded on several high traffic website. This is resulting in incredibly high server load.<p>Our widgets are receiving around 50/60 requests every second. These include both widgets loaded for the first time, and also subsequent requests where the widget asks for new content. We are running our database on a large instance using Amazon EC2. The database is not the problem, however. We recently moved the code that serves widget content onto a dedicated small server instance. Immediately traffic has been phenomenal and CPU usage is very high right now.<p>We operate under a LAMP environment using PHP as our server-side language and MySQL as our database platform. We take advantage of memcached by storing the content for 30 seconds at a time (this makes sense because our widget refreshes every 30 seconds). This means that on most occasions the content that is served comes directly from memory (with 0 queries being executed). After a request is sent, we deliver back HTML content. We know that it would be smarter to return simple JSON containing only the newest updates. This is something we will implement.<p>It seems that with 50/60 requests coming in every second, even though the majority of the content never makes a request to the Database, the sheer volume of traffic is what is killing the CPU.<p>Do you have any suggestions on how to handle this smart?<p>Move to a Large instant on Amazon to serve widget content?
Scale horizontally with small instances and load balance them on Amazon?
Could PubSubHubBub be used to push new content to the widgets? I would love more information.
Is memcached having any effect on CPU - since there is such high traffic?
Should the widget be served using Java?<p>Thank you ver much!
======
ichverstehe
1\. Small instances on EC2 are _really_ slow. Really, really slow. You get
much more value with the medium instances.

2\. Use a HTTP cache (e.g. Varnish) in front of your app. Even though you are
using memcached, every request is still (I assume) hitting your application.
With a proper cache, your application layer would only be hit when the cache
expires.

~~~
bastian
Thanks, good to know that a medium instance is better value. We will have a
look at Varnish now.

------
ecaron
(In addition to the great advice that you need a medium instance, and
Velocity+nginx would certainly become your friends)

Could you provide links and examples? I would be interested to know what
headers your server is delivering (such as are you sending 304s on refreshes).

Is it vmstat or top that's showing "the sheer volume of traffic is what is
killing the CPU"? You didn't mention using APC, I assume that you have that
configured. Also, have you ran inclued to see how your codebase is generating?

------
kls
I would write the content to file every 30 seconds on an lightweight webserver
(not an app server) or a CDN. Point all the clients to the file and then have
your application write the file say every 28 seconds. Get the high load out of
the application server and cache the data on the edge. It will be far more
performant and it is much simpler to scale the edge.

