

Ask HN: How do you do realtime - sid

Hi all, i was wondering, there is alot of talk about realtime and up to the minute data but what i dont understand is this.<p>Twitter for example is considered realtime because the information is up to the minute and discussing topics that are <i>now</i>.<p>However with twitter if you dont refresh your page the information will be old. In friendfeed for example, if there is some new information, it will automatically show and appear as it is created.<p>My question then is, what is realtime, is it the ability for the information to appear on the website as it is created or is it more about the data being about realtime events and topics ?<p>My next question is, if it is about the page being updated and refreshed in realtime like friendfeed and gmail, how do you implement that short of polling. Doing it with polling is very easy but also very inefficient, is there actually something out there that allows this realtime update of information without polling or is this how friendfeed and gmail do it ?<p>Thanks alot guys
======
jacquesm
Realtime is a thing that has two meanings, what you call 'realtime' is not
really realtime, which means 'when it happens', but more of a softer version
of it.

To make the distinction, consider 'hard' realtime, where the events are very
closely followed, usually with a guaranteed latency between something
occurring and a response.

Twitter and all kinds of other 'realtime' applications are probably better
described as 'near realtime'.

To make a website that displays 'live' data you can use ajax for underwater
calls to see if the data on the page has been updated since the last time you
related it.

Plenty of sites now use this technique, and while it gives you the impression
that it now really is 'realtime' the only difference really is that you no
longer do the poll to see if new information has arrived, your browser does it
for you.

The web being 'pull', in other words you contact a server to retrieve
information short of keeping an HTTP connection open all the time and
streaming data through it as it becomes available you have to perform some
kind of poll. And, as you already noticed this is pretty inefficient.

If you use server-push, which is technically possible - and which I've been
using in some form or other since the mid 90's to stream 'video' (read a
sequence of still images) - the upper limit is usually how many concurrent
connections your server architecture can handle.

For a single machine with a single IP that upper limit is about 60,000
connections, a multi-homed machine can do a multiple of that by binding to
more than one IP address. The reason for this is that every connection
requires a socket which maps to a port and the TCP field for a port number is
only 16 bits.

By using a poll system the overall latency goes up but you can handle many
more concurrent users with the same machine.

------
hedgehog
You can push data to the browser over a transport called BOSH:

<http://xmpp.org/extensions/xep-0124.html>

<http://code.stanziq.com/strophe/>

Strophe + ejabberd & twisted is working working well for me.

~~~
sid
Great, thankyou for this , i will give it a try. Our product is almost done
however in future releases i would like to incorprate some _realtime update_
functionality.

I will go through this information.

------
asimjalis
The alternative to polling would be to have the server notify the browser that
the data has changed. Here is one way to do it: The browser can do an HTTP GET
and have a thread block on it -- later when the server has new data it sends
the notification as a response to this GET. This way you are not polling over
the network.

~~~
sid
I had thought about this but the problem is i have set timeouts for my
requests to only a few minutes because i wanted to prevnt DOS attacks as best
i could without timing out to early (as there is some upload functionality).

Also i had 2 other problems which i thought may arise which kind of moved me
away from trying to do it this way.

The first was if we start to get significant hits i think there would be a
limit to the number of FD's i could have open on a single server and having
the connection open continuously as it waits would slowly eat away at the
number of FD's i would have available. Some people may leave the browser
logged in and walk away (though i guess after a certain amount of inactivity
you could just logout for them)

The second is, i have noticed using cherrypy the ajax requests are processed
in sequance so im not sure that if i have a blocked request, whether it would
hold up other requests for that session however if the blocking was in another
thread it should work i think.

Maybe when i finally _release_ my product and i start working on this
functionality for V2 i will look more closely and give this an actual try
instead of just thinking through it.

Hmmm, some things to try there ... Thanks for your inputs mate.

------
marram
Technically, the simplest way to do this is via polling+ajax. In our project
we implement this as follows:

Every n minutes, the clients polls for updates. It sends along a marker token.
The token is either a timestamp it has previously received from the server, or
a serialized/json version of the "client model". A while ago, I posted a quick
writeup on why you'd need a serialized model here:
[http://stackoverflow.com/questions/602322/differential-
ajax-...](http://stackoverflow.com/questions/602322/differential-ajax-updates-
for-html-table/603701#603701)

I believe Gmail uses a Comet approach (keep http connections open for a while,
and send updates at they happen), but this can be tricky to implement,
especially when using something like Google App Engine that places strict
deadlines on request handlers.

~~~
sid
I also tried it this way and it works well. i was sending back the timestamp
and the current state and it was quite easy to implement using setTimeout and
recalling it again x-minutes later to check if the server state is now
different to the client state.

I guess the issue at the moment is we are pretty much a poor startup and
worried that it would put stress on the little resources we already have by
making the servers handle requests that are not actually doing anything except
comparing state of the client and the server.

I was also kind worried that some corporate networks might block my IP/domain
name because it may detect the polling as some sort of malicious data being
sent back.

I think after V1 is released if we can become atleast ramen profitable i might
start to look at implementing this functionality using gmails approach as we
may then be able to afford more servers.

------
tsestrich
Well, the idea behind "realtime" as it pertains to Twitter is that it is about
events happening that minute. While your page might not be refreshed
immediately to see new information, you COULD refresh it yourself to view it.
That's real time, versus something like a blog (except a live blog of course)
where the information is not time sensitive or about something that happened
that second.

Pages like Twitter or Gmail use AJAX to retrieve new information from a server
without refreshing your whole page. Twitter does do this (or did, at least
when viewing search results and it would tell you how many new results there
have been since the last page refresh). The polling here I assume happens in a
simple loop that probably iterates every set number of seconds.

------
kallistec
Right. "Realtime" is a bit overloaded of a term, and you've identified the two
main interpretations.

The realtime data concept has a bunch of buzz right now and is predicted by
some to replace "web 2.0" as the most overused buzzword. The optimistic view
of this trend is that always-on internet and smart phones are reaching a
critical mass so that there's a demand to know what's happening right now.

In terms of web technologies, the standard implementation for "realtime"
updates is to use timers to periodically make AJAX requests. Inefficient,
sure, but you can minimize this. Look at sending JSON and using javascript in
the browser to add the data to the page as HTML, optimizing database/datastore
queries used to serve update requests, and using "lite" versions of your
framework if possible (Rails, for example, provides "Rails Metal" for this
type of thing).

If you're interested in alternatives to polling, look at ReverseHTTP or comet.
Pubsubhubbub looks promising in the web resource space.

