April 24, 2008

Facebook Chat

Today Facebook released chat that looks similar to Gmail. I have written an article about how chat works in Gmail. Facebook chat is also based on HTTP but they are not using any hacks like Gmail chat. Here is how it works.

Check it out my previous article on why chat in HTTP is challenging. Facebook's approach for this straight forward. They always opens one connection to Facebook server and when ever you receive a chat message from your friend they show it in chat window or in case of timeout they open one more connection immediately and waits for response.

Client A -----> Server -----> Client B

Fig 1.

For example, suppose A sends a message to B. Since A can't direct sent the message to B it has to post that message to Facebook Server and which inturn should send the same to B. Here Both clients A and B opens a connection (This is an Ajax request) to Server and waits for Server response. Server will respond only if it has any outstanding message pending to that client. (In our example message from A to B). Since there is already a connection B opens to server it will immediately get the message A had sent. For instance if there is no message, the opened connection will be timed out (this is 300sec for Firefox. You can change the default value by changing network.http.keep-alive.timeout ) and Facebook again opens a new connection (Ajax request) and above process repeats.

The URL that Facebook opens to Server will looks in follaowing format.

http://0.channel10.facebook.com/x//true/p_=803&

Check it out my previous article. How Gmail Works !.

April 21, 2008

World's Most Innovative Companies

BusinessWeek released World's Most Innovative Companies list and interestingly two Indian companies Tata and Reliance Industries are there. Tata is in 6th position and Reliance Industries is in 19th position.

April 20, 2008

How Gmail Works !

This article is not about how Gmail works exactly but i find some features of Gmail interesting and i tried to understand how they work.

Gmail Chat.

Gmail nicely integrated the chat functionality in browser itself. Technically its challenging to implement it. At a first glance we will not understand why it is ?. Lets see why this is tricky.

In HTTP, client initiates connection to the server and server sends the data client had requested. Its pull mechanism i.e client always initiates the connection and server responds with appropriate data. In case of chatting the communication is bidirectional. For example, lets assume Client A and Client B are chatting in Gmail. If Client A sends a message to Client B, Since Client A can't connect to Client B directly (because both are HTTP clients) it goes via intermediate Google servers. So the flow is,

Client A -----> Server -----> Client B

Fig 1.

To happen above communication, Server have to initiate connection to B. Here is where the problem lies. How can server initiate a connection to client ?.

HTTP 1.1 introduced chunked tranfer encoding. This is usually used to transfer a file of unknown length or more specifically stream data to client from server (Remember this is after the client initiates the connection). Gmail chat nicely exploits this feature. What they do is, they put a hidden iframe (Ex. Client B) and sets its source URL to Google server. Whenever data is available at the server end (In our example Client A had sent the data to server which it is supposed to transfer to B) it sends the data to client (Client B) . Since this connection will never end the server can send the updates (Whenever it receives a message from Client A ) to client (Client B).

Following is the HTTP Request and Response captured using Wireshark that is part of the chat conversation.

HTTP Request
GET /mail/channel/bind?at=xn3j322i8pev1stfx4irewnfnlbt1l&VER=6&it=141&RID=rpc&SID=14AEDGD25D064E14&CI=0&AID=60&TYPE=xmlhttp&zx=en6uu1u8grtt&t=1 HTTP/1.1
Host: mail.google.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: close
Referer: http://mail.google.com/mail/?ui=2&view=js&name=js&ids=vsemtnrembkr
Cookie:

HTTP Response
HTTP/1.1 200 OK
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/plain; charset=utf-8
ETag:
Transfer-Encoding: chunked
Server: GFE/1.3
Date: Thu, 03 Apr 2008 03:39:58 GMT
Connection: Close

15

18
[[61,["noop"]
]
]

There are several server push technologies available. Detail explanation about this can be found in the following articles.

http://cometdaily.com/2007/12/11/the-future-of-comet-part-1-comet-today/
http://cometdaily.com/2008/02/22/comet-and-http-pipelining/
http://cometdaily.com/2007/12/18/latency-long-polling-vs-forever-frame/
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
http://en.wikipedia.org/wiki/HTTP

File upload.

They use again hidden iframe hack. Check this blog for more information.

Mail Threading.

I initially thought it is a bit complex algorithm. Why i thought was, when some one sends reply from different email server like yahoo how they say it is reply to my previous mail. Although it seems obvious if we see the SMTP headers you will not find any reference to previous mail. But it turns out to be very dumb algorithm. I mean it is mostly based on the subject of the mail. If you send a mail to A and if he replied with different subject Gmail will not show it as a part of the same conversation. If A sends a completely different mail (not to reply to your mail) and if he uses a subject that is similar to one you send to him earlier, gmail shows it as a part of the previous mail conversion.

Apart from Gmail implementation. If you want to group the mail archive thread wise i.e like Gmail style conversation you can refer the Threading algorithm which was used in Netscape mail.