April 20, 2008

How Gmail Works !

This article is not about how Gmail works exactly but i find some features of Gmail interesting and i tried to understand how they work.

Gmail Chat.

Gmail nicely integrated the chat functionality in browser itself. Technically its challenging to implement it. At a first glance we will not understand why it is ?. Lets see why this is tricky.

In HTTP, client initiates connection to the server and server sends the data client had requested. Its pull mechanism i.e client always initiates the connection and server responds with appropriate data. In case of chatting the communication is bidirectional. For example, lets assume Client A and Client B are chatting in Gmail. If Client A sends a message to Client B, Since Client A can't connect to Client B directly (because both are HTTP clients) it goes via intermediate Google servers. So the flow is,

Client A -----> Server -----> Client B

Fig 1.

To happen above communication, Server have to initiate connection to B. Here is where the problem lies. How can server initiate a connection to client ?.

HTTP 1.1 introduced chunked tranfer encoding. This is usually used to transfer a file of unknown length or more specifically stream data to client from server (Remember this is after the client initiates the connection). Gmail chat nicely exploits this feature. What they do is, they put a hidden iframe (Ex. Client B) and sets its source URL to Google server. Whenever data is available at the server end (In our example Client A had sent the data to server which it is supposed to transfer to B) it sends the data to client (Client B) . Since this connection will never end the server can send the updates (Whenever it receives a message from Client A ) to client (Client B).

Following is the HTTP Request and Response captured using Wireshark that is part of the chat conversation.

HTTP Request
GET /mail/channel/bind?at=xn3j322i8pev1stfx4irewnfnlbt1l&VER=6&it=141&RID=rpc&SID=14AEDGD25D064E14&CI=0&AID=60&TYPE=xmlhttp&zx=en6uu1u8grtt&t=1 HTTP/1.1
Host: mail.google.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: close
Referer: http://mail.google.com/mail/?ui=2&view=js&name=js&ids=vsemtnrembkr
Cookie:

HTTP Response
HTTP/1.1 200 OK
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/plain; charset=utf-8
ETag:
Transfer-Encoding: chunked
Server: GFE/1.3
Date: Thu, 03 Apr 2008 03:39:58 GMT
Connection: Close

15

18
[[61,["noop"]
]
]

There are several server push technologies available. Detail explanation about this can be found in the following articles.

http://cometdaily.com/2007/12/11/the-future-of-comet-part-1-comet-today/
http://cometdaily.com/2008/02/22/comet-and-http-pipelining/
http://cometdaily.com/2007/12/18/latency-long-polling-vs-forever-frame/
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
http://en.wikipedia.org/wiki/HTTP

File upload.

They use again hidden iframe hack. Check this blog for more information.

Mail Threading.

I initially thought it is a bit complex algorithm. Why i thought was, when some one sends reply from different email server like yahoo how they say it is reply to my previous mail. Although it seems obvious if we see the SMTP headers you will not find any reference to previous mail. But it turns out to be very dumb algorithm. I mean it is mostly based on the subject of the mail. If you send a mail to A and if he replied with different subject Gmail will not show it as a part of the same conversation. If A sends a completely different mail (not to reply to your mail) and if he uses a subject that is similar to one you send to him earlier, gmail shows it as a part of the previous mail conversion.

Apart from Gmail implementation. If you want to group the mail archive thread wise i.e like Gmail style conversation you can refer the Threading algorithm which was used in Netscape mail.

6 comments:

  1. Anonymous7:57 PM

    Nice Sai! Enjoyed reading it...

    ReplyDelete
  2. Sai, are you sure about threading?

    Here's what I did. I copied the subject line from an ongoing thread in my inbox and sent to myself a new mail with the above copied subject line. According to what you have written, this new mail should attach itself to this existing thread. But I get it as a new mail in my inbox?

    ReplyDelete
  3. Hi Srikar,

    If you are using gmail to send and receive mails, i mean the receiver is also gmail user. then it won't work. gmail internally uses Message-ID (click on top right hand menu and see then click "show original" to see the message-id. In the reply mails there is "In-Reply-to" header will be there).

    ReplyDelete
  4. parthasaradhi reddy7:46 PM

    read it and enjoyed

    ReplyDelete
  5. howdy was simply checking whether you minded a remark. i like your site and the thme you picked is super. I will be back. Buy USA Gmail Accounts

    ReplyDelete
  6. I wanted to thank you for this excellent read!! I definitely loved every little bit of it. I have you bookmarked your site to check out the new stuff you post. Buy Google Mail Accounts

    ReplyDelete