This article is not about how Gmail works exactly but i find some features of Gmail interesting and i tried to understand how they work.
Gmail Chat.
Gmail nicely integrated the chat functionality in browser itself. Technically its challenging to implement it. At a first glance we will not understand why it is ?. Lets see why this is tricky.
In
HTTP, client initiates connection to the server and server sends the data client had requested. Its pull mechanism i.e client always initiates the connection and server responds with appropriate data. In case of chatting the communication is bidirectional. For example, lets assume Client A and Client B are chatting in Gmail. If Client A sends a message to Client B, Since Client A can't connect to Client B directly (because both are HTTP clients) it goes via intermediate Google servers. So the flow is,
Client A -----> Server -----> Client B
Fig 1.
To happen above communication, Server have to initiate connection to B. Here is where the problem lies. How can server initiate a connection to client ?.
HTTP 1.1 introduced
chunked tranfer encoding. This is usually used to transfer a file of unknown length or more specifically stream data to client from server (Remember this is after the client initiates the connection). Gmail chat nicely exploits this feature. What they do is, they put a hidden iframe (Ex. Client B) and sets its source URL to Google server. Whenever data is available at the server end (In our example Client A had sent the data to server which it is supposed to transfer to B) it sends the data to client (Client B) . Since this connection will never end the server can send the updates (Whenever it receives a message from Client A ) to client (Client B).
Following is the HTTP Request and Response captured using Wireshark that is part of the chat conversation.
HTTP RequestGET /mail/channel/bind?at=xn3j322i8pev1stfx4irewnfnlbt1l&VER=6&it=141&RID=rpc&SID=14AEDGD25D064E14&CI=0&AID=60&TYPE=xmlhttp&zx=en6uu1u8grtt&t=1 HTTP/1.1
Host: mail.google.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: close
Referer: http://mail.google.com/mail/?ui=2&view=js&name=js&ids=vsemtnrembkr
Cookie:
HTTP Response
HTTP/1.1 200 OK
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/plain; charset=utf-8
ETag:
Transfer-Encoding: chunked
Server: GFE/1.3
Date: Thu, 03 Apr 2008 03:39:58 GMT
Connection: Close
15
18
[[61,["noop"]
]
]
There are several server push technologies available. Detail explanation about this can be found in the following articles.
http://cometdaily.com/2007/12/11/the-future-of-comet-part-1-comet-today/
http://cometdaily.com/2008/02/22/comet-and-http-pipelining/
http://cometdaily.com/2007/12/18/latency-long-polling-vs-forever-frame/
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
http://en.wikipedia.org/wiki/HTTP
File upload.
They use again hidden iframe hack.
Check this blog for more information.
Mail Threading.
I initially thought it is a bit complex algorithm. Why i thought was, when some one sends reply from different email server like yahoo how they say it is reply to my previous mail. Although it seems obvious if we see the SMTP headers you will not find any reference to previous mail. But it turns out to be very dumb algorithm. I mean it is mostly based on the subject of the mail. If you send a mail to A and if he replied with different subject Gmail will not show it as a part of the same conversation. If A sends a completely different mail (not to reply to your mail) and if he uses a subject that is similar to one you send to him earlier, gmail shows it as a part of the previous mail conversion.
Apart from Gmail implementation. If you want to group the mail archive thread wise i.e like Gmail style conversation you can refer the
Threading algorithm which was used in Netscape mail.