11 June, 2009

Startup City

Last Saturday i attended Startup city at Nimhans convention centre. It was second time i attended. These were few companies that i found interesting.

MedFlo Telematics: They Develop telemedicine kiosks. This project is already started and they deployed few kiosks in rural areas of Kerala.
Vidteq: They show driving directions. Instead of showing just maps they show actual video directions. Among the companies that participated in conference, this is the only service i used before. Though this is a very good thought, i find streaming a bit sluggish. They should provide even more options like controlling the stream speed etc to make it more usable. This is simplified version of Google street maps.
Iken Solutions: They developed recommendation engine for different domains.
Latlong: Its a location based service. You can get the results by SMSing or by installing their custom software. Since GPS devices are not prevalent in India they maped tower information to location.
Ennovasys: Provides supply chain management solutions. Companies can directly track their inventory directly using their software and it is integrated with Google maps.
Panini: They developed Indian language keypad for mobiles. They did statical language modeling to predict the word you are typing. Accordingly they change the characters in the keypad. So you can type faster. You can send the text throughSMS and they even compress the text while sending. So you can send many characters in one SMS. The main disadvantage with this software is, the same software should present on other mobile to read.

29 May, 2009

Market oriented programming

Apart from CS , the other subject I like the most is Economics . Why economics ? The interesting aspect of economics is, we convert each and every problem into problem of 2 variables namely supply and demand . Thus the complexity of analyzing more variables is reduced to that of 2. I liked the simplicity of this kind of analysis.

Basic principles of economics can be applied to any subject that deals with analyzing the interaction between limited set of resources and players. Economics' analysis explains why certain behavior is happening and how it ought to have happened etc. If the behavior is not what it ought to have been then we change the rules to bring it back to intended one.

Examples of where economics have been applied ( from Wiki ), Economic analysis is applied throughout society, in business, finance and government, but also in crime, education, the family, health, law, politics, religion, social institutions, war, and science.

When I was doing my masters I had given talk on Market Oriented Programming. I am sharing the slides. In this talk I had taken few examples where economic analysis had been applied to computer science problems. As I said earlier, economic analysis is best suited for analyzing Resource allocation problems. We face this problem quite often in computer science. For example,

1. How to allocate the storage space to users. (You can say fixed, but is it really good ?. There are many who are not using any allocated spaces and others who always wants more space. How to draw the line here ?)
2. Load balancing
3. Adsense. Perhaps this is the best example of how the theory of economics can be applied to a CS problem.
4. Facebook is planning a feature where you will buy some credits and give that credits to friends notifications and posts. More on it http://news.cnet.com/8301-17852_3-10212452-71.html

Adsense.

I haven't explained about it in my slides. But I wanted to give overview of it. The problem for Google is ( Here we are taking Google as an example and its true with any other search engine), what kind of Ads it has to show on a page ?. What ever Google do should satisfy three entities, 1. User, 2. Advertiser and 3. Google themselves. So, it has to achieve the following objectives.
  1. It should show relevant Ads to user. If it is not showing relevant ads then users find it less interesting and nobody look at them and it leads to Ad Blindness. Eventually users won't follow ads.
  2. It should satisfy the advertiser. Advertiser should get as many users as possible to their site through the Ads. This is possible only when it's Ad is shown at right place and at right time. For this, each advertiser bids for certain ad terms (Its like describing what Ad they are going to show ) and pays amount to Google based on users visiting the advertisers page. (i.e how many users actually clicked the Ad) Potentially there will be many advertisers, to maximize their chance of showing the ad they have to bid more.
  3. Google should maximize its revenue from above transactions.

How Google achieved these objectives ?
  • 1 and 2 are of related and assumed Google did its best to show relevant Ads. :)
  • 3. Google should maximize its revenue from advertisers. This is done by introducing competition among advertisers. Google uses a variant of vickrey auction (In vickrey auction, highest bidder wins, but the price paid by him is the second-highest bid). Since there will be few Ad places Google should find a way to choose few from these advertisers. ( For example, if you search in google, in your right hand side at the most google shows 8 Ads. So here we have only 8 Ad places or slots )

Whatever mechanism Google chooses,it should not have any side effects. For example, lets assume that Google chooses the advertisers based on the bid amount. What is the problem with this approach ?. Lets see whether it achieves our objectives.
  1. Users should get relevant Ads: This objective may or may not be achieved and it depends on honesty of advertisers. For example, when a user is searching for BMW cars, he should not get advertisements about Ferrari, Toyota cars. This is possible if Ferrari and Toyota bids for BMW keyword. So advertisers may lie and Google should consider it. Second reason why advertisers can lie is, remember they pay only on number of clicks, so advertisers may bid for many irrelevant terms for high price. They do this to increase the brand awareness. Since they are bidding high, their Ads will always show up and users will not click these Ads because these are irrelevant Ads and eventually advertisers will not pay money to Google. The Advantage advertisers get is they show their Ads on lot of pages by paying little money. If Google is not showing relevant Ads, as discussed, it will increase the Ad revenue.
  2. Advertisers should get users to their page: Again this may or may not be achieved for the same above reasons.
  3. Google should maximize its revenue: At the first glance it appears that google got maximum revenue but its not true. Since it may not be showing relevant Ads, it increases the Ad Blindness and looses its revenue. It may be good for short term and definitely not good for long term. Yahoo used the above approach for long time and eBay exploited it. (I got this info from Prabhakar Raghavan's Talk)
Actually Google uses combination of Quality score and Bid amount. Quality score is again combination of CTR (Click through ratio and it is ratio of number of clicks of Ad to number of times they showed the ad), Relevance and Landing page quality. See below YouTube video to know more about Google Ad auction. By using Quality score google eliminates the above problems and still making lot of money :) .

In economics Mechanism design is the study of designing rules of a game or system to achieve a specific outcome, even though each agent may be self-interested. These systems are formally analyzed using Game theory.


References.

Introduction to the Google Ad Auction ( http://www.youtube.com/watch?v=K7l0a2PVhPQ )
http://adsense.blogspot.com/
Example of how bittorrent client uses mechanism design to improve file sharing. http://torrentfreak.com/bittorrent-to-be-pimped-by-nobel-prize-071019/
http://en.wikipedia.org/wiki/Economics
http://en.wikipedia.org/wiki/Resource_Allocation
http://en.wikipedia.org/wiki/Supply_%26_Demand
http://en.wikipedia.org/wiki/Ad_blindness
http://en.wikipedia.org/wiki/Vickrey_auction
http://en.wikipedia.org/wiki/Mechanism_design
http://en.wikipedia.org/wiki/Game_theory
http://plato.stanford.edu/entries/game-theory/

Slides

7 April, 2009

Appsd

As we know, one of the great innovations of IPhone is Appstore . It really made finding and installing apps easier. As of now there are more than 30,000 (Its 30444 while i am writing this article) iphone apps available in appstore. Installing an app is just a click away. Problem is finding apps. They solved this problem partially, by grouping the apps into categories or you can search for an app (IPhone search is getting improved greatly. It is definitely far better than couple of months back.) Both of these have some difficult. Lets say, some how you find an app and before buying the app you wanted to compare the similar apps. Now there is no straight forward way to do it. With appsd we tried to solve this problem.

The problem I faced frequently is, whenever I wanted to buy an app, I wanted to know the feedback and alternative applications to try out. Also whether there is any free version of that app available and other such information. I can read the comments for feedback but i find misleading many times.

Appsd tries to solve these problems. From last couple of months i have been working on the this project. (I work mostly an hour or two hours on this. That's the maximum time i can devote :) ) Appsd is a tool that lets you find related iphone applications and related articles. It achieves this by grouping all the similar applications together. The similarity between the apps is found by comparing the title and description. (designed as vector space model)

Related articles are found by using Yahoo boss API. We do normal search for that application and then we filter the results based on website. We maintain list of 45 websites and we show reveiws only from these websites. We do filter some of the results from these sites if we think they are not appropriate.

Site is hosted in Google Appengine. The main drawbacks is search. Appengine doesn't have sophisticated support for search. Now only games are present. I will upload rest of the apps in a week time. The name appsd came from apps + deamon.


http://www.appsd.com/images/defaultpic.gif


References

http://www.amazon.com/Programming-Collective-Intelligence-Building-Applications/dp/0596529325 (Excellent book)
http://numpy.scipy.org/ (It can easily increase python speed by 7x)
http://code.google.com/p/scipy-cluster/
http://jquery.com/
http://www.blueprintcss.org/
http://plugins.jquery.com/project/corners
http://zooie.wordpress.com/2009/01/15/twitter-boss-real-time-search/
http://developer.yahoo.com/search/boss/boss_guide/Web_Search.html
http://www.anujgakhar.com/2009/03/16/using-jquery-to-load-alternate-images

13 February, 2009

1234567890 .....

Today unix epoch time is going to cross 1234567890 !!. Track it in http://www.coolepochcountdown.com/

24 April, 2008

Facebook Chat

Today Facebook released chat that looks similar to Gmail. I have written an article about how chat works in Gmail. Facebook chat is also based on HTTP but they are not using any hacks like Gmail chat. Here is how it works.

Check it out my previous article on why chat in HTTP is challenging. Facebook's approach for this straight forward. They always opens one connection to Facebook server and when ever you receive a chat message from your friend they show it in chat window or in case of timeout they open one more connection immediately and waits for response.

Client A -----> Server -----> Client B

Fig 1.

For example, suppose A sends a message to B. Since A can't direct sent the message to B it has to post that message to Facebook Server and which inturn should send the same to B. Here Both clients A and B opens a connection (This is an Ajax request) to Server and waits for Server response. Server will respond only if it has any outstanding message pending to that client. (In our example message from A to B). Since there is already a connection B opens to server it will immediately get the message A had sent. For instance if there is no message, the opened connection will be timed out (this is 300sec for Firefox. You can change the default value by changing network.http.keep-alive.timeout ) and Facebook again opens a new connection (Ajax request) and above process repeats.

The URL that Facebook opens to Server will looks in follaowing format.

http://0.channel10.facebook.com/x//true/p_=803&

Check it out my previous article. How Gmail Works !.

21 April, 2008

World's Most Innovative Companies

BusinessWeek released World's Most Innovative Companies list and interestingly two Indian companies Tata and Reliance Industries are there. Tata is in 6th position and Reliance Industries is in 19th position.

20 April, 2008

How Gmail Works !

This article is not about how Gmail works exactly but i find some features of Gmail interesting and i tried to understand how they work.

Gmail Chat.

Gmail nicely integrated the chat functionality in browser itself. Technically its challenging to implement it. At a first glance we will not understand why it is ?. Lets see why this is tricky.

In HTTP, client initiates connection to the server and server sends the data client had requested. Its pull mechanism i.e client always initiates the connection and server responds with appropriate data. In case of chatting the communication is bidirectional. For example, lets assume Client A and Client B are chatting in Gmail. If Client A sends a message to Client B, Since Client A can't connect to Client B directly (because both are HTTP clients) it goes via intermediate Google servers. So the flow is,

Client A -----> Server -----> Client B

Fig 1.

To happen above communication, Server have to initiate connection to B. Here is where the problem lies. How can server initiate a connection to client ?.

HTTP 1.1 introduced chunked tranfer encoding. This is usually used to transfer a file of unknown length or more specifically stream data to client from server (Remember this is after the client initiates the connection). Gmail chat nicely exploits this feature. What they do is, they put a hidden iframe (Ex. Client B) and sets its source URL to Google server. Whenever data is available at the server end (In our example Client A had sent the data to server which it is supposed to transfer to B) it sends the data to client (Client B) . Since this connection will never end the server can send the updates (Whenever it receives a message from Client A ) to client (Client B).

Following is the HTTP Request and Response captured using Wireshark that is part of the chat conversation.

HTTP Request
GET /mail/channel/bind?at=xn3j322i8pev1stfx4irewnfnlbt1l&VER=6&it=141&RID=rpc&SID=14AEDGD25D064E14&CI=0&AID=60&TYPE=xmlhttp&zx=en6uu1u8grtt&t=1 HTTP/1.1
Host: mail.google.com
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.12) Gecko/20080201 Firefox/2.0.0.12
Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5
Accept-Language: en-us,en;q=0.5
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 300
Connection: close
Referer: http://mail.google.com/mail/?ui=2&view=js&name=js&ids=vsemtnrembkr
Cookie:

HTTP Response
HTTP/1.1 200 OK
Cache-control: no-cache
Pragma: no-cache
Content-Type: text/plain; charset=utf-8
ETag:
Transfer-Encoding: chunked
Server: GFE/1.3
Date: Thu, 03 Apr 2008 03:39:58 GMT
Connection: Close

15

18
[[61,["noop"]
]
]

There are several server push technologies available. Detail explanation about this can be found in the following articles.

http://cometdaily.com/2007/12/11/the-future-of-comet-part-1-comet-today/
http://cometdaily.com/2008/02/22/comet-and-http-pipelining/
http://cometdaily.com/2007/12/18/latency-long-polling-vs-forever-frame/
http://en.wikipedia.org/wiki/Chunked_transfer_encoding
http://en.wikipedia.org/wiki/HTTP

File upload.

They use again hidden iframe hack. Check this blog for more information.

Mail Threading.

I initially thought it is a bit complex algorithm. Why i thought was, when some one sends reply from different email server like yahoo how they say it is reply to my previous mail. Although it seems obvious if we see the SMTP headers you will not find any reference to previous mail. But it turns out to be very dumb algorithm. I mean it is mostly based on the subject of the mail. If you send a mail to A and if he replied with different subject Gmail will not show it as a part of the same conversation. If A sends a completely different mail (not to reply to your mail) and if he uses a subject that is similar to one you send to him earlier, gmail shows it as a part of the previous mail conversion.

Apart from Gmail implementation. If you want to group the mail archive thread wise i.e like Gmail style conversation you can refer the Threading algorithm which was used in Netscape mail.