Wednesday, August 31, 2011

Above the Clouds: A Berkeley View of Cloud Computing

Due to all the media buzz about cloud computing, it seems noone really knows what cloud computing is really about, and it is the aim of this paper to fix that. The paper serves as a really great introduction to the cloud computing research space. The paper lists the advantages of cloud computing over simple SaaS, defines terms for the various parts of the cloud computing model, and poses some prelimenary challenges that need to be solved for cloud computing to be truly successful.

The authors do a really great job of detailing the economics around cloud computing. The paper surveys the costs, benefits, and models of current cloud providers. It also talks about the problems that accompany to owning your own datacenter: underprovisioning and overprovisioning. Clear concrete examples, regarding rapid peaks and troughs, show that cloud computing is really a much better model for most web services, due the elasticity of utility computing services.


The paper also provides an equation which is supposed to tell you whether you should switch to cloud computing or not. Although the equation does its job, I do think that it doesn't really express one of the key economic reason as to why cloud computing is superior: it doesn't show that cloud computing is able to easily handle spikes in traffic due to its elasticity.

In addition, although the paper does mention the problem with data confidentiality and the cloud, one of the big reasons why a lot of companies refuse to use the cloud, the paper seems to sort of dismiss it by just saying encrypt the data to the cloud you want to run in at. However, then the data is exposed during execution time while the service is running on the cloud. Some of these companies would like to keep the data secret during runtime as well because the code is running on a different company's machine. Although TPMs sort of solve this problem, they're not foolproof and increase the cost of commodity servers in the cloud, so cloud providers may refuse to get them.

The paper also talks about possibly using flash for memory and storage to aid the problem with I/O interference on commodity machines. However, flash, is a relatively new technology and is expensive. Because the cloud providers have a much bigger incentive to keep their servers cheap, it's unlikely that the flash will be bought for each server.

The paper suggests the way to solve availability of service is to use multiple cloud providers. However, there aren't very many cloud providers, and all of them have different environments. This makes it hard for the developers to write services that will utilize multiple cloud providers, and so availability of service is not quite a solved problem.

There also seems to be a lack of emphasis on the legal issues surrounding cloud computing. For example, if I use EC2 units to provide a SaaS that will do taxes and EC2 goes down right before tax day, who is liable? Shouldn't I, as the appwriter, be able to deflect any lawsuits towards Amazon as I had no fault? This becomes an even bigger mess when I'm in a different country from the cloud provider.