Wednesday, December 14, 2011

The Google File System

The Google File System departs traditional file system design in that it cares more about throughput than design. It has a unique design in which all writes are append-only and most reads are sequential. It's actually amazing how far you can get with just these operations. There is a single master node which has all the metadata and it forwards clients to the proper chunkserver. From then on, the client interacts with the chunkserver to get the actual data.

The system is obviously designed to work with big data and I think it fits the Google workload very well. Obviously for archiving purposes, append-only write is perfectly fine, and it's the tradeoff that Google makes to have a distributed, scalable file system. Given the influence that GFS has had on HDFS today, it's obvious that GFS has had an impact in the field, and as more file systems are built to target specific workloads, GFS will have had a lasting impact on the field.

No comments:

Post a Comment