Google File Systems (GFS) and Big Data Technologies...

Topic > Google File Systems (GFS) and Big Data Technologies...

Google File Systems (GFS) is developed by Google to meet the rapidly growing demand of Google's data processing needs. On the other hand, Hadoop Distributed File Systems (HDFS) developed by Yahoo and updated by Apache is an open source framework for using different clients with different needs. Although Google File Systems (GFS) and Hadoop Distributed File Systems (GFS) are distributed file systems developed by different vendors, they were designed to meet the following objectives: They should be able to run on inexpensive hardware without failure.  Systems should be able to handle large files efficiently.  The system should be scalable, have high performance and should be reliable.  The system should be able to support large streaming reads and also support large concurrent appends to the same file. The common and distinctive features of Google File Systems (GFS) and Hadoop Distributed File Systems (GFS) are as follows: The contents of the GFS file are divided into 64 MB blocks, each of which has 64 KB blocks. A block is identified by its handle called the block handle, and each block is replicated three times by default. Each block in a block consists of a 32-bit checksum. The contents of the HDFS file are divided into 128 MB blocks. A node called namenode contains the replica blocks as two files, one for data and another for checksum and stamp generation. On the client, GFS accepts the request for a read operation and sends the request to the master, the master then generates a lock handle and replication path and sends it to the client. The client uses this information to obtain the requested data from the replicas. Replicas are split into primaries and secondaries when performing a system write...... middle of paper ...... operation. GFS and HDFS have a special feature called snapshot thanks to which you can quickly make a copy at any time. This is similar to the copy-on-write functionality of Andrew File Systems (AFS). GFS aims to reduce real-time operations to large batches, and HDFS aims to develop a true secondary namenode: the Facebook Avatar node.REFERENCES1. Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler, “The Hadoop Distributed File System,” http://storageconference.org/2010/Papers/MSST/Shvachko.pdf2. Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung, “The Google File System,” http://static.googleusercontent.com/external_content/untrusted_dlcp/research.google.com/en/us/archive/gfs-sosp2003.pdf3 . Dhruba Borthakur, “The Hadoop Distributed File System: Architecture and Design,” http://hadoop.apache.org/docs/r0.18.0/hdfs_design.pdf