Thursday, 18 June 2015

DIFFERENCES BETWEEN DISTRIBUTED COMPUTING AND HADOOP


                   Distributed Computing
                       Hadoop
1)      In Grid computing, date moves towards business logic i.e. data stored in a storage area network is accessed by a data node which has business logic to process that data.
      If the volume of data is very large then, the data node remains idle for a long time till it receives all data. If a data node fails while processing data, then then whole time taking retrieval process has to start again.

            In Hadoop, we have a concept called Data  Locality. Here, the business logic is moving  towards data i.e. if a particular data node fails during data processing, then a copy of its data will be available on some other node in the cluster and the same business logic operates on that copy of data.Which results in proper network  optimization.
2)      The programmer has to handle data flow along with data analysis. For ex: socket programming.                                    
.            Programmer needs to take care of only data  analysis, Data flow is handled by Hadoop  framework.
3)      Programmer has to write the code for handling node failure.
           Programmer need not handle node failure, the  name node i.e. master node will handle node  failure.
4)      Here, when a data node completes its computational task. It returns the output to the master node but does not maintain a copy of that output in its LFS (Local file System). If data lost in transit, then the data node has to again redo the task for the same request.
            In Hadoop, we have a concept called Data    Localization i.e. every Data node maintains a  copy of the output emitted in its LFS. So even  if data lost in transit, then the same output can  be sent again for the same request.
                                                                      

 To understand the first difference you can see the images given below: -

                                      1. Distributed Computing

                                      2. Hadoop Distributed System.

Thus, Hadoop is an open source, distributed ,batch-processing and fault-tolerant system used for storing and processing big data.

Batch processing systems execute a series of programs called jobs without any human intervention. these system are termed as batch processing because they collect input data in the form of batches or set of records and each batch is considered as a single unit of input data. The output is another batch which can be used for further computation.

No comments:

Post a Comment