HadoopDevelopment: DIFFERENCES BETWEEN DISTRIBUTED COMPUTING AND HADOOP

Distributed Computing	Hadoop
1) In Grid computing, date moves towards business logic i.e. data stored in a storage area network is accessed by a data node which has business logic to process that data. If the volume of data is very large then, the data node remains idle for a long time till it receives all data. If a data node fails while processing data, then then whole time taking retrieval process has to start again.	In Hadoop, we have a concept called Data Locality. Here, the business logic is moving towards data i.e. if a particular data node fails during data processing, then a copy of its data will be available on some other node in the cluster and the same business logic operates on that copy of data.Which results in proper network optimization.
2) The programmer has to handle data flow along with data analysis. For ex: socket programming.	. Programmer needs to take care of only data analysis, Data flow is handled by Hadoop framework.
3) Programmer has to write the code for handling node failure.	Programmer need not handle node failure, the name node i.e. master node will handle node failure.
4) Here, when a data node completes its computational task. It returns the output to the master node but does not maintain a copy of that output in its LFS (Local file System). If data lost in transit, then the data node has to again redo the task for the same request.	In Hadoop, we have a concept called Data Localization i.e. every Data node maintains a copy of the output emitted in its LFS. So even if data lost in transit, then the same output can be sent again for the same request.

To understand the first difference you can see the images given below: -

1. Distributed Computing

2. Hadoop Distributed System.

Thus, Hadoop is an open source, distributed ,batch-processing and fault-tolerant system used for storing and processing big data.

Batch processing systems execute a series of programs called jobs without any human intervention. these system are termed as batch processing because they collect input data in the form of batches or set of records and each batch is considered as a single unit of input data. The output is another batch which can be used for further computation.

Thursday, 18 June 2015

DIFFERENCES BETWEEN DISTRIBUTED COMPUTING AND HADOOP

No comments:

Post a Comment

Blog Archive