Wednesday, 17 June 2015

Hadoop Basic Terms

Before knowing about Hadoop, we need to be aware about some basic terminologies: -

1) Node: - A computational resource which participates in a computational job by performing some
    computational tasks within a network is called node.

2) Cluster: - A group of nodes connected to each other through a common and dedicated network to form a distributed system is called cluster.

3) Hadoop node: - Any computational node which has both hadoop distributed file system (HDFS) and Map Reduce (MR) components in it is called as hadoop node.

4) Hadoop cluster: - A group of hadoop nodes connected to each other through a common and dedicated network is called as Hadoop Cluster. 

5) File System: - A file system is the underlying structure a computer uses to organize data on a hard disk.

Hadoop has two major components: -

1) HDFS

2) MAPREDUCE

    
                                                                        
                                                                     1) HDFS

HDFS stands for Hadoop Distributed file system, it is a shared file system used by by all the slave nodes of Hadoop distributed system. It is meant for Storage.

                                                                     2) MAPREDUCE

This component is used for data analysis. This component is implemented in Java programming language.

                                                                3) HADOOP CLUSTER


As seen in the cluster above, every computational node including the master node has two components: -
1) HDFS
2) MAPREDUCE

The main computational job is divided into a no of computational tasks and each task is handled by a slave node. In a hadoop cluster, the master node is called name node and all the slave nodes are called as data nodes. The name node is connected to all its data nodes through a common and dedicated network like VPN.

No comments:

Post a Comment