Wednesday, 13 May 2015

Hadoop Daemons Generated Logs

Apache Hadoop’s jobtracker, namenode, secondary namenode, datanode, and tasktracker all generate logs. That includes logs from each of the daemons under normal operation, as well as configuration logs, statistics, standard error, standard out, and internal diagnostic information. Many  users aren’t entirely sure what the differences are among these logs, how to analyze them, or even how to handle simple administrative tasks like log rotation.  This blog post describes each category of log, and then details where they can be found for each Hadoop component.
The log categories are:
·         Hadoop Daemon Logs
These logs are created by the Hadoop daemons, and exist on all machines running at least one Hadoop daemon. Some of the files end with .log, and others end with .out. The .out files are only written to when daemons are starting. After daemons have started successfully, the .out files are truncated. By contrasts, all log messages can be found in the .log files, including the daemon start-up messages that are sent to the .out files. There is a .log and .out file for each daemon running on a machine. When the namenode, jobtracker, and secondary namenode are running on the same machine, then there are six daemon log files: a .log and .out for the each of the three daemons.
The .log and .out file names are constructed as follows:
hadoop-<user-running-hadoop>-<daemon>-<hostname>.log
where <user-running-hadoop> is the user running the Hadoop daemons (this is always ‘hadoop’ with Cloudera’s distribution), <daemon> is the daemon these logs are associated (for example, namenode or jobtracker), and <hostname> is the hostname of the machine on which the daemons are running.
For example:
hadoop-hadoop-datanode-ip-10-251-30-53.log
By default, the .log files are rotated daily by log4j. This is configurable with /etc/hadoop/conf/log4j.properties. Administrators of a Hadoop cluster should review these logs regularly to look for cluster-specific errors and warnings that might have to do with daemons running incorrectly.  Note that the namenode and secondarynamenode logs should not be deleted more frequently than fs.checkpoint.period, so in the event of a secondarynamenode edits log compaction failure, logs from the namenode and secondarynamenode will be available for diagnostics.
These logs grow slowly when the cluster is idle. When jobs are running, they grow very rapidly. Some problems create considerably more log entries, but some problems only create a few infrequent messages. For example, if the jobtracker can’t connect to the namenode, the jobtracker daemon logs explode with the same error (something like “Retrying connecting to namenode [..]“). Lots of log entries here does not necessarily mean that there is a problem: you have to search through these logs to look for a problem.
·         Job Configuration XML
The job configuration XML logs are created by the jobtracker. The jobtracker creates a .xml file for every job that runs on the cluster. These logs are stored in two places:/var/log/hadoop and /var/log/hadoop/history. The XML file describes the job configuration.
The /hadoop file names are constructed as follows:
job_<job_ID>_conf.xml
For example:
job_200908190029_0001_conf.xml
The /hadoop/history file names are constructed as follows:
<hostname>_<epoch-of-jobtracker-start>_<job-id>_conf.xml
where <hostname> is the hostname of the machine on which the jobtracker is running, <epoch-of-jobtracker-start> is the number of milliseconds that had elapsed since Unix Epoch when the jobtracker daemon was started, and <job-id> is the job ID. For example:
ec2-72-44-61-184.compute-1.amazonaws.com_1250641772616_job_200908190029_0001_conf.xml
These logs are not rotated. These files may be more interesting to developers than system administrators, because their contents are job-specific. You can clear these logs periodically without affecting Hadoop. However, consider archiving the logs if they are of interest in the job-development process. Make sure you do not move or delete a file that is currently being written to by a running job.
Individual job configuration logs are created for each job that is submitted to the cluster. Each log file will be more or less the same size for each job.
·         Job Statistics
These logs are created by the jobtracker. The jobtracker runtime statistics from jobs to these files. Those statistics include task attempts, time spent shuffling, input splits given to task attempts, start times of tasks attempts and other information.
The statistics files are named:
<hostname>_<epoch-of-jobtracker-start>_<job-id>_<job-name>
where <hostname> is the hostname of the machine creating these logs, <epoch-of-jobtracker-start> is the number of milliseconds that had elapsed since Unix Epoch when the jobtracker daemon was started, <job-id> is the job ID, and <job-name> is the name of the job.
For example:
ec2-72-44-61-184.compute-1.amazonaws.com_1250641772616_job_200908190029_0002_hadoop_test-mini-mr
These logs are not rotated.  You can clear these logs periodically without affecting Hadoop. However, consider archiving the logs if they are of interest in the job development process. Make sure you do not move or delete a file that is being written to by a running job.
Individual statistics logs are created for each job that is submitted to the cluster. The size of each log file varies. Jobs with more tasks produce larger files.
·         Standard Error for a particular task attempt
These logs are created by each tasktracker. They contain information written to standard error (stderr) captured when a task attempt is run. These logs can be used for debugging.  For example, a developer can include System.err.println(“some useful information”) calls in the job code. The output will appear in the standard error files.
The parent directory name for these logs is constructed as follows:
/var/log/hadoop/userlogs/attempt_<job-id>_<map-or-reduce>_<attempt-id>
where <job-id> is the ID of the job that this attempt is doing work for, <map-or-reduce> is either “m” if the task attempt was a mapper, or “r” if the task attempt was a reducer, and <attempt-id> is the ID of the task attempt.
For example:
/var/log/hadoop/userlogs/attempt_200908190029_0001_m_000001_0
These logs are rotated according to the mapred.userlog.retain.hours property. You can clear these logs periodically without affecting Hadoop. However, consider archiving the logs if they are of interest in the job development process. Make sure you do not move or delete a file that is being written to by a running job.
The size of these log files depends on the amount of stderr calls used in the job code.
·         Standard Out for a particular task attempt
These logs are very similar to the standard error logs, except they capture stdout instead of stderr. File size depends entirely on how much data the task writes to stdout.
·         log4j informational messages from within the task process
These logs contains anything that log4j writes from within the task process. This includes some Hadoop internal diagnostic info. If the job’s mapper or reducer implementations include call such as LOG.info(), then that output also get written here. Messages can include information about the task, such as how big its record buffer was or how many reduce tasks there are. The size of these log files depends on the number of log4j calls used in the job code.
These logs are rotated according to the mapred.userlog.retain.hours property. You can clear these logs periodically without affecting Hadoop. However, consider archiving the logs if they are of interest in the job development process. Make sure you do not move or delete a file that is being written to by a running job.
The Hadoop components produce the following logs when installing Hadoop from CDH RPM or DEB packages:
·         On the jobtracker:
1
2
3
4
5
6
/var/log/hadoop
              /hadoop-* =&gt; daemon logs
              /job_*.xml =&gt; job configuration XML logs
              /history
                     /*_conf.xml =&gt; job configuration logs
                     &lt; everything else &gt; =&gt; job statistics logs
·         On the namenode:
1
2
/var/log/hadoop
              /hadoop-* =&gt; daemon logs
·         On the secondary namenode:
1
2
/var/log/hadoop
              /hadoop-* =&gt; daemon logs
·         On the datanode:
1
2
/var/log/hadoop
              /hadoop-* =&gt; daemon logs
·         On the tasktracker:
1
2
3
4
5
6
7
/var/log/hadoop
              /hadoop-* =&gt; daemon logs
              /userlogs
                      /attempt_*
                               /stderr =&gt; standard error logs
                               /stdout =&gt; standard out logs
                               /syslog =&gt; log4j logs

It’s probably clear now that Hadoop generates plenty of logs, each with a different purpose and audience.  I hope this post will help you debug your MapReduce jobs and Hadoop cluster.

No comments:

Post a Comment