Apache Hadoop’s jobtracker, namenode, secondary namenode,
datanode, and tasktracker all generate logs. That includes logs from each of
the daemons under normal operation, as well as configuration logs, statistics,
standard error, standard out, and internal diagnostic information. Many
users aren’t entirely sure what the differences are among these logs, how to
analyze them, or even how to handle simple administrative tasks like log
rotation. This blog post describes each category of log, and then details
where they can be found for each Hadoop component.
The log categories are:
·
Hadoop
Daemon Logs
These
logs are created by the Hadoop daemons, and exist on all machines running at
least one Hadoop daemon. Some of the files end with .log, and others end
with .out. The .out files are only written to when daemons are
starting. After daemons have started successfully, the .out files are
truncated. By contrasts, all log messages can be found in
the .log files, including the daemon start-up messages that are sent
to the .out files. There is a .log and .out file
for each daemon running on a machine. When the namenode, jobtracker, and
secondary namenode are running on the same machine, then there are six daemon
log files: a .log and .out for the each of the three
daemons.
The
.log and .out file names are constructed as follows:
hadoop-<user-running-hadoop>-<daemon>-<hostname>.log
where
<user-running-hadoop> is the user running the Hadoop daemons (this is
always ‘hadoop’ with Cloudera’s distribution), <daemon> is the daemon
these logs are associated (for example, namenode or jobtracker), and
<hostname> is the hostname of the machine on which the daemons are
running.
For
example:
hadoop-hadoop-datanode-ip-10-251-30-53.log
By
default, the .log files are rotated daily by log4j. This is
configurable with /etc/hadoop/conf/log4j.properties. Administrators of a
Hadoop cluster should review these logs regularly to look for cluster-specific
errors and warnings that might have to do with daemons running incorrectly.
Note that the namenode and secondarynamenode logs should not be deleted
more frequently than fs.checkpoint.period, so in the event of a
secondarynamenode edits log compaction failure, logs from the namenode and
secondarynamenode will be available for diagnostics.
These
logs grow slowly when the cluster is idle. When jobs are running, they grow
very rapidly. Some problems create considerably more log entries, but some
problems only create a few infrequent messages. For example, if the jobtracker
can’t connect to the namenode, the jobtracker daemon logs explode with the same
error (something like “Retrying connecting to namenode [..]“). Lots of log
entries here does not necessarily mean that there is a problem: you have to
search through these logs to look for a problem.
·
Job
Configuration XML
The
job configuration XML logs are created by the jobtracker. The jobtracker
creates a .xml file for every job that runs on the cluster. These
logs are stored in two
places:/var/log/hadoop and /var/log/hadoop/history. The XML file
describes the job configuration.
The /hadoop file
names are constructed as follows:
job_<job_ID>_conf.xml
For
example:
job_200908190029_0001_conf.xml
The /hadoop/history file
names are constructed as follows:
<hostname>_<epoch-of-jobtracker-start>_<job-id>_conf.xml
where
<hostname> is the hostname of the machine on which the jobtracker is
running, <epoch-of-jobtracker-start> is the number of milliseconds that
had elapsed since Unix Epoch when the jobtracker daemon was started, and
<job-id> is the job ID. For example:
ec2-72-44-61-184.compute-1.amazonaws.com_1250641772616_job_200908190029_0001_conf.xml
These
logs are not rotated. These files may be more interesting to developers than
system administrators, because their contents are job-specific. You can clear
these logs periodically without affecting Hadoop. However, consider archiving
the logs if they are of interest in the job-development process. Make sure you
do not move or delete a file that is currently being written to by a running
job.
Individual
job configuration logs are created for each job that is submitted to the
cluster. Each log file will be more or less the same size for each job.
·
Job
Statistics
These
logs are created by the jobtracker. The jobtracker runtime statistics from jobs
to these files. Those statistics include task attempts, time spent shuffling,
input splits given to task attempts, start times of tasks attempts and other
information.
The
statistics files are named:
<hostname>_<epoch-of-jobtracker-start>_<job-id>_<job-name>
where
<hostname> is the hostname of the machine creating these logs,
<epoch-of-jobtracker-start> is the number of milliseconds that had
elapsed since Unix Epoch when the jobtracker daemon was started, <job-id>
is the job ID, and <job-name> is the name of the job.
For
example:
ec2-72-44-61-184.compute-1.amazonaws.com_1250641772616_job_200908190029_0002_hadoop_test-mini-mr
These
logs are not rotated. You can clear these logs periodically without
affecting Hadoop. However, consider archiving the logs if they are of interest
in the job development process. Make sure you do not move or delete a file that
is being written to by a running job.
Individual
statistics logs are created for each job that is submitted to the cluster. The
size of each log file varies. Jobs with more tasks produce larger files.
·
Standard
Error for a particular task attempt
These
logs are created by each tasktracker. They contain information written to
standard error (stderr) captured when a task attempt is run. These logs can be
used for debugging. For example, a developer can
include System.err.println(“some useful information”) calls in the
job code. The output will appear in the standard error files.
The
parent directory name for these logs is constructed as follows:
/var/log/hadoop/userlogs/attempt_<job-id>_<map-or-reduce>_<attempt-id>
where
<job-id> is the ID of the job that this attempt is doing work for,
<map-or-reduce> is either “m” if the task attempt was a mapper, or “r” if
the task attempt was a reducer, and <attempt-id> is the ID of the task
attempt.
For
example:
/var/log/hadoop/userlogs/attempt_200908190029_0001_m_000001_0
These
logs are rotated according to the mapred.userlog.retain.hours property. You can
clear these logs periodically without affecting Hadoop. However, consider
archiving the logs if they are of interest in the job development process. Make
sure you do not move or delete a file that is being written to by a running
job.
The
size of these log files depends on the amount of stderr calls used in
the job code.
·
Standard
Out for a particular task attempt
These
logs are very similar to the standard error logs, except they capture stdout
instead of stderr. File size depends entirely on how much data the task writes
to stdout.
·
log4j informational messages from within the task process
These
logs contains anything that log4j writes from within the task
process. This includes some Hadoop internal diagnostic info. If the job’s
mapper or reducer implementations include call such as LOG.info(), then
that output also get written here. Messages can include information about the
task, such as how big its record buffer was or how many reduce tasks there are.
The size of these log files depends on the number of log4j calls used
in the job code.
These
logs are rotated according to the mapred.userlog.retain.hours property.
You can clear these logs periodically without affecting Hadoop. However,
consider archiving the logs if they are of interest in the job development
process. Make sure you do not move or delete a file that is being written to by
a running job.
The Hadoop components produce the following logs when
installing Hadoop from CDH RPM or DEB packages:
·
On the jobtracker:
1
2
3
4
5
6
|
/var/log/hadoop
/hadoop-* => daemon logs
/job_*.xml => job configuration XML logs
/history
/*_conf.xml => job configuration logs
< everything else > => job statistics logs
|
·
On the namenode:
1
2
|
/var/log/hadoop
/hadoop-* => daemon logs
|
·
On the secondary namenode:
1
2
|
/var/log/hadoop
/hadoop-* => daemon logs
|
·
On the datanode:
1
2
|
/var/log/hadoop
/hadoop-* => daemon logs
|
·
On the tasktracker:
1
2
3
4
5
6
7
|
/var/log/hadoop
/hadoop-* => daemon logs
/userlogs
/attempt_*
/stderr => standard error logs
/stdout => standard out logs
/syslog => log4j logs
|
It’s probably clear now that Hadoop generates plenty of
logs, each with a different purpose and audience. I hope this post will
help you debug your MapReduce jobs and Hadoop cluster.
No comments:
Post a Comment