Welcome to contribute to our documentation on github.
The Hadoop 2.6 release contains a new feature that allows to launch Docker container directly as YARN container, called the DCE (Docker container executor). Running a Hadoop job is not an easy thing, User need to solve a lot of problem, such as complex problem of dependency. Using DCE let the developers package their applications and all of dependencies into a Docker container in order to provide a consistent environment for execution and also provides isolation from other applications installed on host. Because of the official reference document is relatively simple, and ignored a lot of details, this tutorial will describe the detail of DCE configuration and some problems needing attention.
Prerequisite
Your Hadoop version must be 2.7.i at minimum. The distro and version of Linux in your Docker image can be quite different from that of your Nodemanager. However, if you are using the MapReduce framework, then your Docker image will need to be configured for running Hadoop. Obviously, Java is in need, and the following environment variables must be defined in the image: JAVA_HOME, HADOOP_COMMON_PATH, HADOOP_HDFS_HOME, HADOOP_MAPRED_HOME, HADOOP_YARN_HOME, and HADOOP_CONF_DIR.
Before running the DCE, please make sure that your host has installed Hadoop and Docker.
Experimental Environment
You will have 3 machines running Ubuntu Server 14.04, each of which will be running a Docker daemon and Hadoop 2.7.3 inside. Besides we are using sequenceiq/hadoop-docker2.4.1 image that developed by sequenceiq. This image already contains all of required environmental dependencies. Of course, you can also using your own Docker image
Experimental Procedure
##Step 1:Pull the Docker image
You can use the following command to pull image:
docker pull sequenceiq/hadoop-docker:2.4.1
and, you can type the following command to inspect the running environment into the Docker container:
docker run -it sequenceiq/hadoop-docker:2.7.1 /etc/bootstrap.sh -bash
If nothing has gone wrong, you should now find the Hadoop directory into /usr/local that inside the container.
Step 2:Configure the yarn-site.xml on the host
You need to add the following contents into yarn-site.xml:
\
\yarn.nodemanager.docker-container-executor.exec-name\
\/usr/bin/docker\
\
\
\yarn.nodemanager.container-executor.class\
\org.apache.hadoop.yarn.server.nodemanager.DockerContainerExecutor\
\
The first configuration items yarn.nodemanager.docker-container-executor.exec-name is configure the path of the Docker executable file. And the second items yarn.nodemanager.docker-container-executor.class is configure the DockerContainerExecutor as container executor rather than the default container executor into YARN.
Step 3:Restart the YARN and HDFS on the host
./sbin/stop-all.sh
./sbin/start-all.sh
Step 4:Submit a MapReduce job
Here, we are using the hadoop-mapreduce-examples-2.7.3.jar π calculation.
Type following command:
bin/Hadoop jar $HADOOP_HOME/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar pi
-D mapreduce.map.env=”yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1”
-D mapreduce.reduce.env=”yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1”
-D yarn.app.mapreduce.am.env=”yarn.nodemanager.docker-container-executor.image-name=sequenceiq/hadoop-docker:2.4.1” 5 10
Currently you cannot configure any of the Docker settings with the job configuration. You can provide Mapper, Reducer, and ApplicationMaster environment overrides for the docker images, using the following 3 JVM properties respectively(only for MR jobs):
-mapreduce.map.env: You can override the mapper’s image by passing yarn.nodemanager.docker-container-executor.image-name=your_image_name to this JVM property.
-mapreduce.reduce.env: You can override the reducer’s image by passing yarn.nodemanager.docker-container-executor.image-name=your_image_name to this JVM property.
-yarn.app.mapreduce.am.env: You can override the ApplicationMaster’s image by passing yarn.nodemanager.docker-container-executor.image-name=your_image_name to this JVM property.
If noting has gone wrong, you can docker ps command to affirm Docker container is running. And you can find that the name of Docker container is same as the default YARN container name that show on the terminal.
Need to be aware of problem in the experiment
- Don`t use Hadoop 3.0.0 or higher version, There will be some puzzling error.
- Error: No such image or container
This error may be because of the Docker image version do not match the Hadoop version on the host. And you can refer to log file of Nodemanager. - Diagnostics: Container image must not be null
Two reasons led to this error:
A. The Hadoop running no hosts do not support using DCE as YARN container.
B. The configuration of YARN is wrong or when you submit job, you forget configure for DCE.
Note. DCE and default YARN container can’t be use at same time in cluster. - Error: No such image , container or task
This error is not that simple. You can refer to error log file (stderr) of MapReduce job. If you find message about Error: Could not find or load main class org.apache.-hadoop.mapreduce.v2.app.MRAppMaster. You can add the library path that is inside the Docker container to the classpath of MapReduce job in mapred-site.xml. If you find message about Exception in thread “main” java.lang.NoClassDefFoundError:-org/apache/hadoop/service/CompositeService. You need also add the library path that is inside the Docker container to the classpath of YARN in yarn -site.xml. - The MR job is stuck in the accepted.
You can refer to log file of Nodemanager. This error maybe result from resource that schedule by YARN is not enough, then you need to increase the number of resource in yarn-site.xml. May also result from the Docker container can’t get the PID, at this time, please read this article seriously.