Building a Hadoop Yarn with Kerberos authentication

Feedback


SuperMap iServer distributed analysis supports Hadoop Yarn cluster, you can build it yourself by following the process below. This chapter describes how to set up a Hadoop Yarn cluster that uses Kerberos authentication.

Software requirements

To set up a Hadoop Yarn cluster environment with Kerberos authentication, you need to configure the Java environment (JDK download address http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.html#javasejdk, JDK 8 and above is recommended), SSH and hadoop.

The softwares used in this example are:

Hadoop installation package: hadoop-2.7.3.tar.gz stored in: /home/iserver

JDK installation package: jdk-8u131-linux-x64.tar.gz

Jsvc installation package: commons-daemon-1.0.15-src.tar.gz

Kerberos client installation package (windows): kfw-4.1-amd64.msi

Construction process

This example mainly shows how to build a Hadoop Yarn cluster with one master and one worker on two centos7 virtual machines (each has 12 g of memory )

  1. Configure Hadoop

Machine Name          IP                                       Memory               Processes

master          192.168.112.162                10G             namenode、resourcemanager、kerberos server

worker          192.168.112.163                10G             datanode、nodemanager             

Note:

    1. hostname must be lowercase, otherwise it will be wrong when integrating kerberos
    2. Turn off the firewall of the master and each worker

  1. Create the following user groups and users separately on the master and each node.
    1. Create a new user group, the command is as follows:

groupadd hadoop

    1. Create new users: hdfs and yarn, which need to set userID <1000, the commands are as follows:

useradd -u 501 hdfs -g hadoop

useradd -u 502 yarn -g hadoop

      And use the passwd command to set a password for the new users:

passwd hdfs input new password

passwd yarn input new password

    1. After the users are created, use the id user command to view the user infos as shown below:

    1. Configure DNS and gateway. If it is a virtual machine, the DNS and gateway need to be the same as the physical machine.
    1. Add the ip of the master node into DNS service and set teh domain name, such as: master162.ispeco.com
  1. Configure login without password for root, hdfs and yarn users between each node
    1. Take hdfs user as an example to show how to configure login without password.
      • Execute ssh-keygen -t rsa -P'' on master and worker respectively (-P indicates password, can be ignored, defaults to enter three times of "Enter" key). After executing the command, the private key file (id_rsa) and public key file (id_rsa.pub) will be generated in /home/hdfs/.ssh directory.

id_rsa     private key file

id_rsa.pub  public key file

ssh-copy-id -i /home/hdfs/.ssh/id_rsa.pub ip

When execute on master, wirite the ip of the worker node; and write the ip of the master node if executing on worker node.

ssh-copy-id -i /home/hdfs/.ssh/id_rsa.pub ip

    1. Perform the same operations shown above to configure login without password for root and yarn users.
  1. Install jdk
    1. Unzip jdk-8u131-linux-x64.tar.gz and set its path to the environment variable.
  2. Install jsvc
    1.  Unzip file: commons-daemon-1.0.15-src.tar.gz
    2. Enter commons-daemon-1.0.15-src/src/native/unix directory,
      • execute: support/buildconf.sh
      • execute: ./configure
      • execute: make
    3. Compile. After successful, you can see the jsvc file in the commons-daemon-1.0.15-src/src/native/unix directory, copy it to the [hadoop-2.7.3 installation package]/libexec directory.

Note: If your commons-daemon-1.0.15-src.tar.gz already has jsvc, you don't need to execute a, b, just execute c directly.

  1. Kerberos service installation and configuration
    1. The master node installs the krb server and client, the worker only installs the krb client.

yum install krb5-server (server)

yum install krb5-workstation krb5-libs krb5-auth-dialog (client)

    1. Configure Kerberos service related files.
      1. etc/krb5.conf. Modify the realm and change the default EXAMPLE.COM to the value you want to define, such as: SUPERMP.COM. Among them, the following parameters need to be modified:
        • default_realm: The default realm. Set to realm, such as SUERMAP.COM
        • kdc: represents the location of kdc to be added. Add format is machine name.
        • admin_server: represents the location of the admin. The format is the machine name.
        • default_domain: represents the default domain name. (Set the domain name corresponding to the master host, such as master162.ispeco.com)

Note: /etc/krb5.conf (Both the Master and Worker nodes have to place this file, and the file content must be the same)

The file after modified is shown below:

      1. vi /var/kerberos/krb5kdc/kdc.conf (Configure on the master node. If this file doesn't exist, you need to build it yourself):

      1. vi /var/kerberos/krb5kdc/kadm5.acl (Configure on the master node. If this file doesn't exist, you need to build it yourself):

    1. Configure Kerberos service related files.

To create a Kerberos database, you need to set an administrator password. After the creation is successful, a series of files will be generated under /var/Kerberos/krb5kdc/. If you want to recreate it, you need to delete the related principal files in /var/kerberos/krb5kdc.

Execute the following command on the root user of the master node:

kdb5_util create -s -r SUPERMAP.COM

After successful execution, the result is shown as following figure:

Note: After the database is created successfully, you need to restart the krb5 service.

krb5kdc restart

    1. Create an administrator for kerberos.

Execute the following commands on the root user of the master node:

kadmin.local

addprinc admin/admin@SUPERMAP1.COM

As shown below:

    1. Create common users and key files for kerberos so that when configuring <!--YARN kerberos security-->, each node can access each other.
      1. Execute the following commands on the root user of the master node:

kadmin.local

#create users

addprinc -randkey yarn/master162.ispeco.com@SUPERMAP.COM

addprinc -randkey yarn/worker163@SUPERMAP.COM

addprinc -randkey hdfs/master162.ispeco.com@SUPERMAP.COM

addprinc -randkey hdfs/worker163@SUPERMAP.COM

#generate key files (generated to the current path)

xst  -k yarn.keytab  yarn/master162.ispeco.com@SUPERMAP.COM

xst  -k yarn.keytab  yarn/worker163@SUPERMAP.COM

xst  -k hdfs.keytab  hdfs/master162.ispeco.com@SUPERMAP.COM

xst  -k hdfs.keytab  hdfs/worker163@SUPERMAP.COM

      1. Copy the yarn.keytab (yarn:hadoop 400) and hdfs.keytab (hdfs:hadoop 400) files to the /var/kerberos/krb5kdc/ directory of the Master and Worker nodes, and set the corresponding groups the files belong to, set the permission of the files as 400, which are shown as follows:

 

  1. Modify the hadoop cluster configuration file

    Under {HADOOP_HOME}/etc/hadoop directory:

      1. The contents of the core-site.xml file are as follows:

<configuration>

<property>

  <name>fs.defaultFS</name>

  <value>hdfs://192.168.112.162:9000</value>

 </property>

 <property>

  <name>hadoop.tmp.dir</name>

  <value>file:/home/iserver/hadoop-2.7.3/tmp</value>

  <description>Abasefor other temporary directories.</description>

 </property>

 <property>

  <name>hadoop.security.authentication</name>

  <value>kerberos</value>

 </property>

 <property>

  <name>hadoop.security.authorization</name>

  <value>true</value>

 </property>

 <property>

  <name>hadoop.rpc.protection</name>

  <value>authentication</value>

 </property>

 <property>

  <name>hadoop.security.auth_to_local</name>

  <value>DEFAULT</value>

 </property>

</configuration>

      1. The contents of the yarn-site.xml file are as follows:

<configuration>

<property>

  <name>yarn.nodemanager.aux-services</name>

  <value>mapreduce_shuffle</value>

 </property>

 <property>

  <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

  <value>org.apache.hadoop.mapred.ShuffleHandler</value>

 </property>

<property>

   <name>yarn.resourcemanager.address</name>

   <value>192.168.112.162:8032</value>

</property>

<property>

   <name>yarn.resourcemanager.scheduler.address</name>

   <value>192.168.112.162:8030</value>

</property>

<property>

   <name>yarn.resourcemanager.resource-tracker.address</name>

   <value>192.168.112.162:8031</value>

</property>

<property>

   <name>yarn.resourcemanager.admin.address</name>

   <value>192.168.112.162:8033</value>

</property>

<property>

   <name>yarn.resourcemanager.webapp.address</name>

   <value>192.168.112.162:8088</value>

</property>

<!--YARN kerberos security-->

 <property>

  <name>yarn.resourcemanager.keytab</name>

  <value>/var/kerberos/krb5kdc/yarn.keytab</value>

 </property>

  <property>

  <name>yarn.resourcemanager.principal</name>

  <value>yarn/master162.ispeco.com@SUPERMAP.COM</value>

 </property>

  <property>

  <name>yarn.nodemanager.keytab</name>

  <value>/var/kerberos/krb5kdc/yarn.keytab</value>

 </property>

<property>

  <name>yarn.nodemanager.principal</name>

  <value>yarn/master162.ispeco.com@SUPERMAP.COM</value>

 </property>

  <property>

    <name>yarn.nodemanager.container-executor.class</name>

    <value>org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor</value>

  </property>

    <property>

    <name>yarn.nodemanager.linux-container-executor.group</name>

    <value>hadoop</value>

  </property>

  <property>

   <name>yarn.nodemanager.local-dirs</name>

   <value>/home/iserver/hadoop-2.7.3/local</value>

 </property>

 <property>

   <name>yarn.nodemanager.log-dirs</name>

   <value>/home/iserver/hadoop-2.7.3/log</value>

 </property>

</configuration>

      1. The contents of the Hdfs-site.xml file are as follows:

<configuration>

<property>

   <name>dfs.https.enable</name>

   <value>false</value>

 </property>

<property>

   <name>dfs.namenode.http-address</name>

   <value>192.168.112.162:50070</value>

 </property>

 <property>

   <name>dfs.https.port</name>

   <value>504700</value>

 </property>

<property>

  <name>dfs.namenode.secondary.http-address</name>

  <value>192.168.112.162:9001</value>

 </property>

  <property>

   <name>dfs.namenode.name.dir</name>

   <value>file:/home/iserver/hdfs/name</value>

 </property>

 <property>

  <name>dfs.datanode.data.dir</name>

  <value>file:/home/iserver/hdfs/data</value>

  </property>

 <property>

  <name>dfs.replication</name>

  <value>2</value>

 </property>

 <property>

  <name>dfs.webhdfs.enabled</name>

  <value>true</value>

 </property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<property>

 <name>dfs.encrypt.data.transfer</name>

 <value>true</value>

</property>

<property>

 <name>dfs.block.access.token.enable</name>

 <value>true</value>

</property>

<property>

 <name>dfs.namenode.kerberos.principal</name>

 <value>hdfs/master162.ispeco.com@SUPERMAP.COM</value>

</property>

<property>

 <name>dfs.namenode.keytab.file</name>

 <value>/var/kerberos/krb5kdc/hdfs.keytab</value>

</property>

<property>

 <name>dfs.namenode.kerberos.internal.spnego.principal</name>

 <value>HTTP/master162.ispeco.com@SUPERMAP.COM</value>

</property>

<property>

 <name>dfs.web.authentication.kerberos.keytab</name>

 <value>/var/kerberos/krb5kdc/hdfs.keytab</value>

</property>

<property>

 <name>dfs.web.authentication.kerberos.principal</name>

 <value>http/master162.ispeco.com@SUPERMAP.COM</value>

</property>

<property>

 <name>dfs.namenode.kerberos.https.principal</name>

 <value>host/master162.ispeco.com@SUPERMAP.COM</value>

</property>

<property>

 <name>dfs.namenode.secondary.http-address</name>

 <value>192.168.112.162:50090</value>

</property>

<property>

 <name>dfs.secondary.namenode.keytab.file</name>

 <value>/var/kerberos/krb5kdc/hdfs.keytab</value>

</property>

<property>

 <name>dfs.secondary.namenode.kerberos.principal</name>

 <value>hdfs/master162.ispeco.com@SUPERMAP.COM</value>

</property>

<property>

 <name>dfs.secondary.namenode.kerberos.internal.spnego.principal</name>

 <value>HTTP/master162.ispeco.com@SUPERMAP.COM</value>

</property>

<property>

 <name>dfs.datanode.kerberos.principal</name>

 <value>hdfs/master162.ispeco.com@SUPERMAP.COM</value>

</property>

<property>

 <name>dfs.datanode.keytab.file</name>

 <value>/var/kerberos/krb5kdc/hdfs.keytab</value>

</property>

<property>

 <name>dfs.encrypt.data.transfer</name>

 <value>false</value>

</property>

<property>

 <name>dfs.datanode.data.dir.perm</name>

 <value>700</value>

</property>

<property>

 <name>dfs.datanode.address</name>

 <value>192.168.112.163:1004</value>

</property>

<property>

 <name>dfs.datanode.http.address</name>

 <value>192.168.112.163:1006</value>

</property>

<property>

 <name>dfs.datanode.https.address</name>

 <value>192.168.112.163:50470</value>

</property>

</configuration>

      1. The contents of the container-executor.cfg file are as follows:

yarn.nodemanager.linux-container-executor.group=hadoop

#configured value of yarn.nodemanager.linux-container-executor.group

banned.users=hdfs

#comma separated list of users who can not run applications

min.user.id=0

#Prevent other super-users

allowed.system.users=root,yarn,hdfs,mapred,nobody

##comma separated list of system users who CAN run applications

 

      1. In hadoop-env.sh file, add:

export JAVA_HOME=/home/supermap/java/jdk1.8.0_131

export JSVC_HOME=/home/supermap/hadoop/hadoop-2.7.3/libexec

If you need to debug, add:

export HADOOP_OPTS="$HADOOP_OPTS -Dsun.security.krb5.debug=true"

      1. In yarn-env.sh file, add:

export JAVA_HOME=/home/supermap/java/jdk1.8.0_131

      1. In masters file, add:

master (the host name of the master)

      1. In slaves file, add:

worker (the host name of the Worker)

      1. Copy the configured hadoop installation package to the worker node and place it in the same path as the master node.
      1. Set ermissions (Both Master and Worker nodes need to be set)

The Hadoop installation package is placed under the root and the directory with permissions 755.

The following files need to be configured with the corresponding permissions and user groups separately:

Corresponding parameters/files File

Need to be set to user: group

Permission

dfs.namenode.name.dir

hdfs-site.xml

hdfs-site.xml

drwx------ (700)

dfs.datanode.data.dir

hdfs-site.xml

hdfs-site.xml

drwx------ (700)

$HADOOP_LOG_DIR

hadoop_env.sh

hdfs:hadoop

drwxrwxr-x(775)

$HADOOP_YARN_HOME/logs

yarn-env.sh

hdfs:hadoop

drwxrwxr-x(775)

yarn.nodemanager.local-dirs

yarn-site.xml

yarn:hadoop

drwxr-xr-x(775)

yarn.nodemanager.log-dirs

yarn-site.xml

yarn:hadoop

drwxr-xr-x(755)

container-executor

hadoop installation directory/bin/

root:hadoop

--Sr-s--*(6050)

container-executor.cfg

hadoop installation directory/etc/haddop/

root:hadoop

r-------*(400)

tmp

hadoop installation directory/

hdfs:hadoop

-

start-dfs.sh

hadoop installation directory/sbin

hdfs:master

-

start-secure-dns.sh

root:maste

start-yarn.sh

yarn:master

 

  1. Start the yarn cluster with krb authentication enabled and the hadoop cluster with krb authentication enabled.

Execute the following command in the HADOOP installation directory under the master node:

    1. Start the Kerberos service (root user)

krb5kdc start

    1. Formatting the cluster under user hdfs (this step is required only for the first installation or after modifying the hadoop related settings)

[hdfs@master bin]$ ./hadoop namenode –format

    1. Start the cluster

[hdfs@master sbin]$ ./start-dfs.sh

[root@master sbin]# ./start-secure-dns.sh

[yarn@master sbin]$ ./start-yarn.sh

    1. To shut down the cluster, execute:

[yarn@master sbin]$ ./stop-yarn.sh

[root@master sbin]# ./stop-secure-dns.sh

[hdfs@master sbin]$ ./stop-dfs.sh

  1. Verify that the yarn cluster and the hadoop cluster are successfully started.

For yarn cluster, access MasterIP:8088

For hadoop, access MasterIP:50070

  1. To perform distirbuted analysis, you also need to configure UGO for each sub-node of the Hadoop Yarn cluster. For details, see: ugo configuration.