Cloudera (CDP7.1) setup & Deployment with spark 3.0 cluster

Goal: To setup CDP7.1 on a 3 node cluster for Benchmark publications – marketing collateral, helping in sale of servers in different domains like Retail, Banking and Insurance.

A) First step would be to download the CDP 7.1 (License is applied – as a txt file). There are few prerequisites one needs to follow before going into installing the CDP7.1

B) Get the hardware required. In our case I am choosing three homogeneous servers with below configuration. However it is better to make a choice based on application or workload resource characteristics and the price performance cost involved to purchase the server (Price performance {PP} = Like mileage of a car as the performance metric). If the application is multithreaded then choosing more processors with maximum no of cores and sufficient memory will help. However, if the application is single threaded, then SKU/CPUs with higher frequency and larger L1 cache could help and for memory intensive workloads, DIMMs with higher memory speeds would be beneficial for application performance improvements. Also SSD choice is preferred than an HDD due to better reliability of the hardware for Big data processing and faster data processing capability.

Each server configuration: 2 socket server with 22 cores per CPU. So with HT enabled we have 88 threads in total, with a 384 GB RAM per CPU and 2 x 3.2 TB SSDs. (hard drives)

C) Download the trial version of CDP 7.1 from link and follow the system security, network requirements from this link. Key requirements and commands are listed below.

  1. On Centos7 disable firewall and set SElinux with below options, as some of the ports needs to be opened by cdp manager installer.

systemctl disable firewalld
systemctl stop firewalld

vim /etc/selinux/config

SELINUX=disabled

Setenforce 0

2. Network configurations involve setting DHCP, hostnames, DNS, NTP to have identical time on each node of the cluster and passwordless ssh.

DHCP is setup by Lab admins with a central server assigning IPs to the nodes in the network by ARP protocol. However depending on the operating system one can configure a system or VM as DHCP server.

setting hostnames :

a)

[root@localhost ~]# hostnamectl
Static hostname: localhost.localdomain
Icon name: computer-server
Chassis: server
Machine ID: ****
Boot ID: **
Operating System: CentOS Linux 7 (Core)
CPE OS Name: cpe:/o:centos:centos:7
Kernel: Linux 3.10.0-1160.el7.x86_64
Architecture: x86-64
[root@localhost ~]#

b) Edit and update this (command vim /etc/sysconfig/network) as shown below on respective servers.

HOSTNAME=master.local.com

HOSTNAME=slave1.local.com

HOSTNAME=slave2.local.com

HOSTNAME=slave3.local.com

c) Setup domain name as below

[root@localhost ~]# domainname
(none)
[root@localhost ~]# domainname master.local.com
[root@localhost ~]# domainname
master.local.com
[root@localhost ~]#

d) On systems with Linux application servers that are using DNS, edit the /etc/hosts file and add IP addresses to /etc/hosts to ping each node of the cluster with each other
10.xx.x.xx master.local.com master
10.xx.x.xx slave1.local.com slave1
10.xx.x.xx slave2.local.com slave2
10.xx.x.xx slave3.local.com slave3

e) verify by pinging the hostnames above using “ping <master>”, “ping <slave#>” etc on each node.

f) Setting up ntp service with chronyd : reference link

g) Passwordless ssh on master node and copy the public key on other slave nodes

[root@master-spark ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa):
Created directory ‘/root/.ssh’.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
SHA256:***************************** root@master-spark
The key’s randomart image is:
+—[RSA 2048]—-+
| dssf

| ddfdadfadadf
| . |
| |
| |
| |
+—-[SHA256]—–+
[root@master-spark ~]# ssh-copy-id

[root@master-spark ~]# ssh-copy-id root@slave1
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
The authenticity of host ‘slave1 (10.xx.xx.xx)’ can’t be established.
ECDSA key fingerprint is SHA256:*******************************************
ECDSA key fingerprint is MD5:***************************************
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
root@slave1’s password:

Number of key(s) added: 1

Now try logging into the machine, with: “ssh ‘root@slave1′”
and check to make sure that only the key(s) you wanted were added.

[root@master-spark ~]# ssh-copy-id root@slave2
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: “/root/.ssh/id_rsa.pub”
The authenticity of host ‘slave2 (10.10.0.73)’ can’t be established.
ECDSA key fingerprint is SHA256:*******************************************
ECDSA key fingerprint is MD5:***************************************
Are you sure you want to continue connecting (yes/no)? yes
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed — if you are prompted now it is to install the new keys
root@slave2’s password:

Number of key(s) added: 1

Now try logging into the machine, with: “ssh ‘root@slave2′”
and check to make sure that only the key(s) you wanted were added.

[root@master-spark ~]#

similarly do it on slave3 too.

3. Other dependencies to be installed on each node is as shown below. One can use ansible for these system security, network configurations and cdp installation itself by cloning the github repository and editing some of the yaml files. Please check ansible reference link

yum install -y java-1.8.0-openjdk-devel (java)

yum install postgresql-server

verify below :

rpm -qa iptables-services
iptables-services-1.4.21-34.el7.x86_64
systemctl status iptables
iptables -L -n -v
iptables -F
iptables -L -n -v
systemctl stop iptables
systemctl disable iptables

chkconfig iptables off
Note: Forwarding request to ‘systemctl disable iptables.service’.

service iptables status –> output
Redirecting to /bin/systemctl status iptables.service
● iptables.service – IPv4 firewall with iptables
Loaded: loaded (/usr/lib/systemd/system/iptables.service; disabled; vendor preset: disabled)
Active: inactive (dead)

swappiness=10

D) Installation of the CDP 7.1 by following steps in the link

On the master node, run below commands to install the clourdera manager server

wget https://archive.cloudera.com/cm7/7.4.4/cloudera-manager-installer.bin

$ chmod u+x cloudera-manager-installer.bin

$ sudo ./cloudera-manager-installer.bin

Installer First page
Cloudera Manager server Page
Continue Installation by following directions in above screenshot

If the domain name of the master server doesn’t work use the links <http://master node IP>:7180 with credentials admin/admin.

Pardon the screenshots, which aren’t very clear. I have tried to mask the IP’s of our systems for security purposes.

License applied

TLS (Transport Layer security) security Page: Ignore & continue
I had tried this installation couple of times due some misconfigurations and so got above error.

I tried multiple things like checking the configurations, checked logs (under /var/log/cloudera-scm-manager/) for error, but nothing was obvious and also tried running the installer many times but it was hard to get through this above page. This link helped me to get over it from the cloudera community. I had to reboot my master node and that resolved the issue.

Finally got the login page again:

Login Page (admin/admin)
Name your cdp cluster as you wish
Read carefully to add slave1 to slave3 nodes to the cluster. Using hostnames as in slave[1-3].local.com helped to add all the nodes easily in my cluster setup.
Due some hardware issue I had to add my slave3 later.
I choose kafka too
choice of cloudera provided jdk as shown above

install agents
Follow the inspections for network perfomance
follow further few steps as directed
Once the inspection is successfully completed you get above page
I got couple of warnings to set some kernel parameters for proper function of the cloudera cluster services and roles etc.
Cloudera recommends setting /proc/sys/vm/swappiness to a maximum of 10. Current setting is 60. Use the sysctl command to change this setting at run time and edit /etc/sysctl.conf for this setting to be saved after a reboot. You can continue with installation, but Cloudera Manager might report that your hosts are unhealthy because they are swapping. The following hosts are affected:  View Details slave[1-2].local.com

Transparent Huge Page Compaction is enabled and can cause significant performance problems. Run “echo never > /sys/kernel/mm/transparent_hugepage/defrag” and “echo never > /sys/kernel/mm/transparent_hugepage/enabled” to disable this, and then add the same command to an init script such as /etc/rc.local so it will be set on system reboot. The following hosts are affected:

 View Details

slave[1-2].local.com

From <http://XXXXXX:7180/cmf/inspector?commandId=15####$$87&gt;

[root@master home]# ssh root@slave1 “sudo sysctl vm.swappiness=10”

vm.swappiness = 10

[root@master home]# ssh root@slave2 “sudo sysctl vm.swappiness=10”

vm.swappiness = 10

[root@master home]#

[root@master ~]# tail -f /var/log/cloudera-manager-installer/3.install-cloudera-manager-server.log

Total download size: 1.6 G

Installed size: 1.9 G

Downloading packages:

——————————————————————————–

Total                                               11 MB/s | 1.6 GB  02:35

Running transaction check

Running transaction test

Transaction test succeeded

Running transaction

Warning: RPMDB altered outside of yum.

  Installing : cloudera-manager-daemons-7.4.4-15850731.el7.x86_64           1/2

  Installing : cloudera-manager-server-7.4.4-15850731.el7.x86_64            2/2

  Verifying  : cloudera-manager-daemons-7.4.4-15850731.el7.x86_64           1/2

  Verifying  : cloudera-manager-server-7.4.4-15850731.el7.x86_64            2/2

Installed:

  cloudera-manager-server.x86_64 0:7.4.4-15850731.el7

Dependency Installed:

  cloudera-manager-daemons.x86_64 0:7.4.4-15850731.el7

Complete!

Inspection of network performance and hosts is completed. These steps would take a while. Be patient!!

Depending on requirements, one may choose specific services. I have chosen “customer services”, so that I focus on my batch programs to be deployed on spark cluster

Make choice of services based on your requirements. Hence this does require some level of expertise in Big Data and requirements gathering with end goal of this spark cluster deployment.

For our cluster i made choice of HDFS/HBASE, YARN, Spark, HIVE, KAFKA, Zookeeper and SQOOP with zepplin services.

missed to update database passwords in previous step

Another error:

as per the stderr log, there seemed to be some data preexisting in the /dfs/nn/

[root@master home]# ls -lrt /dfs/nn
total 4
drwx——. 2 hdfs hdfs 4096 Nov 18 01:42 current
[root@master home]# date
Thu Nov 18 01:49:05 IST 2021
[root@master home]# rm -rf /dfs/nn/current/
edits_0000000000000000001-0000000000000004325 edits_0000000000000009076-0000000000000009077 fsimage_0000000000000000000
edits_0000000000000004326-0000000000000009059 edits_0000000000000009078-0000000000000009085 fsimage_0000000000000000000.md5
edits_0000000000000009060-0000000000000009067 edits_0000000000000009086-0000000000000009093 seen_txid
edits_0000000000000009068-0000000000000009075 edits_inprogress_0000000000000009094 VERSION
[root@master home]# rm -rf /dfs/nn/current/
edits_0000000000000000001-0000000000000004325 edits_0000000000000009076-0000000000000009077 fsimage_0000000000000000000
edits_0000000000000004326-0000000000000009059 edits_0000000000000009078-0000000000000009085 fsimage_0000000000000000000.md5
edits_0000000000000009060-0000000000000009067 edits_0000000000000009086-0000000000000009093 seen_txid
edits_0000000000000009068-0000000000000009075 edits_inprogress_0000000000000009094 VERSION
[root@master home]# rm -rf /dfs/nn/current/*
[root@master home]#

Soln:

Continued errors:

Soln:

[root@slave3 process]# find . -name webapp.properties
./1546334388-queuemanager-QUEUEMANAGER_STORE/conf/webapp.properties
./1546334346-queuemanager-QUEUEMANAGER_STORE/conf/webapp.properties
./1546334344-queuemanager-QUEUEMANAGER_WEBAPP/conf/webapp.properties
./1546333931-queuemanager-QUEUEMANAGER_STORE/conf/webapp.properties
./1546333929-queuemanager-QUEUEMANAGER_WEBAPP/conf/webapp.properties
[root@slave3 process]# pwd
/var/run/cloudera-scm-agent/process
[root@slave3 process]#

Final assignments:

Hive

TypeMySQLPostgreSQLOracle Database Hostnamemaster.hpelocal.com:7432 Database Namehive1 Usernamehive1 PasswordiyxJSXOqQV 

Reports Manager

Currently assigned to run on master.hpelocal.com.TypeMySQLPostgreSQLOracle Database Hostnamemaster.hpelocal.com:7432 Database Namerman Usernamerman PasswordpcELjCr8J4 

Oozie Server

Currently assigned to run on master.hpelocal.com.TypeMySQLPostgreSQLOracle Database Hostnamemaster.hpelocal.com:7432 Database Nameoozie_oozie_server1 Usernameoozie_oozie_server1 Passwordk8mKDQ322k 

i removed yarn queue manager service :

Updated all nodes with below:

[root@slave3 process]# cat /etc/hosts
10.1x.xx.xx master.hpelocal.com
10.1x.xx.xx slave1.hpelocal.com
10.1x.xx.xx slave2.hpelocal.com
10.1x.xx.xx slave3.hpelocal.com
[root@slave3 process]#

[root@slave1-spark process]# python -c “import socket; print socket.getfqdn(); print socket.gethostbyname(socket.getfqdn())”
slave1.hpelocal.com
10.1x.xx.xx
[root@slave1-spark process]#

https://community.cloudera.com/t5/Support-Questions/Bad-health-issue-DNS-resolve/m-p/37209

Below solved the DNS resolution:

Issue:

Configuration issues: DNS resolution unresolved with yarn and hosts failures

Solution: ( do it on all nodes)

 hostname –fqdn

master.hpelocal.com

 sudo hostnamectl set-hostname slave1.hpelocal.com

[root@slave1-spark process]# hostname –fqdn

slave1.hpelocal.com

 sudo hostnamectl set-hostname slave2.hpelocal.com

[root@slave2-spark process]# hostname –fqdn

slave2.hpelocal.com

[root@slave2-spark process]#

 sudo hostnamectl set-hostname slave3.hpelocal.com

[root@slave3 process]# hostname –fqdn

slave3.hpelocal.com

[root@slave3 process]#

And then restart cloudera scm agent:

 service cloudera-scm-agent restart

Redirecting to /bin/systemctl restart cloudera-scm-agent.service

[root@master process]#

References:

https://stackoverflow.com/questions/35472852/hadoop-dns-resolution/35517749 — helped

https://stackoverflow.com/questions/19639561/cloudera-cdh4-cant-add-a-host-to-my-cluster-because-canonical-name-is-not-cons — good to know

https://docs.cloudera.com/documentation/enterprise/6/6.3/topics/configure_network_names.html

Spark and other Big Data CDP7.1 cluster is finally up and running!!

CDP 7.1 cluster

Uninstallation of CDP:

https://docs.cloudera.com/cdp-private-cloud-base/7.1.6/installation/topics/cdpdc-uninstallation.html

  1. Stop all services and the cluster.

2) Remove user data:

Record the location of the user data paths by checking the configuration in each service.

The user data paths listed in the topic Remove User Data, /var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper data_drive_path/dfs data_drive_path/mapred data_drive_path/yarn, are the default settings.

3) Deactivate and Remove Parcels

If you installed using packages, skip this step and go to Uninstall the Cloudera Manager Server; you will remove packages in Uninstall Cloudera Manager Agent and Managed Software. If you installed using parcels remove them as follows:

  1. Click the parcel indicator  in the left-hand navigation bar.
  2. In the Location selector on the left, select All Clusters.
  3. For each activated parcel, select Actions > Deactivate. When this action has completed, the parcel button changes to Activate.
  4. For each activated parcel, select Actions > Remove from Hosts. When this action has completed, the parcel button changes to Distribute.
  5. For each activated parcel, select Actions > Delete. This removes the parcel from the local parcel repository.

There might be multiple parcels that have been downloaded and distributed, but that are not active. If this is the case, you should also remove those parcels from any hosts onto which they have been distributed, and delete the parcels from the local repository.

click on deactivate

4) Delete the Cluster

5) Uninstall the Cloudera Manager Server

6) Uninstall Cloudera Manager Agent and Managed Software

do the same on all slave nodes

7) Remove Cloudera Manager, User Data, and Databases

Permanently remove Cloudera Manager data, the Cloudera Manager lock file, and user data. Then stop and remove the databases.

a) On all Agent hosts, kill any running Cloudera Manager and managed processes:for u in cloudera-scm flume hadoop hdfs hbase hive httpfs hue impala llama mapred oozie solr spark sqoop sqoop2 yarn zookeeper; do sudo kill $(ps -u $u -o pid=); doneNoteThis step should not be necessary if you stopped all the services and the Cloudera Manager Agent correctly.

b) If you are uninstalling on RHEL, run the following commands on all Agent hosts to permanently remove Cloudera Manager data. If you want to be able to access any of this data in the future, you must back it up before removing it. If you used an embedded PostgreSQL database, that data is stored in /var/lib/cloudera-scm-server-db.sudo umount cm_processes sudo rm -Rf /usr/share/cmf /var/lib/cloudera* /var/cache/yum/cloudera* /var/log/cloudera* /var/run/cloudera*

c) On all Agent hosts, run this command to remove the Cloudera Manager lock file:sudo rm /tmp/.scm_prepare_node.lock

This step permanently removes all user data. To preserve the data, copy it to another cluster using the distcp command before starting the uninstall process.

d) On all Agent hosts, run the following commands:sudo rm -Rf /var/lib/flume-ng /var/lib/hadoop* /var/lib/hue /var/lib/navigator /var/lib/oozie /var/lib/solr /var/lib/sqoop* /var/lib/zookeeper

e) Run the following command on each data drive on all Agent hosts (adjust the paths for the data drives on each host):sudo rm -Rf data_drive_path/dfs data_drive_path/mapred data_drive_path/yarn

Stop and remove the databases. If you chose to store Cloudera Manager or user data in an external database, see the database vendor documentation for details on how to remove the databases.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s