Monday 22 November 2021

CDH Troubleshooting Upgrades

  •  Cluster hosts do not appear

Some cluster hosts do not appear when you click Find Hosts in install or update wizard.

Possible Reasons

You might have network connectivity problems.

Possible Solutions

Make sure all cluster hosts have SSH port 22 open.

Check other common causes of loss of connectivity such as firewalls and interference from SELinux.

  • Cannot start services after upgrade

You have upgraded the Cloudera Manager Server, but now cannot start services.

Possible Reasons

You might have mismatched versions of the Cloudera Manager Server and Agents.

Possible Solutions

Make sure you have upgraded the Cloudera Manager Agents on all hosts. (The previous version of the Agents will heartbeat with the new version of the Server, but you cannot start HDFS and MapReduce with this combination.)

  • HDFS DataNodes fail to start

After upgrading, HDFS DataNodes fail to start with exception:

Exception in secureMainjava.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 4294967296 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.   

Possible Reasons

HDFS caching, which is enabled by default in CDH 5 and higher, requires new memlock functionality from Cloudera Manager Agents.

Possible Solutions:

Do the following:

Stop all CDH and managed services.

On all hosts with Cloudera Manager Agents, hard-restart the Agents. Before performing this step, ensure you understand the semantics of the hard_restart command by reading Cloudera Manager Agents.

RHEL 7, SLES 12, Ubuntu 18.04 and higher

sudo systemctl stop supervisord

sudo systemctl start cloudera-scm-agent

RHEL 5 or 6, SLES 11, Debian 6 or 7, Ubuntu 12.04 or 14.04

sudo service cloudera-scm-agent hard_restart

Start all services.

  • Cloudera services fail to start

Possible Reasons

Java might not be installed or might be installed at a custom location.

Possible Solutions

See Configuring a Custom Java Home Location for more information on resolving this issue.

  • Host Inspector Fails

If you see the following message in the Host Inspector:

There are mismatched versions across the system, which will cause failures. See below for details on which hosts are running what versions of components.

When looking at the results, some hosts report Supervisord vX.X.X, while others report X.X.X-cmY.Y.Y (where X and Y are version numbers). During the upgrade, an old file on the hosts may cause the Host Inspector to indicate mismatched Supervisord versions.

This issue occurs because these hosts have a file on them at /var/run/cloudera-scm-agent/supervisor/__STARTING_CM_VERSION__ that contains a string for the older version of Cloudera Manager.

To resolve this issue:

Remove or rename the /var/run/cloudera-scm-agent/supervisor/__STARTING_CM_VERSION__ file

Perform a hard restart of the agents:

sudo systemctl stop cloudera-scm-supervisord.service

sudo systemctl start cloudera-scm-agent

Run the Host inspector again. It should pass without the warning.