NameNode Common Errors and Solutions
1. NameNode Out of Memory (OOM)
Error Message:
java.lang.OutOfMemoryError: Java heap space
Cause:
- Heap size allocated to NameNode is too small.
- Large number of small files consuming excessive memory.
Solution:
- Increase heap memory in
hadoop-env.sh
:export HADOOP_NAMENODE_OPTS="-Xms4G -Xmx8G"
- Enable Federation for large datasets (
dfs.federation.enabled=true
). - Use HDFS Erasure Coding instead of replication.
2. NameNode Safe Mode Stuck
Error Message:
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot leave safe mode.
Cause:
- DataNodes not reporting correctly.
- Corrupt blocks preventing NameNode from exiting safe mode.
Solution:
- Check DataNode health:
hdfs dfsadmin -report
- Force NameNode out of safe mode (if healthy):
hdfs dfsadmin -safemode leave
- Run block check and delete corrupt blocks:
hdfs fsck / -delete
3. NameNode Fails to Start Due to Corrupt Edit Logs
Error Message:
org.apache.hadoop.hdfs.server.namenode.EditLogInputStream
Cause:
- Corrupt edit logs due to improper shutdown.
Solution:
- Try recovering logs:
hdfs namenode -recover
- If recovery fails, format NameNode metadata (last resort):
(⚠️ This will erase all metadata! Use only if absolutely necessary.)hdfs namenode -format
4. NameNode Connection Refused
Error Message:
java.net.ConnectException: Connection refused
Cause:
- NameNode service is not running.
- Firewall or incorrect network configuration.
Solution:
- Restart NameNode:
hdfs --daemon start namenode
- Check firewall settings:
iptables -L
- Verify correct hostnames in
core-site.xml
.
5. NameNode High CPU Usage
Cause:
- Too many open file handles.
- Insufficient NameNode memory.
Solution:
- Increase file descriptor limit:
ulimit -n 100000
- Optimize
hdfs-site.xml
for large deployments:<property> <name>dfs.namenode.handler.count</name> <value>100</value> </property>
No comments:
Post a Comment