Showing posts with label hbase. Show all posts
Showing posts with label hbase. Show all posts

Saturday, 10 July 2021

Hadoop Tutorial ! What exactly is Hadoop? What hadoop used for ?

 What is Hadoop?

What exactly is Hadoop?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware

What is Hadoop and Big Data?

Big Data meaning a data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size 

Is Hadoop a programming language?

No. Hadoop is framework itself and mostly written in the Java programming language, with some native code in C and command line utilities written as shell scripts.

Is Hadoop a database?

Hadoop not traditional type database, it is  a software ecosystem that allows for massively parallel computing. Also there is types NoSQL distributed databases (such as HBase) which is part of hadoop

Is Hadoop Dead Now?

No Hadoop is not dead. There are  number of core projects from the Hadoop ecosystem continue to live on in the Cloudera Data Platform, a product that is very much alive in near future

All Your questions will be answered and discussed here in details:

Introduction

Apache Hadoop is an open-source software framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single server to thousands of servers, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a running cluster of computers, each of which may be prone to failures.It helps in handling larger volume of data with minimum failure

History

Intially hadoop was f conceived to fix a scalability issue , an open source crawler and search engine. At that time Google had published papers about the Google File System (GFS), and Map-Reduce, a computational framework for parallel processing. Development started in the Apache Nutch project with the successful implementation of these papers. But in  2006 Apache Nutch project was moved to the new Hadoop subproject Doug Cutting, who was working at Yahoo! at the time, named it after his son's toy elephant.


Hadoop

Hadoop is a distributed master-slave architecture that consists of the following primary components:

  • Storage unit– HDFS (NameNode, DataNode)
  • Processing framework– YARN (ResourceManager, NodeManager)
  • Hadoop Distributed File System (HDFS) for data storage.
  • Yet Another Resource Negotiator (YARN), a general purpose scheduler and resource manager.
  • MapReduce, a batch-based computational engine. MapReduce is implemented as a YARN application.

HDFS

HDFS is the storage component of Hadoop. It’s a distributed filesystem that’s modeled after the Google File System (GFS) paper.4 HDFS is optimized for high throughput and works best when reading and writing large files (gigabytes and larger). To support this throughput, HDFS uses unusually large (for a filesystem) block sizes and data locality optimizations to reduce network input/output (I/O).

Scalability and availability are also key traits of HDFS, achieved in part due to data replication and fault tolerance. Hadoop 2 introduced two significant new features for HDFS—Federation and High Availability (HA):

  • NameNode: NameNode is the master node in the distributed environment and it maintains the metadata information for the blocks of data stored in HDFS like block location, replication factors etc.
  • DataNode: DataNodes are the slave nodes, which are responsible for storing data in the HDFS. NameNode manages all the DataNodes.
  • Federation allows HDFS metadata to be shared across multiple NameNode hosts, which aides with HDFS scalability and also provides data isolation, allowing different applications or teams to run their own NameNodes without fear of impacting other NameNodes on the same cluster.
  • High Availability in HDFS removes the single point of failure that existed in Hadoop 1, wherein a NameNode disaster would result in a cluster outage. HDFS HA also offers the ability for failover (the process by which a standby Name-Node takes over work from a failed primary NameNode) to be automated.

HDFS Commands

        Click this Link:  HDFS Commands                                 

YARN

Apache YARN (Yet Another Resource Negotiator) is Hadoop’s cluster resource management system. YARN was introduced in Hadoop 2 to improve the MapReduce implementation, but it is general enough to support other distributed computing paradigms as well.

YARN’s architecture is simple because its primary role is to schedule and manage resources in a Hadoop cluster. The core components in YARN: the ResourceManager and the NodeManager. YARN separates resource management and processing components.

Cluster resource management means managing the resources of the Hadoop Clusters. And by resources we mean Memory, CPU etc. YARN took over this task of cluster management from MapReduce and MapReduce is streamlined to perform Data Processing only in which it is best.

YARN has central resource manager component which manages resources and allocates the resources to the application. Multiple applications can run on Hadoop via YARN and all application could share common resource management.

  • ResourceManagerIt receives the processing requests, and then passes the parts of requests to corresponding NodeManagers accordingly, where the actual processing takes place. It allocates resources to applications based on the needs.
  • NodeManagerNodeManager is installed on every DataNode and it is responsible for the execution of the task on every single DataNode.

MAPREDUCE

MapReduce is a batch-based, distributed computing framework modeled after Google’s paper on MapReduce. It allows you to parallelize work over a large amount of raw data. The MapReduce model simplifies parallel processing by abstracting away the complexities involved in working with distributed systems, such as computational parallelization, work distribution, and dealing with unreliable hardware and software. With this abstraction, MapReduce allows the programmer to focus on addressing business needs rather than getting tangled up in distributed system complications.

  • MapReduce consists of two distinct tasks – Map and Reduce.
  • As the name MapReduce suggests, the reducer phase takes place after the mapper phase has been completed.
  • So, the first is the map job, where a block of data is read and processed to produce key-value pairs as intermediate outputs.
  • The output of a Mapper or map job (key-value pairs) is input to the Reducer.
  • The reducer receives the key-value pair from multiple map jobs.
  • Then, the reducer aggregates those intermediate data tuples (intermediate key-value pair) into a smaller set of tuples or key-value pairs which is the final output.

Hadoop distributions

Hadoop is an Apache open source project, and regular releases of the software are available for download directly from the Apache project’s website (http://hadoop.apache.org/releases.html#Download). You can either download and install Hadoop from the website or use a commercial distribution of Hadoop, which will give you the added benefits of enterprise administration software, a support team to consult.

Apache

Apache is the organization that maintains the core Hadoop code and distribution. the challenge with the Apache distributions has been that support is limited to the goodwill of the open source community, and there’s no guarantee that your issue will be investigated and fixed. Having said that, the Hadoop community is a very supportive one, and responses to problems are usually rapid.

Cloudera

CDH (Cloudera Distribution Including Apache Hadoop) is the most tenured Hadoop distribution, and it employs a large number of Hadoop (and Hadoop ecosystem) committers. Doug Cutting, who along with Mike Caferella originally created Hadoop, is the chief architect at Cloudera. In aggregate, this means that bug fixes and feature requests have a better chance of being addressed in Cloudera compared to Hadoop distributions with fewer committers.

Hortonaworks

Hortonworks Data Platform (HDP) is also made up of a large number of Hadoop committers, and it offers the same advantages as Cloudera in terms of the ability to quickly address problems and feature requests in core Hadoop and its ecosystem projects. Hortonworks is also the main driver behind the next-generation YARN platform, which is a key strategic piece keeping Hadoop relevant.

Cloudera Hortonworks Merger 
January 3, 2019  the enterprise data cloud company, today announced completion of its merger with Hortonworks.Cloudera and Hortonworks have announced they are merging. ... Knowing this, there must have been a strong driver that forced Cloudera and Hortonworks together

Thursday, 24 June 2021

Hbase errors issues and solutions

1)ERROR: KeeperErrorCode = NoNode for /hbase/master 

check

hbase(main):001:0> list

TABLE

ERROR: KeeperErrorCode = NoNode for /hbase/master

For usage try 'help "list"'

Took 8.2629 seconds

hbase(main):002:0> list

TABLE

ERROR: Call id=15, waitTime=60008, rpcTimeout=60000

For usage try 'help "list"'

Took 488.5749 seconds

 Dead regions server

hbase(main):002:0> status

1 active master, 2 backup masters, x servers, x dead, 0.0000 average load

Took 0.0711 seconds

HMASTER UI SHOWING DEAD REGION SERVER


hbase:meta,,1 is not online on 

Solution

In progress

How to Delete a directory from Hadoop cluster which is having commas in its name?

># hdfs dfs -rm -r  /hbase/WALs/wrker-02.xyz.com,16020,1623662453275-splitting

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/hadoop/tmp

{"type":"log","host":"host_name","category":"YARN-yarn-GATEWAY-BASE","level":"WARN","system":"n05dlkcluster","time": "21/06/23 06:13:57","logger":"util.NativeCodeLoader","timezone":"UTC","log":{"message":"Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"}}

Deleted /hbase/WALs/wrker-02.xyz.com,16020,1623662453275-splitting

How to clear Dead Region Servers in HBase UI?

Wednesday, 23 June 2021

hbase commands cheat sheet

hbase commands cheat sheet based on Groupds

 COMMAND GROUPS

  Group name: general

  Commands: processlist, status, table_help, version, whoami

  Group name: ddl

  Commands: alter, alter_async, alter_status, clone_table_schema, create, describe, disable, disable_all, drop, drop_all, enable, enable_all, exists, get_table, is_disabled, is_enabled, list, list_regions, locate_region, show_filters

  Group name: namespace

  Commands: alter_namespace, create_namespace, describe_namespace, drop_namespace, list_namespace, list_namespace_tables

  Group name: dml

  Commands: append, count, delete, deleteall, get, get_counter, get_splits, incr, put, scan, truncate, truncate_preserve

  Group name: tools

  Commands: assign, balance_switch, balancer, balancer_enabled, catalogjanitor_enabled, catalogjanitor_run, catalogjanitor_switch, cleaner_chore_enabled, cleaner_chore_run, cleaner_chore_switch, clear_block_cache, clear_compaction_queues, clear_deadservers, close_region, compact, compact_rs, compaction_state, flush, is_in_maintenance_mode, list_deadservers, major_compact, merge_region, move, normalize, normalizer_enabled, normalizer_switch, split, splitormerge_enabled, splitormerge_switch, stop_master, stop_regionserver, trace, unassign, wal_roll, zk_dump

  Group name: replication

  Commands: add_peer, append_peer_exclude_namespaces, append_peer_exclude_tableCFs, append_peer_namespaces, append_peer_tableCFs, disable_peer, disable_table_replication, enable_peer, enable_table_replication, get_peer_config, list_peer_configs, list_peers, list_replicated_tables, remove_peer, remove_peer_exclude_namespaces, remove_peer_exclude_tableCFs, remove_peer_namespaces, remove_peer_tableCFs, set_peer_bandwidth, set_peer_exclude_namespaces, set_peer_exclude_tableCFs, set_peer_namespaces, set_peer_replicate_all, set_peer_serial, set_peer_tableCFs, show_peer_tableCFs, update_peer_config

  Group name: snapshots

  Commands: clone_snapshot, delete_all_snapshot, delete_snapshot, delete_table_snapshots, list_snapshots, list_table_snapshots, restore_snapshot, snapshot

  Group name: configuration

  Commands: update_all_config, update_config

  Group name: quotas

  Commands: list_quota_snapshots, list_quota_table_sizes, list_quotas, list_snapshot_sizes, set_quota

  Group name: security

  Commands: grant, list_security_capabilities, revoke, user_permission

  Group name: procedures

  Commands: list_locks, list_procedures

  Group name: visibility labels

  Commands: add_labels, clear_auths, get_auths, list_labels, set_auths, set_visibility

  Group name: rsgroup

  Commands: add_rsgroup, balance_rsgroup, get_rsgroup, get_server_rsgroup, get_table_rsgroup, list_rsgroups, move_namespaces_rsgroup, move_servers_namespaces_rsgroup, move_servers_rsgroup, move_servers_tables_rsgroup, move_tables_rsgroup, remove_rsgroup, remove_servers_rsgroup


Saturday, 19 June 2021

Hbase quickly count number of rows

There are two ways  to get count of rows from hbase table with Speed

Scenario #1

If hbase table size is small then login to hbase shell with valid user and execute

hbase shell
>count '<tablename>'

Example

>count 'employee'

6 row(s) in 0.1110 seconds
Use RowCounter in HBase RowCounter is in build mapreduce job to count all the rows of a table. This is a good utility to use as a sanity check to ensure that HBase can read all the blocks of a table if there are any concerns of metadata inconsistency. It will run the mapreduce all in a single process but it will run faster if you have a MapReduce cluster in place for it to exploit.Its very helpfull when hbase table have huge data stored

Scenario #2

If hbase table size is large,then execute inbuilt RowCounter map reduce job: Login to hadoop machine with valid user and execute:

/$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter '<tablename>'

Example:

 /$HBASE_HOME/bin/hbase org.apache.hadoop.hbase.mapreduce.RowCounter 'employee'

     ....
     ....
     ....
     Virtual memory (bytes) snapshot=22594633728
                Total committed heap usage (bytes)=5093457920
        org.apache.hadoop.hbase.mapreduce.RowCounter$RowCounterMapper$Counters
                ROWS=6
        File Input Format Counters
                Bytes Read=0
        File Output Format Counters
                Bytes Written=0

Saturday, 12 June 2021

hadoop commands cheat sheet

 Hadoop  commands cheat sheet | HDFS commands cheat sheet

There are many more commands in "$HADOOP_HOME/bin/hadoop fs" than are demonstrated here, use hadoop or hdfs for the commands

hadoop fs -ls <path> list files in the path of the file system

hadoop fs -chmod <arg> <file-or-dir> alters the permissions of a file where <arg> is the binary argument e.g. 777

hadoop fs -chown <owner>:<group> <file-or-dir> change the owner of a file

hadoop fs -mkdir <path> make a directory on the file system

hadoop fs -put <local-origin> <destination> copy a file from the local storage onto file system

hadoop fs -get <origin> <local-destination> copy a file to the local storage from the file system

hadoop fs -copyFromLocal <local-origin> <destination> similar to the put command but the source is restricted to a local file reference

hadoop fs -copyToLocal <origin> <local-destination> similar to the get command but the destination is restricted to a local file reference

hadoop fs -touchz create an empty file on the file system

hadoop fs -cat <file> copy files to stdout

-------------------------------------------------------------------------

"<path>" means any file or directory name. 

"<path>..." means one or more file or directory names. 

"<file>" means any filename. 

----------------------------------------------------------------


-ls <path>

Lists the contents of the directory specified by path, showing the names, permissions, owner, size and modification date for each entry.

-lsr <path>

Behaves like -ls, but recursively displays entries in all subdirectories of path.

-du <path>

Shows disk usage, in bytes, for all the files which match path; filenames are reported with the full HDFS protocol prefix.

-dus <path>

Like -du, but prints a summary of disk usage of all files/directories in the path.

-mv <src><dest>

Moves the file or directory indicated by src to dest, within HDFS.

-cp <src> <dest>

Copies the file or directory identified by src to dest, within HDFS.

-rm <path>

Removes the file or empty directory identified by path.

-rmr <path>

Removes the file or directory identified by path. Recursively deletes any child entries (i.e., files or subdirectories of path).

-put <localSrc> <dest>

Copies the file or directory from the local file system identified by localSrc to dest within the DFS.

-copyFromLocal <localSrc> <dest>

-moveFromLocal <localSrc> <dest>

Copies the file or directory from the local file system identified by localSrc to dest within HDFS, and then deletes the local copy on success.

-cat <filen-ame>

Displays the contents of filename on stdout.

-mkdir <path>

Creates a directory named path in HDFS.

Creates any parent directories in path that are missing (e.g., mkdir -p in Linux).

-setrep [-R] [-w] rep <path>

Sets the target replication factor for files identified by path to rep. (The actual replication factor will move toward the target over time)

-touchz <path>

Creates a file at path containing the current time as a timestamp. Fails if a file already exists at path, unless the file is already size 0.

-test -[ezd] <path>

Returns 1 if path exists; has zero length; or is a directory or 0 otherwise.

-stat [format] <path>

Prints information about path. Format is a string which accepts file size in blocks (%b), filename (%n), block size (%o), replication (%r), and modification date (%y, %Y).

-chmod [-R] mode,mode,... <path>...

Changes the file permissions associated with one or more objects identified by path.... Performs changes recursively with R. mode is a 3-digit octal mode, or {augo}+/-{rwxX}. Assumes if no scope is specified and does not apply an umask.

-chown [-R] [owner][:[group]] <path>...

Sets the owning user and/or group for files or directories identified by path.... Sets owner recursively if -R is specified.

-chgrp [-R] group <path>...

Sets the owning group for files or directories identified by path.... Sets group recursively if -R is specified.


LIST FILES

hdfs dfs -ls / ==>>List all the files/directories for the given hdfs destination path.

hdfs dfs -ls -d /hadoop ==> Directories are listed as plain files. In this case, this command will list

the details of hadoop folder.

hdfs dfs -ls -h /data ==>Format file sizes in a human-readable fashion (eg 64.0m instead of

67108864).

hdfs dfs -ls -R /hadoop ==>Recursively list all files in hadoop directory and all subdirectories in

hadoop directory.

hdfs dfs -ls /hadoop/dat* ==>List all the files matching the pattern. In this case, it will list all the

files inside hadoop directory which starts with 'dat'.

OWNERSHIP

hdfs dfs -checksum /hadoop/file1  ==>Dump checksum information for files that match the file pattern <src>

to stdout.

hdfs dfs -chmod 755 /hadoop/file1  ==> Changes permissions of the file.

hdfs dfs -chmod -R 755 /hadoop  ==> Changes permissions of the files recursively.

hdfs dfs -chown myuser:mygroup /hadoop  ==> Changes owner of the file. 1st ubuntu in the command is owner and

2nd one is group.

hdfs dfs -chown -R hadoop:hadoop /hadoop ==> Changes owner of the files recursively.

hdfs dfs -chgrp ubuntu /hadoop ==> Changes group association of the file.

hdfs dfs -chgrp -R ubuntu /hadoop ==> Changes group association of the files recursively.

Monday, 26 November 2018

java.lang.StringIndexOutOfBoundsException: String index out of range: -12 in CDH Upgrade

scheduledTime=2018-11-21T10:30:00.000Z}.
2018-11-21 16:00:00,054 INFO 336151100@scm-web-19:com.cloudera.server.web.common.ExceptionReport: (63 skipped) Exception report generated accessing https://10.43.230.133:7183/cmf/services/landingPageStatusContent
java.lang.StringIndexOutOfBoundsException: String index out of range: -12
at java.lang.String.substring(String.java:1967)
at com.cloudera.cmf.model.DbHostHeartbeat.checkJavaComponent(DbHostHeartbeat.java:177)
at com.cloudera.cmf.model.DbHostHeartbeat.getActiveComponentInfo(DbHostHeartbeat.java:207)
at com.cloudera.cmf.service.HostUtils.hasComponent(HostUtils.java:291)
at com.cloudera.server.web.common.Util.isCDHCluster(Util.java:144)
at com.cloudera.server.web.cmf.menu.ClusterActionsMenuHelper.addClusterCommandMenu(ClusterActionsMenuHelper.java:81)
at com.cloudera.server.web.cmf.menu.ClusterActionsMenuHelper.<init>(ClusterActionsMenuHelper.java:36)
at com.cloudera.server.web.cmf.AggregateStatusController.getClusterModel(AggregateStatusController.java:1039)
at com.cloudera.server.web.cmf.AggregateStatusController.servicesTable(AggregateStatusController.java:575)
at sun.reflect.GeneratedMethodAccessor1361.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:790)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:669)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:585)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:595)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:131)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at com.jamonapi.http.JAMonServletFilter.doFilter(JAMonServletFilter.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at com.cloudera.enterprise.JavaMelodyFacade$MonitoringFilter.doFilter(JavaMelodyFacade.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:311)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:116)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:101)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:146)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:182)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.session.ConcurrentSessionFilter.doFilter(ConcurrentSessionFilter.java:125)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:173)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:237)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:167)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.handler.StatisticsHandler.handle(StatisticsHandler.java:53)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2018-11-21 16:00:00,732 INFO com.cloudera.cmf.scheduler-1_Worker-1:com.cloudera.cmf.scheduler.CommandDispatcherJob: Skipping scheduled command 'GlobalPoolsRefresh' since it is a noop.
2018-11-21 16:00:30,569 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=19ms, min=1ms, max=156ms.
2018-11-21 16:01:30,614 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=19ms, min=1ms, max=156ms.
2018-11-21 16:02:30,646 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=19ms, min=1ms, max=156ms.
2018-11-21 16:03:30,684 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:04:28,478 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2018-11-21 16:04:28,480 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2016-11-21T10:34:28.478Z to reap.
2018-11-21 16:04:28,481 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2018-11-21 16:04:28,482 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.
2018-11-21 16:04:29,353 INFO 1495743394@agentServer-2:com.cloudera.server.common.MonitoringThreadPool: agentServer: execution stats: average=20ms, min=3ms, max=181ms.
2018-11-21 16:04:29,353 INFO 1495743394@agentServer-2:com.cloudera.server.common.MonitoringThreadPool: agentServer: waiting in queue stats: average=0ms, min=0ms, max=15ms.
2018-11-21 16:04:30,716 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:05:30,745 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:06:30,772 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:07:30,811 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:08:30,854 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:09:30,916 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:10:30,950 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:11:30,986 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=18ms, min=1ms, max=156ms.
2018-11-21 16:12:31,037 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=48ms.
2018-11-21 16:13:31,085 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=48ms.
2018-11-21 16:14:28,501 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2018-11-21 16:14:28,503 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2016-11-21T10:44:28.502Z to reap.
2018-11-21 16:14:28,505 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2018-11-21 16:14:28,505 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.
2018-11-21 16:14:29,754 INFO 1495743394@agentServer-2:com.cloudera.server.common.MonitoringThreadPool: agentServer: execution stats: average=18ms, min=10ms, max=73ms.
2018-11-21 16:14:29,754 INFO 1495743394@agentServer-2:com.cloudera.server.common.MonitoringThreadPool: agentServer: waiting in queue stats: average=0ms, min=0ms, max=1ms.
2018-11-21 16:14:31,127 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=48ms.
2018-11-21 16:14:43,120 INFO ScmActive-0:com.cloudera.server.cmf.components.ScmActive: (119 skipped) ScmActive completed successfully.
2018-11-21 16:14:50,326 WARN pool-180-thread-1:com.cloudera.server.web.cmf.StatusProvider: (144 skipped) Failed to submit task for getting status from SERVICE_MONITORING
com.cloudera.cmon.MgmtServiceNotRunningException: SERVICE_MONITORING is not running
at com.cloudera.cmon.MgmtServiceLocator.getNozzleIPC(MgmtServiceLocator.java:145)
at com.cloudera.server.web.cmf.StatusProvider$SubjectStatusCustomFuture.<init>(StatusProvider.java:618)
at com.cloudera.server.web.cmf.StatusProvider.getStatus(StatusProvider.java:1043)
at com.cloudera.server.web.cmf.AggregateStatusController$HealthInfo.<init>(AggregateStatusController.java:887)
at com.cloudera.server.web.cmf.AggregateStatusController$HealthInfo.<init>(AggregateStatusController.java:855)
at com.cloudera.server.web.cmf.AggregateStatusController$1.load(AggregateStatusController.java:176)
at com.cloudera.server.web.cmf.AggregateStatusController$1$1.call(AggregateStatusController.java:188)
at com.cloudera.server.web.cmf.AggregateStatusController$1$1.call(AggregateStatusController.java:185)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-11-21 16:14:55,759 WARN EventStorePublisherWithRetry-0:com.cloudera.cmf.event.publish.EventStorePublisherWithRetry: Failed to publish event: SimpleEvent{attributes={CATEGORY=[AUDIT_EVENT], EVENTCODE=[EV_LOGIN_SUCCESS], SERVICE=[ClouderaManager], USER=[admin], SERVICE_TYPE=[ManagerServer], MESSAGE_CODES=[LOGIN_SUCCESS], SEVERITY=[INFORMATIONAL]}, content=User admin logged in successfully., timestamp=1542795294547} - 1 of 343 failure(s) in last 1801s
2018-11-21 16:15:01,328 WARN pool-180-thread-1:com.cloudera.server.web.cmf.StatusProvider: (123 skipped) Failed to submit task for getting status from HOST_MONITORING
com.cloudera.cmon.MgmtServiceNotRunningException: HOST_MONITORING is not running
at com.cloudera.cmon.MgmtServiceLocator.getNozzleIPC(MgmtServiceLocator.java:145)
at com.cloudera.server.web.cmf.StatusProvider$SubjectStatusCustomFuture.<init>(StatusProvider.java:618)
at com.cloudera.server.web.cmf.StatusProvider.getStatus(StatusProvider.java:1034)
at com.cloudera.server.web.cmf.AggregateStatusController$HealthInfo.<init>(AggregateStatusController.java:887)
at com.cloudera.server.web.cmf.AggregateStatusController$HealthInfo.<init>(AggregateStatusController.java:855)
at com.cloudera.server.web.cmf.AggregateStatusController$1.load(AggregateStatusController.java:176)
at com.cloudera.server.web.cmf.AggregateStatusController$1$1.call(AggregateStatusController.java:188)
at com.cloudera.server.web.cmf.AggregateStatusController$1$1.call(AggregateStatusController.java:185)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
2018-11-21 16:15:01,337 INFO 1270484339@scm-web-32:com.cloudera.server.web.common.ExceptionReport: (82 skipped) Exception report generated accessing https://10.43.230.133:7183/cmf/services/landingPageStatusContent
java.lang.StringIndexOutOfBoundsException: String index out of range: -12
at java.lang.String.substring(String.java:1967)
at com.cloudera.cmf.model.DbHostHeartbeat.checkJavaComponent(DbHostHeartbeat.java:177)
at com.cloudera.cmf.model.DbHostHeartbeat.getActiveComponentInfo(DbHostHeartbeat.java:207)
at com.cloudera.cmf.service.HostUtils.hasComponent(HostUtils.java:291)
at com.cloudera.server.web.common.Util.isCDHCluster(Util.java:144)
at com.cloudera.server.web.cmf.menu.ClusterActionsMenuHelper.addClusterCommandMenu(ClusterActionsMenuHelper.java:81)
at com.cloudera.server.web.cmf.menu.ClusterActionsMenuHelper.<init>(ClusterActionsMenuHelper.java:36)
at com.cloudera.server.web.cmf.AggregateStatusController.getClusterModel(AggregateStatusController.java:1039)
at com.cloudera.server.web.cmf.AggregateStatusController.servicesTable(AggregateStatusController.java:575)
at sun.reflect.GeneratedMethodAccessor1361.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:176)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:436)
at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:424)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:790)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:669)
at org.springframework.web.servlet.FrameworkServlet.doPost(FrameworkServlet.java:585)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:595)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:668)
at org.mortbay.jetty.servlet.ServletHolder.handle(ServletHolder.java:511)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1221)
at org.mortbay.servlet.UserAgentFilter.doFilter(UserAgentFilter.java:78)
at org.mortbay.servlet.GzipFilter.doFilter(GzipFilter.java:131)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at com.jamonapi.http.JAMonServletFilter.doFilter(JAMonServletFilter.java:48)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at com.cloudera.enterprise.JavaMelodyFacade$MonitoringFilter.doFilter(JavaMelodyFacade.java:109)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:311)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:116)
at org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:83)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:113)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:101)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:113)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.authentication.rememberme.RememberMeAuthenticationFilter.doFilter(RememberMeAuthenticationFilter.java:146)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:54)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:45)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:182)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.authentication.logout.LogoutFilter.doFilter(LogoutFilter.java:105)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.context.SecurityContextPersistenceFilter.doFilter(SecurityContextPersistenceFilter.java:87)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.session.ConcurrentSessionFilter.doFilter(ConcurrentSessionFilter.java:125)
at org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:323)
at org.springframework.security.web.FilterChainProxy.doFilter(FilterChainProxy.java:173)
at org.springframework.web.filter.DelegatingFilterProxy.invokeDelegate(DelegatingFilterProxy.java:237)
at org.springframework.web.filter.DelegatingFilterProxy.doFilter(DelegatingFilterProxy.java:167)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:88)
at org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:76)
at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:767)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.handler.StatisticsHandler.handle(StatisticsHandler.java:53)
at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:928)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:549)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:410)
at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
2018-11-21 16:15:28,270 INFO CMMetricsForwarder-0:com.cloudera.server.cmf.components.ClouderaManagerMetricsForwarder: (29 skipped) Failed to send metrics.
java.lang.reflect.UndeclaredThrowableException
at com.sun.proxy.$Proxy133.writeMetrics(Unknown Source)
at com.cloudera.server.cmf.components.ClouderaManagerMetricsForwarder.sendWithAvro(ClouderaManagerMetricsForwarder.java:325)
at com.cloudera.server.cmf.components.ClouderaManagerMetricsForwarder.sendMetrics(ClouderaManagerMetricsForwarder.java:312)
at com.cloudera.server.cmf.components.ClouderaManagerMetricsForwarder.run(ClouderaManagerMetricsForwarder.java:146)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180)
at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.avro.AvroRemoteException: java.net.ConnectException: Connection refused (Connection refused)
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:88)
... 11 more
Caused by: java.net.ConnectException: Connection refused (Connection refused)
at java.net.PlainSocketImpl.socketConnect(Native Method)
at java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:350)
at java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:206)
at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:188)
at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
at java.net.Socket.connect(Socket.java:589)
at sun.net.NetworkClient.doConnect(NetworkClient.java:175)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:463)
at sun.net.www.http.HttpClient.openServer(HttpClient.java:558)
at sun.net.www.http.HttpClient.<init>(HttpClient.java:242)
at sun.net.www.http.HttpClient.New(HttpClient.java:339)
at sun.net.www.http.HttpClient.New(HttpClient.java:357)
at sun.net.www.protocol.http.HttpURLConnection.getNewHttpClient(HttpURLConnection.java:1220)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect0(HttpURLConnection.java:1156)
at sun.net.www.protocol.http.HttpURLConnection.plainConnect(HttpURLConnection.java:1050)
at sun.net.www.protocol.http.HttpURLConnection.connect(HttpURLConnection.java:984)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream0(HttpURLConnection.java:1334)
at sun.net.www.protocol.http.HttpURLConnection.getOutputStream(HttpURLConnection.java:1309)
at org.apache.avro.ipc.HttpTransceiver.writeBuffers(HttpTransceiver.java:71)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:58)
at org.apache.avro.ipc.Transceiver.transceive(Transceiver.java:72)
at org.apache.avro.ipc.Requestor.request(Requestor.java:147)
at org.apache.avro.ipc.Requestor.request(Requestor.java:101)
at org.apache.avro.ipc.specific.SpecificRequestor.invoke(SpecificRequestor.java:72)
... 11 more
2018-11-21 16:15:31,171 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=48ms.
2018-11-21 16:16:31,215 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=48ms.
2018-11-21 16:17:31,253 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=48ms.
2018-11-21 16:18:31,276 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=48ms.
2018-11-21 16:19:31,322 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=38ms.
2018-11-21 16:20:31,364 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=38ms.
2018-11-21 16:21:31,396 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=38ms.
2018-11-21 16:22:31,423 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=17ms, min=9ms, max=39ms.
2018-11-21 16:23:31,473 INFO avro-servlet-hb-processor-3:com.cloudera.server.common.AgentAvroServlet: (35 skipped) AgentAvroServlet: heartbeat processing stats: average=16ms, min=8ms, max=39ms.
2018-11-21 16:24:28,524 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Reaped total of 0 deleted commands
2018-11-21 16:24:28,526 INFO StaleEntityEviction:com.cloudera.server.cmf.StaleEntityEvictionThread: Found no commands older than 2016-11-21T10:54:28.524Z to reap.
2018-11-21 16:24:28,527 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeScannerService: Reaped 0 requests.
2018-11-21 16:24:28,527 INFO StaleEntityEviction:com.cloudera.server.cmf.node.NodeConfiguratorService: Reaped 0 requests.
2018-11-21 16:24:30,179 INFO 1495743394@agentServer-2:com.cloudera.server.common.MonitoringThreadPool: agentServer: execution stats: average=17ms, min=8ms, max=73ms.
2018-11-21 16:24:30,179 INFO 1495743394@agentServer-2:com.cloudera.server.common.MonitoringThreadPool

Thursday, 21 December 2017

Hadoop 3 works with HDFS Erasure Coding

Hadoop 3 add HDFS Erasure Coding

Well Hadoop 3 already available with improvements including support for HDFS Erasure coding, a preview of v2 of the YARN Timeline Service, and improvements to YARN/HDFS federation.

Now a day Hadoop is a framework is used to process large data sets across clusters of computers using simple programming models. 

The addition of HDFS erasure coding should make data more durable and to reduce the amount of storage needed for HDFS. 

The default three times replication scheme in HDFS has a 200 per cent  overhead in storage space and other resources such as network bandwidth.

For many datasets with relatively low I/O activities, additional block replicas are rarely accessed during normal operations, but still consume the same amount of resources as the first replica. If Erasure Coding is used in place of replication, the storage overhead is no more than 50 per cent. HDFS Erasure Coding uses RAID , in which Erasure Coding is implemented by stripping. This logically stores the data in the form of a block, and stores the block on the different disk. For each block, the parity is calculated and stored. This is the encoding, and any error can be recovered by back calculating using the parity.

The new release also includes a preview of the YARN Timeline Service v.2, which offers better scalability, reliability, and usability of the Timeline Service. The service is responsible for persisting application specific information, and for persisting generic information about completed applications.

Understand HDFS Erasure Coding 

Erasure Coding helps Capacity Utilization & Performance for Data Storage Systems

HDFS by default replicates each block three times. Replication provides a simple and robust form of redundancy to shield against most failure scenarios. It also eases scheduling compute tasks on locally stored data blocks by providing multiple replicas of each block to choose from.

However, replication is expensive: the default 3x replication scheme incurs a 200% overhead in storage space and other resources (e.g., network bandwidth when writing the data). For datasets with relatively low I/O activity, the additional block replicas are rarely accessed during normal operations, but still consume the same amount of storage space.

Therefore, a natural improvement is to use erasure coding (EC) in place of replication, which uses far less storage space while still providing the same level of fault tolerance. Under typical configurations, EC reduces the storage cost by ~50% compared with 3x replication


Advantages of Erasure Coding in Hadoop

  • Saving substantial space – Initially, blocks are triplicated when they are sealed (no longer modified), a background task encode it and delete it replicas.
  • Flexible policy – User and admin able to flag the file hot and cold. Hot files are replicated even after it sealed.
  • Fast Recovery – HDFS block errors are discovered and recovered both actively (in the background) and passively (on the read path).
  • Low overhead – Because of parity bit overhead is up to 50%.
  • Transparency/compatibility – HDFS user should be able to use all basic and advanced features on erasure coded data, including snapshot, encryption, appending, caching and so forth.

YARN Improvements

YARN is a framework for job scheduling and cluster resource management, and high availability for the HDFS filing system.

YARN federation is used to scale single YARN clusters to tens of thousands of nodes, by federating multiple YARN sub-clusters.

Support for YARN resource types has also been added, making it possible to schedule additional resources such as disks and GPUs for better integration with machine learning and container workloads.


Other improvements include the ability to federate YARN and HDFS subclusters transparently; and opportunistic container execution to improve resource utilization and increase task throughput for short-lived containers. Support for cloud storage systems such as Amazon S3  and Azure Data Lake has also been improved.

Monday, 18 December 2017

Apache Software Foundation announce Apache Hadoop 3.0.0 GA!

Wow Apache Hadoop reaches 3.0

!!Hadoop was born around 2007, and by 2017 its Part of Life!!


The Apache Software Foundation has announced version three of the open source software framework for distributed computing. 



It incorporates over 6,000 changes since its started over a year ago of the open source software framework for distributed computing. 
Apache Hadoop 3.0 is the first major release since Hadoop 2 was released in 2013.

“Hadoop 3 is a major milestone for the project, and our biggest release ever,” said Andrew Wang, Apache Hadoop 3 release manager. “It represents the combined efforts of hundreds of contributors over the five years since Hadoop 2. I’m looking forward to how our users will benefit from new features in the release that improve the efficiency, scalability, and reliability of the platform.”

Apache Hadoop has become known for its ability to run and manage data applications on large hardware clusters in the Big Data ecosystem. The latest release features HDFS erasure coding, a preview of YARN Timeline Service version 2, YARN resource types, and improved capabilities and performance enhancements around cloud storage systems. It includes Hadoop Common for supporting other Hadoop modules, the Hadoop Distributed File System, Hadoop YARN and Hadoop MapReduce.

“This latest release unlocks several years of development from the Apache community,” said Chris Douglas, vice president of Apache Hadoop. “The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud.”

Apache Hadoop is widely deployed in enterprises and companies like Adobe, AWS, Apple, Cloudera, eBay, Facebook, Google, Hortonworks, IBM, Intel, LinkedIn, Microsoft, Netflix and Teradata. 

In addition, it has inspired other Hadoop related projects such as: Apache Cassandra, HBase, Hive, Spark and ZooKeeper.


All you want To Know About Apache Hadoop 3.0.0


!!!Apache Hadoop 3.0: Major changes!!

HDFS erasure coding


halves the storage cost of HDFS while also improving data durability;

Minimum required Java version increased to Java 8

All Hadoop JARs are now compiled targeting a runtime version of Java 8, which means that those of you who are still using Java 7 or below should upgrade to Java 8.


Early preview of YARN Timeline Service major revision


Hadoop 3.0 also brings an early preview (alpha 2) of a major revision of YARN Timeline Service: v.2, which addresses two major challenges:
  • improving scalability and reliability of Timeline Service
  • enhancing usability by introducing flows and aggregation


Shell script rewrite

The Hadoop shell scripts have been rewritten to fix many long-standing bugs and include some new features. However, keep in mind that some changes could break existing installations.You’ll find the incompatible changes in the release notes, with related discussion on HADOOP-9902.


MapReduce task-level native optimization


MapReduce has added support for a native implementation of the map output collector. For shuffle-intensive jobs, this can lead to a performance improvement of 30% or more.


Shaded client jars


The hadoop-client Maven artifact available in 2.x releases pulls Hadoop’s transitive dependencies onto a Hadoop application’s classpath. This can be problematic if the versions of these transitive dependencies conflict with the versions used by the application.


Over the past decade, Apache Hadoop has become ubiquitous within the greater Big Data ecosystem by enabling firms to run and manage data applications on large hardware clusters in a distributed computing environment.

Default ports of multiple services have been changed.

Previously, the default ports of multiple Hadoop services were in the Linux ephemeral port range (32768-61000). This meant that at startup, services would sometimes fail to bind to the port due to a conflict with another application.

Apache Hadoop 3.0.0 highlights:

  • YARN Timeline Service v.2 —improves the scalability, reliability, and usability of the Timeline Service;
  • YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads;
  • Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines;
  • Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and 
  • Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.

Hadoop 3.0.0 release Details:


After four alpha releases and one beta release, 3.0.0 is generally available. 3.0.0 consists of 302 bug fixes, improvements, and other enhancements since 3.0.0-beta1. All together, 6242 issues were fixed as part of the 3.0.0 release series since 2.7.0.

Here is series of alpha and beta releases leading up to an eventual Hadoop 3.0.0 GA.

3.0.0-alpha12016-09-03(tick)
3.0.0-alpha22017-01-25(tick)
3.0.0-alpha32017-05-26(tick)
3.0.0-alpha42017-07-07(tick)
3.0.0-beta12017-10-03(tick)
3.0.0 GA2017-12-13(tick)


"Hadoop Is Here To Stay" —Forrester