Friday, 9 December 2022

Is Hadoop Still in Demand in 2023

Is Hadoop Still in Demand in 2023

 When we take a look at predictions about the Big Data Industry, the trend doesn’t seem to be slowing down any time soon. Learning skills such as Hadoop, Spark, Kafka, etc., can land promising Big Data jobs. The Global Hadoop market is said to grow at a CAGR of 33% between 2019 and 2024

Does Hadoop have a future?

Since it all started, the number of open-source projects and startups in the Big Data world has kept increasing, year after year (just take a look at the 2021 landscape to see how huge it has become). I remember that around 2012 some people were predicting that the new SQL wars would end and true victors would eventually emerge. This did not happen yet. How all of this will evolve in the future is very difficult to predict. It will take a few more years for the dust to settle. But if I had to take some wilde guesses, I would make the following predictions.

What is the future scope of Hadoop?

What is the future of Hadoop?

Is Hadoop outdated?

Hadoop Market is expected to reach $340.35 billion by 2027, growing at a CAGR of 37.5% from 2020 to 2027

Hadoop in 2023

As others have already noted, the main existing data platforms (Databricks, Snowflake, BigQuery, Azure Synapse) will keep on improving and add new features to close the gaps between each other. I expect to see more and more connectivity between every component, and also between data languages like SQL and Python.

We might see a slowdown of the number of new projects and companies in the next couple of years, although this would be more from a lack of funding after the burst of a new dotcom bubble (if this ever happens) than from a lack of will or ideas.

Since the beginning, the main lacking resource has been skilled workforce. This mean that for most companies, it was simpler to throw more money at performance problems, or migrate to more cost-effective solutions, rather than spend more time optimizing them. Especially now that storage costs in the main distributed warehouses have become so cheap. But perhaps at some point the price competition between vendors will become more difficult to maintain for them, and prices will go up. Even if prices don’t go up, the volume of data stored by businesses keeps increasing year after year, and the related cost of inefficiency with them. Perhaps at some point we will see a new trend where people start looking for new, cheaper open-source alternatives, and a new Hadoop-like cycle will start again.

In the long term, I believe the real winners will be the cloud providers, Google, Amazon and Microsoft. All they have to do is wait and see in which direction the wind blows the most, bide their time, then acquire (or simply reproduce) the technologies which work the best. Each tool that gets integrated into their cloud makes things so much easier and seamless for users, especially when it comes to security, gouvernance, access control, and cost management. 

Is Hadoop still in demand in 2023?

Is Hadoop worth learning 2023?

Yes

What will replace Hadoop?

Possible Top 10 Alternatives to Hadoop HDFS

Top 10 Hadoop HDFS Alternatives 2022

Google Cloud BigQuery.

Databricks Lakehouse Platform.

Cloudera.

Hortonworks Data Platform.

Snowflake.

Google Cloud Dataproc.

Microsoft SQL Server.

Vertica.

Saturday, 21 May 2022

What is Hadoop as a service (HaaS)?

 Hadoop as a service (HaaS), also known as Hadoop in the cloud, is a big data analytics framework that stores and analyzes data in the cloud using Hadoop. Users do not have to invest in or install additional infrastructure on premises when using the technology, as HaaS is provided and managed by a third-party vendor.

Hadoop or Spark?

 Hadoop or Spark?

  1. Performance: Spark is faster because it uses random access memory (RAM) instead of reading and writing intermediate data to disks. Hadoop stores data on multiple sources and processes it in batches via MapReduce.
  2. Cost: Hadoop runs at a lower cost since it relies on any disk storage type for data processing. Spark runs at a higher cost because it relies on in-memory computations for real-time data processing, which requires it to use high quantities of RAM to spin up nodes.
  3. Processing: Though both platforms process data in a distributed environment, Hadoop is ideal for batch processing and linear data processing. Spark is ideal for real-time processing and processing live unstructured data streams.
  4. Scalability: When data volume rapidly grows, Hadoop quickly scales to accommodate the demand via Hadoop Distributed File System (HDFS). In turn, Spark relies on the fault tolerant HDFS for large volumes of data.
  5. Security: Spark enhances security with authentication via shared secret or event logging, whereas Hadoop uses multiple authentication and access control methods. Though, overall, Hadoop is more secure, Spark can integrate with Hadoop to reach a higher security level.
  6. Machine learning (ML): Spark is the superior platform in this category because it includes MLlib, which performs iterative in-memory ML computations. It also includes tools that perform regression, classification, persistence, pipeline construction, evaluation, etc.

Wednesday, 20 April 2022

UnsupportedFileSystemException No FileSystem for scheme "hdfs"

No FileSystem for scheme \"hdfs\"\n\tat 

stacktrace":"org.apache.hadoop.fs.UnsupportedFileSystemException:

No FileSystem for scheme \"hdfs\"\n\tat org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:)\n\

tat org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:)

\n\tat org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:)\n\

tat org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:)\n\

tat org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:)\n\t

at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:)\n

\tat org.apache.hadoop.fs.FileSystem.get(FileSystem.java:)\n

Solution 

This Error is due to unavailability of required libarary during FS object creation

Add hadoop-hdfs and hadoop-hdfs-client jars as runtime depedency to your project

POM:

.....

 <dependency>

  <groupId>org.apache.hadoop</groupId>

  <artifactId>hadoop-hdfs-client</artifactId>

  <version>3.0.0</version>

</dependency>

<dependency>

  <groupId>org.apache.hadoop</groupId>

  <artifactId>hadoop-hdfs</artifactId>

  <version>3.0.0</version>

</dependency>

-------------------------------------------------------------------------------------------------------

Saturday, 9 April 2022

Caused by: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.

Error While running Hadoop on Window 

 Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems

at org.apache.hadoop.util.Shell.getWinUtilsPath(Shell.java:737)

at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:272)

at org.apache.hadoop.util.Shell.getSetPermissionCommand(Shell.java:288)

at org.apache.hadoop.fs.RawLocalFileSystem.setPermission(RawLocalFileSystem.java:777)

at org.apache.hadoop.fs.RawLocalFileSystem.mkOneDirWithMode(RawLocalFileSystem.java:522)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:562)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:534)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirsWithOptionalPermission(RawLocalFileSystem.java:561)

at org.apache.hadoop.fs.RawLocalFileSystem.mkdirs(RawLocalFileSystem.java:534)

at org.apache.hadoop.fs.ChecksumFileSystem.mkdirs(ChecksumFileSystem.java:705)

at com.nokia.cemod.ice.rest.controller.HDFSDemo.createDir(HDFSDemo.java:42)

at com.nokia.cemod.ice.rest.controller.HDFSDemo.main(HDFSDemo.java:24)

Caused by: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems

at org.apache.hadoop.util.Shell.fileNotFoundException(Shell.java:549)

Solution

Hadoop requires native libraries on Windows to work properly -that includes to access the file:// filesystem, where Hadoop uses some Windows APIs to implement posix-like file access permissions.

This is implemented in HADOOP.DLL and WINUTILS.EXE.

In particular, %HADOOP_HOME%\BIN\WINUTILS.EXE must be locatable.

If it is not, Hadoop or an application built on top of Hadoop will fail.

How to fix a missing WINUTILS.EXE

You can fix this problem in two ways

  1. Install a full native windows Hadoop version. The ASF does not currently (September 2015) release such a version; releases are available externally.
  2. Or: get the WINUTILS.EXE binary from a Hadoop redistribution. There is a repository of this for some Hadoop versions on github.

Then

  1. Set the environment variable %HADOOP_HOME% to point to the directory above the BIN dir containing WINUTILS.EXE.
  2. Or: run the Java process with the system property hadoop.home.dir set to the home directory.
  3. In Eclipse/Studio Job configuration, open the Run > Advanced settings tab. In the JVM Setting section, select the Use specific JVM arguments check box, click the New button, and add a new argument like this:
  4. Dhadoop.home.dir=C:\hadoop\bin
      5.Also in Development Environment set HADOOP_HOME=C:\hadoop\bin

Saturday, 18 December 2021

What is Big data as a service (BDaaS)

What is Big Data as a Service 

BDaaS encompasses the software, data warehousing, infrastructure and platform service models in order to deliver advanced analysis of large data sets, generally through a cloud-based network.

Big data as a service is the delivery of data platforms and tools by a cloud provider to help organizations process, manage and analyze large data sets so they can generate insights in order to improve business operations and gain a competitive advantage.

BDaaS = DaaS+ HaaS + data analytics as a service.

Benefits of BDaaS

Initially, most big data systems were installed in on-premises data centers, primarily by large enterprises that combined various open source technologies to fit their particular big data applications and use cases. But deployments have shifted more to the cloud because of its potential advantages. In particular, big data as a service offers the following benefits to users:

  • Reduced complexity. Because of their customized nature, big data environments are complicated to design, deploy and manage. Using cloud infrastructure and managed services can simplify the process by eliminating much of the hands-on work that organizations need to do.
  • Easier scalability. In many environments, data processing workloads aren't consistent. For example, big data analytics applications often run intermittently or just once. BDaaS makes it easy to scale up systems when processing needs increase and to scale them down again after jobs are completed.
  • Increased flexibility. In addition to scaling systems up or down as needed, BDaaS users can more easily add or remove platforms, technologies and tools to meet evolving business requirements than typically is possible in on-premises big data architectures.
  • Potential cost savings. Using the cloud may reduce IT costs by enabling businesses to avoid the need to buy new hardware and software and to hire workers with big data management skills. But pay-as-you-go cloud services must be monitored to prevent unnecessary processing expenses from driving up their cost.
  • Stronger security. Concerns about data security kept many organizations from adopting the cloud at first, particularly in regulated industries. In many cases, though, cloud vendors and service providers are able to invest in better security protections than individual companies can.

Large enterprises lead big data as a service investment

As mentioned, the SMB market doesn’t account for the largest share of the Big-Data-as-a-Service market. Small- and medium-sized businesses only accounted for around a quarter of the USD 5,356.8 million value of the BDaaS market in 2018. However, during the forecast period, the small and medium-sized business segment is expected to grow fastest.











What is Data as a Service (DaaS)?

Data as a Service (DaaS) 

Data as a service, or DaaS, is a term used to describe cloud-based software tools used for working with data,such as managing data in a data warehouse or analyzing data with business intelligence

Data as a Service (DaaS) is one of the most ambiguous offerings in the "as a service" family. Yet, in today's world, data and analytics are the keys to building a competitive advantage. We're clearing up the confusion around DaaS and helping your company understand when and how to tap into this service..

Data as a service (DaaS) is a data management strategy that uses the cloud to deliver data storage, integration, processing, and/or analytics services via a network connection.

What are the benefits of data as a service?

DaaS increases the speed to access the necessary data by exposing the data in a flexible but simple way. Users can quickly take action without the need for a comprehensive understanding of where the data is stored or how it is indexed

Compared to on-premises data storage and management, DaaS provides several key advantages with regard to speed, reliability, and performance. They include:

  • Minimal setup time: Organizations can begin storing and processing data almost immediately using a DaaS solution.
  • Improved functionality: Cloud infrastructure is less likely to fail, making DaaS workloads less prone to downtime or disruptions.
  • Greater flexibility: DaaS is more scalable and flexible than the on-premises alternative, since more resources can be allocated to cloud workloads instantaneously.
  • Cost savings: Data management and processing costs are easier to optimize with a DaaS solution. Companies can allocate just the right amount of resources to their data workloads in the cloud and increase or decrease those allocations as needs change.
  • Automated maintenance: The tools and services on DaaS platforms are automatically managed and kept up-to-date by the DaaS provider, eliminating the need for end-users to manage the tools themselves.
  • Smaller staff requirements: When using a DaaS platform, organizations do not need to maintain in-house staff who specialize in data tool set up and management. These tasks are handled by the DaaS provider.

Data as a Service is one of 3 categories of big data business models based on their value propositions and customers:
  • Answers as a Service;
  • Information as a Service;
  • Data as a Service.



Friday, 17 December 2021

Hadoop as a Service (HaaS)

 

 What is Hadoop as a Service (HaaS) ?

Well While world is busy in Saas,Paas or CaaS,Now new term HaaS is also gaining curiosity

Hadoop as a service (HaaS), also known as Hadoop in the cloud, is a big data analytics framework that stores and analyzes data in the cloud using Hadoop. Users do not have to invest in or install additional infrastructure on premises when using the technology, as HaaS is provided and managed by a third-party vendor.

Definition of HaaS

HaaS (commonly referred as Hadoop in the cloud), is a framework of Big Data Analytics. This framework analyzes and stores data in the cloud utilizing Hadoop. For using HaaS, there is no need to install or invest in extra infrastructures On-Premises. The technology of the HaaS is offered as well as handled by the third party. In other words, HaaS is a term, which defines virtual data analyses as well as storage in the cloud. It arises as an alternative to On-Premise Hadoop.

Features                                                         

  • HaaS providers offer a variety of features and support, including:
  • Hadoop framework deployment support.
  • Hadoop cluster management.
  • Alternative programming languages.
  • Data transfer between clusters.
  • Customizable and user-friendly dashboards and data manipulation.
  • Security features.

Why HaaS As A Cloud Computing Solution?


Apache Hadoop as a Service when providing as a cloud computing solution is aimed at making medium and large scale data processing easier, faster, accessible and cost effective. To help a business focus on the growth perspective, the HaaS eliminates all the operational challenges that emerge while running Hadoop.

With outstanding features like unlimited scalability and on demand access to storage capacity and computing, cloud computing perfectly blends with this Big Data processing technology. More than the on-premise solutions, the Hadoop as a Service providers offer various distinct advantages as given below:-

1. Fully Integrated Big Data Software

Hadoop as a Service comes fully powered with the Hadoop ecosystem comprising Hive, Pig, MapReduce, Presto, Oozie, Spark and Sqoop. The HaaS also offers connectors for integration of data and creating data pipelines that coordinate with the working of existing data pipelines.

2. On-Demand Elastic Cluster

In accordance with the changes in the data processing requirements, the Hadoop clusters in the cloud scale up and down, thus providing more operational efficiency in comparison to static clusters deployed on-premises. Moreover, performance is improved as nodes get automatically added or removed from the clusters depending upon the size of the data.

3. Cluster Management Made Easier

Opting for cloud based HaaS offers a fully configured Hadoop cluster, thus relieving of the need to invest extra time and resources in setting up clusters, scaling infrastructure and managing nodes. 

4. Cost Economical 

One of the major reasons why Hadoop in the cloud is becoming immensely popular is its cost effectiveness. Businesses are not required to make investments in installing on site infrastructure and IT support and on-demand instances render 90 percent savings and payment has to be made only for space when used with auto-scaling clusters.

Monday, 22 November 2021

CDH Troubleshooting Upgrades

  •  Cluster hosts do not appear

Some cluster hosts do not appear when you click Find Hosts in install or update wizard.

Possible Reasons

You might have network connectivity problems.

Possible Solutions

Make sure all cluster hosts have SSH port 22 open.

Check other common causes of loss of connectivity such as firewalls and interference from SELinux.

  • Cannot start services after upgrade

You have upgraded the Cloudera Manager Server, but now cannot start services.

Possible Reasons

You might have mismatched versions of the Cloudera Manager Server and Agents.

Possible Solutions

Make sure you have upgraded the Cloudera Manager Agents on all hosts. (The previous version of the Agents will heartbeat with the new version of the Server, but you cannot start HDFS and MapReduce with this combination.)

  • HDFS DataNodes fail to start

After upgrading, HDFS DataNodes fail to start with exception:

Exception in secureMainjava.lang.RuntimeException: Cannot start datanode because the configured max locked memory size (dfs.datanode.max.locked.memory) of 4294967296 bytes is more than the datanode's available RLIMIT_MEMLOCK ulimit of 65536 bytes.   

Possible Reasons

HDFS caching, which is enabled by default in CDH 5 and higher, requires new memlock functionality from Cloudera Manager Agents.

Possible Solutions:

Do the following:

Stop all CDH and managed services.

On all hosts with Cloudera Manager Agents, hard-restart the Agents. Before performing this step, ensure you understand the semantics of the hard_restart command by reading Cloudera Manager Agents.

RHEL 7, SLES 12, Ubuntu 18.04 and higher

sudo systemctl stop supervisord

sudo systemctl start cloudera-scm-agent

RHEL 5 or 6, SLES 11, Debian 6 or 7, Ubuntu 12.04 or 14.04

sudo service cloudera-scm-agent hard_restart

Start all services.

  • Cloudera services fail to start

Possible Reasons

Java might not be installed or might be installed at a custom location.

Possible Solutions

See Configuring a Custom Java Home Location for more information on resolving this issue.

  • Host Inspector Fails

If you see the following message in the Host Inspector:

There are mismatched versions across the system, which will cause failures. See below for details on which hosts are running what versions of components.

When looking at the results, some hosts report Supervisord vX.X.X, while others report X.X.X-cmY.Y.Y (where X and Y are version numbers). During the upgrade, an old file on the hosts may cause the Host Inspector to indicate mismatched Supervisord versions.

This issue occurs because these hosts have a file on them at /var/run/cloudera-scm-agent/supervisor/__STARTING_CM_VERSION__ that contains a string for the older version of Cloudera Manager.

To resolve this issue:

Remove or rename the /var/run/cloudera-scm-agent/supervisor/__STARTING_CM_VERSION__ file

Perform a hard restart of the agents:

sudo systemctl stop cloudera-scm-supervisord.service

sudo systemctl start cloudera-scm-agent

Run the Host inspector again. It should pass without the warning.

Saturday, 10 July 2021

Hadoop Tutorial ! What exactly is Hadoop? What hadoop used for ?

 What is Hadoop?

What exactly is Hadoop?

Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware

What is Hadoop and Big Data?

Big Data meaning a data that is huge in size. Bigdata is a term used to describe a collection of data that is huge in size 

Is Hadoop a programming language?

No. Hadoop is framework itself and mostly written in the Java programming language, with some native code in C and command line utilities written as shell scripts.

Is Hadoop a database?

Hadoop not traditional type database, it is  a software ecosystem that allows for massively parallel computing. Also there is types NoSQL distributed databases (such as HBase) which is part of hadoop

Is Hadoop Dead Now?

No Hadoop is not dead. There are  number of core projects from the Hadoop ecosystem continue to live on in the Cloudera Data Platform, a product that is very much alive in near future

All Your questions will be answered and discussed here in details:

Introduction

Apache Hadoop is an open-source software framework that allows to store and process big data in a distributed environment across clusters of computers using simple programming models. It is designed to scale up from single server to thousands of servers, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a running cluster of computers, each of which may be prone to failures.It helps in handling larger volume of data with minimum failure

History

Intially hadoop was f conceived to fix a scalability issue , an open source crawler and search engine. At that time Google had published papers about the Google File System (GFS), and Map-Reduce, a computational framework for parallel processing. Development started in the Apache Nutch project with the successful implementation of these papers. But in  2006 Apache Nutch project was moved to the new Hadoop subproject Doug Cutting, who was working at Yahoo! at the time, named it after his son's toy elephant.


Hadoop

Hadoop is a distributed master-slave architecture that consists of the following primary components:

  • Storage unit– HDFS (NameNode, DataNode)
  • Processing framework– YARN (ResourceManager, NodeManager)
  • Hadoop Distributed File System (HDFS) for data storage.
  • Yet Another Resource Negotiator (YARN), a general purpose scheduler and resource manager.
  • MapReduce, a batch-based computational engine. MapReduce is implemented as a YARN application.

HDFS

HDFS is the storage component of Hadoop. It’s a distributed filesystem that’s modeled after the Google File System (GFS) paper.4 HDFS is optimized for high throughput and works best when reading and writing large files (gigabytes and larger). To support this throughput, HDFS uses unusually large (for a filesystem) block sizes and data locality optimizations to reduce network input/output (I/O).

Scalability and availability are also key traits of HDFS, achieved in part due to data replication and fault tolerance. Hadoop 2 introduced two significant new features for HDFS—Federation and High Availability (HA):

  • NameNode: NameNode is the master node in the distributed environment and it maintains the metadata information for the blocks of data stored in HDFS like block location, replication factors etc.
  • DataNode: DataNodes are the slave nodes, which are responsible for storing data in the HDFS. NameNode manages all the DataNodes.
  • Federation allows HDFS metadata to be shared across multiple NameNode hosts, which aides with HDFS scalability and also provides data isolation, allowing different applications or teams to run their own NameNodes without fear of impacting other NameNodes on the same cluster.
  • High Availability in HDFS removes the single point of failure that existed in Hadoop 1, wherein a NameNode disaster would result in a cluster outage. HDFS HA also offers the ability for failover (the process by which a standby Name-Node takes over work from a failed primary NameNode) to be automated.

HDFS Commands

        Click this Link:  HDFS Commands                                 

YARN

Apache YARN (Yet Another Resource Negotiator) is Hadoop’s cluster resource management system. YARN was introduced in Hadoop 2 to improve the MapReduce implementation, but it is general enough to support other distributed computing paradigms as well.

YARN’s architecture is simple because its primary role is to schedule and manage resources in a Hadoop cluster. The core components in YARN: the ResourceManager and the NodeManager. YARN separates resource management and processing components.

Cluster resource management means managing the resources of the Hadoop Clusters. And by resources we mean Memory, CPU etc. YARN took over this task of cluster management from MapReduce and MapReduce is streamlined to perform Data Processing only in which it is best.

YARN has central resource manager component which manages resources and allocates the resources to the application. Multiple applications can run on Hadoop via YARN and all application could share common resource management.

  • ResourceManagerIt receives the processing requests, and then passes the parts of requests to corresponding NodeManagers accordingly, where the actual processing takes place. It allocates resources to applications based on the needs.
  • NodeManagerNodeManager is installed on every DataNode and it is responsible for the execution of the task on every single DataNode.

MAPREDUCE

MapReduce is a batch-based, distributed computing framework modeled after Google’s paper on MapReduce. It allows you to parallelize work over a large amount of raw data. The MapReduce model simplifies parallel processing by abstracting away the complexities involved in working with distributed systems, such as computational parallelization, work distribution, and dealing with unreliable hardware and software. With this abstraction, MapReduce allows the programmer to focus on addressing business needs rather than getting tangled up in distributed system complications.

  • MapReduce consists of two distinct tasks – Map and Reduce.
  • As the name MapReduce suggests, the reducer phase takes place after the mapper phase has been completed.
  • So, the first is the map job, where a block of data is read and processed to produce key-value pairs as intermediate outputs.
  • The output of a Mapper or map job (key-value pairs) is input to the Reducer.
  • The reducer receives the key-value pair from multiple map jobs.
  • Then, the reducer aggregates those intermediate data tuples (intermediate key-value pair) into a smaller set of tuples or key-value pairs which is the final output.

Hadoop distributions

Hadoop is an Apache open source project, and regular releases of the software are available for download directly from the Apache project’s website (http://hadoop.apache.org/releases.html#Download). You can either download and install Hadoop from the website or use a commercial distribution of Hadoop, which will give you the added benefits of enterprise administration software, a support team to consult.

Apache

Apache is the organization that maintains the core Hadoop code and distribution. the challenge with the Apache distributions has been that support is limited to the goodwill of the open source community, and there’s no guarantee that your issue will be investigated and fixed. Having said that, the Hadoop community is a very supportive one, and responses to problems are usually rapid.

Cloudera

CDH (Cloudera Distribution Including Apache Hadoop) is the most tenured Hadoop distribution, and it employs a large number of Hadoop (and Hadoop ecosystem) committers. Doug Cutting, who along with Mike Caferella originally created Hadoop, is the chief architect at Cloudera. In aggregate, this means that bug fixes and feature requests have a better chance of being addressed in Cloudera compared to Hadoop distributions with fewer committers.

Hortonaworks

Hortonworks Data Platform (HDP) is also made up of a large number of Hadoop committers, and it offers the same advantages as Cloudera in terms of the ability to quickly address problems and feature requests in core Hadoop and its ecosystem projects. Hortonworks is also the main driver behind the next-generation YARN platform, which is a key strategic piece keeping Hadoop relevant.

Cloudera Hortonworks Merger 
January 3, 2019  the enterprise data cloud company, today announced completion of its merger with Hortonworks.Cloudera and Hortonworks have announced they are merging. ... Knowing this, there must have been a strong driver that forced Cloudera and Hortonworks together

Thursday, 24 June 2021

Hbase errors issues and solutions

1)ERROR: KeeperErrorCode = NoNode for /hbase/master 

check

hbase(main):001:0> list

TABLE

ERROR: KeeperErrorCode = NoNode for /hbase/master

For usage try 'help "list"'

Took 8.2629 seconds

hbase(main):002:0> list

TABLE

ERROR: Call id=15, waitTime=60008, rpcTimeout=60000

For usage try 'help "list"'

Took 488.5749 seconds

 Dead regions server

hbase(main):002:0> status

1 active master, 2 backup masters, x servers, x dead, 0.0000 average load

Took 0.0711 seconds

HMASTER UI SHOWING DEAD REGION SERVER


hbase:meta,,1 is not online on 

Solution

In progress

How to Delete a directory from Hadoop cluster which is having commas in its name?

># hdfs dfs -rm -r  /hbase/WALs/wrker-02.xyz.com,16020,1623662453275-splitting

Picked up _JAVA_OPTIONS: -Djava.io.tmpdir=/hadoop/tmp

{"type":"log","host":"host_name","category":"YARN-yarn-GATEWAY-BASE","level":"WARN","system":"n05dlkcluster","time": "21/06/23 06:13:57","logger":"util.NativeCodeLoader","timezone":"UTC","log":{"message":"Unable to load native-hadoop library for your platform... using builtin-java classes where applicable"}}

Deleted /hbase/WALs/wrker-02.xyz.com,16020,1623662453275-splitting

How to clear Dead Region Servers in HBase UI?

Microsoft’s Windows 11 Feature Dowload Anroid apps Amazon app store

Microsoft’s Windows 11 Launched....
Android apps coming to Windows 11 as well 

The next version of Windows 11 is here with a complete design overhaul.

 Teams integration is being added to Windows 11 rings Fresh Interface, Centrally-Placed Start Menu called the “next generation” of Windows, comes with a massive redesign over its predecessor, starting from an all-new boot screen and startup sound to a centrally-placed Start menu and upgraded widgets.

 Windows 11 also removes elements including the annoying “Hi Cortana” welcome screen and Live Tiles

Windows 11 is a major release of the Windows NT operating system, announced on June 24, 2021, and developed by Microsoft.

Developer Microsoft
Written in
C, C++, C#, assembly language
OS family Microsoft Windows
Source model
Closed-source
Source-available (through Shared Source Initiative)
Some components open source[1][2][3][4]
Marketing target Personal computing
Available in 110 languages[5][6]
List of languages
Update method
Windows Update
Microsoft Store
Windows Server Update Services (WSUS)
Platforms x86-64, ARM64
Kernel type Hybrid (Windows NT kernel)
Userland Windows API
.NET Framework
Universal Windows Platform
Windows Subsystem for Linux
Android
Default user interface Windows shell (graphical)
Preceded by Windows 10 (2015)
Official website windows.com