NextGen Big Data Trends: Apache Software Foundation announce Apache Hadoop 3.0.0 GA!

Wow Apache Hadoop reaches 3.0

!!Hadoop was born around 2007, and by 2017 its Part of Life!!

The Apache Software Foundation has announced version three of the open source software framework for distributed computing.

It incorporates over 6,000 changes since its started over a year ago of the open source software framework for distributed computing.
Apache Hadoop 3.0 is the first major release since Hadoop 2 was released in 2013.

“Hadoop 3 is a major milestone for the project, and our biggest release ever,” said Andrew Wang, Apache Hadoop 3 release manager. “It represents the combined efforts of hundreds of contributors over the five years since Hadoop 2. I’m looking forward to how our users will benefit from new features in the release that improve the efficiency, scalability, and reliability of the platform.”

Apache Hadoop has become known for its ability to run and manage data applications on large hardware clusters in the Big Data ecosystem. The latest release features HDFS erasure coding, a preview of YARN Timeline Service version 2, YARN resource types, and improved capabilities and performance enhancements around cloud storage systems. It includes Hadoop Common for supporting other Hadoop modules, the Hadoop Distributed File System, Hadoop YARN and Hadoop MapReduce.

“This latest release unlocks several years of development from the Apache community,” said Chris Douglas, vice president of Apache Hadoop. “The platform continues to evolve with hardware trends and to accommodate new workloads beyond batch analytics, particularly real-time queries and long-running services. At the same time, our Open Source contributors have adapted Apache Hadoop to a wide range of deployment environments, including the Cloud.”

Apache Hadoop is widely deployed in enterprises and companies like Adobe, AWS, Apple, Cloudera, eBay, Facebook, Google, Hortonworks, IBM, Intel, LinkedIn, Microsoft, Netflix and Teradata.

In addition, it has inspired other Hadoop related projects such as: Apache Cassandra, HBase, Hive, Spark and ZooKeeper.

All you want To Know About Apache Hadoop 3.0.0

!!!Apache Hadoop 3.0: Major changes!!

HDFS erasure coding

halves the storage cost of HDFS while also improving data durability;

Minimum required Java version increased to Java 8

All Hadoop JARs are now compiled targeting a runtime version of Java 8, which means that those of you who are still using Java 7 or below should upgrade to Java 8.

Early preview of YARN Timeline Service major revision

Hadoop 3.0 also brings an early preview (alpha 2) of a major revision of YARN Timeline Service: v.2, which addresses two major challenges:

improving scalability and reliability of Timeline Service
enhancing usability by introducing flows and aggregation

Shell script rewrite

The Hadoop shell scripts have been rewritten to fix many long-standing bugs and include some new features. However, keep in mind that some changes could break existing installations.You’ll find the incompatible changes in the release notes, with related discussion on HADOOP-9902.

MapReduce task-level native optimization

MapReduce has added support for a native implementation of the map output collector. For shuffle-intensive jobs, this can lead to a performance improvement of 30% or more.

Shaded client jars

The hadoop-client Maven artifact available in 2.x releases pulls Hadoop’s transitive dependencies onto a Hadoop application’s classpath. This can be problematic if the versions of these transitive dependencies conflict with the versions used by the application.

Over the past decade, Apache Hadoop has become ubiquitous within the greater Big Data ecosystem by enabling firms to run and manage data applications on large hardware clusters in a distributed computing environment.

Default ports of multiple services have been changed.

Previously, the default ports of multiple Hadoop services were in the Linux ephemeral port range (32768-61000). This meant that at startup, services would sometimes fail to bind to the port due to a conflict with another application.

Apache Hadoop 3.0.0 highlights:

YARN Timeline Service v.2 —improves the scalability, reliability, and usability of the Timeline Service;
YARN resource types —enables scheduling of additional resources, such as disks and GPUs, for better integration with machine learning and container workloads;
Federation of YARN and HDFS subclusters transparently scales Hadoop to tens of thousands of machines;
Opportunistic container execution improves resource utilization and increases task throughput for short-lived containers. In addition to its traditional, central scheduler, YARN also supports distributed scheduling of opportunistic containers; and
Improved capabilities and performance improvements for cloud storage systems such as Amazon S3 (S3Guard), Microsoft Azure Data Lake, and Aliyun Object Storage System.

Hadoop 3.0.0 release Details:

After four alpha releases and one beta release, 3.0.0 is generally available. 3.0.0 consists of 302 bug fixes, improvements, and other enhancements since 3.0.0-beta1. All together, 6242 issues were fixed as part of the 3.0.0 release series since 2.7.0.

Here is series of alpha and beta releases leading up to an eventual Hadoop 3.0.0 GA.

3.0.0-alpha1	2016-09-03
3.0.0-alpha2	2017-01-25
3.0.0-alpha3	2017-05-26
3.0.0-alpha4	2017-07-07
3.0.0-beta1	2017-10-03
3.0.0 GA	2017-12-13

"Hadoop Is Here To Stay" —Forrester

Pages

Monday, 18 December 2017

Apache Software Foundation announce Apache Hadoop 3.0.0 GA!