Hortonworks から最新情報をメールで受け取る

月に一度、ビッグデータに関する最新のインサイト、トレンド、分析情報、ナレッジをお届けします。

行動喚起

始める

クラウド

スタートのご用意はできましたか?

Sandbox をダウンロード

ご質問はありませんか?

クローズクローズボタン
Apache プロジェクト
Apache HAWQ

Apache HAWQ

メニュー

概要

Apache HAWQ(インキュベーティング)は、高度な MPP エラスティッククエリエンジンに基づいて、Apache Hadoop でネイティブ SQL を提供します。HAWQ は新世代の高性能で高度な分析機能ともいうべきもので、Hadoop をエンタープライズ分析データベースに変換することができます。管理を簡素化し、データアクセスと分析の幅を広げながら、すべてのワークロードを動かして分析します。しかもすべてを Hadoop 内でネイティブに実行します。

What HAWQ Does

HAWQ is an elastic SQL query engine that combines exceptional MPP-based analytics performance and robust ANSI SQL compliance – enabling you to run fast ad hoc queries. Hortonworks HDB powered by Apache HAWQ includes integrated Apache MADlib (incubating) machine learning – enabling SQL-based predictive analytics.

HAWQ and MADlib advantages include:

hawq-diagram-1

Evolved from over a decade’s worth of intellectual property from Pivotal Greenplum™ and open source PostgreSQL, HAWQ operates natively in Hadoop, which simplifies overall system management of cluster resources.

How HAWQ Works

Flow

The flow for setting up, loading, managing and using HAWQ and MADlib is listed below:

hawq-diagram-2

Technical Architecture

The high level architecture of Apache HAWQ is shown below. In a typical deployment, each slave node includes a physical HAWQ segment, an HDFS DataNode and a NodeManager. Masters for HAWQ, HDFS and YARN are on separate nodes.

HAWQ is tightly integrated with YARN for query resource management. HAWQ caches containers from YARN in a resource pool and then manages those resources locally leveraging its own finer-grained resource management for users and groups.

For a query to be executed, it allocates a set of virtual segments according to the cost of a query, resource queue definitions, data locality and the current resource usage in the system. Then the query is dispatched to corresponding physical hosts (can be a subset of nodes of the whole cluster). The HAWQ resource enforcer on each node monitors and controls the real time resources used by the query to avoid resource usage violations.

hawq-diagram-3

Nodes can be added dynamically without data redistribution. Expansion takes only seconds. When a new node is added, it automatically contacts the HAWQ master, which makes the resource available on the node to be used for future queries immediately.

Apache HAWQ を搭載した Hortonworks HDB

How HDB Complements Apache Hive

The Hortonworks HDB support subscription offering is a combination of Apache HAWQ and Apache MADlib, fully supported by Hortonworks running on the Hortonworks Data Platform (HDP). Apache Hive is the de facto standard for SQL queries over petabytes of data in Hadoop.

Hortonworks HDB complements Hive by adding the following capabilities:

Capability 詳細
Interactive query performance
  • Query performance in seconds
  • Compatible with any ANSI SQL compliant BI Tool
  • Larger number of concurrent users
MADlib big data Machine Learning in SQL
  • Classification e.g. predict loan default
  • Regression e.g. predict value of a sale
  • Clustering e.g. marketing campaign segmentation, and more.
Data federation using HAWQ Extension Framework
  • SQL queries against other data sources such as JSON files in HDFS

When to use HDB vs. Hive

Choose the right SQL engine based on your application’s needs:

コンポーネント Best Fit
Apache Hive
  • Multiple subject areas
  • Holds very detailed information
  • Scale – Multiple Petabytes
  • Integrates all data sources
  • ETL, Reporting & BI
  • Low-Mid Query Latency
Apache HAWQ を搭載した Hortonworks HDB
  • Single Subject Mart
  • Summarized information
  • Scale – 100s TB
  • Ad-hoc Analytics & Visualization
  • Machine Learning
  • Low Query Latency

フォーラム