Hortonworks から最新情報をメールで受け取る

月に一度、ビッグデータに関する最新のインサイト、トレンド、分析情報、ナレッジをお届けします。

行動喚起

始める

クラウド

スタートのご用意はできましたか?

Sandbox をダウンロード

ご質問はありませんか?

クローズクローズボタン
Apache プロジェクト
Apache Atlas

Apache Atlas

メニュー

概要

Agile enterprise compliance through metadata

Atlas is designed to exchange metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements

 

What Atlas Does

Screen Shot 2016-09-06 at 4.30.46 PM

Apache Atlas provides scalable governance for Enterprise Hadoop that is driven by metadata. Atlas, at its core, is designed to easily model new business processes and data assets with agility. This flexible type system allows exchange of metadata with other tools and processes within and outside of the Hadoop stack, thereby enabling platform-agnostic governance controls that effectively address compliance requirements

Apache Atlas is developed around two guiding principles:

  • Metadata Truth in Hadoop: Atlas provides true visibility in Hadoop. By using native connector to Hadoop components, Atlas provides technical and operational tracking enriched by business taxonomical metadata. Atlas facilitates easy exchange of metadata by enabling any metadata consumer to share a common metadata store that facilitates interoperability across many metadata producers.
  • Developed in the Open: Engineers from Aetna, Merck, SAS, Schlumberger, and Target are working together to help ensure Atlas is purposely built to solve real data governance problems across a wide range of industries that use Hadoop. This approach is an example of open source community innovation that helps accelerate product maturity and time-to-value for the data-first enterprise.

Apache Atlas empowers enterprises to effectively and efficiently address their compliance requirements through a scalable set of core governance services. These services include:

  • データ Lineage: Captures lineage across Hadoop components at platform level
  • Agile Data Modeling: Type system allows custom metadata structures in a hierarchy taxonomy
  • REST API: Modern, flexible access to Atlas services, HDP components, UI & external tools
  • Metadata Exchange: Leverage existing metadata / models by importing it from current tools. Export metadata to downstream systems

 

 

 

How Atlas Works

Apache Atlas is designed to effectively exchange metadata within Hadoop and the broader data ecosystem. Atlas’s adaptive model reduces enterprise time to compliance by leveraging existing metadata and industry-specific taxonomy. With Atlas, data administrators and stewards also have the ability to define, annotate and automate the capture of relationships between data sets and underlying elements including source, target and derivation processes.

Atlas also ensures downstream metadata consistency across the ecosystem by enabling enterprises to easily export metadata to third-party systems.

 

atlas_architecture

テクニカルプレビュー

Business Taxonomy (Catalog)

ビッグデータは情報アクセスの民主化をもたらし、社内全体での情報共有を容易にします。しかし計画を立てずにデータを拡大すると、適切にタグ付けやカタログ化がされていないコンテンツの「データスワンプ」ができかねません。しかしビジネス・タクソノミーには、このギャップを埋める力があります。「順序」と「配置」を意味するギリシア語の「taxis」に由来するタクソノミーは、用語の階層を使用して概念や物理的/論理的なオブジェクトを分類して整理し、これらを企業コンテンツのドメイン全体の構造を理解するための理想的な媒体にします。

Consistent classification and tagging across the enterprise using taxonomies supports system/ platform interoperability and value generation from structured and unstructured data sources by mapping them to common shared vocabulary. This authoritative reference taxonomy improves both data confidence and time to insight.

Requirements for a Big Data Business Catalog

  • Purpose-Built Platform Solution: In order to make sense of big data and provide users with the ability to find the right information, enterprises need a data governance solution that is designed for Hadoop and operates at the platform level, so that it consistently classifies data across all the engines used by the organization to move and analyze data.
  • 専用のプラットフォーム・ソリューションは、ネイティブコネクターを使用して Hadoop コンポーネントのマルチユーザー、マルチアプリケーションのアクティビティを自動的に追跡することにより、Hadoop の単一のメタデータソースとして機能します。一方、アプリケーションレベルで動作するデータガバナンス・ソリューションは単一の独占的なソリューションパスを必要とし、その結果データ・サイロが急増します。
  • Faster Data Discovery: The business catalog enables data officers and stewards to search for data and metadata quickly and in a number of different ways to reduce time to value. This includes the ability to search by:
    • Asset Type: Search for a Hive table, Storm Topology or any connected component.
    • Tags: Search for all columns or tables that have a specific tag such as PII
    • Business Language: Aligned with compliance standards & policies

これらの検索機能を組み合わせることにより、データスチュワードは組織のモデルおよびその組織のビジネスの運営方法を構築することができます。これには、論理的および物理的データエンティティの両方を組み合わせてビジネスをモデル化し、より完全な理解を得る能力が含まれます。

What's New in HDP 2.6

クラウド

  • Shared enterprise services for governance

Component Coverage

  • Tag-based policy support for HDFS, Kafka and HBase
  • Atlas UI 用 Knox SSO

使い易さ

  • API revamp
  • Simplified UI for basic search
  • Manual entity creation – support for HDFS, HBase, Kafka & custom entity types etc.
  • Performance and scalability improvements
  • SmartSense metrics

Recent Progress with Atlas

The Atlas/ Ranger integration represents a paradigm shift for big data governance and security. By integrating Atlas with Ranger enterprises can now implement dynamic classification-based security policies, in addition to role-based security. Ranger’s centralized platform empowers data administrators to define security policy based on Atlas metadata tags or attributes and apply this policy in real-time to the entire hierarchy of data assets including databases, tables and columns.

Latest release of Apache Atlas has focused on delivering scalable metadata services to model any business process enhanced with industry-specific terminology, as well as the ability to import and export metadata from other systems and tools.

Apache Atlas Version Progress
Apache Atlas 0.7
  • Enterprise deployment
    • Performance enhancements
    • HA, DR and BC support
    • AD integration
  • Component lineage
    • Kafka/ Storm
    • Scoop
    • Falcon
  • セキュリティ
    • Support for Kerberos
    • Atlas/ Ranger integration for dynamic tag-based security
  • ユーザインターフェース
    • Improved GUI
    • Business catalog (Technical Preview)
  • Governance-ready partner ecosystem
 Apache Atlas 0.6
  • Built-in types for HDFS
  • Metadata tag management
  • Expanded support for Apache Hive
Apache Atlas 0.5
  • Scalable metadata service
    • Enterprise/Business unit level modeling with industry-specific vocabulary
    • Extend visibility into HDFS Path, Hive DB, table, columns
    • Flexible access to Atlas services
  • Hive integration leverages existing metadata
    • Leverage existing metadata with import / export capability
    • Capture SQL runtime metrics directly
  • UI driven Hive table lineage and domain-specific search
    • Support for keyword, faceted and free text searches

Governance Ready Certification

Screen Shot 2016-09-07 at 4.11.40 PM

To address enterprise requirements for Hadoop application integration, Atlas strives to foster a vibrant ecosystem based on a centralized metadata store. The Governance Ready program aims to create a curated group of partners that contribute a rich set of data management features focusing on data preparation, integration, cleansing, tagging, ETL visualization and collaboration areas.

 

Certified partners will help define a set of standards to exchange metadata and contribute conforming data integration features to the metadata store. Customers can then subscribe to desired features with low switching costs and faster deployment time.

フォーラム

Atlas Tutorials

Atlas in the Press

ウェビナーとプレゼンテーション