Comprehensive security for Enterprise Hadoop
Apache Ranger delivers a comprehensive approach to security for a Hadoop cluster. It provides a centralized platform to define, administer and manage security policies consistently across Hadoop components.
Apache Ranger offers a centralized security framework to manage fine-grained access control across:
Using the Apache Ranger console, security administrators can easily manage policies for access to files, folders, databases, tables, or column. These policies can be set for individual users or groups and then enforced consistently across HDP stack.
The Ranger Key Management Service (Ranger KMS) provides a scalable cryptographic key management service for HDFS “data at rest” encryption. Ranger KMS is based on the Hadoop KMS originally developed by the Apache community and extends the native Hadoop KMS functionality by allowing system administrators to store keys in a secure database.
Ranger also provides security administrators with deep visibility into their Hadoop environment through a centralized audit location that tracks all the access requests in real time and support multiple destination sources including HDFS and Solr.
Apache Ranger has a decentralized architecture with the following internal components:
|Ranger admin portal||The Ranger Admin portal is the central interface for security administration. Users can create and update policies, which are then stored in a policy database. Plugins within each component poll these policies at regular intervals. The portal also consists of an audit server that sends audit data collected from the plugins for storage in HDFS or in a relational database.|
|Ranger plugins||Plugins are lightweight Java programs which embed within processes of each cluster component. For example, the Apache Ranger plugin for Apache Hive is embedded within Hiveserver2. These plugins pull in policies from a central server and store them locally in a file. When a user request comes through the component, these plugins intercept the request and evaluate it against the security policy. Plugins also collect data from the user request and follow a separate thread to send this data back to the audit server.|
|User group sync||Apache Ranger provides a user synchronization utility to pull users and groups from Unix or from LDAP or Active Directory. The user or group information is stored within Ranger portal and used for policy definition.|
|Extension of support||Additional investments extend administration of authorization and auditing to more Hadoop components:|
|Encryption||Production-ready KMS to support HDFS Transparent Data Encryption|
Export/Import of Ranger Security Policies
Incremental User Sync
Support for $username and other Macro Variables in Ranger Policies
The Atlas/ Ranger integration represents a paradigm shift for big data governance and security. By integrating Atlas with Ranger enterprises can now implement dynamic classification-based security policies, in addition to role-based security. Ranger’s centralized platform empowers data administrators to define security policy based on Atlas metadata tags or attributes and apply this policy in real-time to the entire hierarchy of data assets including databases, tables and columns.
|Apache Ranger 0.7.0||
|Apache Ranger 0.6.0||
|Apache Ranger 0.5.0||
|Apache Ranger 0.4.0||
|HDP Advanced Security 3.5||
Ambari 2.0 helps provision, manage and monitor Hadoop security in two ways. First, Ambari now simplifies the setup, configuration and maintenance of Kerberos for strong authentication in the cluster. Secondly, Ambari now includes support for installing and configuring Apache Ranger for centralized security administration, authorization and audit. For additional details view the Apache Ambari page.
Introduction R is a popular tool for statistics and data analysis. It has rich visualization capabilities and a large collection of libraries that have been developed and maintained by the R developer community. One drawback to R is that it’s designed to run on in-memory data, which makes it unsuitable for large datasets. Spark is […]
多くのお客様から非常によくいただくリクエストは、たとえばスキャンした PNG ファイルのテキストなど、画像ファイル中でテキストをインデックスすることです。このチュートリアルでは、それを SOLR を使って行う方法を段階的に説明します。前提条件：Hortonworks Sandbox がダウンロードされていること、「HDP Sandbox のコツを学ぶ」のチュートリアルを完了していること。ステップバイステップ・ガイド […]
Introduction In this tutorial, you will learn about the different features available in the HDF sandbox. HDF stands for Hortonworks DataFlow. HDF was built to make processing data-in-motion an easier task while also directing the data from source to the destination. You will learn about quick links to access these tools that way when you […]
はじめに：JReport は、Apache Hive の JDBC ドライバを使用して Hortonworks Data Platform 2.3 からデータを簡単に抽出し可視化することができる、組み込み BI レポーティングツールです。レポート、ダッシュボード、データ分析を作成することが可能で、後で自分のアプリケーションに組み込むこともできます。このチュートリアルでは、次のステップをご説明します[...]
Apache Zeppelin on HDP 2.4.2 Author: Vinay Shukla In March 2016 we delivered the second technical preview of Apache Zeppelin, on HDP 2.4. Meanwhile we and the Zeppelin community have continued to add new features to Zeppelin. These features are now available in the final technical preview of Apache Zeppelin. This technical preview works with […]
Introduction Hadoop has always been associated with BigData, yet the perception is it’s only suitable for high latency, high throughput queries. With the contribution of the community, you can use Hadoop interactively for data exploration and visualization. In this tutorial you’ll learn how to analyze large datasets using Apache Hive LLAP on Amazon Web Services […]
Introduction The Azure cloud infrastructure has become a common place for users to deploy virtual machines on the cloud due to its flexibility, ease of deployment, and cost benefits. Microsoft has expanded Azure to include a marketplace with thousands of certified, open source, and community software applications and developer services, pre-configured for Microsoft Azure. This […]
The Hortonworks Sandbox is delivered as a Dockerized container with the most common ports already opened and forwarded for you. If you would like to open even more ports, check out this tutorial.
Apache, Hadoop, Falcon, Atlas, Tez, Sqoop, Flume, Kafka, Pig, Hive, HBase, Accumulo, Storm, Solr, Spark, Ranger, Knox, Ambari, ZooKeeper, Oozie, Phoenix, NiFi, HAWQ, Zeppelin, Atlas, Slider, Mahout, MapReduce, HDFS, YARN, Metron and the Hadoop elephant and Apache project logos are either registered trademarks or trademarks of the Apache Software Foundation in the United States or other countries.