ニュースレター

Hortonworks から最新情報をメールで受け取る

月に一度、ビッグデータに関する最新のインサイト、トレンド、分析情報、ナレッジをお届けします。

AVAILABLE NEWSLETTERS:

Sign up for the Developers Newsletter

月に一度、ビッグデータに関する最新のインサイト、トレンド、分析情報、ナレッジをお届けします。

行動喚起

始める

クラウド

スタートのご用意はできましたか?

Sandbox をダウンロード

ご質問はありませんか?

*いつでも登録を解除できることを理解しています。Hortonworks プライバシーポリシーのその他の情報も確認しています。
クローズクローズボタン
HDP > Hadoop を使用した開発 > 入門編の基本

Hadoop Tutorial – Getting Started with HDP

Loading Sensor Data into HDFS

クラウド スタートのご用意はできましたか?

SANDBOX をダウンロード

Loading Sensor Data into HDFS

はじめに

In this section, you will download the sensor data and load that into HDFS using Ambari User Views. You will get introduced to the Ambari Files User View to manage files. You can perform tasks like create directories, navigate file systems and upload files to HDFS.  In addition, you’ll perform a few other file-related tasks as well.  Once you get the basics, you will create two directories and then load two files into HDFS using the Ambari Files User View.

前提条件

The tutorial is a part of series of hands on tutorial to get you started on HDP using Hortonworks sandbox. Please ensure you complete the prerequisites before proceeding with this tutorial.

概要

HDFS backdrop

A single physical machine gets saturated with its storage capacity as the data grows. This growth drives the need to partition your data across separate machines. This type of File system that manages storage of data across a network of machines is called Distributed File Systems. HDFS is a core component of Apache Hadoop and is designed to store large files with streaming data access patterns, running on clusters of commodity hardware. With Hortonworks Data Platform (HDP), HDFS is now expanded to support heterogeneous storage media within the HDFS cluster.

Step 1 – Download and Extract Sensor Data Files

  1. Download the sample sensor data contained in a compressed (.zip) folder here:  Geolocation.zip
  2. Save the Geolocation.zip file to your computer, then extract the files. You should see a Geolocation folder that contains the following files:
    • geolocation.csv – This is the collected geolocation data from the trucks. It contains records showing truck location, date, time, type of event, speed, etc.
    • trucks.csv – This is data was exported from a relational database and it shows information on truck models, driverid, truckid, and aggregated mileage info.

Step 2 – Load the Sensor Data into HDFS

1. Logon to Ambari using: maria_dev/maria_dev

2. Go to Ambari Dashboard and open Files View.

files_view_lab1

3. Start from the top root of the HDFS file system, you will see all the files the logged in user (maria_dev in this case) has access to see:

root_files_view_folder_lab1

4. Navigate to /user/maria_dev directory by clicking on the directory links.

5. Let’s create a data directory to upload the data that we are going to use for this use case. Click the new_folder_icon_lab1 button to create the data directory inside the maria_dev directory. Now navigate into the data directory.

add_new_folder_data_lab1

Upload Geolocation and Trucks CSV Files to data Folder

1. If you’re not already in your newly created directory path /user/maria_dev/data, go to the data folder. Then click on the upload_icon_lab1 button to upload the corresponding geolocation.csv and trucks.csv files into it.

2. An Upload file window will appear, click on the cloud symbol.

upload_file_lab1

3. Another window will appear, navigate to the destination the two csv files were downloaded. Click on one at a time, press open to complete the upload. Repeat the process until both files are uploaded.

upload_file_window_lab1

Both files are uploaded to HDFS as shown in the Files View UI:

uploaded_files_lab1

You can also perform the following operations on a file or folder by clicking on the entity’s row: Open, Rename, Permissions, Delete, Copy, Move, Download and Concatenate.

Set Write Permissions to Write to data Folder

  1. click on the data folder’s row, which is contained within the directory path /user/maria_dev.
  2. Click Permissions.
  3. Make sure that the background of all the write boxes are checked (blue).

Refer to image for a visual explanation.

edit_permissions_lab1

Summary

Congratulations! Let’s summarize the skills and knowledge we acquired from this tutorial. We learned Hadoop Distributed File System (HDFS) was built to manage storing data across multiple machines. Now we can upload data into the HDFS using Ambari’s HDFS Files view.

参考文献

ユーザーの評価

ユーザーの評価
4 5 out of 5 stars
5 Star 100%
4 Star 0%
3 Star 0%
2 Star 0%
1 Star 0%
チュートリアル名
Hadoop Tutorial – Getting Started with HDP

質問する回答を探す場合は、Hortonworks Community Connectionをご参照ください。

4 Reviews
評価する

登録

登録して評価をご記入ください

ご自身の体験を共有してください

例: 最高のチュートリアル

この欄に最低50文字で記入してください。

成功

ご意見を共有していただきありがとうございます!

Easy to understand
by Dennis Suhari on October 19, 2018 at 12:27 am

Informative and good practical description of the steps

Informative and good practical description of the steps

表示件数を減らす
Cancel

Review updated successfully.

Great Tutorial
by scott payne on July 24, 2018 at 8:55 pm

Tutorial was an excellent introduction to HDP data processing using a realistic data set. Each concept is presented succinctly with suggestions to explore the concept further. My only suggestion is that not enough emphasis is placed on how much faster it is to run your queries using a shell than it is to use the sandbox.

Tutorial was an excellent introduction to HDP data processing using a realistic data set. Each concept is presented succinctly with suggestions to explore the concept further.

My only suggestion is that not enough emphasis is placed on how much faster it is to run your queries using a shell than it is to use the sandbox.

表示件数を減らす
Cancel

Review updated successfully.

Outstanding
by Christian Lopez on May 8, 2018 at 8:29 pm

This review is written from the perspective of a new HDP user interested in understanding this environment and the tools included in the Sandbox. First you will be introduced to the technologies involved in the tutorial namely Hadoop, Ambari, Hive, Pig Latin, SPARK, HDFS, and most importantly HDP. Next, you will use IoT data to calculate the risk factor for truck drivers by using the truck's information and their geo-location, you will accomplish this goal by uploading the needed data to your VM and storing the data as Hive tables. Additionally, you will learn to use… Show More

This review is written from the perspective of a new HDP user interested in understanding this environment and the tools included in the Sandbox.

First you will be introduced to the technologies involved in the tutorial namely Hadoop, Ambari, Hive, Pig Latin, SPARK, HDFS, and most importantly HDP. Next, you will use IoT data to calculate the risk factor for truck drivers by using the truck’s information and their geo-location, you will accomplish this goal by uploading the needed data to your VM and storing the data as Hive tables. Additionally, you will learn to use PIG Latin and SPARK to extrapolate the data needed to find the risk factor for all drivers in the set and storing the information you found back into the database. Accomplishing the same task using two different tools (SPARK, and PIG) highlights the robustness and flexibility of HDP as all the operations happen flawlessly.

I highly recommend this tutorial as it is highly informative, shows a realistic use-case, and as a new user of HDP I learned about all the cool technologies enabled to work through the Hortonworks platform, most importantly I was left with a great sense of accomplishment and that’s reason alone to try the tutorial.

表示件数を減らす
Cancel

Review updated successfully.

Excellent Tutorial!
by Ana Castro on May 8, 2018 at 4:05 pm

The tutorial was very informative and had an excellent flow. It had just the right amount of detail per concept. Great introduction to Hadoop and other Apache projects.

The tutorial was very informative and had an excellent flow. It had just the right amount of detail per concept. Great introduction to Hadoop and other Apache projects.

表示件数を減らす
Cancel

Review updated successfully.