CSC 4740

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Open Source

Documentation is online

hdfs dfs -get shakespeare/poems ~/shakepoems.txt less ~/shakepoems.txt

Download the poems file into the local filesystem

Scalable

Easily add a machine to the cluster

Fault-Tolerant Scalable Open_Source Distributed

Features of Hadoop

HMaster

HBase structure assigning regions and creating & deleting tables

RegionServers

HBase structure serving data for reads and writes

hdfs dfs -ls /

HDFS Print the contents of the root directory

hdfs dfs -mkdir weblog

HDFS command to Create a directory 'weblog' in HDFS

gunzip -c access_log.gz | hdfs dfs -put - weblog/access_log

HDFS command to Unzip access_log.gz and upload the file in one step to weblog

tar zxvf shakespeare.tar.gz

HDFS command to Unzip shakespeare.tar.gz

hdfs dfs -ls shakespeare

List the contents of the /user/training/shakespeare directory

hdfs dfs -cat shakespeare/* | grep 'tiger'

Print all lines containing a word 'tiger' in all files stored in HDFS shakespeare directory without storing any file on your local file system.

hdfs dfs -ls /user/training hdfs dfs -ls

Print the contents of your HDFS home directory

hdfs dfs -cat shakespeare/histories | tail -n 50

Print the last 50 lines of the histories file

hdfs dfs -cat wordcounts/part-r-00000 | less

View the contents of the output in wordcounts/part-r-00000

HDFS

What is the storage of Hadoop

YARN

What manages computing resources and schedules submitted jobs

Hadoop MapReduce

What processes data on the cluster for Hadoop

job

a 'full program'

Fault Tolerant

capable of continuing operation even if a component fails

MapReduce

is a component for distributing a job across multiple nodes

HBase

is a distributed column-oriented data store built on top of HDFS

cluster

is a group of computers working together • Provides data storage, data processing, and resource management

task attempt

is a particular instance of an attempt to execute a task.

daemon

is a program running on a node, performs a specific function in the cluster

Node

is an individual computer in the cluster

task

is the execution of a single Mapper or Reducer over a slice of data.

Master node

manage distribution of work and data to worker nodes

YARN

the Hadoop processing layer that contains • A resource manager • A job scheduler

Distributed

Can work on multiple machines at the same time

Apache Hadoop

A software framework for storing, processing, and analyzing "big data"

hdfs dfs -cat shakespeare/* | grep 'tiger' | sort hdfs -put - tigers.txt

After finding all lines containing a word 'tiger' in all files stored in HDFS shakespeare directory, store them in lexicographically sorted order in a HDFS file called tigers.txt without storing any file on your local file system.

HBase is a distributed _____ _____ data store built on top of ______.

Column-oriented HDFS

hadoop version

Command line print the installed Hadoop version

hadoop

Command line to print a help message

job.SetNumReduceTasks(0);

Command to create Map-only job, set the number of Reducers to 0 in your Driver code

Storage Processing Resource Management

Core Hadoop Components

hdfs dfs -cat shakespeare/* | wc -l

Count the number of lines of all files stored in HDFS shakespeare directory without storing any file on your local file system.

hdfs dfs -mkdir testlog gunzip -c access_log.gz | head -n 5000 | hdfs dfs -put - testlog/test_access_log

Create a smaller version of the log file named testlog (e.g., first 5,000 lines) and store the smaller version in HDFS

hdfs dfs -ls /user

HDFS print the contents of the /user directory

hdfs dfs

HDFS to print a help message

ZooKeeper

Hbase structure for maintaining cluster status

128MB

How big is each block

speculative execution

If a Mapper appears to be running significantly more slowly than the others, a new instance of the Mapper will be started on another machine, operating on the same data

hdfs dfs -put shakespeare shakespeare

Insert shakespeare directory into HDFS

NameNode

One machine that get selected to store the metadata

hdfs dfs -rm shakespeare/glossary

Remove the glossary file from shakespeare


Kaugnay na mga set ng pag-aaral

ATI Reproductive and Genitourinary

View Set

Choosing a Leader - The Electoral College

View Set

decision trees pt 3 , neural networks, and whatever is missing

View Set

Peds Exam Three Practice Questions

View Set

Chapter 1 A Framework for Maternal and Child Health Nursing

View Set

AI/Machine Learning (15) AND Cognition/Emotion/Therapy (16)

View Set

Chapter 11: the gallbladder and biliary system (practice test)

View Set