CSC 4740

¡Supera tus tareas y exámenes ahora con Quizwiz!

Open Source

Documentation is online

hdfs dfs -get shakespeare/poems ~/shakepoems.txt less ~/shakepoems.txt

Download the poems file into the local filesystem

Scalable

Easily add a machine to the cluster

Fault-Tolerant Scalable Open_Source Distributed

Features of Hadoop

HMaster

HBase structure assigning regions and creating & deleting tables

RegionServers

HBase structure serving data for reads and writes

hdfs dfs -ls /

HDFS Print the contents of the root directory

hdfs dfs -mkdir weblog

HDFS command to Create a directory 'weblog' in HDFS

gunzip -c access_log.gz | hdfs dfs -put - weblog/access_log

HDFS command to Unzip access_log.gz and upload the file in one step to weblog

tar zxvf shakespeare.tar.gz

HDFS command to Unzip shakespeare.tar.gz

hdfs dfs -ls shakespeare

List the contents of the /user/training/shakespeare directory

hdfs dfs -cat shakespeare/* | grep 'tiger'

Print all lines containing a word 'tiger' in all files stored in HDFS shakespeare directory without storing any file on your local file system.

hdfs dfs -ls /user/training hdfs dfs -ls

Print the contents of your HDFS home directory

hdfs dfs -cat shakespeare/histories | tail -n 50

Print the last 50 lines of the histories file

hdfs dfs -cat wordcounts/part-r-00000 | less

View the contents of the output in wordcounts/part-r-00000

HDFS

What is the storage of Hadoop

YARN

What manages computing resources and schedules submitted jobs

Hadoop MapReduce

What processes data on the cluster for Hadoop

job

a 'full program'

Fault Tolerant

capable of continuing operation even if a component fails

MapReduce

is a component for distributing a job across multiple nodes

HBase

is a distributed column-oriented data store built on top of HDFS

cluster

is a group of computers working together • Provides data storage, data processing, and resource management

task attempt

is a particular instance of an attempt to execute a task.

daemon

is a program running on a node, performs a specific function in the cluster

Node

is an individual computer in the cluster

task

is the execution of a single Mapper or Reducer over a slice of data.

Master node

manage distribution of work and data to worker nodes

YARN

the Hadoop processing layer that contains • A resource manager • A job scheduler

Distributed

Can work on multiple machines at the same time

Apache Hadoop

A software framework for storing, processing, and analyzing "big data"

hdfs dfs -cat shakespeare/* | grep 'tiger' | sort hdfs -put - tigers.txt

After finding all lines containing a word 'tiger' in all files stored in HDFS shakespeare directory, store them in lexicographically sorted order in a HDFS file called tigers.txt without storing any file on your local file system.

HBase is a distributed _____ _____ data store built on top of ______.

Column-oriented HDFS

hadoop version

Command line print the installed Hadoop version

hadoop

Command line to print a help message

job.SetNumReduceTasks(0);

Command to create Map-only job, set the number of Reducers to 0 in your Driver code

Storage Processing Resource Management

Core Hadoop Components

hdfs dfs -cat shakespeare/* | wc -l

Count the number of lines of all files stored in HDFS shakespeare directory without storing any file on your local file system.

hdfs dfs -mkdir testlog gunzip -c access_log.gz | head -n 5000 | hdfs dfs -put - testlog/test_access_log

Create a smaller version of the log file named testlog (e.g., first 5,000 lines) and store the smaller version in HDFS

hdfs dfs -ls /user

HDFS print the contents of the /user directory

hdfs dfs

HDFS to print a help message

ZooKeeper

Hbase structure for maintaining cluster status

128MB

How big is each block

speculative execution

If a Mapper appears to be running significantly more slowly than the others, a new instance of the Mapper will be started on another machine, operating on the same data

hdfs dfs -put shakespeare shakespeare

Insert shakespeare directory into HDFS

NameNode

One machine that get selected to store the metadata

hdfs dfs -rm shakespeare/glossary

Remove the glossary file from shakespeare

Ver todos los conjuntos de estudio

CSC 4740

Conjuntos de estudio relacionados

Health Informat

Ch 48 Diabetes Mellitus

ATI Reproductive and Genitourinary

Genetics Exam 3

Choosing a Leader - The Electoral College

exam 1 Econ

IOB Chapter 6

decision trees pt 3 , neural networks, and whatever is missing

Peds Exam Three Practice Questions

Chap 12 Bio 12.1 Unit 4 pg 89

Chapter 1 A Framework for Maternal and Child Health Nursing

Square Roots 1 - 100

AI/Machine Learning (15) AND Cognition/Emotion/Therapy (16)

PSYC180 Wks5-7

Knewton Review Questions

Chapter 11: the gallbladder and biliary system (practice test)

Exam 3: Chapters 11, 9, & 7

Chapter 4 - Cells & Energy

1 Virtual Computer Tour

Sukunli Nun va Tanvin Belgisi Qoidalari