Hadoop Developer Practice Questions set I 1-250

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

78. you are writing a create table command in hive and you will two columns: ID integer and Name string. What goes inside the parenthesis in that statement?

(ID integer, name string)

6. True/False: The "sort" that happens before the reduce phase is a true sort operation

False: This is really a "merge" operation.

27. Can mapreduce be written in any other language than Java?

Yes. Most scripting languages like python, ruby, PHP are fine.

13. What is the difference between class and interface?

You can not implement any defined methods within an interface.

54. Generally speaking, one map task per ONE ______?

input split

73. True or false: you can edit system properties via API

False

28. T/F: HDFS allows random writes

False!

83. True or false , in reading hdfs data, name node is the bottleneck

False!

88. True or false; in YARN , application manager and application master are really the same thing

False!

89. True or false: scheduler in YARN offers guarantee of restarting in case of application failure or hardware failure.

False!

94. True or false: you have to have hdfs to run yarn

False!

15. True/False: Reducer sorts data by keys once it receives the output of mappers

False. Data is guaranteed to be sorted by key by the time they get to the reducers.

70. To install Hive , you need to compile source code. True or false

False. Tar xvf is adequate

31. Flume: Nodes send heartbeats to master every ____ seconds

5

24. You have 5 mapper and 4 reducers. How many copying operations will take place?

5x4=20

3.Author of these questions

@mamunr7

76. Author of these questions

@mamunr7

82. HCatalog

An abstraction layer that hides where and how data is physically stored from users and scripts

95. _____ is essentially an OS for distributed applications

YARN

85. Apache Slider allows what?

YARN unaware distributed applications can run on YARN

100. Does hive have "use database" command?

Yes

16. Can you have 0 reducers?

Yes

25. Is PIG "C-like" ?

Yes

39. Does Oozie allow forks?

Yes

36. Does HBase support "transactions" ?

Yes (short answer)

56. Which method of mapper should be used to set up so that a file get copied to the distributed cache before the tasks start?

Configure

93. How to set a system property using CLI

-Dproperty1=100

66. Once you have fed configure class an XML file, which java method can you use to check if the values are correct?

Assertthat. e.g: assertThat(Conf1.getYear(), is("1978"));

18. Combiner reduces: a. load on network b. load on the datanodes running reducers c. both d. niether

C

23. Where does hive interpreter run?

Client machine! (It turns HIVEQL into mapreduce jobs)

65. You have defined an new instance Conf1. You have resource file ready names R1.xml. How do you mean add resources from this xml file to Conf1?

Conf1.AddResources(R1.xml);

47. How to create a new object called myconf which belongs to configuration class

Configuration myconf = new Configuration ();

1. Which method to call to associate an array in mapper

Configure

99. Command to create a database in hive CLI

Create database db1;

84. DAG

Directed Acyclic Graph

80. In hive, in the create table statement, how do you define comma as the field separator?

FIELD TERMINATED BY ','

41. T/F: jobconf defines # of mappers

False

49. Which java class represent a dir or file to get the metadata e.g. ownership, replication factor, block size, ownership and permissions:

FileStatus

67. What is a foreign key?

From wikipedia: In context of relational databases, a foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table. In simpler words, the foreign key is defined in a second table, but it refers to the primary key in the first table. Imagine: "user id" stored in some important table , that is used later to make joins.

8. Which apache project provides random access to "planet size data"?

HBASE

97. Hoya

HBase on yarn

38. Apache project that allows SQL type queries on HDFS data

HIVE

71. The main she'll variable you should set after installing hive

HIVE_HOME

42. Who defines # of mappers

Hadoop Framework

46. Way mapper figures out which key goes to which partition

Hashing

22. How do Hive jobs become mapreduce tasks?

Hive Interpreter

75. You are creating a table in Hive. Which clause can you use to ensure that you are not overriding an existing table ?

IF NOT EXISTS

12. Is Writable a class or interface?

Interface

81. Which hivecli command reads data from a text file into a created table?

LOAD DATA INPATH followed by more options

57. In left outer join, unmatched entries from which table is included?

Left

17. 3 types of outer joins:

Left Outer, Right Outer, Fi

9. How in the world, HBASE can provide random access to Petabytes of data?

Lot of indexing and partitioning and use of caching in RAM

64. If left table has M rows and right table has N rows, how many rows will there in a cartesian join of those two tables?

MxN

21. Can reduce tasks talk to each other?

NO

26. When it comes to reading files, is Namenode a bottleneck?

NO

37. Does HBase support multi-row transactions?

No

74. You defined a system property using the configuration class. Will the value still be there when your job is finished?

No. One will have to set it again via configure class.

34. 2 requirements (for the operation at hand) for using Combiners

Operations have to be both associative and commutative

69. 3 ways to interact data in Hbase

Put, get, scan

48. A ____________ uses the data within the boundaries created by the input split to generate key/value pairs

Recordreader

87. Resource Manager's two main components

Scheduler and application manager

43. Who does housekeeping of Primary Namenode?

Secondary Namenode

77. When writing hive statements spanning multiple lines which character actually ends the statement ?

Semi colon

68. If you are worried about another developer setting a configuration property to something else after you modify, what can you do?

Set it to final

98. If you are importing data from MySQL to hive using sqoop, what would be the first 3 words of the command?

Sqoop import --connect

92. Which class and method to use to set a system property?

System.setProperty("property1",100);

2. THE DRIVER METHOD

There is one final component of a Hadoop MapReduce program, called the Driver. The driver initializes the job and instructs the Hadoop platform to execute your code on a set of input files, and controls where the output files are placed. A cleaned-up version of the driver from the example Java implementation that comes with Hadoop is presented below: public void run(String inputPath, String outputPath) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); // the keys are words (strings) conf.setOutputKeyClass(Text.class); // the values are counts (ints) conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setReducerClass(Reduce.class); FileInputFormat.addInputPath(conf, new Path(inputPath)); FileOutputFormat.setOutputPath(conf, new Path(outputPath)); JobClient.runJob(conf); }

30. In Flume, there are "master" nodes? T/F

True

32. Combiner reduces SHUFFLE load , as well. T/F

True

33. T/F: In general, same reducer method can be used as mini-combiner.

True

44. Secondary namenode is considered a "master" T/F

True

5. T/F: HDFS was designed for write-once, streaming access for relatively large files.

True

59. T/F: Generally speaking, reduce side join can "long"

True

61. T/F: reduce side join supports all of the following: inner join, outer join (both), full outer join, anti-join and Cartesian join

True

62. True/False: Reduce side join supports all of the following: inner join, outer join (both), full outer join, anti-join and Cartesian join

True

79. True or false: in the create table statement in hive, you can insert a comment for the table.

True

86. True or False: Storm already takes advantage of YARN APIs

True

90. True or false: yarn scheduler has a policy plug in that would divide up to cluster in different queues etc.

True

91. Capacity scheduler and fair scheduler are examples of policy plugins of yarn scheduler. True or false

True

96. True or false: yarn supports windows

True

4. If allowed and configured, when does a combiner run? (which phase)

after mapper, but before practitioner and shuffle

7. "copy" phase of mapreduce happens when?

after the mapper and before the "sort" phase which is really a merge phase.

19. Easy example to understand "serialization"

converting "tables" to data on files so that they can be "written" to disk

63. Cartesian join

each row of left table is crossed with each row of right table

55. Distributed Cache is used for what?

files needed by all data nodes for processing (e.g. jar files or data files)

58. ANTI-JOIN =

full outer join - inner join

45. One way fundamental Hadoop is different than RDBMS which enables it scale so much better:

hadoop takes code to the data. RDBMS takes data to the code.

35. Which "class" represents a mapreduce job?

job

53. parameter that defines how many times a failed task will be attempted

mapred.max.map.attempts

52. Parent ABSTRACT CLASS of HDFS file systems

org.apache.hadoop.FileSystem

50. Which package has the conf (configuration) class?

org.apache.hadoop.conf

29. Common problem with map-side join

out of memory on slave nodes

20. Easy example to understand "deserialization"

read from data files on disk and converting them to "table" data.

51. Where does org.apache.hadoop.conf class get its data from?

reads from an xml file

60. Which one is easier to implement: map side join or reduce-side join?

reduce-side

11. What does identity reducer do?

shuffle and sort after mapper, but still no reducing

40. map files are _________________ sequence files

sorted

10. 2 ways to have files in input directory that are "ignored" during processing

start with DOT or underscore

72. Which takes priority over which: system property or property set via resource file?

system property

14. What is the difference between "extending" and "implementing" (Java)

you extend a "class". You implement an "interface"


Kaugnay na mga set ng pag-aaral

Ch. 15 How Humans Evolved Physical Anthropology

View Set

Texas Government - Chapter 7 : The Legislature

View Set

The Civil Constitution of the Clergy

View Set

Series vs Parallel Circuit Formulas

View Set