Hadoop Developer Practice Questions set I 1-250
78. you are writing a create table command in hive and you will two columns: ID integer and Name string. What goes inside the parenthesis in that statement?
(ID integer, name string)
6. True/False: The "sort" that happens before the reduce phase is a true sort operation
False: This is really a "merge" operation.
27. Can mapreduce be written in any other language than Java?
Yes. Most scripting languages like python, ruby, PHP are fine.
13. What is the difference between class and interface?
You can not implement any defined methods within an interface.
54. Generally speaking, one map task per ONE ______?
input split
73. True or false: you can edit system properties via API
False
28. T/F: HDFS allows random writes
False!
83. True or false , in reading hdfs data, name node is the bottleneck
False!
88. True or false; in YARN , application manager and application master are really the same thing
False!
89. True or false: scheduler in YARN offers guarantee of restarting in case of application failure or hardware failure.
False!
94. True or false: you have to have hdfs to run yarn
False!
15. True/False: Reducer sorts data by keys once it receives the output of mappers
False. Data is guaranteed to be sorted by key by the time they get to the reducers.
70. To install Hive , you need to compile source code. True or false
False. Tar xvf is adequate
31. Flume: Nodes send heartbeats to master every ____ seconds
5
24. You have 5 mapper and 4 reducers. How many copying operations will take place?
5x4=20
3.Author of these questions
@mamunr7
76. Author of these questions
@mamunr7
82. HCatalog
An abstraction layer that hides where and how data is physically stored from users and scripts
95. _____ is essentially an OS for distributed applications
YARN
85. Apache Slider allows what?
YARN unaware distributed applications can run on YARN
100. Does hive have "use database" command?
Yes
16. Can you have 0 reducers?
Yes
25. Is PIG "C-like" ?
Yes
39. Does Oozie allow forks?
Yes
36. Does HBase support "transactions" ?
Yes (short answer)
56. Which method of mapper should be used to set up so that a file get copied to the distributed cache before the tasks start?
Configure
93. How to set a system property using CLI
-Dproperty1=100
66. Once you have fed configure class an XML file, which java method can you use to check if the values are correct?
Assertthat. e.g: assertThat(Conf1.getYear(), is("1978"));
18. Combiner reduces: a. load on network b. load on the datanodes running reducers c. both d. niether
C
23. Where does hive interpreter run?
Client machine! (It turns HIVEQL into mapreduce jobs)
65. You have defined an new instance Conf1. You have resource file ready names R1.xml. How do you mean add resources from this xml file to Conf1?
Conf1.AddResources(R1.xml);
47. How to create a new object called myconf which belongs to configuration class
Configuration myconf = new Configuration ();
1. Which method to call to associate an array in mapper
Configure
99. Command to create a database in hive CLI
Create database db1;
84. DAG
Directed Acyclic Graph
80. In hive, in the create table statement, how do you define comma as the field separator?
FIELD TERMINATED BY ','
41. T/F: jobconf defines # of mappers
False
49. Which java class represent a dir or file to get the metadata e.g. ownership, replication factor, block size, ownership and permissions:
FileStatus
67. What is a foreign key?
From wikipedia: In context of relational databases, a foreign key is a field (or collection of fields) in one table that uniquely identifies a row of another table. In simpler words, the foreign key is defined in a second table, but it refers to the primary key in the first table. Imagine: "user id" stored in some important table , that is used later to make joins.
8. Which apache project provides random access to "planet size data"?
HBASE
97. Hoya
HBase on yarn
38. Apache project that allows SQL type queries on HDFS data
HIVE
71. The main she'll variable you should set after installing hive
HIVE_HOME
42. Who defines # of mappers
Hadoop Framework
46. Way mapper figures out which key goes to which partition
Hashing
22. How do Hive jobs become mapreduce tasks?
Hive Interpreter
75. You are creating a table in Hive. Which clause can you use to ensure that you are not overriding an existing table ?
IF NOT EXISTS
12. Is Writable a class or interface?
Interface
81. Which hivecli command reads data from a text file into a created table?
LOAD DATA INPATH followed by more options
57. In left outer join, unmatched entries from which table is included?
Left
17. 3 types of outer joins:
Left Outer, Right Outer, Fi
9. How in the world, HBASE can provide random access to Petabytes of data?
Lot of indexing and partitioning and use of caching in RAM
64. If left table has M rows and right table has N rows, how many rows will there in a cartesian join of those two tables?
MxN
21. Can reduce tasks talk to each other?
NO
26. When it comes to reading files, is Namenode a bottleneck?
NO
37. Does HBase support multi-row transactions?
No
74. You defined a system property using the configuration class. Will the value still be there when your job is finished?
No. One will have to set it again via configure class.
34. 2 requirements (for the operation at hand) for using Combiners
Operations have to be both associative and commutative
69. 3 ways to interact data in Hbase
Put, get, scan
48. A ____________ uses the data within the boundaries created by the input split to generate key/value pairs
Recordreader
87. Resource Manager's two main components
Scheduler and application manager
43. Who does housekeeping of Primary Namenode?
Secondary Namenode
77. When writing hive statements spanning multiple lines which character actually ends the statement ?
Semi colon
68. If you are worried about another developer setting a configuration property to something else after you modify, what can you do?
Set it to final
98. If you are importing data from MySQL to hive using sqoop, what would be the first 3 words of the command?
Sqoop import --connect
92. Which class and method to use to set a system property?
System.setProperty("property1",100);
2. THE DRIVER METHOD
There is one final component of a Hadoop MapReduce program, called the Driver. The driver initializes the job and instructs the Hadoop platform to execute your code on a set of input files, and controls where the output files are placed. A cleaned-up version of the driver from the example Java implementation that comes with Hadoop is presented below: public void run(String inputPath, String outputPath) throws Exception { JobConf conf = new JobConf(WordCount.class); conf.setJobName("wordcount"); // the keys are words (strings) conf.setOutputKeyClass(Text.class); // the values are counts (ints) conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(MapClass.class); conf.setReducerClass(Reduce.class); FileInputFormat.addInputPath(conf, new Path(inputPath)); FileOutputFormat.setOutputPath(conf, new Path(outputPath)); JobClient.runJob(conf); }
30. In Flume, there are "master" nodes? T/F
True
32. Combiner reduces SHUFFLE load , as well. T/F
True
33. T/F: In general, same reducer method can be used as mini-combiner.
True
44. Secondary namenode is considered a "master" T/F
True
5. T/F: HDFS was designed for write-once, streaming access for relatively large files.
True
59. T/F: Generally speaking, reduce side join can "long"
True
61. T/F: reduce side join supports all of the following: inner join, outer join (both), full outer join, anti-join and Cartesian join
True
62. True/False: Reduce side join supports all of the following: inner join, outer join (both), full outer join, anti-join and Cartesian join
True
79. True or false: in the create table statement in hive, you can insert a comment for the table.
True
86. True or False: Storm already takes advantage of YARN APIs
True
90. True or false: yarn scheduler has a policy plug in that would divide up to cluster in different queues etc.
True
91. Capacity scheduler and fair scheduler are examples of policy plugins of yarn scheduler. True or false
True
96. True or false: yarn supports windows
True
4. If allowed and configured, when does a combiner run? (which phase)
after mapper, but before practitioner and shuffle
7. "copy" phase of mapreduce happens when?
after the mapper and before the "sort" phase which is really a merge phase.
19. Easy example to understand "serialization"
converting "tables" to data on files so that they can be "written" to disk
63. Cartesian join
each row of left table is crossed with each row of right table
55. Distributed Cache is used for what?
files needed by all data nodes for processing (e.g. jar files or data files)
58. ANTI-JOIN =
full outer join - inner join
45. One way fundamental Hadoop is different than RDBMS which enables it scale so much better:
hadoop takes code to the data. RDBMS takes data to the code.
35. Which "class" represents a mapreduce job?
job
53. parameter that defines how many times a failed task will be attempted
mapred.max.map.attempts
52. Parent ABSTRACT CLASS of HDFS file systems
org.apache.hadoop.FileSystem
50. Which package has the conf (configuration) class?
org.apache.hadoop.conf
29. Common problem with map-side join
out of memory on slave nodes
20. Easy example to understand "deserialization"
read from data files on disk and converting them to "table" data.
51. Where does org.apache.hadoop.conf class get its data from?
reads from an xml file
60. Which one is easier to implement: map side join or reduce-side join?
reduce-side
11. What does identity reducer do?
shuffle and sort after mapper, but still no reducing
40. map files are _________________ sequence files
sorted
10. 2 ways to have files in input directory that are "ignored" during processing
start with DOT or underscore
72. Which takes priority over which: system property or property set via resource file?
system property
14. What is the difference between "extending" and "implementing" (Java)
you extend a "class". You implement an "interface"