MIS final
MapReduce
distributes the processing of very large multi-structured data files across a large cluster of ordinary machines/processors. Achieve high performance with "simple" computers
Which of the following statements contains valid syntax?
do increase=5 to 10 while (temperature lt 102);
Big data uses commodity hardware, which is expensive, specialized hardware that is custom built for a client or application
false
Current NoSQL tools are mature management and monitoring tools, and are used extensively
false
In the classigication of location-based analytic applications, examining geographic site locations falls in the consumer-oriented category
false
While cloud services are useful for small and midsize analytic applications, they are still limited in their ability to handle big data applications
false
Which of the following statements about using the BY statement with the SET statement is false?
first and last are stored in the table
To perform text mining
first, impose structure to the data, then mine the structured data
Which big data approach promotes efficiency, lower cost, and better performance by processing jobs in a shared, centrally managed pool of IT resources?
grid computing
Today, most smartphones are equipped with various instruments to measure jerk, orientation, and sense motion. One of these instruments is an accelerometer, and the other is a
gyroscope
Which data step statement indicates to continue processing the last row of a BY group?
if last.jobtitle;
NLP is
important concept in text mining, a subfield of artificial intelligence and computational linguistics, the studies of "understanding" the natural human language
allowing big data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called?
in memory analytics
Which function calculates the average of the columns week1, week2, week3, and week4?
mean(of Week1-Week4)
What kind of location-based analytics is a real-time marketing promotion?
organization-oriented location-based dynamic approach
Enablers of big data analytics: in-database analytics
placing analytic procedures close to where data is stored
Which of the following items cannot be accomplished with the PRINT procedure
produce summary reports
Which of the following functions can convert the values of the numeric variable level to character values?
put(level, 3.)
Which statement contains valid syntax for the RETAIN statement?
retain year 2018;
The portion of the IoT technology infrastructure that focuses on how to manage incoming data and analyze it is
software backend
Enablers of big data analytics: in-memory analytics
storing and processing the complete data set in RAM
Natural language processing
syntax versus semantics-based text mining
Traditional data warehouses have not been able to keep up with
the variety and complexity of data
From massive amounts of high-dimensional location data, algorithms that reduce the dimensionality of the data can be used to uncover trends, meaning, and relationship to eventually produce human-understandable representations
true
Hadoop was designed to handle petabytes and exabytes of data distributed over multiple nodes in parallel.
true
Internet of things (IoT_ is the phenomenon of connecting the physical world to the internet
true
What is the hadoop distributed file system designed to handle?
unstructured and semistructured non-relational data
data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of big data called?
variability
The three Vs that define big data
volume, variety, velocity
Which statement can be used to subset the rows read in a proc step
where
Text mining
A semi-automated process of extracting knowledge from unstructured data sources
Data integration
A challenge of big data analytics, the ability to combine data quickly and at a reasonable cost
How does hadoop work?
Access unstructured and semi-structured data, break it into "parts and replicate it multiple times and load into the file system.
What does web content mining involve?
Analyzing the unstructured content of web pages
What statement is true concerning the execution phase of the DATA step?
Data is processed in the program data vector (PDV)
REgional accents do not present challenges for natural language processing
False
Text analytics is the subset of text mining that handles information retrieval and extraction, plus data mining
False
Text analytics
Information retrieval + information extraction + data mining + web mining
How does hadoop work?
It breaks up Big Data into multiple parts so each part can be processed and analyzed at the same time on multiple computers.
What does advanced analytics for social media do?
It examines the content of online conversations.
Hadoop
Open source framework for storing and analyzing massive amounts of distributed, unstructured data. Opensource
What does a proc print report display by default?
Proc print displays all rows and columns in the table, a column for row numbers on the far left, and columns in the order in which they occur in the table
Which of the following statements are considered step boundaries (the end of one step and the beginning of another)?
Run, quit, data, and proc
This model allows consumers to use applications and software that run on distant computers in the cloud infrastructure
SaaS
Natural language processing (NLP) is associated with which of the following areas?
Text mining, AI, computational linguistics
Which statement is false?
The DO WHILE loop always executes at least one time
Which statement is false?
The KEEP statement names the columns to include from the input table
Which of the following is not created during the compilation phase of the data step?
The first row
Which statement is false concerning the sum statement?
The sum statement initially sets the accumulator column to missing
What do voice of the market (VOM) applications of sentiment analysis do?
They examine customer sentiment at the aggregate level.
Articles and Auxiliary verbs are assigned little value in text mining and are usually filtered out
True
Big data is being driven by the exponential growth, availability and use of information
True
Categorization and clustering of documents during text mining differ only in the pre-selection of categories.
True
Which statement about merging with the data step is true?
Two or more input tables can be merged in a data step
Which statement is false concerning the TRANSPOSE procedure?
Use a BY statement to sort the data while transposing
Enablers of big data analytics: Grid computing & MPP
Use of many machines and processors in parallel (MPP - Massively Parallel Processing)
Search engine optimization (SEO) is a means by which
Web site developers can increase web site search rankings
In text analysis, what is a lexicon?
a catalog of words, their synonyms, and their meanings
WordNet
a major resource for NLP, a hand-coded database of english words, their definitions, sets of synonyms, and various semantic relations between synonym sets... need automation to be completed
Sentiment Analysis
a technique used to detect favorable and unfavorable opinions toward specific products and services
The portion of the IoT technology infrastructure that focuses on controlling what and how information is captured is
applications
By default, proc freq creates a table of frequencies and percentages for which column types?
both character and numeric columns
Which of the following can be a way that the length of a new column is set in the data step?
by the first time that the column is referenced in the data step, in an assignment statement, or using a length statement
Which of the following files is a permanent sas file?
cerxl.quarter1
Enablers of big data analytics: Appliances
combining hardware, software, and storage in a single unit for performance and scalability
GPS navigation is an example of which kind of location-based analytics?
consumer-oriented geospatial static approach