Exam 3 (Ch 7 & 8) BIV
What are the 5 critical success factors for Big Data analytics? a. A clear _______ _______ b. Strong, committed ______ c. ________ between the business and IT strategy d. ____-____ decision-making culture e. _______ people with the right skills
business need, sponsorship, alignment, fact-based, right
Fog computing address IoT issue by: 1) Proposing fog nodes to process the data ____ to IoT 2) _____ ______ - any device including routers or switches
close, fog nodes
What is "model of enabling convention, on-demand network access to a shared pool of configurable computing resources that can be rapidly provisioned and released with minimal management effort or service-provider interaction"?
cloud computing
Hadoop's _____ run on inexpensive commodity hardware so projects can scale-out inexpensively
clusters
physical devices, sensors, and actuators where data is produced and records
hardware
Social network vs. ITO are like ________ to______ vs _________ to_______
human to human, machine to machine
which cloud moves between the private/public clouds and is more flexible
hybrid
(T/F): Fog computing is critical when data needs to be analyzed in less than a second
true
(T/F): Rockwell Automation wanted to use technology to monitor equipment status ahead of time to prevent costly repairs
true
Stream analytics can be used in the ___________ industry for analytics on crunching the numbers behind the scenes to understand what we are really interested in to provide creative offerings
e-commerce
Where does big data come from?
everywhere
(T/F) In the Salesforce case study, streaming data is used to identify services that customers use most.
false
(T/F) Internet of Things (IoT) is the phenomenon of connecting the virtual world to the Internet
false
(T/F) Social networking Web sites like Facebook, Twitter, and LinkedIn, are not examples of cloud computing
false
___ _______ is the middleman between physical device and data center
fog computing
Allowing Big Data to be processed in memory and distributed across a dedicated set of nodes can solve complex problems in near-real time with highly accurate insights. What is this process called?
in-memory analytics
(T/F) Big data by itself is worthless
true
(T/F) FitBit and Ring are IoT startups
true
The analytics revolution → _________ _________
Cultural transformation
What are the 7 keys to succeed with Big Data? SCVEIGE a. Simplify b. Coexist c. Visualize d. Empower e. _____________ f. Govern g. Evangelize
Integrate
______ ______ ________ is connecting the physical world to the Internet in contrast to the Internet of the people that connects us humans to each other through technology
Internet of Things
The skills of a data scientist are to ___________ Big Data
investigate
_____________ ______ _________ are used to capture, store, analytics, and manage the data linked to a location
geographic information systems
What are 4 building blocks to IoT? a. ________: physical devices, sensors, and actuators where data is produced and records b. Connectivity: connected to a network to communicate with each other or applications c. Software backend: manages connected networks with devices and provides data integration d. Applications: data is turned into meaningful information
hardware
What are 4 building blocks to IoT?
hardware, connectivity, software backend, applications
IoT has grown because of a. ________ - smaller, affordable, more powerful b. Creativity - new ___________ and use cases uncovered (asking what if we put sensors here....) c. Availability of ____ tools - maybe?
hardware, innovations, BI
RFID is a generic technology that refers to the use of radio-frequency waves to _________ objects
identify
Main Hadoop Components: ________ ________ initiates and coordinates MapReduce jobs or the processing of the data
job tracker
Top 3 security threats in the cloud? a. Data loss and _________ b. Hardware _________ of equipment c. ________ interfaces
leakage, failure, insecure
What are the 7 keys to succeed with Big Data? SCVEIGE a. Simplify b. Coexist c. ______________ d. Empower e. Integrate f. Govern g. Evangelize
Visualize
Which of Big Data's V is "conformity to facts: accuracy, quality, truthfulness, or trustworthiness"?
veracity
Main Hadoop Components: ________ _______ backup to name node
secondary node
What are the Enablers (or high-performing computing) of Big Data Analytics? a. In-memory analytics: solve problems in near real-time b. In-database analytics: speed times to insights c. ______-_______& _____: processing jobs in a shared, centrally managed pool of IT resources d. Appliances: bring together hardware and software
Grid computing & MPP
________ is an open-source framework for storing and analyzing massive amounts of distributed, unstructured data
Hadoop
Main Hadoop Components: ______ _______ _______ _______ (HDFS): default storage layer in any Hadoop cluster
Hadoop distributed file system
___ is an open-source data warehouses originally developed by Facebook that allows analytics modeling within Hadoop
Hive
What are the 3 main issues managers have to keep in mind when exploring IoT? a. __________ ________: everyone needs to be receptive to link up their systems b. __________ Challenges: connect all the applications seamlessly c. __________: entry points for malicious hackers
Organizational Alignment, Interoperability, security
What was the earliest sensor technology
RFID
Why is separating the impact of analytics from that of other computerized systems a difficult task? A) Businesses do not typically track the sources of successful projects. B) The trend is toward integrating systems. C) Software tools are not sophisticated enough. D) It is not an organizational priority.
The trend is toward integrating systems
Which of the following is true of data-as-a-Service (DaaS) platforms? A) Knowing where the data resides is critical to the functioning of the platform. B) There are standardized processes for accessing data wherever it is located. C) Business processes can access local data only. D) Data quality happens on each individual platform.
There are standardized processes for accessing data wherever it is located
_____ tags are larger, more expensive, HAVE a power source
active
3 major cloud services providers? Amazon elastic B________ Microsoft A________ Google A_____ E_______
beanstalks, azure, app engine
Big data + "_____" analytics = value
big
____ ______ has become a popular term to describe the exponential growth, availability, and use of information, both structure and unstructured
big data
Cloud-computing and service-oriented thinking is a service centered around _______ _______ (data, information, and analytics) capabilities as _________
building agile, service
Facebook data generation in a day
500 terabytes
Key players in the IoT ecosystem? a. A__________ b. M______ A_______ c. IBM _______ d. T_________
Amazon AWS, Microsoft Azure, Watson, Teradata,
What are 4 building blocks to IoT? a. Hardware: physical devices, sensors, and actuators where data is produced and records b. Connectivity: connected to a network to communicate with each other or applications c. Software backend: manages connected networks with devices and provides data integration d._________ : data is turned into meaningful information
Applications
What are 4 building blocks to IoT? a. Hardware: physical devices, sensors, and actuators where data is produced and records b. __________: connected to a network to communicate with each other or applications c. Software backend: manages connected networks with devices and provides data integration d. Applications: data is turned into meaningful information
Connectivity
What are the 7 keys to succeed with Big Data? SCVEIGE a. Simplify b. Coexist c. Visualize d. Empower e. Integrate f. Govern g. ____________
Evangelize
Pitney Bowers wanted to analyze data generated from the mailing machines in advance to prevent any outage and fix machines before they break down and used _____ _______
GE Prefix
"NoSQL" = ____ _____ SQL
Not only
SilverHook powerboats used IBM Bluemix Platform as a Service _____ to employ IBM SPSS analytics solutions and deliver insights in an understandable way to users and fans
PaaS
Using this model, companies can deploy their software and applications in the cloud so that their customers can use them. A) SaaS B) PaaS C) IaaS D) DaaS
PaaS
________ analytics evaluate every incoming observation against all prior observations where their no window size
Perpetual
Which of the following sources is likely to produce Big Data the fastest? a. Order entry clerks b. Cashiers c. RFID tags d. Online customers
RFID tags
In the opening vignette, ________ believes Big Data/IoT can help forecast component faults weeks in advance
Siemens
What are 4 building blocks to IoT? a. Hardware: physical devices, sensors, and actuators where data is produced and records b. Connectivity: connected to a network to communicate with each other or applications c. ______ ________: manages connected networks with devices and provides data integration d. Applications: data is turned into meaningful information
Software backend
data is turned into meaningful information for IoT
applications
analytics allows for more ________ ________ to be done by humans and decrease costs for organization & quality of output
cognitive tasks
connected to a network to communicate with each other or applications
connectivity
Which of the Big Data challenges are specific to Big Data Analytics success? a. strong _____ _______ b. right ________ __________
data infrastructure, analytics tools
Which of these is NOT a part of the IoT technology infrastructure? A) hardware B) connectivity C) electrical access D) software
electrical access
_____ _____, like routers or switches, process data and analyze it
fog nodes
Main Hadoop Components: ______ ______ (primary facilitator) provides client info on where the cluster data is stored if a node fails
name node
"Big" depends on the _________'s size
organization
______ tag are small, expensive, NO power
passive
Retail uses _______ (small, inexpensive, no power source) RFID tags with an ________ (electronic product code)
passive, EPC
which cloud is more secure and operated for single organization
private
which cloud has users subscribe to resources offered by service provider over internet
public
The in-memory analytics of Big Data analytics allows to solve problems in near ______ _______
real time
2 enablers of IoT are: 1) _____ 2) _______ devices
sensors, sensing
Main Hadoop Components: ______ ________ are the grunts of any Hadoop cluster
slave nodes
manages connected networks with devices and provides data integration
software backend
3 companies that are AaaS? T_______ IBM ________ _________ S_________
tableau, watson analytics, snowflake
(T/F) AaaS is part of SaaS, PaaS, and Iaas - which means costs and compliance risks are reduced while increasing productivity of users
true
(T/F) Dartmouth-Hitchcock Medical Center wanted to proactively determine the health of people who are likely to fall sick and prevent them from falling ill
true
(T/F) GulfAir developed a sentiment analysis tool called "Arabic Sentiment Analysis" that analyzed English and Arabic social media posts (based on Cloudera and Hadoop)
true
(T/F) Mankind Pharma Used IBM to reduce application implementation time by 98% through IBM Cloud platform called SoftLayer
true
(T/F) Pay-as-you-go and pay-per-use are cloud computing business models
true
(T/F) Public School in Tacoma, WA used Microsoft Azure Machine Learning to Predict School Dropouts and boost graduation rates
true
(T/F): Chime used Snowflake to connect FB, Google, JSON sources to learn more about customer engagement across mobile, web, and backend platforms
true
(T/F): The order of bytes from small to large: kilobyte, megabyte, gigabyte, terabyte, petabyte, exabyte, zettabyte, yottabyte, brontobyte, gegobyte, google
true
·(T/F) MD Anderson Cancer Center used all the collected clinical oncology data to provide better treatment to patients through IBM Watson
true
Big data is important for: 1. Finding _____ within and outside conventional data sources 2. Serves as basis of innovation, growth, and differentiation
value
Which of Big Data's V is "patterns can be detected in the data for insights and better decisions"
value
Cloud computing is a style of computing which is dynamically scalable and often __________ resources are provided over the Internet
virtualized
______ _______ allows turning millions of data records into informational graphics in just seconds for Big Data analytics
visual analytics
The addition of location components based lat/long to traditional analytical techniques enables organizations to add a new dimension of "_______" to their traditional business analysis (which before only answered who/what/when/how much)
where
How much information does a Google hold?
10^100
YouTube's data generation in a day
360 terabytes
What are the Enablers (or high-performing computing) of Big Data Analytics? a. In-memory analytics: solve problems in near real-time b. In-database analytics: speed times to insights c. Grid computing & MPP: processing jobs in a shared, centrally managed pool of IT resources d. _________: bring together hardware and software
Appliances
Who are the big players in the Big Data vendor landscape? C___ M____ H_______ O______ G_____ A_______
Cloudera, Microsoft, Hortonworks, Oracle, Google, Amazon
What are the 7 keys to succeed with Big Data? SCVEIGE a. Simplify b. _______________ c. Visualize d. Empower e. Integrate f. Govern g. Evangelize
Coexist
What are the 7 keys to succeed with Big Data? SCVEIGE a. Simplify b. Coexist c. Visualize d. Empower e. Integrate f. g. Evangelize
Govern
In this model, infrastructure resources like networks, storage, servers, and other computing resources are provided to client companies. A) SaaS B) PaaS C) IaaS D) DaaS
IaaS
What are the Enablers (or high-performing computing) of Big Data Analytics? a. In-memory analytics: solve problems in near real-time b. ___________: speed times to insights c. Grid computing & MPP: processing jobs in a shared, centrally managed pool of IT resources d. Appliances: bring together hardware and software
In-database analytics
What are the Enablers (or high-performing computing) of Big Data Analytics? a. ________: solve problems in near real-time b. In-database analytics: speed times to insights c. Grid computing & MPP: processing jobs in a shared, centrally managed pool of IT resources d. Appliances: bring together hardware and software
In-memory analytics
__________ is a technique popularized by Google that distributes the processing of very large multi-structured data files across a large cluster of machines
MapReduce
Demystifying Facts about Hadoop a. Hadoop consists of multiple products b. Hadoop is open source but available from vendors, too c. Hadoop is an ecosystem, not a single product d. HDFS is a file system, not a DBMS e. Hive resembles _____ but is not standard SQL f. Hadoop and MapReduce are related but not the same g. MapReduce provides control for analytics, not analytics h. Hadoop is about data diversity, not just data volume
SQL
What new geometric data type in Teradata's data warehouse captures geospatial features? A) NAVTEQ B) ST_GEOMETRY C) GIS D) SQL/MM
ST_GEOMETRY
This model allows consumers to use applications and software that run on distant computers in the cloud infrastructure: A) SaaS B) IaaS C) PaaS D) AaaS
SaaS
What are the 7 keys to succeed with Big Data? SCVEIGE a. ____________ b. Coexist c. Visualize d. Empower e. Integrate f. Govern g. Evangelize
Simplify
Which of the following is true about the furtherance of homeland security? A) There is a lessening of privacy issues. B) There is a greater need for oversight. C) The impetus was the need to harvest information related to financial fraud after 2001. D) Most people regard analytic tools as mostly ineffective in increasing security.
There is greater need for oversight
The grid computing & MPP of Big Data analytics is processing jobs in a shared, _________ managed pool of IT resources
centrally
In the opening vignette, AT explored how the problem of customer _______could be reduced based on an analysis of the customers' communication problem
churn
Streaming is the analytic process of extracting actionable information from __________ flowing data
continuously
Demystifying Facts about Hadoop a. Hadoop consists of multiple products b. Hadoop is open source but available from vendors, too c. Hadoop is an ecosystem, not a single product d. HDFS is a file system, not a DBMS e. Hive resembles SQL but is not standard SQL f. Hadoop and MapReduce are related but not the same g. MapReduce provides _____ for analytics, not analytics h. Hadoop is about data diversity, not just data volume
control
_____ ______ ________ is a method of capturing, tracking, and analyzing streams of data to detect events (out of normal happenings) of certain types that are worthy of the effort
critical event processing
This process is enabling technology for stream analytics and extracting novel patterns/knowledge structures from continuous, rapid data records
data stream mining
In the Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse case study, what was the analytic goal? a. determine differences in rates of disease in urban and rural populations b. determine if diseases are accurately diagnosed c. determine probabilities of diseases that are comorbid d. determine differences in rates of disease in males v. females
determine differences in rates of disease in urban and rural populations
Demystifying Facts about Hadoop a. Hadoop consists of multiple products b. Hadoop is open source but available from vendors, too c. Hadoop is an ecosystem, not a single product d. HDFS is a file system, not a DBMS e. Hive resembles SQL but is not standard SQL f. Hadoop and MapReduce are related but not the same g. MapReduce provides control for analytics, not analytics h. Hadoop is about data ________, not just data volume
diversity
Demystifying Facts about Hadoop a. Hadoop consists of multiple products b. Hadoop is open source but available from vendors, too c. Hadoop is an _______, not a single product d. HDFS is a file system, not a DBMS e. Hive resembles SQL but is not standard SQL f. Hadoop and MapReduce are related but not the same g. MapReduce provides control for analytics, not analytics h. Hadoop is about data diversity, not just data volume
ecosystem
Demystifying Facts about Hadoop a. Hadoop consists of multiple products b. Hadoop is open source but available from vendors, too c. Hadoop is an ecosystem, not a single product d. HDFS is a ____ ______, not a DBMS e. Hive resembles SQL but is not standard SQL f. Hadoop and MapReduce are related but not the same g. MapReduce provides control for analytics, not analytics h. Hadoop is about data diversity, not just data volume
file system
The appliances of Big Data analytics bring together ______ and software
hardware
Big data is important for: 1. Finding value within and outside conventional data sources 2. Serves as basis of ___________ , growth, and differentiation
innovation
MapReduce's _______ is colored squares & counting number of squares of each color
input
The in-database analytics of Big Data analytics allows for speed times to _____
insights
In the Alternative Data for Market Analysis or Forecasts case study, satellite data was NOT used for which of the following: a. tracking agricultural estimates b. monitoring activity at factories c. monitoring individual customer patterns d. evaluating retail traffic
monitoring individual customer patterns
Demystifying Facts about Hadoop a. Hadoop consists of _______ products b. Hadoop is open source but available from vendors, too c. Hadoop is an ecosystem, not a single product d. HDFS is a file system, not a DBMS e. Hive resembles SQL but is not standard SQL f. Hadoop and MapReduce are related but not the same g. MapReduce provides control for analytics, not analytics h. Hadoop is about data diversity, not just data volume
multiple
Demystifying Facts about Hadoop a. Hadoop consists of multiple products b. Hadoop is open source but available from vendors, too c. Hadoop is an ecosystem, not a single product d. HDFS is a file system, not a DBMS e. Hive resembles SQL but is not standard SQL f. Hadoop and MapReduce are related but are ____ the same g. MapReduce provides control for analytics, not analytics h. Hadoop is about data diversity, not just data volume
not
Demystifying Facts about Hadoop a. Hadoop consists of multiple products b. Hadoop is ____ _______ but available from vendors, too c. Hadoop is an ecosystem, not a single product d. HDFS is a file system, not a DBMS e. Hive resembles SQL but is not standard SQL f. Hadoop and MapReduce are related but not the same g. MapReduce provides control for analytics, not analytics h. Hadoop is about data diversity, not just data volume
open source
The 3 use cases for data warehousing and RDBMS are: a. Data warehouse ________ b. Integrating data that provides business _____ c. ___________ BI tools
performance, value, interactive
Streaming analytics is the applying transaction level logic to ____-_____ observations (last 5 seconds)
real-time
The 2 use cases for Big Data and Hadoop are: a. Hadoop as the _____ and refinery b. Hadoop as the ______ _________
repository, active archive
Grouping a string of events together involving a particular customer into a defined time period (5 days over all the channels of communication) is called __________
sessionizing
What are the 7 keys to succeed with Big Data? SCVEIGE
simplify, coexist, visualize, empower, integrate, govern, evangelize
The continuous sequence of data elements relates to a ________
stream
(T/F) Current total storage capacity lags behind the digital information being generated in the world
true
(T/F) From massive amounts of high-dimensional location data, algorithms that reduce the dimensionality of the data can be used to uncover trends, meaning, and relationships to eventually produce human-understandable representations
true
(T/F) In Application Case 7.6, Analyzing Disease Patterns from an Electronic Medical Records Data Warehouse, it was found that urban individuals have a higher number of diagnosed disease conditions.
true
(T/F) In the Great Clips case study, the company uses geospatial data to analyze, among other things, the types of haircuts most popular in different geographic locations
true
(T/F) In the Quiznos case, the company employed location-based behavioral targeting to narrow the characteristics of users who were most likely to eat at a quick-service restaurant.
true
(T/F) In the opening vignette, the Access Telecom (AT), built a system to better visualize customers who were unhappy before they canceled their service.
true
(T/F) Process efficiency and cost reduction is the top business problem addressed by Big Data Analytics
true
(T/F) Stream analytics is also called data-in-motion analytics and real-time data analytics
true
(T/F) The term "Big Data" is relative as it depends on the size of the using organization.
true
(T/F): Any industry that requires quickly staying on top of business events as they unfold and allowing organizations to address before they become a problem can benefit from stream analytics
true
Data elements in a stream are called ________
tuples
1. Data flows can be highly inconsistent, with periodic peaks, making data loads hard to manage. What is this feature of Big Data called?
variability
Which of Big Data's V is "increasing velocities and varieties of data, data flows can be highly consistent with periodic peaks"
variability
Which of Big Data's V is "data collected in all types of formats"?
variety
Stream is also referred to as ________, or the rapid and continuous streaming of data
velocity
What is the most overlooked characteristic of Big Data?
velocity
Which of Big Data's V is "how fast data is produced and how fast the data must be processed to meet the need or demand"?
velocity
What is the most common the Big Data's 3 Vs?
volume
_________ might be considered the most important because although size is relative to the organization, the growth of more and more data defines the need for Big Data
volume
What are the main challenges of Business Analytics? All the V's a. ___________ b. Data ___________ c. __________ capabilities d. Data ___________ e. _________ availability f. Solution _______
volume, integration, processing, governance, skill, costs
______, ______, and ______ are the 3 V's of Big Data
volume, variety, velocity
The difference between streaming and perpetual analytics is the ______ _______
window size
Why are companies like IBM shifting to provide more services and consulting? A) Customers see that significant value can be created with the application of analytics, and need help completing these tasks. B) They can no longer compete in the software market. C) New regulations forced them into this market. D) None of these.
Customers see that significant value can be created with the application of analytics, and need help completing these tasks
This model began with the notion that data quality could happen in a centralized place, cleansing and enriching data and offering it to different systems, applications, or users, irrespective of where they were in the organization, computers, or on the network. A) SaaS B) PaaS C) IaaS D) DaaS
DaaS
What are the 7 keys to succeed with Big Data? SCVEIGE a. Simplify b. Coexist c. Visualize d. _______________________ e. Integrate f. Govern g. Evangelize
Empower