IMB midterm
The evolution of Data Science is best studied with a three-pronged approach: Statistical science, compute power and ______________.
-a. Digital data p14 b. Optimization c. AIs
The Venn diagram that depicts the intersection of Science, Technology, and Data has highlighted a cross-section known as the 'danger zone.' Which of the following is an accurate depiction of this overlap in the Venn diagram?
-a. Has technology and data experience but no science (analytics) background p26 b. It is called the "danger zone" because the individual is a "unicorn," one who is an expert in all concerns of data science. c. Is expert in technology and science but has no domain expertise on the data collected. d. Is expert in science and data, but not well versed with technology and programming.
Which of the following factor contributed to the fact that AI and Machine Learning become more powerful and efficient in recent years?
-a. The development of more advanced algorithms b. The nanotechnology in manufacturing semi-conduct c. The exponential growth of computer storage spaces
Which of the following best defines the Internet of Things (IoT)?
-a. A giant network of physical objects with sensors, processing ability, software, and other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks p18 b. A giant network of people connected with social media, software, and other technologies exchange data with other people and systems over the Internet or other communications networks c. A giant network of computer applications with sensors, processing ability, software, and other technologies that connect and exchange data with other devices and systems over the Internet or other communications networks
Because Data Analytics is a specific and precise application of Data Science (Read the class material entitled "The Core Differences between Data Analytics vs Data Science" on Moodle), there are major differences between these two fields. Which of the following INCORRECTLY states the differences?
-a. All of the above are incorrect. b. Data analytics is a concept that continues to expand and evolve, but this particular field of digital information expertise or technology is often used within the healthcare, retail, gaming, and travel industries for immediate responses to challenges and business goals, whereas Data Science is used in major fields of corporate analytics, search engine engineering, and autonomous fields such as artificial intelligence (AI) and machine learning (ML). c. The primary goal of Data Science is to use the wealth of available digital metrics and insights to discover the questions that we need to ask to drive innovation, growth, progress, and evolution. With the main aim of using existing information to uncover patterns and visualize insights in specific areas, Data Analytics is geared toward sourcing actionable data based on specific aims, operations, and KPIs by using existing information to uncover patterns and visualize insights in specific areas. d. In the field of Data Science, a comprehensive understanding of SQL database and coding is required, in addition to a firm grasp of working with large sets of unstructured metrics, and insights; On the other hand, in the field of Data Analytics, a solid understanding of mathematics and statistical skills is essential, as well as programming skills and a working knowledge of online data visualization tools, and intermediate statistics. e. Data Science is a MACRO field of multidisciplinary, covering a wider field of data exploration, working with enormous sets of structured and unstructured data. On the other hand, Data Analytics is a MICRO field, drilling down into specific elements of business operations with a view to documenting departmental trends and streamlining processes either over specific time periods or in real time, therefore, concentrating mostly on structured data.
Which of the following is an example of big data management and analysis tools?
-a. Apache Hadoop p31 b. DataBase II c. Python
Which of the following computational techniques can be used in carrying out predictive analytics?
-a. All of the above b. Machine learning c. Data mining d. Neural networks
Descriptive tables share which of the following characteristics?
-a. All the above b. Measures of Central Tendency c. Measures of Dispersion d. Measures of Distribution
Diagnostics Analytics explains anomalies in Key Performance Indicators by identifying _____________ and causations among KPIs in enterprise data. Note that Key Performance Indicators (KPIs) include Revenue growth, Revenue per client, Profit margin, Client retention rate, and Customer satisfaction among others.
-a. Correlations b. Arbitrages c. Ensembles
Once an enterprise has identified its plan by implementing the planning analytics, then it moves on to figuring out what was happening to its business by carrying out descriptive analytics. Therefore, to determine patterns and trends in business and measure performance against Key Performance Indicators (KPIs), the Descriptive Analytics of Data Analytics follows five major steps: State business metrics, identify data required, ____________________, analyze data, and present data.
-a. Extract and prepare data b. Modify data c. Predict data
CRIPS-DM, KDD, and SEMMA are all data analytics methodologies. Which of the following is common to all these methods?
-a. They address both structured and unstructured data b. These are older methodologies that are not deployed any longer c. They are all about unstructured data d. These are distinct methodologies and there is no commonality amongst them
Select all that apply to the characteristics of data.
3 VS:Verbose,Variety,Volume
Which of the following tasks is a typical role of a Data Engineer?
Collect and clean data
Traditional field of statistical analysis is an overlap of ________________ and ____________ in three domains involved in a data science project
Science and Data
Key Performance Indicators (KPIs) along with descriptive analytics establish a common, data-driven language within the enterprise and identify trends, thresholds, anomalies, and other metrics that lead to the subsequent question of why something happened. Given the volume, velocity, and variety of current enterprise data, Diagnostic Analytics that is based on the combination of AI-infused software and the domain expertise of people will be the most effective means for answering the question: ___________.
What happened to business performance? -b. Why did a particular business outcome happen? c. To what degree is a particular business outcome predictable?
Which of the following is the role of a data scientist?
Which of the following is the role of a data scientist?
Which of the following is the role of a data analyst (or a business analyst or a citizen analyst or a journal analyst)?
a. Extracts insights from patterns using Machine Learning Models b. Tests and deploys Machine Learning Model -c. Explain and visualizes results
Which of the following is NOT the role of a data engineer?
a. Tests and deploys Machine Learning Model -b. Extracts insights from patterns using Machine Learning Models P32 c. Transforms raw data into usable pipelines d. Manages the data infrastructure
Which of the following is one of the most fundamental characteristics of a data scientist?
a. Using open-source software libraries and packages b. Being proficient in R or Python c. Having a strong background in high power computing (HPC) -d. Having a sense of curiosity about all things
Which of the following belongs to structured data?
a. XML/JSON b. Audio c. Documents -d. Delimited ASCII data
Which of the following is NOT an example of Technology in a data science project?
a. Big Data -b. Algorithms p26 Venn Diagram c. Programing d. Databases
Which of the following is more likely to serve as a liaison between the IT department and C-Suites?
a. Data Engineer b. Data Analyst -c. Data Scientist
Hadoop is an open-source software framework for storing data and running applications on clusters of commodity hardware. It provides massive storage for any kind of data, enormous processing power, and the ability to handle virtually limitless concurrent tasks or jobs. Which of the following is NOT a Hadoop-based technology?
a. Hive -b. Matlab c. MapReduce
Which of the following are examples of unstructured data? Select all that apply.
a. Records in IBM DB2 database b. CSV files c. Twitter feeds -d. Facebook images
Domain knowledge is often referred to as a general discipline or field to which data science is applied to. An expert or specialist in a field such as finance is said to possess domain knowledge of the banking industry. Therefore, domain knowledge belongs to ______________ in a data science project?
a. Technology b. Science -c. Data
Which of the following statements is true?
a. A Data Analyst captures domain knowledge for successful business alignment -b. All the above c. A Data Scientist transforms data into knowledge to solve business problems d. A Data Engineer architects how data is organized and ensures operability
Predictive analytics predicts future business outcomes. Then, prescriptive analytics uses these predicted outcomes as inputs to answer what an enterprise should do to achieve enterprise goals, such as maximizing profits, optimizing resource allocations, and minimizing losses. Specifically, combined with rules and constraint-based optimization, prescriptive analytics enable better decisions about what to do. In essence, the intent of prescriptive analytics is to ________ the decision-making process.
a. Analyze b. Quantify c. Automate
Data analytics is defined as a systematic process of transforming "existing" data in any form into actionable insights to practically provide decision support. These decision support capabilities of data analytics are well described in the Data Analytics Life Cycle: Planning Analytics, Diagnostic Analytics, Descriptive Analytics, ____________, and Prescriptive Analytics.
a. Automatic Analytics -b. Predictive Analytics c. Selective Analytics
The Planning Analytics of Data Analytics automates time-consuming manual budgeting, forecasting, and reporting processes and embeds advanced analytics into everyday decision-making. It can easily adjust planning models and deliver these adjusted planning models to downstream applications and provide insight to upstream
a. Consumers b. Data scientist -c. Decision makers
A methodology is a general strategy that guides the processes and activities within a given domain. Thus, a methodology provides the data scientist with a framework for how to proceed with whatever methods, processes, and heuristics will be used to obtain answers or
a. Data b. Questions -c. Results
To provide a framework for proceeding with the methods and processes in Data Science, IBM popularizes the eight sequential steps of Data Science Methodology that can be repeated constantly for data science teams to provide the best decision support. These steps include (1) Business Understanding, (2) Data Exploration and Preparation, (3) Data Representation and Transformation, (4) Data Visualization and Presentation, (5)_______________, (6) Validating Data Models, (7) Deployment of Data Models, and (8) Environment Feedback.
a. Data Refinement -b. Training Data Models c. Predicting Data Models
Predictive analytics is the domain of _______________, who are tasked with steps in the analytic workflow represented by the five categories: Identify business outcomes; Determine data required to train the model; Determine types of analysis; Validate model prediction results; and Test predictions on performance.
a. Data analysts b. Data engineers -c. Data scientists
Lecture 1 taught us that Data Science is an interdisciplinary field that uses Science, Data, and ____________ to extract knowledge and actionable insights from data in any forms (structured, semi-structured, and unstructured data).
a. Entrepreneurship b. Capital -c. Technology
As an enterprise is managed to a set plan, and descriptive and diagnostic analytics are used to measure execution against the plan and understand reasons for deviations from it, there is an opportunity to start _________________ to answer what is likely to happen next.
a. Ex-post Analytics -b. Predictive Analytics c. Predetermined Analytics
Which of the following best describes the key difference between descriptive and predictive analytics in the data analytics life cycle?
a. In descriptive analytics, you strive to find the best models, whereas in predictive analytics you visualize the trend of the data b. The difference is that in predictive analytics you decide what you should do to overcome operational obstacles found in your analysis. c. The key difference in descriptive analytics you diagnose why problems happened and identify key patterns. -d. In descriptive analytics you identify what happened and determine how you're performing against that plan; whereas, in predictive analytics you use patterns to predict future trends
Which of the following is NOT a function that planning analytics(i.e., IBM Planning Analytics) of data analytics will provide, if adopted?
a. Planning applications that are integrated with transactional applications have access to financial budgeting and forecasting data on demand. -b. Planning applications use the disconnected, siloed, and ungoverned planning platform in order to foster efficient planning processes. c. Driver-based plans and rolling forecasts enable this process to adapt quickly to changing business conditions. d. Continuous planning is deployed and plans are adjusted on demand.
A survey by KDNuggets showed that the top methodology used by Data Analytics professionals to extract value from data follows a six-step approach: [1] Business understanding (What does the business need?); [2] Data understanding (What data do we have/need? Is it clean?); [3] Data preparation (How do we organize the data for modeling?); [4] Modeling (What modeling techniques should we apply?); [5] Evaluation (Which model best meets the business objectives?); [6] Deployment (How do stakeholders access the results?). This data analytics methodology is known as ______________________.
a. SEMMA -b. CRISP-DM c. KDD
In analyzing data at the descriptive analytics cycle of business analytics, data analysts create models (simple statistical and quantitative ones) and run analyses, such as summary statistics, clustering, and regression analysis to extract trends and ____________ in business data (mostly structured data) and measure business performance, whereas data scientists use more advanced modeling to programmatically analyze and visualize data to compute trends and patterns and measure business performance.
a. Techniques -b. Patterns c. Values
The data science methodology (framework) includes the following 8 stages: Business Understanding, Data Exploration and Preparation, Data Representation and Transformation, Data Visualization and Presentation, ________________, Validate Data Models, ______________, and Environment Feedback.
a. Visualize Data Models/ Select Appropriate Models b. Categorize Data/ Deploy Models -c. Train Data Models/ Deploy Data Models d. Transform Unstructured Data into Structured Data/ Normalize Data
When you are trying to identify business causes and you are in the diagnostics phase of the data analytics life cycle, which of the following questions is most important to you at this stage?
a. What's your plan? -b. Why did it happen? c. What happens next? d. What happened?
Cognitive computing is a system that:
can handle massive amounts of unstructured data.