MATH144: Introduction to Data Science Mod 1 Reviewer
Based on the following results of logistic regression, what is the likelihood of churning when Age = 40 and Churned_contacts = 5? (Note: Round coefficients up to 2 decimal places) Estimate (Intercept) 3.415201 *** Age -0.1566643 *** Churned_contacts 0.382324 *** 0.714 0.623 0.357 0.269
0.269
In predicting Sales Revenue using Newspaper Ads Expenses, we have the following regression results Predicted sales (y) = 12.3514 + 0.0547(# of radio ads) Estimate the predicted sales if newspaper ads expenses is 60 units. 15.6 17.4 19.2 20.8
15.6
In predicting Sales Revenue using TV and Radio Ads Expenses, we have the following regression results Predicted sales (y) = 2.921 + 0.046(# of TV) + 0.188(# of radio ads) Estimate the predicted sales if tv and radio ads expenses are 200 and 50 respectively. 19.3 21.5 23.7 25.9
21.5
Which of the following is TRUE about the logistic function? I. As the value of y increases, the likelihood of the event f(y) also increases. II. The values of y are not directly observed but rather, only the value of f(y) in terms of success or failure is observed. I only II only both I and II neither I nor II
both I and II
Which of the following is a deliverable under the operationalize phase? Presentation for project sponsors Presentation for analysts Technical specifications of implementing the code All of the Above
All of the Above
Which of the following key questions are helpful to ask during the discovery phase when interviewing the project sponsor? What is the desired outcome of the project? What data sources are available? What data sources are available? What industry issues may impact the analysis? All of the Above
All of the Above
Which of these attributes stand out as defining Big Data characteristics? * Huge volume of data * Complexity of data types and structures * Speed of new data creation and growth * All of the Above
All of the Above
The following are recurring sets of activities that data scientist performs EXCEPT * Reframe business challenges as analytics challenges. * Design, implement, and deploy statistical models and data mining techniques on Big Data. * Provide technical expertise to support analytical projects such as provisioning and administrating analytical sandboxes. * Develop insights that lead to actionable recommendations.
Provide technical expertise to support analytical projects such as provisioning and administrating analytical sandboxes
The following are examples of applications for logistic regression EXCEPT: A model on patient's successful response to a specific medical treatment with variables including age, weight, blood pressure, and cholesterol levels. A churn model for a customer switching to a new network given age and number of contacts who churned. A model to determine the relationship of amount of income given age, education, number years working and gender. A model to determine the likelihood of a person buying a new automobile given age, income and gender.
A model to determine the relationship of amount of income given age, education, number years working and gender.
The following activities are involved under the model planning phase EXCEPT: Assess the structure of the datasets. Ensure that the analytical techniques enable the team to meet the business objectives and accept or reject the working hypotheses. Evaluate whether similar, existing approaches are available or if the team will need to create something new. Assess the validity of the model and its results.
Assess the validity of the model and its results
Which of the following group of players in the data value chain makes sense of the data collected from various entities?
Data Aggregators
This refers to the process of cleaning data, normalizing datasets, and performing transformations on the data.
Data Conditioning
Examples that fall under this group includes financial analysts, market research analysts, life scientists, operations managers, and business and functional managers.
Data Savvy Professionals
Which of the following key roles in the new big data ecosystem has members who possess a combination of skills to handle raw, unstructured data and to apply complex analytical techniques at massive scales?
Deep Analytical Talent
The following is part of the data preparation phase EXCEPT: Performing ETLT Survey and Visualize Developing Initial Hypothesis Preparing the Analytic Sandbox
Developing Initial Hypothesis
In this phase of the data analytics life cycle, the team assesses the resources available to support the project in terms of people, technology, time, and data.
Discovery
Prior to any regression modelling, the data should always be inspected for the following EXCEPT: Data - entry errors Expected pattern Outliers Missing values
Expected pattern
In creating robust models, the following questions needs to be considered EXCEPT: Does the model avoid intolerable mistakes? How consistent are the contents and files? Do any of the inputs need to be transformed or eliminated? Will the kind of model chosen support the runtime requirements?
How consistent are the contents and files?
Which of the following describe the decade beyond 2010 in regards to big data? I. In this era, everyone and everything is leaving a digital footprint. II. Data volumes in this decade are measured in terms of petabytes. * I only * II only * both I and II * neither I nor II
I only
Which of the following is TRUE about logistic regression? I. When the outcome variable is categorical in nature, logistic regression can be used to predict the likelihood of an outcome based on the input variables. II. Logistic regression can only be applied to an outcome variable with two values such as true/false, pass/fail, or yes/no. I only II only both I and II neither I nor II
I only
Which of the following is TRUE about the final phase of data analytics life cycle? I. In the final phase, the team communicates the benefits of the project more broadly and sets up a pilot project to deploy the work in a controlled way before broadening the work to a full enterprise or ecosystem of users. II. Under this phase, the team reflect on the project and consider what obstacles were in the project and what can be improved in the future as well as make recommendations for future work or improvements to existing processes. I only II only both I and II neither I nor II
I only
Which of the following is always TRUE about Big Data? I. Due to its size or structure, Big Data cannot be efficiently analyzed using only traditional databases or methods. II. Although the variety of Big Data tends to attract the most attention, generally the volume and velocity of the data provide a more apt definition of Big Data. * I only * II only * both I and II * neither I nor II
I only
Which of the following is/are ALWAYS TRUE about regression analysis? I. It's the technique used most frequently to analyze the relationship between two or more variables. II. Predictor variables could either be discrete or continuous. I only II only both I and II neither I nor II
I only
Among the business drivers that push businesses to become more analytical and data-driven, this one involves customer churn, fraud, and default. Identify Business Risk
Identify Business Risk
Which of the following are free or open source tools available for data analytics practitioner? SAS Enterprise Miner SPSS Modeler Octave Alpine Miner
Octave
In this phase of the data analytics life cycle, the team delivers final reports, briefings, code, and technical documents.
Operationalize
Which of the following activity is NOT involve in identifying potential data sources? Capture aggregate data sources Evaluate the data structures and tools needed Perform extract, transform, load processes to data Scope the sort of data infrastructure needed
Perform extract, transform, load processes to data
The following characterizes inferential statistics EXCEPT: Draw conclusions for a larger group/data Determine relationships Present data Make prediction
Present data
Which of the following person provides the funding and gauges the degree of value from the final outputs of the working team in a data analytics project? Project Manager Project Sponsor Business Intelligence Analyst Business User
Project Sponsor
The following activities is part of the discovery phase EXCEPT * The team determine how much business or domain knowledge the data scientist needs to develop models. * The team catalog the data sources that the team has access to and identify additional data sources that the team can leverage. * The team identify the main objectives of the project, identify what needs to be achieved in business terms, and identify what needs to be done to meet the needs. * The team identify the key stakeholders and their interests in the project.
The team catalog the data sources that the team has access to and identify additional data sources that the team can leverage.
This type of data has no inherent structure, which may include text documents, PDFs, images, and video.
Unstructured Data
Which of the following is TRUE about data analytics life cycle? I. A common mistake made in data science projects is rushing into data collection and analysis, which precludes spending sufficient time to plan and scope the amount of work involved, understanding requirements, or even framing the business problem properly. II. Having a good data analytics process ensures a comprehensive and repeatable method for conducting analysis and helps focus time and energy. * I only * II only * both I and II * neither I nor II
both I and II
The data now is said to come from many sources including * Photos and video footage uploaded to the World Wide Web * Nontraditional IT devices, including the use of radio-frequency identification (RFID) readers, GPS navigation systems, and seismic processing * Medical information, such as genomic sequencing and diagnostic imaging * All of the Above
All of the Above
Which of the following are activities done under phase 5 of data analytics life cycle? The team determine if it succeeded or failed in its objectives. The team reflect on the implications of these findings and measure the business value. The team record all the findings and then select the three most significant ones that can be shared with the stakeholders. All of the Above
All of the Above
Which of the following are problems encountered in traditional data architecture? * High-value data is hard to reach and leverage, and predictive analytics and data mining activities are last in line for data. * Data scientists are limited to performing in-memory analytics which will restrict the size of the datasets they can use. * Data Science projects will remain isolated and ad hoc, rather than centrally managed. * All of the Above
All of the Above
These are centralized data containers in a purpose-built space that supports business intelligence and reporting but restricts robust analyses.
Data Warehouses
Based on the following results of logistic regression, which of the following statements is/are TRUE? Estimate (Intercept) 3.415201 *** Age -0.1566643 *** Married 0.066432 Cust_years 0.017857 Churned_contacts 0.382324 *** Signif. codes: *** I. For every 1 unit increased in Age, the value of logistic function increases by 0.16. II. The regression coefficient for the Married variable is not significant. I only II only both I and II neither I nor II
II only
Which of the following TRUE about the differences of Business Intelligence (BI) and Data Science? I. Where Data Science problems tend to require highly structured data organized in rows and columns for accurate reporting, BI projects tend to use many types of data sources, including large or unconventional datasets. II. Data Science tends to be more exploratory in nature and may use scenario optimization to deal with more open-ended questions. * I only * II only * both I and II * neither I nor II
II only
Which of the following is TRUE about model planning? I. Under this phase, the team develop datasets for training, testing, and production purposes. II. Data Exploration, Variable and Model selection characterize this phase. I only II only both I and II neither I nor II
II only
The following are the skillsets and behavioral characteristics a data scientist must possess EXCEPT * Qualitative skill * Curious and creative * Skeptical mindset and critical thinking * Communicative and collaborative
Qualitative skill
Which of the following is TRUE about model building? I. The phases of model planning and model building can overlap quite a bit, and in practice one can iterate back and forth between the two phases for a while before settling on a final model. II. Although the modeling techniques and logic required to develop models can be highly complex, the actual duration of this phase can be short compared to the time spent preparing the data and defining the approaches. I only II only both I and II neither I nor II
both I and II
Which of the following is true about the current analytical architecture? I. Data sources are first loaded into the data warehouse where data needs to be well understood, structured, and normalized with the appropriate data type definitions. This kind of centralization enables security, backup, and failover of highly critical data. II. Once in the data warehouse, data is read by additional applications across the enterprise for BI and reporting purposes. These are high-priority operational processes getting critical data feeds from the data warehouses and repositories. * I only * II only * both I and II * neither I nor II
both I and II
Which of the following statements is/are ALWAYS TRUE? I. Inferential statistics consists of Estimation and Hypothesis Testing II. The link between inferential and descriptive statistics is probability I only II only both I and II neither I nor II
both I and II
Which of the following describe the key role of Data Engineer? * provides access to key databases or tables and ensuring the appropriate security levels are in place related to the data repositories. * executes the actual data extractions and performs substantial data manipulation to facilitate the analytics. * provides subject matter expertise for analytical techniques, data modeling, and applying valid analytical techniques to given business problems. * gives business domain expertise based on a deep understanding of the data, key performance indicators (KPIs), key metrics, and business intelligence from a reporting perspective.
executes the actual data extractions and performs substantial data manipulation to facilitate the analytics.
Which of the following is/are ALWAYS TRUE about simple regression? I. Simple regression attempt to predict the dependent variable using more than one independent variable. II. Simple regression consists of one regression coefficient for each explanatory variable. I only II only both I and II neither I nor II
neither I nor II
