ITSS 4353 Exam 2

Ace your homework & exams now with Quizwiz!

Patsy is _____

a Python library for describing linear statistical models

In the Data Visualization in Python by Examples video, this code resulted in producing what kind of visualization? (Choose the best answer).

a bar plot

IPython is:

a better interactive python interpreter

When there are no specific outcomes, such as in a discovery process, which method can best help you understand patterns of customers or entities in terms of their characteristics?

a clustering model

SciPy is:

a collection of scientific computing packages

What is a data aggregation?

a data transformation that produces scalar values from arrays

The first/best way of discerning patterns with known outcomes is done with which kind of a model?

a decision tree model

What is the DOE methodology?

a design of experiments (DOE) to understand the relationship between factors affecting a process and the output of that process

What is a ndarray

a fast, flexible container for large datasets

What is a tuple?

a fixed-length, immutable sequence of Python objects

What is KNIME?

a free & open-source data analytics, reporting and integration platform

scikit-learn is:

a machine learning toolkit

What does a regex describe?

a pattern to locate in text

BI visualization is:

a recommended final input into driving business decision-making

Anaconda is:

a recommended installer for python

What is a tuple?

a sequence or list

Choose all that apply. Examples of structured data include:

multidimensional matrices unevenly spaced time series multiple tables of data interrelated by key columns spreadsheet or CSV

Syntactic sugar is:

programming syntax that simplifies

Exploratory Data Analysis (EDA) is:

required and a pre-requisite for building more effective analytics models

Data Normalization is to:

rescale & arrive at values relative to some size variable

Data security, such as what we saw in the Target and JP Morgan Chase case studies, needs to cover ___________________________________________.

the entire value chain of customer data

In [13]: time_zones = [rec['tz'] for rec in records if 'tz' in rec] In [14]: time_zones[:10] Out[14]: ['America/New_York', 'America/Denver', 'America/New_York', 'America/Sao_Paulo', 'America/New_York', 'America/New_York', 'Europe/Warsaw', '', '', '']

the first 10 timezones we want to look at

What is the Medici effect?

the integration of concepts from varying cultures & industries/ideas

Which indicates the best grouping method?

the lowest degree of impurity

The 3 V's of Big Data are:

Variety Velocity Volume

When cleaning up data for analysis, it is often important to do analysis on the missing data itself to identify data collection problems or potential biases in the data caused by missing data.

True

The analytics maturity of any company can be measured along five dimensions, as the delta model. Apply the correct label to each of the five dimensions in the Analytics Maturity Model.

1. C. Data Access 2. D. Enterprise Focus 3. B. Analytical Leadership 4. A. Strategic Targets 5. E. Analytical Talents

The power of real insights come when _______1______ from diverse sources can be ________2_________.

1. Multiple Data 2. Integrated Together

Put the steps of the Analytics Business Value Chain in the right sequence:

1. New Business Challenges & Questions 2. Data Audit, Preparation & Exploration 3. Analytics Knowledge Discovery 4. Test & Learn Knowledge Management 5. Execution & Continuous Optimization

Apply the correct label to each of the five levels in the Analytics Maturity Model.

1.E. Analytically Impaired 2.D. Analytically Aware 3.A. Analytically Inspired 4.C. Analytically Proficient 5.B. Analytical Leaders

Match up the components of this privacy hierarchy and level of intrusiveness:

A. Level of Privacy Intrusiveness B. Access C. Identity D. Pooling

Classification and regression are examples of _________A_________, while hidden Markov and principal component analysis and clustering are examples of __________B________.

A. Supervised Learning B. Unsupervised Learning

Choose the correct statement.

Business analytics is focused on the process of discovery of actionable knowledge and the creation of new business opportunities from such knowledge.

Predicted Lifetime Value (LTV) of a customer can be predicted using:

Buyer Differential Characteristics (BDC) except substituting transactions with net present value (NPV) of all revenue/profit streams through the customer life cycle with the business

All of the descriptive statistics on pandas objects ___________________ missing data by default.

Exclude

Which of these is a sentinel value in pandas, that can be easily detected?

For numeric data, pandas uses the floating-point value NaN (Not a Number) to represent missing data.

What is JSON?

JavaScript Object Notation

Common causes for senior management's lack of attention to how analytics is used are:

Lack of understanding and trust Lost in translation Over reliance on what they know Lack of strategic vision of analytics' uses

The point of contact between pandas and other analysis libraries is usually ____

NumPy Arrays

NumPy stands for:

Numerical Python

Choose the correct statement about leverage in analytics testing.

Once the possible levers are identified, the incremental effects of different levers can then be isolated and validated with tests (with properly designed multicell control groups) design of experiments (DOE).

What is PII?

Personally Identifiable Information

What does this formula represent? P(B|A)= P(A&B)/P(A) where A is a price point, and B is product purchase

Predicted pricing sensitivity

When we think of workable data quality, we understand that data quality is _________________, especially for business data.

Relative

Consider a scenario where a customer marketing campaign failed, attributed to analytics. Choose the correct analysis:

This is likely due to a mismatch between marketing messages, offers, and target customers. A better way is to create an intersection between the marketing team responsible for the contact collaterals and the analytics team.

A common workflow for model development is to use pandas for data loading and cleaning before switching over to a modeling library to build the model itself.

True

Adopt consistent data strategy and invest in acquiring, cleansing and searching, and integrating with new data sources. Establish a data council to oversee data governance and formulate progressive privacy and security policies.

True

In Python, mixing functions with arrays, dicts, or Series is not a problem as everything gets converted to arrays internally.

True

Leverage past histories to predict future outcomes and trends. Identify and optimize levers (promotions, sales, new products) in scenario simulations and predict levels of investments and ROIs.

True

NumPy is designed for efficiency on large arrays of data.

True

Pandas provides many built-in time series tools and data algorithms. You can efficiently work with very large time series and easily slice and dice, aggregate, and resample irregular- and fixed-frequency time series.

True

Python has both built-in and third-party libraries for converting a JSON string into a Python dictionary object

True

Python is an interpreted language.

True

In python, bisect means to:

binary search and insert into a sorted list

In the Data Visualization in Python by Examples video, you learned what syntax/command to print off the first ten rows?

data.head()

Select the six steps in the BAP (business analytics process):

deployment business objectives test and learn modeling data preparation data audit

Select the four generally agreed on types of activities / solutions for analytics:

descriptive predictive diagnostic prescriptive

In the Data Visualization in Python by Examples video, you learned what syntax/command to read in the data?

df = pd.read_csv("data-disasters.csv")

Select the most effective digital marketing tactics for customer retention.

email marketing

Business needs to give value to consumers to be permitted access to their data. Research indicates the three greatest incentives are:

exclusive deals loyalty points cash rewards

In python, pandas are:

high-level data structures and functions designed to make working with structured or tabular data fast, easy, and expressive

A lift chart shows what?

how a predictive model performs

Fill in the blank. The simplest and most widely used kind of time series are those _______________.

indexed by timestamp

Given the choice between using an INSERT or an APPEND in python, which is more "computationally expensive"?

insert

What does the pip or conda install command do?

installs packages into your python environment

What kind of language is python?

interpreted

How are object references in python typed?

no type associated (strongly)

Given this code, what does the set of three single quotes ' ' convey, in the output? In [13]: time_zones = [rec['tz'] for rec in records if 'tz' in rec] In [14]: time_zones[:10] Out[14]: ['America/New_York', 'America/Denver', 'America/New_York', 'America/Sao_Paulo', 'America/New_York', 'America/New_York', 'Europe/Warsaw', '', '', '']

unknown/empty strings

How is code structured in python?

using whitespace (tabs/spaces)

Select the channels that are used the most for customer acquisition:

website and email

Select typical/possible components of an ERP system:

Service Management Supply Chain Management Financial and Managerial Accounting Human Resources

Select the correct statement.

Significantly more companies have a greater focus on customer acquisition over those that focus on retention.

To optimize ROI, management may test and find ways to minimize costs and increase conversions among various tiers of customers.

Some customers may need to be fired for having small wallets and are not worth pursuing, whereas others with large wallets and low shares may not be worth the high costs to pursue them, also indicating that the fundamental value proposition and focus may not appeal to this group of customers.

It is best to always focus on the simple descriptive and diagnostic analytics over needlessly advanced predictive and prescriptive analytics.

False

NumPy internally stores data in non-contiguous blocks of memory, dependent on other built-in Python objects.

False

Python runs a lot faster than Java or C++ compiled code.

False

What the customer purchased before, or the customer's web browsing history are examples of ________A__________. The likelihood of a customer being the target is an example of a _________B__________.

A. independent variable B. dependent variable

Select the correct statement.

Anything that is observed or measured at many points in time forms a time series.

With a robust analytics strategy and the right tools and processes, it is not necessary to focus on hiring or training analytics talent to recognize patterns in customers and product purchases.

False

Choose all valid scenarios related to predicting outcomes and testing potential levers.

Given everything we know, could we have predicted this year's results from last year's data. How can we baseline predictions and test effective levers to determine what we could do differently. Taking validated predictions, can we confidently predict next year's results with this year's data. What confidence do we have in business outcomes, before major investments are made.

Choose the correct statement.

While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. NumPy, by contrast, is best suited for working with homogeneous numerical array data.

Select the correct statement.

With a DataFrame, you can sort by index on either axis.

Given these two lines of code, select the 3rd line of code necessary to produce the bar plot shown above. In [36]: import seaborn as sns In [37]: subset = tz_counts[:10]

In [38]: sns.barplot(y=subset.index, x=subset.values)

What does the read_csv function do?

load a comma separated value file into a dataframe load a comma separated value file or PDF into a dataframe

______ is a desktop plotting package designed for creating (mostly two-dimensional) publication-quality plots.

matplotlib

Select all examples of optimized groupby methods (data aggregation).

min sum max mean count

When data contains significant missing data, there are a few ways to alleviate the impacts or assess the model effectiveness under such imperfect conditions. Choose all the correct options.

use data imputation eliminate the columns of independent variables with questionable quality additional data cleansing and augmentation eliminate the rows (of customers) with questionable reliability

A big benefit of using NumPy is the ability to write math functions for fast operations on entire arrays of data _______________________.

without having to write for-loops

What are the three stages of a simple group aggregation?

split-apply-combine

Select all examples of time series that could be modeled in business:

stock prices inventory levels profit & loss

Given this code, what is the expected output? In [10]: from datetime import datetime In [11]: now = datetime.now() In [12]: now Out[12]: datetime.datetime(2017, 10, 17, 13, 34, 33, 597499) In [13]: now.year, now.month, now.day

(2017, 10, 17)

To predict a customer's propensity to buy certain product, the workflow would be:

(Spend + Demographics Data) + (P(Buy) model) + (DoE)

Given this code, what is the expected graphical output? Look at the graphs, and then select A, B, C or D. In [238]: close_px.AAPL.plot( ) In [239]: close_px.AAPL.rolling(250).mean( ).plot( )

A

Select the correct statement.

A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.)

Select the correct statement(s).

A data analyst/business analytics professional works with structured/unstructured data and utilizes programming and visualization to recommend business strategies. A business analyst focuses on processes and requirements such as the software development lifecycle and utilizes Excel over programming as a key tool.

In the Business Analytics Workflow reviewed via in-class lecture/video, select the correct combination to replace the red and purple ? (question) marks.

Data Mining and Predictive Analytics

Choose the two work horse data structures used in pandas

Dataframe Series

In the different types of Analytics reviewed via in-class lecture/video, select what does NOT belong.

Disruptive Analytics

If you execute the following code, what will be the result? In [12]: import numpy as np In [13]: data = np.arange(10) In [14]: data Out[14]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [15]: plt.plot(data)

a simple 45-degree line plot starting from (0,0)

What is Seaborn

a statistical graphics library

What is a ufunc

a universal function

In python, dict is:

a very important built-in data structure, also known as hash map or associative array

What is a tag cloud?

a visual depiction of user-generated tags attached to online content, typically using color and font size to represent the prominence or frequency of the tags depicted.

What is a lambda function?

an anonymous function

What is a dot used for in python?

an array method & function for matrix multiplication

ANOVA is ______

analysis of variance methods

Generic time series in pandas are ____

assumed to be irregular; that is, they have no fixed frequency

GroupBy operations can be significantly faster with __________________ because the underlying algorithms use the integer-based codes array instead of an array of strings.

categorical

If you do a lot of analytics on a particular dataset, converting to ______________ can yield substantial overall performance gains. A ________________ version of a DataFrame column will often use significantly less memory.

categorical

Pandas has a special ___________ type for holding data that uses the integer-based _________ representation or encoding. You can achieve better performance and memory use in some pandas operations.

categorical

What are the four key types of analytics reviewed in Ch.9 (Applied Business Analytics)?

descriptive, diagnostic, predictive, prescriptive

Built-in aggregate functions like 'mean' or 'sum' are typically ____________________ than/as a general applyfunction.

faster

What is the pandas method to use to fill in holes (missing) in data?

fillna

What is NaN?

not a number

Scikit-learn is ______

one of the most widely used and trusted general-purpose Python machine learning toolkits

The cost of changing or adapting operational systems to embed advanced analytics is __________________.

prohibitively high

What does the plt.savefig method do?

save a plot figure to file

Select two popular python modeling toolkits:

scikit-learn statsmodels


Related study sets

Unit 1: The Revelation of Jesus Christ, Quiz 1: Glorious Christ and His People

View Set

Anatomy and physiology 2: Respiratory system

View Set

Epidemiology Lecture 3 - Observational Studies: Cohort Studies

View Set

IB Design Technology Topic 1-6 (except 4)

View Set

Chapter 53: Assessment and Management of Patients with Male Reproductive Disorders

View Set