BA Analytics Exam 2 Part 1
What is a data aggregation?
a data transformation that produces scalar values from arrays a series utilizing dataframes
What are the three stages of a simple group aggregation?
split-apply-combine
What is seaborn?
statistical graphics library
Select the correct statement: A DataFrame represents a parallel table of data and contains an ordered collection of columns, each of which can be the same value type (numeric, string, boolean, etc.) A DataFrame represents a rectangular table of data and contains an unordered collection of columns, each of which must be the same value type (numeric, string, boolean, etc.) Correct A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.) A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which must be the same value type (numeric, string, boolean, etc.)
A DataFrame represents a rectangular table of data and contains an ordered collection of columns, each of which can be a different value type (numeric, string, boolean, etc.)
True or False: When cleaning up data for analysis, it is often important to do analysis on the missing data itself to identify data collection problems or potential biases in the data caused by missing data.
True
To predict a customer's propensity to buy certain product, the workflow would be:
(Spend + Demographics Data) + (P(Buy) model) + (DOE)
Choose all valid scenarios related to predicting outcomes and testing potential levers
- Taking validated predictions, can we confidently predict next year's results with this year's data? - Given everything we know, could we have predicted this year's results from last year's data? - How can we baseline predictions and test effective levers to determine what we could do differently? -What confidence do we have in business outcomes, before major investments are made?
What does this formula represent? P(B|A)= P(A&B)/P(A) where A is a price point, and B is product purchase
Predicted pricing sensitivity
Apply the correct label to each of the five levels in the Analytics Maturity Model.
1. Analytically Impaired 2. Analytically Aware 3. Analytically Inspired 4. Analytically Proficient 5. Analytical Leaders
The analytics maturity of any company can be measured along five dimensions, as the delta model. Apply the correct label to each of the five dimensions in the Analytics Maturity Model.
1. Data Access 2. Enterprise Focus 3. Analytical Leadership 4. Strategic Targets 5. Analytical Talents
Select the correct statement A data analyst professional works with Python, R and R studio to provide visualization throughout the software development lifecycle. He/she utilizes Python/R or Excel interchangeably as a key tool. A data analyst/business analytics professional and business analyst are all different titles for essentially the same job function. He/she works with structured/unstructured data and utilizes programming and visualization to recommend business strategies. A data analyst focuses on processes and requirements such as the software development lifecycle and utilizes Excel over programming as a key tool. A business analyst is also known as a business analytics professional and utilizes programming and visualization to recommend business strategies. A business analytics professional focuses on processes and requirements such as the software development lifecycle and utilizes Excel over programming as a key tool. A data analyst/business analytics professional works with structured/unstructured data and utilizes programming and visualization to recommend business strategies. A business analyst focuses on processes and requirements such as the software development lifecycle and utilizes Excel over programming as a key tool.
A data analyst/business analytics professional works with structured/unstructured data and utilizes programming and visualization to recommend business strategies. A business analyst focuses on processes and requirements such as the software development lifecycle and utilizes Excel over programming as a key tool.
What does a regex describe?
A pattern to locate in text
What is a tuple?
A sequence or list
Predicted Lifetime Value of a customer can be predicted using:
Buyer Differential Characteristics (BDC) except substituting transactions with net present value (NPV) of all revenue/profit streams through the customer life cycle with the business
True or False: It is best to always focus on the simple descriptive and diagnostic analytics over needlessly advanced predictive and prescriptive analytics?
False
True or False: With a robust analytics strategy and the right tools and processes, it is not necessary to focus on hiring or training analytics talent to recognize patterns in customers and product purchases
False
Which of these is a sentinel value in pandas, that can be easily detected?
For numeric data, pandas used the floating-point value NaN to represent missing data
What is JSON?
JavaScript Object Notation
What is NaN?
Not a numerical
Choose the correct statement about leverage in analytics testing
Once the possible levers are identified, the incremental effects of different levers can then be isolated and validated with tests (with properly designed multicell control groups) design of experiments (DOE)
What does the plt.savefig method do?
Save a plot figure to file
Select the correct statement. Significantly more companies have a greater focus on customer acquisition over those that focus on retention. Companies tend to have about the same focus on acquisition as they do on customer retention. Customer retention tends to be a much higher priority among companies than acquiring new customers.
Significantly more companies have a greater focus on customer acquisition over those that focus on retention.
To optimize ROI, management may test and find ways to minimize costs and increase conversions among various tiers of customers.
Some customers may need to be fired for having small wallets and are not worth pursuing, whereas others with large wallets and low shares may not be worth the high costs to pursue them, also indicating that the fundamental value proposition and focus may not appeal to this group of customers.
True or False: Leverage past histories to predict future outcomes and trends. Identify and optimize levers (promotions, sales, new products) in scenario simulations and predict levels of investments and ROIs
True
Consider a scenario where a customer marketing campaign failed, attributed to analytics
This is likely due to a mismatch between marketing messages, offers, and target customers. A better way is to create an intersection between the marketing team responsible for the contact collaterals and the analytics team.
In Python, mixing functions, arrays, dicts, or Series is not a problem as everything gets converted to arrays internally
True
True or False: Adopt consistent data strategy and invest in acquiring, cleansing and searching, and integrating with new data sources. Establish a data council to oversee data governance and formulate progressive privacy and security policies.
True
Select the correct statement. While NumPy adopts many coding idioms from pandas, the biggest difference is that NumPy is designed for working with tabular or heterogeneous data. Pandas, by contrast, is best suited for working with homogeneous numerical array data. While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. NumPy, by contrast, is best suited for working with homogeneous numerical array data.
While pandas adopts many coding idioms from NumPy, the biggest difference is that pandas is designed for working with tabular or heterogeneous data. NumPy, by contrast, is best suited for working with homogeneous numerical array data.
Select the correct statement. With a DataFrame, you can sort by index on either axis. With a DataFrame, you can sort by index on only one axis. With a DataFrame, you cannot sort.
With a DataFrame, you can sort by index on either axis
If you execute the following code, what will be the result? In [12]: import numpy as np In [13]: data = np.arange(10) In [14]: data Out[14]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]) In [15]: plt.plot(data)
a simple 45-degree line plot starting from (0,0)
Select all examples of optimized groupby methods (data aggregation) multiply count sum divide groupby ( ) subtract min aggregate max pd.dataframe mean
count sum min max mean
Choose the two work horse data structures used in pandas
dataframe and series
What are the four key types of analytics reviewed in Ch. 9?
descriptive, diagnostic, predictive, prescriptive
Select the most effective digital marketing tactics for customer retention
email marketing
All of the descriptive statistics on pandas objects ___________ missing data by default
exclude
What is the pandas method to use to fill in holes (missing) in data?
fillna
What does the pip or conda install command d o?
installs packages into your python environment
What does the read_csv function do?
load a comma separated value file or PDF into a dataframe
___________ is a desktop plotting package designed for creating (mostly two-dimensional) publication-quality plots
matplotlib
When data contains significant missing data, there are a few ways to alleviate the impacts or assess the model effectiveness under such imperfect conditions. Choose all the correct options. revert to gut-based decision making as analytics are not possible eliminate the columns of independent variables with questionable quality find some other alternative to analytics-based decision making use data imputation eliminate the rows (of customers) with questionable reliability additional data cleansing and augmentation
use data imputation eliminate the rows with questionable reliability additional data cleansing and augmentation eliminate the columns of independent variables with questionable quality
Select the channels that are used the most for customer acquisition
website and email