ABA Final Exam Prep
After using K-NN, you find out that the predicted class is the majority class for all the observations. What value of K would lead to this scenario?
1 y n ** Explanation: When K = n, all predicted points are in the neighborhood. This means that the predicted class is the majority class. 100
Python 3 has been in use since
2017. 2008** 2000. 2023.
What is the k value for AICc?
2nn/n-df−1 ** log(n) df 2
What is the k value for BIC (i.e., BIC(βˆ)=dev(βˆ)+k×df ) ?
2nn−df−1 2 df log(n) **
What is the log penalty based on?
A diminishing bias that encourages many zeros while allowing large signals to show** A squared term that gives little penalty to small β but increasingly larger penalties as β increases The absolute value penalty A combination of the absolute value penalty and the Lasso
Which of the following best describes a document term matrix?
A high-dimensional sparse model matrix** A high-dimensional dense model matrix A low-dimensional sparse model matrix A low-dimensional dense model matrix
Consider the following statement: A sampling distribution is a sample of data displayed on a histogram. What is wrong with this statement?
A histogram contains actual data, while a sampling distribution contains theoretical data. A histogram of sample data shows one iteration of the hypothetical samples. A sampling distribution shows a sample statistic from many hypothetical examples.** Each bar on a histogram becomes one datapoint on a sampling distribution. A histogram contains theoretical data, while a sampling distribution contains actual data.
How does the choice of K affect K-NN?
A larger value of K results in a larger neighborhood of data on which the predicted class is based.** A larger value of K uses a numeric matrix for training data, while a smaller value uses a binary coordinate system. A smaller value of K results in "smoother," more accurate predictions. A smaller value of K assumes a larger Euclidean distance between x and its nearest neighbors.
What is bag of words?
A list of common words removed during tokenization Mapping raw text to either counts or sequences of words or phrases A representation where tokens are words and the vectorization maps to word counts** A matrix comprising the vectorization of various documents
What is the implication of increasing the λ parameter in the Lasso?
A noisier model A less sparse model A sparser model** A greedier model
What is tokenization?
A numerical measure assigned to each token and representing the document as a vector A preprocessing technique that strips words down to the stem A representation where tokens are words and the vectorization maps to word counts Mapping raw text to either counts or sequences of words or phrases**
What is vectorization?
A preprocessing technique that strips words down to the stem A representation where tokens are words and the vectorization maps to word counts Mapping raw text to either counts or sequences of words or phrases A numerical measure assigned to each token and representing the document as a vector**
What is the Lasso based on?
A squared term that gives little penalty to small β but increasingly larger penalties as β increases A combination of the absolute value penalty and the log penalty The absolute value penalty** A diminishing bias that encourages many zeros while allowing large signals to show
Why is scaling the data essential prior to the distance calculation in K-NN?
A value for K cannot be selected until the data is scaled. It supports model selection based on the complete dataset. It places weights distant data points more heavily so they are not discriminated against. It allows for relative comparison of distance along differing scales.**
Suppose a company is seeking to identify consumers who will buy their product. Within this context, what is the definition of true positive?
Any customer who buys the product A customer predicted to buy the product who actually buys it** The probability someone who bought the product was predicted to buy it A customer predicted to not buy the product who doesn't buy it
Suppose a company is seeking to identify consumers who will buy their product. Within this context, what is the definition of false positive?
Any customer who does not buy the product A customer predicted to not buy the product who doesn't buy it The probability someone who bought the product was predicted to buy it A customer predicted to buy the product who doesn't buy it**
Visual Basic for ________ (VBA) is included in Excel and offers the capability to create custom macros and functions that are designed to accomplish organizational specific tasks.
Applications** Accountants Analytics Amortization
How do sensitivity and specificity relate to the cutoff in a classification problem?
As the cutoff increases, more predictions fall into the 0 class, which leads to a higher specificity and lower sensitivity.** Specificity and sensitivity both decrease as the cutoff increases. Only specificity is affected as the cutoff increases because it is concerned with cases where y = 0. More predictions fall into the 1 class as the cutoff increases, which leads to higher specificity and higher sensitivity.
If you prefer a simpler model, which information criteria are best?
BIC** AIC AICc All are equally simple
For a given set of predictor variable values, why is the 95% prediction interval wider than the 95% confidence interval (CI) for the mean?
Because a prediction interval is always more accurate than a confidence interval Because the standard error of predictions is always less certain than that of the confidence interval Because the CI includes the error in observations via the residuals Because the uncertainty (standard error) of the prediction includes both the standard error for the fit (from the 95% CI) and the error in the observations via the residuals**
Why are Lasso techniques particularly useful for text regression?
Because they work well with high-dimensional sparse model matrices** Because they increase the dimension of the text data before trying to use it as input Because they work well with latent, low-dimensional model matrices Because they enable you to increase the term frequency of the document
How are degrees of freedom measured in the Lasso?
By the mean of the in-sample correlation between y and yˆ By the number of nonzero estimated parameters in the model** By multiplying the number of parameters in the model by 2 By the total number of coefficients in the model
Which of the following are dangers of focusing exclusively on statistically significant p-values?
Check All That Apply In large sample sizes, tiny effects may be statistically significant but practically insignificant.In large sample sizes, tiny effects may be statistically significant but practically insignificant. ** In small sample sizes, tiny effects may become statistically and practically insignificant.In small sample sizes, tiny effects may become statistically and practically insignificant. Ignoring p-values below a certain threshold (p-hacking) can lead to bias.Ignoring p-values below a certain threshold (p-hacking) can lead to bias.** p-values cannot help you determine whether to reject the null hypothesis.p-values cannot help you determine whether to reject the null hypothesis. A false positive rate can be much higher than stated p-values.A false positive rate can be much higher than stated p-values.**
How is the line of best fit determined in logistic regression?
Choose parameters that minimize the deviance of the logistic model** Choose parameters that maximize the deviance of the logistic model. Choose parameters that maximize the residuals of the logistic model. Choose parameters that minimize the residuals and the linear distance.
How is the line of best fit determined in linear regression?
Choose parameters that minimize the deviance of the logistic model. Choose parameters that minimize the sum of squared errors of the predictions to the data, and minimize the linear deviance.** Choose parameters that minimize the logistic regression and maximize sum of the residuals. Choose parameters that maximize the residuals and the linear deviance.
How is the line of best fit determined in logistic regression?
Choose parameters that minimize the residuals and the linear distance. Choose parameters that maximize the deviance of the logistic model. Choose parameters that minimize the deviance of the logistic model.** Choose parameters that maximize the residuals of the logistic model.
What is the name of the algorithm that uses the nearest labeled points to classify an unlabeled point x?
Confusion matrix K nearest neighbor (K-NN)** Poisson regression Sample identifier
The central limit theorem provides a theoretical framework for the sampling distribution of the sample mean. How is the sampling distribution discovered for other sample statistics that may not have a defined theoretical sampling distribution?
Conjugate models The Gaussian distribution The bootstrap** The glm function
Suppose a regression is run with 100 independent variables. If the significance level is set at 0.05 for each test, what would the false discovery rate (FDR) be if none of the variables were truly significant?
FDR = 100(1 − (1 − 0.05)) = 5 FDR = 1 ÷ (100 − 0.05) = 0.010 FDR = (100 × 0.05) ÷ 100 = 0.05 FDR = 1 − (1 − 0.05)100(1 − 0.05)power of 100 = 0.0994 **
What preprocessing steps are used to transform raw unstructured text to document term matrix?
Form a matrix with rows as documents and columns as sentiment, prune the text by removing words with unusual sentiments, and tokenize the remaining words. Prune the text by removing very common and very rare words, split the text into tokens, form a matrix with rows as documents and columns as tokens. Split the text into tokens, form a matrix with rows as tokens and columns as documents, prune them by removing very common and very rare words, and count the tokens in each part of the matrix. Split the text into tokens, prune them by removing very common and very rare words, count the tokens within each document, and form a matrix with rows as documents and columns as tokens.**
How does frequentist uncertainty quantification differ from Bayesian?
Frequentist approaches are based on considering how estimates would change if a new sample of data were generated by the same processes and scenarios; the Bayesian framework is based on a probabilistic construct based on the variance estimate from the data obtained.** Frequentist approaches rely on a probabilistic construct for the distribution based on the data obtained; the Bayesian approach asks how estimates would change if a new sample of data were generated using the same processes and scenarios. Frequentist approaches are based on the assumption that the data generating process will change with each sample; the Bayesian approach assumes the data generating process is consistent. Frequentist approaches update the variance estimate based on the data obtained from samples using different data generating processes; the Bayesian framework introduces a probabilistic construct for the distribution of the parameter and does not change the variance estimate.
Suppose you are using K-NN to determine the category of the point marked with a question mark based on the nearest neighbors that appear in the circle. What is the value of K?
However many points in the neighborhood can be included for classification.
A(n) ________ provides programmers the ability to work with Python and often consists of at least a source code editor, build automation tools, and a debugger.
IDE** text editor WordPad programming script
How is the response variable y different in multiclass logistic regression than in binary logistic regression?
In binary regression, the log-odds interpretation applies on the difference between coefficients for any pair of classes, compared across M categories. In multivariate logistic regression, classification is modeled with a logit link. In multiclass logistic regression, there is a simple interpretation for the βj values as linear effects on the log-odds of success, while binary regression compares across M categories. In binary logistic regression, there are m vectors of regression coefficients (i.e., one for each class), while in multiclass logistic regression there are y vectors of regression coefficients. In multiclass logistic regression, the response variable is a vector of binary variables of length M, where M is the number of classes. The ith entry in the vector is a 1 if the item belongs to class i, and a 0 otherwise.**
What type of data validation provides the best estimate of "true fit" without incurring overfitting?
In-sample data Out-of-sample data** Gaussian distribution Linear regression panel
API stands for Application Programming ________ and allows for different software programs the ability to connect and interface with each other.
Interface** Interaction Interactivity Induction
How does using fewer latent variables aid in better prediction?
It allows the model to use only the most common tokens, which reduces noise. It focuses the model on the bigger picture by reducing noise and overfit.** It removes unidentified confounders to reduce noise and underfit. It simplifies the matrix by including zeros, which makes the model more flexible.
What is often an advantage of using topics rather than tokens as predictors?
It allows you to build a nonparametric model like a random forest. It allows you to use more predictors which leads to better predictions. It allows you to more easily identify latent predictors. It allows you to achieve a tighter fit and lower deviance.**
Which of the following is true about the bootstrap?
It eliminates the need to make modeling assumptions. It cannot be used in combination with Monte Carlo. It is most practical with high-dimensional statistics. It is most practical with low-dimensional statistics.**
How does topic modeling use supervised techniques for text analysis?
It generates latent variables underlying the text data using a large corpus of labeled and unlabeled data. It makes predictions in a supervised fashion using labeled data discovered by unsupervised techniques.** It removes latent variables underlying the text data that could create an incorrect interpretation of the corpus. It applies a squared error loss to estimate factors for observations that are mostly zeros.
Why is out-of-sample (OOS) validation a preferred estimate of "true fit"?
It helps avoid overfit.** It doesn't allow for Gaussian errors. It helps avoid underfit. It misses the curvature in the underlying true mean function.
Which of the following is drawback of the Bayesian approach?
It is based on a thought experiment rather than on real data. It requires the assumption that the data generating process does not change. It is more dependent on the initial distribution assumptions than is the frequentist approach** It cannot update the variance estimate based on new data that is obtained.
Which of the following is a drawback of the frequentist approach?
It is more dependent on the initial distribution assumptions, which makes it less accurate at predicting how estimates would change. It assumes that the data generating process changes with each sample, so variance estimates are inconsistent. It requires the variance estimate to be updated based on new data obtained. It requires the assumption that the data generating process is consistent, but this is not always the case.**
Which of the following is a drawback of the frequentist approach?
It is more dependent on the initial distribution assumptions, which makes it less accurate at predicting how estimates would change. It requires the variance estimate to be updated based on new data obtained. It assumes that the data generating process changes with each sample, so variance estimates are inconsistent. It requires the assumption that the data generating process is consistent, but this is not always the case.**
Which of the following best describes the approach of latent Dirichlet allocation (LDA)?
It models the proportion of tokens in a document as a factor model.** It stores all data, including zeros, in a large sparse text matrix to improve efficiency. It uses Lasso regularization methos to build regression models using text inputs. It reduces the text data into a small set of factors that can be used as inputs to a random forest regression.
What is the major downside of using PCA on document term matrices?
It removes the zeros from the document term matrix, which makes the data more efficient, but difficult to interpret. It stores the dense matrix, including all the zeros, which can exhaust the working memory and produce factors that are difficult to interpret.** It requires running the algorithm twice because it uses latent semantic analysis to solve a nonconvex optimization problem. It cannot estimate latent low-dimensional factors that summarize high-dimensional observations.
What is most likely to happen when a topic is very common in all documents?
It will have a higher probability score, thus failing to differentiate** It will have a higher probability score and a higher differentiation It will result in more zeros in the document term matrix, reducing efficiency It will be more difficult to apply sentiment analysis to documents containing that word
What are the implications of setting a binary class cutoff too low?
It would yield a model that isn't very sensitive (low true positives) but is quite specific (high true negatives). It would yield a model with low negative predictive value and low precision. It would yield a model with a high number of true negatives, despite a specificity of 0. It would yield a model with high sensitivity (high true positives) but low specificity (low true negatives).**
What are the implications of setting a binary class cutoff too high?
It would yield a model that isn't very sensitive (low true positives) but is quite specific (high true negatives).** It would yield a model with low negative predictive value and low precision. It would yield a model with high sensitivity (high true positives) but low specificity (low true negatives). It would yield a model with a high number of true negatives, despite a specificity of 0.
When using K-NN, what are discriminators?
K values that provide good information about nearby datapoints. Datapoints that are x's nearest neighbors. Classification rules based on symmetric utilities for each asset class. Inputs that provide good information about a datapoint's classification.**
What is one limitation on the scalability of K-NN algorithms?
K-NN does not allow inputs to be measured in standard deviations. K-NN models can only be used through subsampling. K-NN algorithms can be slow and inefficient.** K-NN does not work when using more than two categories for classification.
Poisson regression is commonly used to model count data as a function of input variables. The rate of occurrence (λ) is modelled via the log link function (log(λ) = x' β)). The probability density of the Poisson distribution is given below. f(yi∣∣xi) = λyie−λyi! What is the log likelihood function for β given n independent observations?
L(y∣∣λ) = (yix′iβ − exp[x′iβ] − ln(yi!)) L(y∣∣λ) =−2[∑ni = 1(yix′iβ − exp[x′iβ] − ln(yi!))] (y∣∣λ) = −2[∑ni = 1(yix′iβ − exp[x′iβ] − ln(yi!))] L(y∣∣λ) =∑ni = 1(yix′iβ − exp[x′iβ] − ln(yi!))**
Poisson regression is commonly used to model count data as a function of input variables. The rate of occurrence (λ) is modelled via the log link function (log(λ) = x' β)). The probability density of the Poisson distribution is given below. f(yi∣∣xi) = λyie−λyi! What is the deviance for Poisson regression?
L(y∣∣λ) =(yix′iβ − exp[x′iβ] − ln(yi!)) (y∣∣λ) = 2[∑ni = 1(yix′iβ − exp[x′iβ] − ln(yi!))] Deviance(y∣∣λ) =−2[∑ni = 1(yix′iβ − exp[x′iβ] − ln(yi!))]** Deviance(y∣∣λ) = ∑ni=1(yix′iβ − exp[x′iβ] − ln(yi!))
How do LDA and PCA fundamentally differ?
LDA minimizes a multinomial deviance, whereas PCA minimizes the sum of squared errors.** LDA and PCA both minimize a multinomial deviance, but only PCA results in weights that can be easily interpreted. PCA minimizes a multinomial deviance, whereas LDA minimizes the sum of squared errors. The weights in PCA must sum to 1, whereas the weights in LDA must sum to 0.
Suppose there are many predictors in the data and you believe your predictors all have small but non-zero coefficients and none of them dominate the model. Which penalty function be most appropriate?
Lasso Ridge penalty** log penalty Elastic net
What is stemming?
Mapping raw text to either counts or sequences of words or phrases A representation where tokens are words and the vectorization maps to word counts A preprocessing technique that strips words down to the stem** Assigning a numerical measure to each token and representing the document as a vector
Why is overfit a problem?
Overfit models have the highest in-sample deviance, which makes it harder to apply the models to new out-of-sample observations. Overfit models fit closely to in-sample noise that will not appear in future observations.** Overfit models have the lowest deviance when evaluated against new data. Overfit models fit the out-of-sample data better than in-sample data.
Derived from the term "panel data" that is used in econometrics, ________ includes a variety of modules designed to work with data.
Pandas** NumPY scikit-learn TensorFlow
What is collaborative filtering?
Predicting a person's future choices based on their and others' past choices.** Identifying latent factors that influence probability and choice. Using topics instead of individual tokens to lower deviance in the model. Breaking data into relevant and irrelevant topics and removing irrelevant ones.
What tool is used to illustrate classification success rates?
Predictive value Confusion matrix** Probability threshold Cutoff
What are stop words?
Raw text that is mapped to sequences of words or phrases Common words removed during tokenization** Numerical measures to map the document as a vector Words that are stripped down to the stem
What measure of distance does K-NN use?
Standard deviation Poisson distribution Euclidean distance** GLM distance
When constructing a 90% confidence interval using the nonparametric bootstrap, what are the appropriate percentiles to choose?
The 2.5th and 97.5th percentiles The 10th and 90th percentiles The 5th and 95th percentiles** The 15th and 85th percentiles
To classify a new point, K-NN can be slow. Why does it take so long?
The K-NN algorithm assigns each data point its own category. The K-NN algorithm computes the distances from the test point(s) to all data points.** The K-NN algorithm fits a model to the data, which takes more time than simply measuring the distances between points. The K-NN algorithm creates a matrix for each data point, then applies that to the test point(s).
What is topic-token lift?
The application of random initialization to determine a token's sentiment The weight that is applied to each token based on how common it is in the overall corpus The location of each token on the ranked list of all tokens' probability The probability for each token within a topic, divided by the overall probability for that token**
What does the central limit theorem state?
The average of independent random variables increases if the bootstrap is utilized. The standard error increases as the sample size increases. The average of independent random variables becomes normally distributed if your sample size is large enough.** The variance of the sum of independent variables equals the standard error of the normal distribution.
Which of the following is true about using gamlr before modeling?
The covariates are penalized via the same λ, so you must scale the penalties so they are relatively equal.** Covariates that are part of the same sample always have a mean penalty of one standard deviation. Applying standardization scaling means the model fit will change if units of measurement change (e.g., from meters to feet). The covariates cannot be penalized equally because their size doesn't matter in Lasso regression.
What is a sampling distribution?
The distribution of any statistic from actual and theoretical samples using various data generating processes The hypothetical distribution of a sample statistic computed from many theoretical random samples from the same data generating process** The actual distribution of the changes to the variance estimate from actual samples The hypothetical distribution of any statistic computed from many nonrandom samples selected using a changing data generating process
What assumptions do the frequentist and Bayesian approaches rely on?
The frequentist approach assumes that the data generating process is consistent; the Bayesian approach is dependent on the initial distribution assumptions.** The frequentist approach and Bayesian approaches both assume that the data generating process is consistent. The frequentist approach is dependent on the initial distribution assumptions; the Bayesian approach assumes that the data generating process is consistent. The frequentist and Bayesian approaches both assume that the initial distributions will remain unchanged.
What is Euclidean distance?
The length of a straight line between two points** A measure that increases exponentially as it gets further from a data point The distance from a specific data point to its nearest neighbor One standard deviation from the mean
What is the residual deviance?
The mean linear deviance The parameters used in linear regression The sum of squared errors** The parameters used in logistic regression
What is a disadvantage of the min CV model compared to the CV-1se method of model selection?
The min CV finds the model that minimizes out-of-sample error, but is slightly overfit.** The min CV model generalizes more effectively but does not address out-of-sample error. The min CV model maximizes out-of-sample error, but is slightly underfit. The min CV model reduces error that the CV-1se method does not detect.
What does P(H0 is true) describe?
The p-value The probability that the null hypothesis is true, given the data The probability of observing the data (or more extreme data), given the null** The sampling distribution, assuming the null hypothesis is true
Suppose a company is seeking to identify consumers who will buy their product. Within this context, what is the definition of negative predictive value?
The probability that any prediction about buying behavior is incorrect Precision, or the probability that all estimates are correct The probability that someone predicted to buy the product and did not buy it The proportion of those predicted to not buy the product who do not buy it**
Suppose a company is seeking to identify consumers who will buy their product. Within this context, what is the definition of sensitivity?
The probability that any prediction about buying behavior is incorrect The predictive value of all estimates The proportion of consumers who bought the product who were predicted to do so** The proportion of those predicted to not buy the product who do not buy it
Define p-value.
The probability that the elasticity is statistically different from 0 The probability of obtaining an actual statistic that is more extreme than what was observed in the sample, assuming the null hypothesis is true The probability of obtaining a sample statistic as extreme, or more extreme than what was observed in the sample, assuming the null hypothesis is true** The probability that the null hypothesis is true
Suppose a company is seeking to identify consumers who will buy their product. Within this context, what is the definition of specificity?
The proportion of those who do not buy it who were predicted to not buy it** The negative predictive value The predictive value of all estimates The probability that any prediction about buying behavior is incorrect
What does topic-token lift represent?
The relative probability that a particular token will appear in a document compared to the probability that any token will appear. The value between 0 and 1 used to rank tokens by a weighted combination of probability and sentiment. The increase in a token's probability in a given topic relative to its overall probability in the corpus.** How each token must be weighted to reflect how common or rare it is in the overall corpus.
What is the fundamental difference between linear and logistic regression?
The response variable in linear regression is continuous, while in logistic regression it is binary.** The accuracy of linear regression is high, while that of logistic regression is low. Residuals in linear regression are quantitative, while in logistic regression they are categorical. The response variable in linear regression is binary, while in logistic regression it is quantitative.
What best describes the response variable in logistic regression?
The response variable is continuous. The response variable is binary.** The response variable is quantitative. The response variable is numeric.
What best describes the response variable in logistic regression?
The response variable is numeric. The response variable is quantitative. The response variable is continuous. The response variable is binary.**
What best describes the response variable in linear regression?
The response variable is qualitative. The response variable is continuous.** The response variable is categorical. The response variable is binary.
Why is the bootstrap not practically useful for approximating high-dimensional statistics?
There is no way to observe the variability that occurs across resamples. It cannot be used with parallel computing. It requires an enormous observed sample to get enough information to summarize the covariances between the variables.** It has a high level of precision that cannot be changed to fit what is needed for a specific application.
In linear regression, how should the chosen parameters affect the linear deviance?
They should maximize it. They should minimize it.** They should reflect and make it equal the logistic regression. They should make it equal the residuals.
Why should stop words be removed from most text analyses?
To improve computational and statistical efficiency** Because they cannot be tokenized To remove potential bias in the sample Because they can lead to a failure to differentiate
What is overfit?
Tuning the model to ignore random errors in your current sample even though they will persist in future samples Tuning the model to fit a simple linear regression for the current sample and the future sample Tuning the model to eliminate Gaussian errors in the current sample by applying a true quadratic model to future samples Tuning the model to predict both signal and noise in your current sample rather than predicting the signal that will persist in future samples**
How is the bootstrap utilized to generate a sampling distribution?
Use the same sample to calculate various statistics of interest and determine which best supports your hypothesis. Divide your current sample into many smaller subsamples. Create a histogram for each subsample. Resample with replacement from your current sample many times. Create a histogram for each subsample and use only the dataset that best fits your hypothesis. Resample with replacement from your current sample many times. Calculate the statistic of interest from each of the resamples and present them in a single distribution.**
Imagine you are using K-NN with numeric data matrix x with label vector y and unlabeled data xtest. What is a good first step to take?
Use the table function to calculate the confusion matrix. Give the heaviest weight to the points that are furthest from the unlabeled observation. Scale both xtrain and xtest by dividing by the variable standard deviations.** Set the largest K value possible.
Which of the following describes the best use of logistic regression and threshold?
Use thresholding and logistic regression to estimate probabilities. Use threshold only for binary classification problems. Thresholding is often the best choice to estimate probabilities and to make classifications. Logistic regression is typically used for only multivariate classification. Use logistic regression to estimate probabilities for binary classification problems. Use thresholding on probabilities to make classifications.** Use logistic regression for ROC curves. Use thresholding with the Lasso when inputs interact with one another.
In Microsoft Excel, ________ is used to create custom macros and functions that are designed to accomplish organizational specific tasks.
VBA** VCS VBB VDA
What types of words are removed during tokenization?
Very short and very long words Words that have an unclear or unexpected sentiment Very common words and very rare words** Words with unknown meanings
The 95% confidence interval for the coefficient representing the difference in elasticity of Minute Maid orange juice from Dominick's was (−0.056, 0.169). Based on this information alone, what can we conclude about elasticity at the α = 0.05 level?
We can conclude that the elasticity is not statistically different from 1. We can conclude that the elasticity is not statistically different from 0** We can conclude that the elasticity is 0.05. We can conclude that the elasticity is 0.113.
When is a zero imputation method preferable to mean imputation?
When all data is close to zero When you are not missing values When data is dense When data is sparse**
Which of the following is an example of a recommender system that uses collaborative filtering with LDA?
YouTube's "Up Next" playlistCorrect Waze's navigation function Delta's on-time flight probability Amazon's "Buy It Again" list
In Python, a list is
a data file. imported from a database. a column in a database. a group of related items**
In order to process data, a computer program needs
a file. user input. variables with values and instructions** Python.
A variable is
a number associated with a name. a name for a storage location in the computer's memory** a user's name.
What is each item in a list called?
an array an element** a variable a tuple
One way to visualize a data ecosystem is to apply it to the data project life cycle. The Data Science Ready project outlines the 5 steps of the data project life cycle. The final stage in this life cycle is
analysis wrangling storage** data fabric
Search Engine Optimization (SEO) involves the design (and improvement) of websites to increase non-paid visibility on search engines including Google and Bing. There is a Python script called the SEO ________ that is a web crawler designed to collect data from websites.
analyzer** collector compiler investigator
Similar to how mining recovers small amounts of precious metals from large amounts of ore or earth, ________ is the process of extracting information from large data sets.
data mining** data extraction data analytics data crunching
What is the k value for AIC (i.e., AIC(βˆ)=dev(βˆ)+k×df )?
df 2 ** 2nn−df−1 log(n)
What is the k value for AICc?
df log(n) 2 2nn−df−1 **
In a program a ________ tells the user what input is needed.
file prompt** variable number
In Microsoft Excel, a predefined formula that is built into the software for ease of use and deployment is referred to as a
function** formula program script
The work done by a computer program is called
input. Python. the results. processing**
Computers process instructions
internally** randomly. on a monitor. with a calculator.
In Python, a collection of related programming modules that include precompiled bundles of code that can be used repetitively in different programs and applications are referred to as a Python ________. They often contain configuration data, templates, classes, values, and documentation.
library** repository database book
In the following program, which line is the processing line? 1. X = 2 2. Y = 3 3. Z = X + Y 4. print Z
line 4 line 3** line 2 line 1
What is the k value for BIC (i.e., BIC(βˆ)=dev(βˆ)+k×df ) ?
log(n) ** df 2nn−df−1 2
The IDLE Shell
makes it easy to enter and test Python statements** can be used like a word processing program. is also called a text editor. is where you write whole Python programs.
The process of collecting, cleansing, transforming, and classifying data is referred to as data processing and
modeling** structuring forming interpreting
Python is ________ software based on community-based development that runs optimally on Windows and Linux operating systems.
open-source** freeware fee-based proprietary
Values that a function may need and values sent to a function by the code that calls the function are known as
operations. variables. expressions. arguments**
What is the term that describes the probability of obtaining a sample statistic as extreme compared to what was observed in the sample, assuming the null hypothesis is true?
p-value** Variance Confidence interval Elasticity
Known as numerical Python, ________ is well-suited for scientific computations and complex array operations and is widely used in data mining.
pandas NumPY** Keras PyTorch
Trying to answer questions such as, "what happened?" or, "what is occurring?" or "what was the ROI?", is the role of
prescriptive analytics. predictive analytics. descriptive analytics** tracking analytics.
In Python, a prompt takes the form of a(n)
screenshot. value. syntax statement. input request**
A computer is content to process programming instructions internally but output is needed by
the monitor. humans** the prompts. mobile devices.
Which of the following equations describes a quadratic model with Gaussian errors?
y=β0+β1x+β2x2+ε ** 𝔼[y]=βx+β1ε y=mx+β1x+ε 𝔼[y]=y+β1x