DATS FINAL EXAM QUESTIONS

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

Briefly explain the difference between a sampling error and a non-sampling error.

Sampling error is the difference between the mean of the sample and the mean of the population. On the other hand, non-sampling error is an issue with how the sample works or is carried out.

What is the formula for Accuracy? a. (TP + TN) / Total b. TP / (TP + FP) c. TP / (TP + FN) d. TN / (TN + FP)

a. (TP + TN) / Total

Peter Parker got stranded and is now living alone on an island without any technology. In order to make his life more interesting, he has been writing daily journals. Figuring out if there is some connection between the locations he travels and the amount of fruits he gathers daily, he conducts a test. The location is recorded as North-East-South-West-NorthEast-NorthWest-SouthEast-SouthWest, and the amount of fruits is recorded as 'color', 'taste', 'texture', 'edible', and 'poisonous'. What test should he run? a. Chi- Square b. ANOVA test c. Pearson test d. Spearman test

a. Chi-square

For regular Linear Regression models, what do we use? Choose the following: a. Coefficients' p-values b. RMSE root-mean-squared-error c. P-statistics for overall model significance d. Coefficient of Determination R2, for percentage explained e. Chi-Squared tests f. all of the above

a. Coefficients' p-values, b. RMSE root-mean-squared-error, d. Coefficient of Determination R2, for percentage explained

Which of the following statements best describes the purpose of the ANOVA test? A) ANOVA is used to compare means of two independent samples. B) ANOVA is used to determine if two independent samples are correlated. C) ANOVA is used to determine if there are significant differences among the means of three or more groups. D) ANOVA is used to compare means of two dependent samples.

C) ANOVA is used to determine if there are significant differences among the means of three or more groups.

Normal distribution is mathematically defined as ______ distribution with exponential tails exp(−x2/2𝜎2) symmetrically on both sides, total area = unity.

Gaussian distribution

When would you use a logistic regression model, and what is its primary purpose a. To predict continuous outcomes, such as house prices or stock prices b. To classify data into two or more discrete categories based on predictor variables. c. To estimate the mean response of a continuous variable at different levels of predictor variables d. To identify clusters within data points based on their similarity.

b. To classify data into two or more discrete categories based on predictor variables.

Which measure of central tendency is most affected by outliers in a data set? A. Mean B. Median C. Mode D. Range

mean

The __________ is the value that occurs most frequently in a data set

mode

If a data set has a bell-shaped distribution, which of the following statements is true regarding the mean and median? A. The mean is greater than the median. B. The mean is less than the median. C. The mean and median are approximately equal. D. There is no relationship between the mean and median.

the mean and median are approximately equal

True or False, the standard error is the same as standard deviation of the sampling distribution:

true

List 2 common metrics used to analyze a confusion matrix

Answer: Any 2 of the following metrics: accuracy, precision, recall (sensitivity or true positive) rate, specificity, F1 score

What does a boxplot show and what library do you have to load to create a boxplot?

The boxplot shows how the data is distributed and it also shows any outliers. You need library(ggplot2) to show a boxplot.

Binomial distributions come from a binary choice (Y/N, Head/Tail, 0/1, T/F).

True

True or False: If the p value is greater than 0.05 you fail to reject the null hypothesis

True

When would you use a Pearson test? A spearman test?

Use Pearson correlation when you have two continuous variables with a linear relationship and normally distributed data. - Use Spearman correlation when you have ordinal data or when the relationship between variables is non-linear or non-normally distributed.


Ensembles d'études connexes

c and the challenge of secularisation part 1

View Set

BUS. STRAT CHAPTER 4 QUIZ (TEST 1)

View Set

mktg 353 chapter 16 knowledge checks

View Set

schenck v. united states & tinker v. des moines independent community school district

View Set

Endocrinology - Pancreatic Hormones

View Set

Chapter 44: Assessment and Management of Patients with Biliary Disorders

View Set

Anatomy Chapter 15: Special Senses

View Set

Cubed Roots, Square Roots, Irrational and Rational Numbers

View Set

"Socratic Logic," Chapter III: Material Fallacies (For Exam)

View Set