Data analytics

Réussis tes devoirs et examens dès maintenant avec Quizwiz!

An agronomist is studying three different varieties of tomato to determine whether a difference exists in the proportion of seeds that germinate. Random samples of 100 seeds of each of three varieties are subjected to the same starting conditions. How should the data be analyzed? (a) Chi-square test for differences in proportions (b) One-way ANOVA F test (c) t test for the differences in means (d) t test for the mean difference

A

Cluster is ____________________ A. group of similar objects that differ significantly from other objects B. operations on a database to transform or simplify data in order to prepare it for ML algorithm C. symbolic representation of facts or ideas from which information can potentially be extracted D. none of these

A

Data mining is best described as the process of____________ A. identifying patterns in data. B. deducing relationships in data. C. representing data. D. simulating trends in data.

A

If the assumption of normally distributed populations in t-test has been violated, you should use an alternative procedure such as the nonparametric wilcoxon rank sum test. A. True B. False

A

In testing a hypothesis using the chi-square test, the theoretical frequencies are based on the: (a) null hypothesis (b) alternative hypothesis (c) normal distribution (d) t distribution

A

Methods that summarize large collections of data and present constantly updated status information are called: A. Descriptive analytics B. Predictive analytics C. Prescriptive analytics D. All of the above

A

The F test statistics in a one-way ANOVA is: A. F = Mstrt/MSe B. F = MSe/ MStrt C. F = SStrt/SSe D. F = SSe/ SStrt

A

The Y intercept (b0) represents the: (a) predicted value of Y when X = 0 (b) change in Y per unit change in X (c) predicted value of Y (d) variation around the regression line

A

What is meant by discrete data? A. One that allows only finite set of values B. One that allows real numbers only C. One that allows float values only D. Both a and b

A

Which of the following assumptions concerning the distribution of the variation around the line of regression (the residuals) is correct? (a) The distribution is normal. (b) All of the variations are positive. (c) The variation increases as X increases. (d) Each residual is dependent on the previous residual.

A

Which of the following is NOT example of ordinal attributes? A. Zip codes B. Ordered Numbers C. Movie ratings D. Military Ranks

A

Which of the following is not a data pre-processing method? A. Data Visualization B. Data Discretization C. Data Cleaning D. Data Reduction

A

Which of the following measures of variability is dependent on every value in a set of data? A. standard deviation B. range C. each of these D. neither of this

A

The probability you reject the null hypothesis when in fact the null hypothesis is true is called: A. Type 1 error B. Type 2 error C. The power D. The margin

A.

A car rental company wants to select a computer software package for its reservation system. Three software packages (A, B, and C) are commercially available. The car rental company will choose the package that has the lowest mean number of renters for whom a car is not available at the time of pickup. An experiment is set up in which each package is used to make reservations for five randomly selected weeks. How should the data be analyzed? (a) Chi-square test for differences in proportions (b) One-way ANOVA F test (c) t test for the differences in means (d) t test for the mean difference

B

A statement made about a population for testing purpose is called: A. statistics B. hypothesis C. level of significance D. Test-Statistic

B

A statistics professor wanted to test whether the grades on a statistics test were the same for her morning class and her afternoon class. For this situation, the professor should use the: (a) Z test for the difference between two proportions (b) Pooled-variance t test (c) Paired t test

B

Assuming a straight line (linear) relationship between X and Y, if the coefficient of correlation (r) equals -0.30: (a) there is no correlation (b) the slope is negative (c) variable X is larger than variable Y (d) the variance of X is negative

B

If the coefficient of correlation (r) = -1.00, then: (a) All the data points must fall exactly on a straight line with a slope that equals 1.00. (b) All the data points must fall exactly on a straight line with a negative slope. (c) All the data points must fall exactly on a straight line with a positive slope. (d) All the data points must fall exactly on a horizontal straight line with a zero slope.

B

In a one-way ANOVA, if the F test statistic is greater than the critical F value, you: (a) reject H0 because there is evidence all the means differ (b) reject H0 because there is evidence at least one of the means differs from the others (c) do not reject H0 because there is no evidence of a difference in the means (d) do not reject H0 because one mean is different from the others

B

In a simple linear regression model, the coefficient of correlation and the slope: A. may have the opposite signs B. must have the same sign C. must have opposite signs D. are equal

B

In a simple linear regression model, the coefficient of correlation and the slope: (a) may have opposite signs (b) must have the same sign (c) must have opposite signs (d) are equal

B

In performing a regression analysis involving two numerical variables, you assume: (a) the variances of X and Y are equal (b) the variation around the line of regression is the same for each X value (c) that X and Y are independent (d) All of the above

B

In testing a hypothesis about the difference between two proportions, the p-value is computed to be 0.043. The null hypothesis should be rejected if the chosen level of significance is 0.01 A. True B. False

B

In testing for differences between the means of two related populations, the null hypothesis is: (a) H0 : µD = 2 (b) H0 : µD = 0 (c) H0 : µD < 0 (d) H0 : µD > 0

B

Researchers claim that 60 tissues is the average number of tissues a person uses during the course of a cold. The company who makes Kleenex brand tissues thinks that fewer of the tissues are needed. What are the null and alternative hypotheses? A. H0: U=60 H1: U>60 B. H0: U=60 H1: U<60 C. H0: X = 60 H1: X<60 D. H0: U<60 H1: U=60

B

Statistical inference occurs when you: A. compute descriptive statistics from a sample B. take the results of a sample and reach conclusions about a population C. take a complete census of a population D. present a graph of data

B

The bronze, silver, or gold medal awarded at the Olympics is what kind of attribute? A. Nominal B. Ordinal C. Binary D. Numeric

B

The residuals represent: (a) the difference between the actual Y values and the mean of Y (b) the difference between the actual Y values and the predicted Y values (c) the square root of the slope (d) the predicted value of Y when X = 0

B

The slope (b1) represents: (a) predicted value of Y when X = 0 (b) change in Y per unit change in X (c) predicted value of Y (d) variation around the regression line

B

The standard error of the estimate is a measure of: (a) total variation of the Y variable (b) the variation around the regression line (c) explained variation (d) the variation of the X variable

B

The strength of the linear relationship between two numerical variables is measured by the: (a) predicted value of Y (b) coefficient of determination (c) total sum of squares (d) Y intercept

B

When testing for independence in a contingency table with three rows and four columns, there are ________ degrees of freedom. (a) 5 (b) 6 (c) 7 (d) 12

B

Which of the following is true regarding the sampling distribution of the mean for a large sample size? A. it has a normal distribution with a different mean from the population B. it has a normal distribution with the same mean as the population C. it has the same shape and mean as the population

B

which of the following graphical presentations is not appropriate for numerical data? A. histogram B. pie chart C. time-series plot D. scatter plot

B

which of the following would best show that the total of all the categories sums to 100% A. histogram B. pie chart C. time-series plot D. scatter plot

B

The rejection probability of Null Hypothesis when it is true is called: A. Level of confidence B. Level of significance C. Level of Margin D. Level of Rejection

B.

A researcher is curious about the effect of sleep on students' test performances. He chooses 100 students and gives each student two exams. One is given after four hours' sleep and one after eight hours' sleep. The statistical test the researcher should use is the: (a) Z test for the difference between two proportions (b) Pooled-variance t test (c) Paired t test

C

If the coefficient of determination (r2) = 1.00, then: (a) the Y intercept must equal 0 (b) the regression sum of squares (SSR) equals the error sum of squares (SSE) (c) the error sum of squares (SSE) equals 0 (d) the regression sum of squares (SSR) equals 0

C

In a one-way ANOVA, if the p-value is greater than the level of significance, you: (a) reject H0 because there is evidence all the means differ (b) reject H0 because there is evidence at least one of the means differs from the others. (c) do not reject H0 because there is insufficient evidence of a difference in the means (d) do not reject H0 because one mean is different from the others

C

In a one-way ANOVA, the null hypothesis is always: A. all the population means are different B. some of the population means are different C. all the population means are the same D. some of the population means are the same

C

The F test statistic in a one-way ANOVA is: (a) MSW/MSA (b) SSW/SSA (c) MSA/MSW (d) SSA/SSW

C

The sampling distribution of the mean can be approximated by the normal distribution: A. as the size of the sample standard deviation increases B. as the size of the population standard deviation increases C. as the sample size(number of observation in each sample) gets large enough D. as the number of samples gets large enough

C

The strength of the linear relationship between two numerical variables is measured by the: A. Predicted value of Y B. total sum of squares C. coefficient of determination D. Y intercept

C

The t test for the difference between the means of two independent populations assumes that the two: (a) Sample sizes are equal. (b) Sample medians are equal. (c) Populations are approximately normally distributed. (d) all of the above

C

in five-number summary, the following is not included: A. third quartile B. median C. mean D. minimum

C

which of the following is Not a characteristic of Big Data? A. Variety B. Velocity C. Viscous D. Volume

C

The t-test for the difference between the means of two independent populations assumes that the two: A. Sample sizes are equal B. sample medians are equals C. Populations are approximately normally distributed D. All of the above

C.

The ratio of the regression sum of squares (SSR) to the total sum of squares (SST) is called the _______________.

Coefficient of Determination

Big Data Usually involves: A. High-volume data B. Data that is generated at a fast rate C. Data that is stored in a variety of ways D. All of the above

D

Dimensionality reduction reduces the data set size by removing_________________. A. Composite attributes B. Derived Attributes C. Relevant Attributes D. Irrelevant Attributes

D

Identify the example of Nominal attribute: A. Temperature B. Salary C. Mass D. Gender

D

In a one-way ANOVA, if the p-value is greater then the level of significance, you: A. reject the null hypothesis because there is evidence all the means differ B. reject the null hypothesis because there is evidence at least one of the means differs from the others C. do not reject the null hypothesis because one mean is different from the others D. do not reject the null hypothesis because there is insufficient evidence of a difference in the means

D

In a one-way ANOVA, the null hypothesis is always: (a) all the population means are different (b) some of the population means are different (c) some of the population means are the same (d) all of the population means are the same

D

In a right-skewed distribution: A. the median equals equals the mode B. the median equals the mean C. the mean is less than the median D. the mean is greater than the median

D

The coefficient of determination (r2) tells you: (a) that the coefficient of correlation (r) is larger than 1 (b) whether the slope has any significance (c) whether the regression sum of squares is greater than the total sum of squares (d) the proportion of total variation that is explained

D

The point where the Null Hypothesis gets rejected is called: A. Significant Value B. Rejection Value C. Acceptance Value D. Critical Value

D

The residuals represent: A. the difference between the actual Y values and the mean of Y B. the square root of the slope C. the predicted value of Y when X = 0 D. the difference between the actual Y values and the predicted Y values

D

Those methods that involve collecting, presenting and computing characteristics of a set of data in order to properly describe the various features of the data are called: A. Inferential statistics B. the scientific method C. sampling method D. descriptive statistics

D

Which of the following graphical presentations is not appropriate for categorical data? A. bar chart B. pie chart C. pareto plot D. scatter plot

D

Which of the following is NOT a data quality related issue? A. Missing values B. Outlier records C. Duplicate records D. Attribute value range

D

Which of the following statements about mean is not true? A. It is more affected by extreme values than the median B. It is equal to the median in symmetric distribution C. it is a measure of central tendency D. it is equal to the median in skewed distribution

D

The shape of distribution is given by the: A. variance B. mean C. first quartile D. skewness

D.

The conversion of quantitative values to nominal or ordinal values is called_____________________

Data Transformation

If the assumption of normally distributed populations in the pooled-variance t test has been violated, you should use an alternative procedure such as the nonparametric Wilcoxon rank sum test.

False

If the p-value for a t test for the slope is 0.021, the results are significant at the 0.01 level of significance.

False

In testing a hypothesis about the difference between two proportions, the Z test statistic is computed to be 2.04. The null hypothesis should be rejected if the chosen level of significance is 0.01 and a two-tail test is used.

False

In testing a hypothesis about the difference between two proportions, the p-value is computed to be 0.034. The null hypothesis should be rejected if the chosen level of significance is 0.01.

False

In testing a null hypothesis about the difference between two proportions, the Z test statistic is computed to be 2.04. The p-value is 0.0207.

False

The one-way analysis-of-variance (ANOVA) tests hypotheses about the difference between population proportions.

False

The one-way analysis-of-variance (ANOVA) tests hypotheses about the difference between population variances.

False

The sample size in each independent sample must be the same in order to test for differences between the means of two independent populations.

False

The sample size in each independent sample must be the same in order to test for differences between the proportions of two independent populations.

False

The value of r is always positive.

False

You can use a pie chart to evaluate whether the assumption of normally distributed populations in the pooled-variance t test has been violated.

False

Incorrect or invalid data is known as__________

Noisy data

One of the assumptions of regression is that the residuals around the line of regression follow the ____________ distribution.

Normal

One of the assumptions of regression is that the residuals around the line of regression follow the _______________ distribution.

Normal

_____________may be defined as the data objects that do not comply with the general behavior or model of the data available.

Outlier

In simple linear regression, if the slope is positive, then the coefficient of correlation must also be ________.

Positive

The residual represents the difference between the observed value of Y and the _________ value of Y.

Predicted

Methods that can suggest the best future decision making for specific case situations are called:

Prescriptive Analytics

The change in Y per unit change in X is called the __________.

Slope

A test for the difference between two proportions can be performed using the chi-square distribution.

True

If no apparent pattern exists in the residual plot, the regression model fit is appropriate for the data.

True

If the range of the X variable is between 100 and 300, you should not make a prediction for X = 400.

True

If you use the chi-square method of analysis to test for independence in a contingency table with more than two rows and more than two columns, you must assume that there is at least one theoretical frequency in each cell of the contingency table.

True

If you use the chi-square method of analysis to test for the difference between two proportions, you must assume that there are at least five observed frequencies in each cell of the contingency table.

True

In a one-factor ANOVA, the Among sum of squares and Within sum of squares must add up to the total sum of squares.

True

In testing a hypothesis about the difference between two proportions, the p-value is computed to be 0.043. The null hypothesis should be rejected if the chosen level of significance is 0.05.

True

Regression analysis is used for prediction, while correlation analysis is used to measure the strength of the association between two numerical variables.

True

Repeated measurements from the same individuals are an example of data collected from two related populations.

True

The Mean Squares in an ANOVA can never be negative.

True

The coefficient of determination represents the ratio of SSR to SST.

True

The one-way analysis-of-variance (ANOVA) tests hypotheses about the difference between population means.

True

The pooled-variance t test assumes that the population variances in the two independent groups are equal.

True

The regression sum of squares (SSR) can never be greater than the total sum of squares (SST).

True

When the coefficient of correlation r = -1, a perfect relationship exists between X and Y.

True

When you are sampling the same individuals and taking a measurement before treatment and after treatment, you should use the paired t test.

True

The distribution of data involving two or more attributes is called

multivariant

Nominal and ordinal attributes can be collectively referred to as _________________ attributes.

qualitative


Ensembles d'études connexes

Stability & Range of Motion Practice

View Set

Peds - Exam 4 - Practice Q's w/ rationale

View Set

Econ and Personal Finance Vocabulary Unit 2 Credit

View Set

Chapter 8 & 9 - Communicating Professionally, Working with an Individual Patient

View Set

Amino Acids and their mRNA Codons

View Set