MAT 202 Statistics Super Study Guide

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

At the beginning of the semester, an Intro to Statistics instructor asked the 225 students enrolled in the class to complete a survey. For each student, the instructor collects information about the following: Sport (Favorite sport: Football, Baseball, Basketball, Hockey, Other) Exercise (How many minutes do you spend exercising per week) Personality (on a 0-25 scale, how would you describe your personality 0=total introvert, 25=total extrovert) Death penalty (Strongly agree, Agree, Neutral, Disagree, Strongly Disagree) How many variables are in this example?

The histogram below displays the distribution of 50 ages at death due to trauma (accidents and homicides) that were observed in a certain hospital during a week. What percentage of deaths were individuals younger than 35?

68%

Use the Standard Deviation Rule to calculate 1, 2 and 3 SD's of data if the mean= 70.5 and the SD=3

68% of data: 70.5-3= 67.5 70.5+3=73.5 95% of data: 70.5-(2)3= 64.5 70+(2)3=76.5 All or nearly all (99.7%) of data: 70.5-(3)3=61.5 70.5+3(3)=79.5

This histogram shows the distribution of times, in minutes, required for 25 rats in an animal behavior experiment to navigate a maze successfully. What percentage of the rats navigated the maze in less than 5.5 minutes?

84%

stemplot

Also called a stem-and-leaf plot. Data are separated into a stem and leaf by place value and organized in the form of a histogram.

Least Squares Criterion

Among all the lines that look good on your data, choose the one that has the smallest sum of squared vertical deviations (smallest total area), or least squares regression line

Calculating standard deviation

1. Calculate each score's deviation (distance form the mean) 2. Square each deviation 3. Compute the mean for the squared deviations (this is the variance) 4. Take the square root of the variance (this is the standard deviation) ie. data set (7, 9, 5, 13, 3, 11, 16, 9) mean= 9 (7-9), (9-9), (5-9), (13-9)... = -2, 0, -4, 4, -6, 2, 6, 0 square deviations and add=4+0+16+16+36+4+36+0= 112 divide by n-1 (8-1=7) = 112/7= 16 (the variance) square root the variance for a SD= 4

Here again is the histogram showing the distribution of 50 ages at death due to trauma (accidents and homicides) that occurred in a certain hospital during a week. A possible value of the median in this example is:

Death penalty

Which of the tables is the appropriate table of conditional percents to discover if the region where one lives affects whether or not one has health insurance?

Table A

Which of the following variables is not a ratio variable? Temperature (Outside temperature) Charity (How much money do you donate to charity in a year) Internet (How much minutes/day do you spend on the Internet?) Text messages (How many text messages you send a day)

Temperature

What does it mean to have a SD of 4 with a mean of 9?

The average is 9, give or take 4

The number of hours students study is compared with the day's highest temperature. It is found that the coefficient of determination r2 = 0.481. About 48% of the studying habits can be explained by the linear regression model of the relationship between the two variables. What is r?

The square root of 0.481 is approximately equal to 0.694.

A local cafe kept track of the number of servings of the soup of the day it sold each day, and the temperature that day, for two months during the summer. The data are displayed in the scatterplot below:

Negative linear relationship with outlier(s)

a= y bar- b(x bar) ie a= 42.3- (-3 * 51) = 576

SD=0 when

all observations are the same value

outlier

an observation is considered an outlier if it is: less than Q1-1.5(IQR) or more than Q3+1.5(IQR) ie 32-(1.5)(9.5)= 17.75 41.5+(1.5)(9.5)=55.75 Observations of 62, 74 and 80 should therefore be flagged as outliers

to add an observation and keep M the same, the observation must be

at the exact same place as where the median already is

b=r (Sy/Sx) ie b= (-0.793) x (82.8/21.78)= -3 For every year a driver gets older, the maximum distance at which they can read a sign decreases on average by 3 feet

This histogram shows the distribution of times, in minutes, required for 25 rats in an animal behavior experiment to navigate a maze successfully. Which of the following best describes the shape of the histogram?

Right skewed with a possible outlier

SD of explanatory variable

SD of response variable

The boxplots below show amount spent for vehicles in two neighboring locations (in thousands of dollars). Which city has a greater percentage of vehicles which cost between $30,000 and $50,000?

Suburbia

The distribution of the amount of money spent by students for textbooks in a semester is approximately normal in shape with a mean of $235 and a standard deviation of $20. According to the Standard Deviation Rule, in a semester, almost all (99.7%) of the students spent on textbooks:

between 175 and 295 dollars.

The boxplots below show amount spent for vehicles in two neighboring locations (in thousands of dollars). Which city has the greater percentage of vehicles which cost below $30,000?

Both locations have the same percentage of vehicles which cost below $30,000.

standard deviation rule

a normal distribution contains 68% of the data between one standard deviation above and below the mean 95% of the data between two standard deviations above and below the mean 99.7% of data between three standard deviations above and below the mean

correlation coefficient (r)

a numerical measure that measures the strength and direction of a linear relationship between to quantitative variables Can fall between -1 and 1

A survey was conducted to study the relationship between the annual income of a family and the amount of money the family spends on entertainment. Data were collected from a random sample of 280 families from a certain metropolitan area. A meaningful graphical display of these data would be:

a scatterplot

dataset

a set of data identified with particular circumstances. Typically displayed in tables with rows as the individuals and columns as the variables

A survey was conducted to study the relationship between whether the family is buying or renting their home and the marital status of the parents. Data were collected from a random sample of 280 families from a certain metropolitan area. A meaningful graphical display of these data would be:

a two way table

ordinal variable

categorical variable where there is a natural order among the categories ie socioeconomic status (high, medium, low)

nominal variable

categorical variables where there is no natural order among the categories (eye color)

median

center of distribution (M) if n is even, it is between the 2 at the center ie 3, 5 M=4

correlation coefficient

form of a scatterplot means its

general shape

boxplot

graphically represents the distribution of a quantitative variable, displaying 5 number summary and any observations classified as outliers using 1.5 IQR outliers= * box= IQR, top line Q3, bottom line Q1, with M represented as line top line max (largest non outlier), bottom line min (smallest non outlier)

strength of a scatterplot means

how closley the data follow the form of the relationship (strong v weak)... requires numerical measure

a bar graph is used to show

how the different categories compare to each other

a pie chart is used to show

how the different categories relate to the whole

interval

what are the two types of quantitative variables?

interval and ratio

if data is skewed right, the mean will be

larger than the median

x bar

mean

x bar

mean of explanatory variables

y bar

mean of response variable

_______ are sensitive to outliers, while _______ are resistant

means, medians

median

mode

most commonly occurring value in a distribution

nominal

what are the two types of categorical variables?

nominal and ordinal

sample size

number of indiviudals

box plots do not show

number of observations

variable

particular characteristic of the individual, ie

individual

particular person/object, unit, ie marathon runners

standard deviation

quantifies the spread of a distribution by measuring how far the observations are from their mean (x bar) gives an average distance from data point to mean rep by SD, s, Sd, and StDev

Interquartlie Range (IQR)

quantifies variability of a distribution by giving us the range covered by the middle 50% of data

ratio variable

quantitative variable for which it makes sense to talk about the difference but also the ratio has intrinsic meaning ie income, weight, time

interval variable

represent a measure/count for which it makes sense to talk about the DIFFERENCE between values but it does not make sense to talk about the ratio ie temperature

categorical variable

represent labels or ranks and places/classifies an individual into one of several groups ie eye color, social status, right/left handed, "strongly agree" to "strongly disagree" Can be represented numerically (1, 2)

quantitative variable

represents a measurement or count, answering "how much" or "how many" ie. time waiting in line, temperature, income, height

In order to study the relationship between IQ level and GPA, data were collected from a sample of 540 students. The data collected in this study would best be displayed using:

scatterplot

The data display and numerical summary you should use to analyze QQ study are

scatterplot correlation coefficient (negative, positive, outliers?)

what is used to interpret a histogram?

shape, center, spread (the pattern) and outliers (deviation from the pattern)

In order to study whether there is a relationship between IQ level and birth order, data were collected from a sample of 540 students on their birth order (Oldest/In Between/Youngest) and their score on an IQ test. The data collected in this study would best be displayed using:

side by side box plots

The data display and numerical summary you should use to analyze CQ study are

side by side boxplot descriptive statistics

A store asked 250 of its customers how much they spend on groceries each week. The responses were also classified according to the gender of the customers. We want to study whether there is a relationship between amount spent on groceries and gender. A meaningful display of the data from this study would be:

side by side boxplots

Here again is the histogram showing the distribution of 50 ages at death due to trauma (accidents and homicides) that were observed in a certain hospital during a week. For the data described by the above histogram, the median will be _________ than the mean

smaller

This histogram shows the distribution of times, in minutes, required for 25 rats in an animal behavior experiment to navigate a maze successfully. For the data described by the above histogram, the median will be

smaller than the mean

if the data is skewed left, the mean will be

smaller than the median

regression

specifies the dependence of the response variable on the explanatory variable

means can be used as a measure of center over a median only for

symmetric distributions without outliers, otherwise medians are better

SD should only be used for

symmetrical distributions as it is strongly influenced by outliers

shape

symmetry/skewedness of distribution peakedness (modality) or number of peaks (modes)

linear regression

technique of finding the line that best fits the pattern of the linear relationship of the response and explanatory variable

for symmetric distributions with no outliers, the mean (x bar) is approximately equal to

the median (M)

A store asked 250 of its customers whether they were satisfied with the service or not. The responses were also classified according to the gender of the customers. We want to study whether there is a relationship between satisfaction and gender. A meaningful display of the data from this study would be:

two way table

types of symmetrical distributions

unimodal (one peak) bimodal (two peaks), more than 2 multimodal uniform (no peaks)

distribution

what values the variable takes and how often the variable takes those values

This histogram shows the distribution of times, in minutes, required for 25 rats in an animal behavior experiment to navigate a maze successfully. A possible value of the median in this example is:

3.9

the sum of deviations from the mean is always

The data display and numerical summary you should use to analyze CC study are

2 way table conditional percentages

left skewed distribution

A density curve where the left side of the distribution extends in a long tail

right skewed distribution

A density curve where the right side of the distribution extends in a long tail; (mean > median)

What determines which numerical measures of center and spread are appropriate for describing a given distribution of a quantitative variable?

IQR/median better for outlier

The boxplots below show amount spent for vehicles in two neighboring locations (in thousands of dollars). Which location has more vehicles?

It is impossible to tell from the boxplots.

The histogram below displays the distribution of exam scores for 40 students in an elementary statistics course. To describe the center and spread of the above distribution, the appropriate numerical measures are:

M and IQR

The boxplots below show amount spent for vehicles in two neighboring locations (in thousands of dollars). Which city has greater variability in the cost of the vehicles?

Metropolis

5 Number Summary

Min, Q1, M, Q3, Max, with Q1 to Q3= IQR

Coefficient of Determination r2

The proportion of the variation in the y data set that can be predicted by the linear regression model of the relationship between x and y. The value of r2 can fall between 0 and 1. r2 is never negative.

A student survey was conducted in a major university, where data were collected from a random sample of 750 undergraduate students. One variable that was recorded for each student was the student's answer to the question: What region of the country did you live in just prior to enrolling in this university? Northeast/Southeast/Northwest/Southwest/Midwest/Outside the U.S. These data would be best displayed using which of the following?

Pie chart

Here again is the histogram showing the distribution of 50 ages at death due to trauma (accidents and homicides) that were observed in a certain hospital during a week. Which of the following best describes the shape of the histogram?

Right skewed with a possible outlier

In IQR, M is essentially

Calculating IQR

Q3-Q1, with Q1 the median of the bottom 50% of data and Q3 as the median of the upper 50% ie Q1=32 and Q3=41.5 IQR= 41.5-32= 9.5

The number of people at a park each day is compared with the day's highest temperature. It is found that the coefficient of correlation r = 0.76. What is r2?

The coefficient of determination r2 = (0.76)^2 = 0.578. About 58% of the variation in park attendance can be explained by the linear regression model of the relationship between the two variables.

Here again is the histogram showing the distribution of 50 ages at death due to trauma (accidents and homicides) that were observed in a certain hospital during a week. Assume that the largest observation in this dataset is 90. If this observation were wrongly recorded as 900, then:

The mean will increase, but the median won't change.

This histogram shows the distribution of times, in minutes, required for 25 rats in an animal behavior experiment to navigate a maze successfully. Assume that the largest observation in this dataset is 8.6 minutes. If this observation were wrongly recorded as 86, then the mean will ___________ and the median will ___________

The mean will increase, but the median won't change.

The students enrolled in Intro to Statistics

The histogram below displays the distribution of 50 ages at death due to trauma (accidents and homicides) that were observed in a certain hospital during a week. What is the largest age of death due to trauma in this dataset?

This information is not provided by the histogram.

This histogram shows the distribution of times, in minutes, required for 25 rats in an animal behavior experiment to navigate a maze successfully. What is the largest time recorded in this dataset?

MAT 202 Statistics Super Study Guide

संबंधित स्टडी सेट्स

Exam 1 Neu340

Cisco Network Administration - Chapter 1

food safety and sanitation chapter 2

Criminal Evidence Chapter 2

JFE301 - Part1

Software Testing & QA - Midterm

BUS LAW EXAM II

Microeconomics 2106 Homework Set #11

The Standard Rosary Prayer (without Mysteries)

artbgvjuewbh/iofjwghEWDKQPOEFJDIOBVKJN

KHS 412 Ch. 8, 9

AIS Midterm 2 Study Review (5,7,8,9,10)

Payroll Acts and Laws

Psych Exam 1

MICROECONOMICS QUIZ ANSWERS

GEOG-120 Teacher Study Guide

Fundamentals Review

EMT - Chapter 17: Neurologic Emergencies - Questions (MFD)

Alliteration, Analogy, Antithesis, Allusion, Anecdote

Fams 160 Test 3