Stats Exam 1

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

Median

Middle number. can use n+1/2 to find the position of median. if between two numbers find the average

5 number summary of a boxplot

Minimum, Q1, Median, Q3, Maximum

A marketing researcher is testing out new names for a hair product. she asks 40 women to pick the name of the product that they would be most likely to buy out of a list of 4 names. could she use a dot plot to describe her results.

No. Dot plots are used for quantitative data and this is a categorical variable.

mean formula

"The sum of observations / n"

In the fall STA2023 beginning of the semester survey, students were asked how many parties they attended every week and how many text messages they sent per day. The researcher decided to make the number of parties attended per week the explanatory variable and the number of text messages sent per day the response variable. The least squares regression line for this relationship is y-hat=64.96+25.41x. One student attended 2 parties that week and sent 20 text messages per day. What is the residual?

-95.78

Decimals and stem and leaf plots

0.10= 2.0 1.0= 2 10.0=200 0.01= 0.20

suppose that we were going to conduct a study to determine if there is a decrease in risk of getting lung cancer based on whether the person eats strawberries. in the group that eat strawberries we find 5/2000 get lung cancer. in the group that doesnt eat strawberries, 5/1000 get lung cancer. what is the sample relative risk?

0.5 phat1/phat2

a least squares regression line was created to predict the exam score of STA2023 students based on their exam 1 score. the study found that the value of R squared was 28.8% and the least squares regression line was yhat=50.57+0.4845x. what is the correlation coefficient R?

0.54

a political scientist was interested in studying America's voting habits. So, he decided to make a least squares regression equation to predict the percentage of people that would vote for Obama 2012 based on the percentage of people that voted for Obama in 2008. The least squares equation is y-hat=-4.599+ 1.04x. The value of R-squared was 96.69%. What should be the value of r?

0.9833 to find the value of r from r-squared, take the square root

in 1996 the general social survey included a question that asked participants if they had ever volunteered for the environment. this information is provided below divided by a political party. what is the conditional proportion of democrats who volunteered for the environment?

116/142

A confidential and voluntary survey conducted in STA3024 in the spring of 1999 asked the students questions about their sex-life. A simple linear regression analysis was conducted to predict the number of lifetime sexual partners a student has had based on the student's current age. The analysis yielded an R-squared of 14.2 %. Interpret R- squared.

14.2% of the variability in the number of lifetime sexual partners students have had is explained by the student's age.

below is a histogram of heights of students in a summer 2010 class of STA2023. How many students were less than 56 inches tall?

2 students (add together the heights of all of the bars less than 56 inches)

find the sample standard deviation of the following data set, using the statistical functions on your calculator. 1.23,3.45,5.76,2.58,9.45,4.21,1.63,3.22,6.33,4.76

2.46

a political scientist was interested in studying America's voting habits. so he decided to make a least squares regression equation to predict the percentage of people in a state that would vote for obama in 2012 based on the percentage of people in a state that voted for Obama in 2008. the least squares equation is y-hat=-4.599+1.04x. In the state of florida, 50.09% voted for Obama in 2008; where as 50.01% voted for him in 2012. What is the value of the residual?

2.52%

Below is a stem and leaf plot of the amount of money spent on their last haircut. the stem and lead plot was made in minitab find the median

20

Q1

25th percentile. median of the lower half of the data

a study was done to compare people's religion with how many days that they have said they have been calm in the past three days. what proportion of protestants were calm 1 out of the 3 days.

27/99

Below is a stem and leaf plot of the amount of money spent on their last haircut. The stem and leaf plot was made in minitab. Find the third quartile.

30

A company is trying to determine if they should accept a shipment of toy parts. the shipment has over 1 million parts. they decide to randomly select 100 parts out of the shipment. they will only accept the shipment if 1% or less of all the parts are defective. Out of the 100, 4 % are defective, Identify the numbers "1%" and "4%" as samples or parameters.

4% is a statistic, 1% is a parameter.

in 1996 the general social survey included a question that asked participants if they had ever volunteered for the environment. this information is provided below divided by political party. what is the conditional proportion of independents who volunteered for the environment?

58/62

in 1996 the general social survey included a question that asked participants if they had ever volunteered for the environment. this information is provided below divided by a political party. what is the proportion of people in the study who were independents?

62/362

Emperical Rule

68% of all data is within one standard deviations of the mean 95% of all data is within two standard deviations of the mean 99.7% of all data is within three standard deviations of the mean

Q3

75th percentile median of the upper half of the data.

scores on an exam follow an approximately bell-shaped distribution with a mean of 76.4 and a standard deviation of 6.1 points. Approximately what percentage of the data is between 58.1 points and 94.7 points?

99.7%

histogram

A graph of vertical bars representing the frequency distribution of a set of data.

scatter plots

A graph with points plotted to show a possible relationship between two sets of data. Y vs. X two quantitative variables, measured on the same individual. X= explanatory Y= response

stem plot (also called stem and leaf plot)

A graphical representation of a quantitative data set. Leading values of each data point are presented as stems and second digits are given as leaves.

continuous quantitative variable

A quantitative variable whose measurements can assume any one of a countless number of values in a line interval. It is usually a measurable quantity or something that is calculated, such as rates, averages, proportions, and percentages.

significance test

A statistical technique used in inferential statistics to test the probability of an observed difference.

A STA 2023 class in the summer of 2010 was asked to complete a survey. one of the questions asked how much the students spend on their last haircut. below is a histogram of the data. What is the best description of the spread of data?

All students surveyed spent between 0-110 dollars on haircuts.

A researcher carries out a random comparative experiment with young rats to investigate the effects of toxic compound in food. she feeds the control group a normal diet. the experimental group received a diet with 2,500 parts per million of the toxic material. after 8 weeks the mean weight gain is 335 grams for the control group and 289 for the experimental group. which of the following is true regarding the two numbers "335" and "289"

Both are statistics

a car enthusiast is interested in studying the relationship between the asking price of used cars and their age. which is more naturally the explanatory variable?

Car's price

the class list below has all the information for each student in the class at the end of the semester, including their year in school, major, exam grades, project grades, number of absences, their avg in the class and their final letter grade in the class.

Categorical: Major, letter grades Discrete quantitative: years in school, number of absences continuous quantitative: Exam scores, Project scores, Average.

treatments

Combinations of factor levels

What are the two types of statistical inference?

Confidence intervals significance tests

Interpreting Scatterplots

DOTS Direction- positive or negative Linear Trend? How strong? Any outliers?

Stacy wood at the Moore school of business in south carolina is studying what people crave, familiar food, or new food, during periods of change in their lives. For her samples, those that self identified themselves as in a period of changem chose the new item 70% of the time, where as students who self- identified themselves as not in a period of change, chose the new item 40% od the time. what part of the field of statistics is this?

Description

A new reporter surveyed 20 new orleans residents and asked if they planed to evacuate due to impending hurricane. she randomly selected 20 people out of the phone book. what part of the field of statistics is this?

Design

statistical methods

Design, Description, Inference

Suppose you had the following data set. 100, 200, 250, 275, 300 suppose that the value 100 was a typo,and it was suppose to be 500 instead. how would the value of the standard deviation change?

It would increase

influential outliers

Points that have an x value far away from the rest tend to pull the line towards them deleting them can have a huge effect on the regression line and could change the slope

Extrapolation

Predicting a Y value using a value of X that is outside of the range of X values used to obtain the regression equation. This prediction could be very far off.

Describing a Histogram

SOCS Spread Outlier Center Spread

Box Plot (Box and Whisker Plot)

Shows distribution of data values on a number line. Divided into quartiles.

During the summer of 2010, a class of STA2023 students answered a student survey. One of the questions asked the students to report the cost of their last haircut. A histogram was made of their responses. What is the best description of the shape of the graph?

Skewed to the right. for skewed distributions look at the direction of the tail. (right higher) (left lower)

population proportion example

Students at UF with a pet

suppose that someone sampled 100 palm trees of the same species, all planted at the same time. they recorded the height for each tree. What shape would you expect the histogram of this data to be.

Symmetric: the heights of trees is similar to heights of females. Most trees would probably have similar heights but a few would be a little taller or a little shorter.

below is a boxplot of 4.0 magnitude earthquakes or higher for the philippine islands region and south of java, indonesia region on Tuesday, September 4th, 2012. This data was recorded by the USGS. which of the following statements below is true.

The first quartiles of both regions are fairly similar.

Quartiles

Values that divide a data set into four equal parts

Internet sites report that about 13% of Americans are left-handed. is this true for students at your university? during a statistics exam, the instructor walks around the room and counts 15 left-handed students out of 98 students in the class

Variable: handedness (left or right) population: all the students at the university sample: 98 students in the class parameter: population proportion of students at our university who are left-handed. Unknown Statistic: 15/98 about 15.5% Was random sampling used? No, a convenience sample was used.

relative risk

[a/(a+b)]/[c/(c+d)]

Dot Plot

a graphical device that summarizes data by the number of dots above each data value on the horizontal axis

a set of data has a mean that is much larger than the median. which of the following statements is most consistent with this information?

a histogram of the data is skewed right

contigency table

a method of presenting the relationship between two categorical variables in the form of a table

discrete quantitative variable

a variable whose measurements can assume only a countable number of possible values. ex) number of pets in a household.

y hat

a+bx

suppose the regression equation to predict y=weight from height for female college students is yhat=-200+5x A. one women were 5'6 and 128 lbs find her residual B. correlation coefficient is 0.75 interpret C. computer r^2 and interpret

a. observed-predicted y=-200+5(66)=130 128-130=-2. she is 2 lbs less than predicted b. there is a fairly strong positive linear relationship between weight and height. (0.75)^2= 0.56 or 56% 56% of the variation of the y is explained by the linear regression between weight and height.

lurking variable/confounding variable

affect the relationship between x and y

Population

all subjects of interest

a simple linear regression analysis was conducted to predict the Exam 3 score of students in STA2023 based on their exam 1 score. The analysis yielded the following results: yhat=50.57+0.4845x which of the following is the best description of the slope of the line?

as the exam 1 score increases by 1 point, the student's exam 3 grade will increase on average, by 0.4845 points.

population average example

average GPA

interpreting the regression line

b is the slope a is the y intercept. corresponds to predicted value of y when x=0. only interpret at x=0 if it makes sense.

suppose the regression equation to predict y=weight from x=height for female college students is yhat=-200+5x

b= +5 slope, as female college student height increases by 1 in, weight is predicted to increase by 5 lbs. y int= -200 for a height of 0, does not make sense outside of data

r formula

beyond the scope of this course

factors

categorical explanatory variables in an experiment are called factors.

When to use a pie and bar charts

categorical variables

lurking variables

characteristics of the individuals in a sample that are not measured during the study, but do affect the result. affects relationship between X and Y

what to do if your data has outliers

check the data and correct any typos try to find out more. do they belong? if the point is valid conduct the regression analysis

comparing two population proportions

compare proportions of people in 2010 and 2020 who think we spend too much on the environment

Comparing two population averages example

compare salary between men and women

The IQR

contains the central 50% of the data.

r^2 (coefficient of determination)

correlation squared easier to interpret percent of variability in y explained by the linear regression on x

a student is interested in studying the amount spent on rent for apartments in the gainesville area. she takes a random sample of 100 local apartment complexes. she is interested in studying the center, shape, and spread of the data. what type of graph does she make?

dot plot

random sampling

each member of the population has an equal chance of being selected. tend to be representative of the population.

Who uses statistics?

everyone ex) Business- analyze the results of a marketung campaign

uniform

everything is equally likely ex) dice

matched pair designs

experimental designs that use either two matched individuals or the same individual to receive each of two treatments

newspaper report: decaf drinkers have higher blood pressure levels than regular coffee or non-coffee drinkers. is decaf bad for your health?

explanatory variable: type of drink response variable: blood pressure graphical summary: side by side boxplots categorical and quantitative data. Potential lurking variables: blood pressure at start of study conclusion: cannot say decaf causes high blood pressure.

if 5mg of a medicine improve the patients condition by 10% then 50mg must improve it by 100% this is an example of

extrapolation

the following appeared in the magazine financial times, March 23,1995: "when Elvis Presley died in 1977, there were 48 professional Elvis impersonators. today there are an estimated 7328. if that growth is projected by the year 2016 one person in four on the face of the globe will be an Elvis impersonator." this is an example of

extrapolation

if you change the units f one of the variables the value of correlation will increase or decrease T/F

false, the value of correlation is not dependent on the units

a sample of 9 female and 10 male college students tracked their spending for a month and reported their results, rounded to the nearest dollar. Find the median,range, and quartiles. Female: 121,184,191,254,288,329,404,476,505 Male: 46,65,106,153,172,244,342,345,458,810

females median-288 range- 384 Q1- 187.5 Q3- 440 Males median-208 range- 764 Q1- 106 Q3- 345

conditional proportions

find percentages by dividing each cell count by the total number of observations in their group

very high positive correlation is found between the size of the head of school aged children and their reading skills. Are big headed kids smarter? explanatory variable response variable graphical summary potential lurking variables conclusion

head size reading skill both quantitative, scatter plot age of the child cannot say head size leads to better reading skill

someone is interested in studying property tax rates in the US. he randomly selects 25 homes and looks up their property tax rate. he is interested in studying the center, shape and spread of the data. what type of graph does he make?

histogram

Design

how you plan to obtain the data survey, experiment, question

a small study on sleep and income found a very strong negative correlation between the two variables. However, one of the participants in the study reported only sleeping four hours per night which was much less than other participants, and this person has by far the highest income in the study. This is an example of:

influential outlier

how to find the center of distribution in a histogram

look at the mode

Range

max-min

which of the following is not part of the 5 number summary

mean

skewed right

mean is greater than median

skewed left

mean is less than median

How do outliers and skewness affect each measure of center and spread?

mean is not resistant to skewness and outliers affected pull in the direction of the tail median is resistant to skewness and outliers. use the median for skewed distribution use the mean for symmetric Range is not resistant to skewness and outliers.most affected. Standard deviation is not resistant to skewness and outliers. use with mean IQR is resistant using Q1 and Q3. use with median.

Measures of center

mean, median, mode

Data set 1: 1,1,1,4,7,7,7 Mean Median Range Standard deviation Variance

mean: 1+1+1+4+7+7+7/7=4 Median: 7+1/2= 4th position= 4 Range: 7-1=6 Standard deviation: 3 Variance: 9

Match each term on the left column with the appropriate description from the right column description. each answer choice will only be used one time.

mean: Distances from the data points to this measure of center always add up to zero Median: this measure of center always has 50% of observations on either side Mode: this measure of center represents the most common observation, or class of observations range: this measure of spread is affected the most by outliers. Variance: measure of spread around the mean, but its units are not the same as its data points. Standard deviation: is smaller for distributions where the points are clustered around the middle.

Q2

median, 50th percentile

common shapes in statistics

mound or bell-shaped uniform or rectangular bimodal skewed left skewed right

when taking an exam, some students quickly answer all the questions they know how to do. if they go back and revise their answers, they start second guessing themselves and end up changing their choice from correct to incorrect. for these students the correlation between the amount of time spend on an exam and their grade should be

negative and fairly strong

suppose your whole grade in a class is based on four exams worth 100 points each. if you know someone's avg score in the class can you determine all 4 exam grades? If you know the avg and one exam grade can you determine the other 3? If you know the avg and 2 exams can you determine the other 2? If you know the avg and 3 exam grades can you determine the last one?

no no no yes

bell-shaped

normal distribution ex) IQ

parameters

numerical summary of a population

statistics

numerical summary of the sample

unusual obervations

outliers, usually far from the other data.

a sociologist is interested in seeing if people's satisfaction with their married life (very satisfied, somewhat, fairly, not satisfied) is associated with their parent's marital status (divorced, widowed, still married) which is more naturally the explanatory variable

parent's marital status

subjects

persons, animals, or objects in our study/experiment

categorical variables

place each observation into groups and they are usually summarized by the percentage of observations in each group. Doesn't make sense to find the average.

least-squares regression method

points above and below the line cancel each other. That means the sum of the residuals will be zero minimizes the sum of squared residuals or predicted errors. the formulas for slope and intercept are discussed below. they are not hard to derive least squares regression line passes through (xbar,ybar)

review of straight lines

points are exactly on the line line extends forever in both directions equation= y=mx+b

regression line

points are scattered want the equation of the line that fits best through the middle of the points. use it to predict the response variable y for a particular value of x predicted values called y hat regression equation: yhat=a+bx. a= y int. b= slope

when taking an exam, some students think things over very thoroughly, revise their answers several times, and carefully make sure they have not missed anything. For these students, the correlation between the amount of time spent on an exam and their grade should be:

positive and fairly strong.

residuals

prediction errors for each observation graphically, vertical distance from point to line difference between the observed and predicted values of y residuals= y-yhat.

extrapolation

predictions made using the regression equation can only be trusted for values of x within the observed range. predicting outside the range is called extrapolation. can get ridiculous predictions.

a college newspaper interviews a psychologist about student ratings of the teaching of faculty members. the psychologist says "the evidence indicates that the correlation between the research productivity and teaching rating of faculty members is close to zero. which of the following would be a correct interpretation of the statement?

professor Mcdaniel said that, among good researchers, you can find both good and bad teachers, and the same thing goes for bad researchers. no correlation means that low values of x could have high values or low values of y. additionally high values of y could have high or low values of y.

b

r (sy/sx)

how to find the spread or variability of a histogram

range

measures of variability (spread)

range, IQR, variance, standard deviation

nine people with headaches and nine people with muscular pain are given, advil, tylenol, or excedrin. Time until they report the pain is gone is recorded for each person

response variable: time until pain is gone experimental units: 18 people factors: categorical explanatory variable: type of medication, type of pain levels: advil, tylenol, excedrin headache, muscular treatments: 3*2=6 replications: number of experiments/number of treatments= 3 replication

from 1975 to 1986 the general social survey asked its participants if they favored or opposed capital punishment. A least squares regression line was calculated to predict the percent that favored the death penalty based on the year. the least squares regression equation is yhat=-1912.49+1.003x. interpret the y intercept for the equation if appropriate

should not be interpreted

a political scientist was interested in studying America's voting habits. so, he decided to make a least squares regression equation to predict the percentage of people in a state that would vote for Obama in 2012 based on the percentage of people in a state that voted for Obama in 2008. The least squares equation is y-hat=-4.699+1.04x. The value of R-squared was 96.70% interpret the y-intercept if applicable.

should not be interpreted. when you interpret the value of the y-intercept, first check to make sure that there is data around x=0. In this case look at the scatterplot. the lowest percentage of votes is about 30% so you would not interpret the y intercept.

Variance

standard deviation squared avg squared deviation from the mean. S^2 need to take the square root before we interpret because of the units.

Data sets consist of

subjects variables

sample

subjects for whom we have data

description

summarize the data with graphs and numerical summaries. Percent, standard deviation

correlation

summarizes the direction and strength of the straight line relationship between x and y the two variables have the same correlation regardless of which one is called explanatory or the response variable. we will use the symbol r to represent the correlation coefficient. r is always between -1 and +1 no units interpretation: positive/negative strong/weak outliers can have a strong effect on r

Quantitative variables

take on numerical value. Key features are the center (avg) and spread (variability) of the data. Can be split into discrete and continuous variables.

mean

the average of all the observations. Xbar

variables

the characteristics that we measure on each subject. they can take on different values for each individual.

data was collected on the amount of food in grams served at two restaurants and the price paid for the meal. the regression line for the data appears on the first plot. the second plot shows the regression analysis conducted when the data is analyzed separately for each of the two restaurants. which of the following statements best describes the relationship between the variables?

the correlation between amount of food served and price is very strong within each restaurant but seems weak when both restaurants are combined

Why is N-1 in the denominator?

the denominator in the formula for the variance is n-1, represents the number of the degrees of freedom. This is the number of independent quantities we are adding up in the numerator. Since X bar is computed from those n observations, only n-1 distances from the observations to the mean are independent of each other.

interpreting the standard deviation s

the larger the SD the more spread out the data set is s can never be negative s can only be zero if there is no variability in the data set. all observations are identical. ex) 8,8,8,8,8,8,8,8 s is very much affected by outliers s works best for bell-shaped and symmetric distributions

mode

the most frequently occurring score(s) in a distribution

confidence interval

the range of values within which a population parameter is estimated to lie

blocked design

the same idea as matched pairs, extended to three or more treatments, each set of matched experimental units is then called a block.

standard deviation formula

the square root of the variance

standard deviation

the square root of the variance its units of measurement are the same as those of the original data.

Boxplots cannot be used

to determine if a data set is bell shaped

Bimodal

two modes ex) height of males and females

Inference

use data from a random and representative sample to draw conclusions about the population of interest. Make decisions or predictions based on data.

determine which variable should be X and Y for father's height and adult son's height.

use father's height to predict son's height. Father=X Son=Y

consider the relationship between father's height and mother's height. would that pair of variables have a strong or weaker linear relationship than father and son?

weaker, no genetic relationship

Simpson's Paradox

when averages are taken across different groups, they can appear to contradict the overall averages

this is an example of simpson's paradox because:

when the lurking variable (size of the stone) is introduced the conclusions are reversed (turns out to be less successful at removing them)

a

ybar-bxbar

predict the weight of a female college student whose height is 65 in

yhat=-200+5(65)= 125 lbs

find the equation of the least squares regression line xbar=15 sx=2 ybar=17.1 sy=3 r=.2

yhat=12.6+0.3x

Find the equation of the least squares regression line

yhat=49.98-0.998x

Variance formula

∑(x - X)²/n-1


संबंधित स्टडी सेट्स

Prep U Mastery Ch.66- Management of Patients with Neurologic Dysfunction

View Set

10B: It's said that peacock feathers are bad luck

View Set

Chapter 4- The Tissue Level of Organization (chapter questions)

View Set

Ch 5: The Working Cell (Dr. Kas)

View Set

PSY 2012 GENERAL PSYCHOLOGY QUIZ 14

View Set

НЕЙРОЛЕПТИКИ, ТРАНКВІЛІЗАТОРИ, СЕДАТИВНІ

View Set

Write the sentences in reported speech. 'I love you.' He told her that he <loved her>.

View Set

Macroeconomics chapter Exam 4 11,13 and 14

View Set

Describe the parts of a controlled experiment. Independent, dependent, controlled variables. Experimental vs. control groups

View Set

Capitulo 14 Barberia Corte de Cabello Y Peinado Para Hombres

View Set

Life and Health Insurance: Chapter 3

View Set