Stats Chapter One Quiz, Statistics Math

¡Supera tus tareas y exámenes ahora con Quizwiz!

What words to use when talking about correlation

related to, linked to, predicted

which is not a graph/chart that can be used for organizing qualitative data?

relative frequency histogram

normal distribution

•a mathematical probability distribution that follows a particular function f (X) = e-X/2/√(2π)

bias sample

•some members of the population are not as likely to be included in the sample as are others L

sample

•the part of the population about which you actually haveinformation •e.g., a handful of pennies •e.g., 100 men and women who are HIV+ and for whom you know their T4 count and HAART medication status

Sample standard deviation

s

modified box plot

whiskers extend only to values not considered outliers

In an experiment, the variable that is manipulated is called the... a) Independent variable. b) Dependent variable. c) Independant variable. d) Confounding variable.

Answer = A - Independent variable.

The data in the graph is a) Positively skewed b) Normally distributed c) Negatively skewed d) Bimodal

Answer = C - Negatively Skewed ( To right)

z score formula

z=(x-mean)/standard deviation

On average, if the samples are coming from the same population what will M1-M2 equal

zero, so therefore the graph centers over zero

LSRL equation

ŷ = a +bx ensure you use context for y-hat and the x variables. DON'T FORGET THE HAT

Population standard deviation

σ

median

•For an even number of scores, the median is the average of the two middle scores.

Interpreting computer regression output

Constant coefficient: y intercept Variable coefficient: slope Variable: explanatory variable name S: standard deviation of residuals R-sq: determination coefficient

Participant variables

Differing individual characteristics of participants in an experiment i.e. age, gender, IQ

Leptokurtic distribution

Distribution curve is very tall, thin and peaked. (Memory: Leptokurtic leaps tall buildings in a single bound.)

How to verify if a point is influential

Find the regression line both with and without the unusual point. If the line moves more than a small amount when point is deleted, the point is influential

Use of correlation for prediction

If two variables are known to be related in some systematic way, it is possible to use one variable to make accurate predictions about the other.

The term condition matches which of the following ANOVA terms

Level

Symmetrical distribution

Mirror image of the data set

Determine the type of relationship shown in (picture)

Positive

.001 < P-value < .01

Strong evidence in favor of Ha

Goal of hypothesis testing for one sample t-statistic

To see whether treatment had an effect or not; You are seeing whether treatment influences the scores and causes them mean (which is known for the population before treatment in this case) to change. The unknown population is one that exists after treatment is administered. The null hypothesis is saying that the treatment does not change. (?)

Differences in scores produced by the independent variable corresponds with which of the following ANOVA terms?

Treatment effect

A positive relationship exists when both variables increase or decrease at the same time

True

np(1-p)≥10

What condition must hold to be able to use Normal Approximation for a Binomial Probability Distribution

nominal categorical variable

a label (eye color, athlete jersey number

joint probability

a probability that corresponds to an event represented in the intersection of a row and column of a contingency table

mean

arithmetic average x̄ = sum of observations / n

descriptive statistics

consists of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures

Correlation formula relationship to z scores

correlation is looking at how far each score is away from the mean of its in compared to how far away the companion score is away from the mean of its group. The correlation Pairs each z-score with corresponding y score, and averages them by adding them together and dividing by n

experiment

deliberately imposes some treatment on individuals in order to observe their responses

sample variance s2 effect on estimated standard error

directly related to the sample variance; the larger the variance the larger the error

Use of correlation for theory verification

done through hypothesis testing

treatment

each experimental condition

positive z-score

indicates that the observation is above the mean

A correlation coefficient r was calculated to be 0.610 the coefficient of determination would be approximately

0.372

what are the mean and standard deviation equal to for a standardized variable?

mean: 0 standard deviation: 1

what is the mean and standard deviation of Z always?

mean: 0 standard deviation: 1

How could a restricted range reverse a correlation

meaning could go from a positive to a negative or negative to a positive correlation; when you widen the range, what, for example, might have looked positive is actually part of a negative correlation

Numerator of estimated d formula for one sample t statistic

measures the magnitude of the treatment effect by finding the difference between the mean of the treated sample and the mean of the untreated population

what will you not see in research articles

medians, or modes in research articles.

median

midpoint of a distribution, where half of the observations are smaller and the other half are lardger

What is the value for x^2 left for a 95% confidence interval n=18

7.564

If the null hypothesis H0: μ=15 is not rejected at ∝ =0.05 when a mean of 15 is obtained from a random sample, one could say that the

95% confidence interval for the population mean contains the value 16

df for two sample (independent) t test

n-2

interval

numbers have order but there are also equal intervals between adjacent catagories (ie temperature)

nonresponse

occurs when an individual chosen for the sample can't be contacted or refuses to cooperate

If you are using a computer printout for APA Pearson r (or t test) what would p be

p= not p≤ or >

Sample Correlation

r

reliability

repeated testings give you about the same result. When individuals are measured 2 times or more under the same conditions, they will produce the same, or nearly the same, results

Formula for r2 for independent sample t test

same as for one sample t test

the mean and variance of a poisson random variable are both equal to the value of

the rate parameter lambda

complement of event E

the set of all outcomes in a sample space that are not included in event E

What does a scatter plot allow you to predict about a correlation

the sign of the correlation and get a feeling for what the number might be (close to 0, -1.00, +1.00)

Frequency Distribution, when is it symmetric?

A frequency distribution is symmetric when a vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are approximately mirror images.

inferential statistics

uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result

Why does a large sample variance mean a large estimated standard error?

means that you are less likely to ind a significant effect. Scores are widely scattered which makes it difficult to see a consistent pattern or trend in the data and reduces the likelihood of rejecting the null hypothesis.

Another way to think about reliability

measured score=true score+error

Determine if the survey question is biased. If the question is​ biased, suggest a better wording. How often do you exercise during an average month?

No, because it does not lead the respondent to any particular answer.

nonsampling error

Nonsampling errors result from undercoverage, nonresponse bias, response bias, or data-entry error. Such errors could also be present in a complete census of the population.

Facts about correlation: #3

The correlation r itself has no units of measurement. It is just a number.

continuous variable

a quantitative variable that has an infinite number of possible values that are not countable

normally distributed variable

if the probability function is given by

discrete quantitative variable

one whose possible values can be listed

The critical value for a left tailed t test for dependent samples is ... when there are 7 degrees of freedom and ∝ =0.025

-2.365

adding or subtracting a constant to a spread of data

adds "a" to measures of center, location (mean median quartiles, percentiles) does not change shape or measures of spread (Range, IQR, standard deviation)

A complete factorial design occurs when

all levels of one factor are combined with all levels of the other factor

An equation of a regression line is y1 =4.6+3.2x what is the intercept of this line.

4.6

law of large numbers

-in a large number of independent observations of a random variable X, the average value of those observations will approximately -the larger the number of observations, the closer the average tends to be to the mean

descriptive statistics

-methods for organizing and summarizing information -can be applied to populations and samples -are graphical and numerical methods

Determining SS from s

1) square s, 2) multiply by (n-1)

Match the frequency distribution of 180 rolls of a dodecahedron​ (a 12-sided​ die) with one of the histograms shown below. (steps)

1)What are the possible outcomes: 12 2) Are any outcomes more likely than the other?: No. 3) Therefore, the graph should be uniform

What is the value for x^2 right for a 98% confidence interval when n=12

24.725

The winning team's score in 11 high school basketball games was recorded. If the sample mean is 72.3 points and the sample standard deviation is 12.0 points, find the 98% confidence interval of the true mean

52.1<μ<68.5

There are 56 runners in a race. How many ways can the runners finish​ first, second, and​ third?

56!/(56-3)!= 56x55x54=166320

Given a frequency distribution that has positive skewness, a) the mode will be greater than the median. b) the mean will be greater than the median. c) the mode will be greater than the mean. d) the distribution will have more large values than small values. e) none of the above are necessarily true

Answer = B - the mean will be greater than the median.

A​ double-blind experiment is used to increase the placebo effect.

The statement is false. Double blinding is used to decrease the placebo effect

You toss a fair coin nine times and it lands tails up each time. The probability it will land heads up on the tenth flip is greater than 0.5.

The statement is false. The correct statement is​ "You toss a fair coin nine times and it lands tails up each time. The probability it will land heads up on the tenth flip is exactly​ 0.5."

mutually exclusive

Two events that cannot occur at the same time

What words not to use when talking about correlation

caused by, leads to, influenced

explanatory variable

attempts to explain outcomes, it may influence or change the response variable (independent variable)

In order to conduct an​ experiment, 4 subjects are randomly selected from a group of 51 subjects. How many different groups of 4 subjects are​ possible?

n=51 51x50x49x48/4x3x2x1 = 249900

subjective probability

result from intuition, educated guesses, and estimates

linear transformation

µ = a + bµ σ = |b| σ add and multiply to mean just multiply to standard deviation

mean of a binomial random variable

µ=np

figures used in research articles / how probability

is discussed in the context of reporting statistical significance of study results. is usually mentioned in the methods section of a research article.

The major problem with point estimation is that it

is extremely vulnerable to sampling error

permutation

ordered arrangement of objects

simple random sampling

subjects are chosen in such a way that every subject and every possible group has an equal chance of being selected - pulling names out of a hat, picking random ID numbers

voluntary response sample

subjects choose to participate in a study. - call in poll, mail poll, text poll

What does a z-score require?

that you have the population mean and the SD

What determines the weight of each sample in pooled variance

the number of degrees of freedom. The larger the df=more weight

odds

the number of successful outcomes : the number of unsuccessful outcomes

median

the number that divides the bottom 50% of the sorted data from the top 50%

standard deviation

the population data set of N entries is the square root of the population variance. σ= √σ^2

Conditional Probability

the probability of an event ( A ), given that another ( B ) has already occurred. Denoted by P (B|A) Read as probability of B given A.

sample space

the set of all possible outcomes of a chance process

N represents _____, whereas n represents _____.

the total number of scores in the study; the number of scores in each sample

Nonparametric procedures are usually not our first choice among statistical procedures because

they are less powerful than parametric procedures

T-distribution table

two rows at the top of the table show proportions of the t distribution contained in either one or two tails, depending on which row is used; first column of the table lists the degrees of freedom for the t statistic; the numbers in the body of the table are the t-values that mark the boundary between the tails and the rest of the distribution

What do you do if the exact number of df is not found in the t-distribution table

you use the degrees of freedom above (smaller).

Standard Error (SE) of the Estimator

- The estimated standard deviation of a sampling distribution of any estimator (such as p-hat)

What is a medium effect for estimated cohen's d

.5

What is a large effect for estimated cohen's d

.8

2 tools to describe the relationship between variables

1.) Correlation 2.) Regression

If the correlation coefficient is 0.930 what is the unexplained variation?

13.5%

statistic

A ___ is a numerical value based on a sample

ratio

A variable is at the ___ level of measurement there is always an zero that is meaningful.

Numbers of errors on a test of English comprehension for ten individuals = 5; 10; 2; 0; 8; 12; 7; 6; 0; 9. What is the 50th percentile for the above data? a) 6 b) 6% c) 50% d) none of these.

Answer =

What does SS stand for? a) Sum of Squares b) The sum of the squared deviations. c) Σ(X-µ) 2 d) All of the above

Answer =

Within subjects design V's Between subjects design

Answer =

Which of the following assumptions are relevant in mixed ANOVA designs? Answer choices a) Homogeneity of regression slopes b) Sphericity only c) Homogeneity of variance and sphericity d) Multicollinearity

Answer = C - Homogeneity of variance and sphericity

Which of the following is the research methods term that describes the researcher's ability to measure what is intended? a) reliability b) dependent variable c) validity d) concept

Answer = C - Validity

Strength of effect of a Pearson correlation

Coefficient of determination (r2)

Who proposed the parameters for interpretation of r2

Cohen

response

In an experiment, the ___ variable is the one that we measure the outcome of the study.

Least-squares regression line

Least-squares regression line of y on x is the line that makes the sum of the squared residuals as small as possible (Squares b/c positive and negative cancel out)

Response variable

Measures an outcome of a study

Can two events with nonzero probabilities be both independent and mutually​ exclusive?

No, two events with nonzero probabilities cannot be independent and mutually exclusive because if two events are mutually​ exclusive, then when one of them​ occurs, the probability of the other must be zero.

What levels of measurement can data be quantitative?

Ordinal, Ratio, Interval

What are some benefits of representing data sets using frequency​ distributions?

Organizing the data into a frequency distribution can make patterns within the data more evident.

outcome

The result of an experiment

dependent

Two events E and F are ___ if the occurrence of one event effects the probability of the other.

Percentile

When a score is identified by its percentile rank

Restricted range

When you are only given data for a portion of the possible X value range. The problem with this is that you cannot make a general statement about the entire range of X values if you only have a portion of it.

to save time and money

Why are samples used instead of entire populations?

statistics

___ is the science of collecting, organizing, summarizing, analyzing, and interpreting data.

f(x) called

a probability density function

discrete random variable

a random variable whose possible values can be represented in some type of list

Hypotheses for one sample t-test for two tailed test

a. H0: μ=(population mean) b. H1: μ≠ (population mean)

ceiling effect

all the scores are squeezed together at the high end •Scores pile up toward the upper end of the distribution •because it is not technically possible to have a higher score (the measuring instrument does not go that high) even though conceptually the construct might

probability model

description of some chance process that consists of two parts: a sample space S and a probability for each outcome

Confidence level: 90%

1.64

single-blind

A ___ experiment is one in which the experimental unit (or subject) does not know which treatment he/she is receiving.

Standard Error

An estimate of the standard deviation of the sampling distribution of a any estimator

A list of 5 pulse rates is: 70, 64, 80, 74, 92. What is the median for this list? a) 74 b) 76 c) 77 d) 80

Answer = A - 74

P-value > .10

Little or no evidence in favor of Ha

random variable

A(n) ___ is a numerical measure of the outcomes of a probability experiment, so its value is determined by chance. Usually denoted with capital letters such as X.

bivariate data

data from 2 variables of a population

95

Empirical Rule: The distribution is roughly bell shaped. Approx ___% of the data lie within 2 standard deviations

Confidence interval for a population mean

estimates a likely interval for a population mean

99 percentiles divide the data set into

hundredths

P is the symbol for

probability. •For example, p < .05 •the probability is less than .05 •more specifically, isless than .05 (i.e., less than 5%)

condition to use a binomial distribution model (check if independent)

n ≤ 1/10 N as n increases, the shape of the probability distribution gets more and more normal

df for one sample t test

n-1

In a two-way ANOVA, a cell represents

one level of the independent variable called Factor A and one level of the independent variable called Factor B

categorical variable

places an individual into one of several groups or categories Ex: gender

mode

the value that occurs most frequently

The null hypothesis in the chi-squared test for independence is

the variables are independent

What is the df for the pearson r

n-2 (number of pairs -2)

What is the critical value for a right tailed t test when ∝ =0.025 and n=13?

2.179

r2 name

The coefficient of determination

variable

any characteristic of an individual

Frequency distribution shapes - skewed

positively vs negatively

control

two or more treatments should be compared

lack of realism

if experiment is not conducted in a realistic setting, the results may not be generalized to another setting

Non directional (two tailed) hypotheses for Pearson r

i. H0:⍴=0 ii. H1:⍴≠0

Directional (one tailed) hypotheses for pearson r

i. H0:⍴≥ or ≤0 ii. H0:⍴> or <0

Sums of products of deviations (SP)

Numerator of the formula for Pearson r; Is similar to the sums of squares because if you were to replace the Y value with a second X value then you would have the SS; Used to measure the amount of variability between two variables

simple random sample

sample chosen in such a way that each possible sample of a given size is equally likely to be the one obtained

Denominator of estimated d formula for one sample t statistic

sample standard deviation to standardize the mean difference into standard deviation units.

Interpreting the one sample t test SPSS printout

the crucial number is the sig value

residual

the difference between the observed value of the response variable and the value predicted by the LSRL

sample distribution

the distribution of sample data

in a skewed frequency table what type of representative value is furthest out in the tail?

the mean

treatment

the specific experimental condition applied to the units. a treatment can have several factors and levels

Only way to get a negative correlation using the pearson r formula

to have a negative in the numerator (SP), you cannot have a negative in the denominator (SS)

When an experiment design has two factors and both factors are tested using related samples, we should perform a

two-way within-subjects ANOVA

Evaluate the given expression and express the result using the usual format for writing numbers​ (instead of scientific​ notation). 59 Upper P 2 ! = product of whole numbers from 1 to n

(59!/57!) = not correct form but correct overall 3422 = 58*59; because those are the only factorials not included up to 57 nPr= n!/(n-r)!

What is the value of ∝ used in describing the confidence interval shown below (98%)

0.02

Find the P-value for the test value t=1.61, n=15, right tailed

0.0649

The independent variable matches which of the following ANOVA terms?

Factor

Time related variables

Fatigue, practise

General idea of regression lines

Model for the data: the equation of a regression line gives compact mathematical description of what the model tells us about the relationship between the response variable y and the explanatory variable x

.01 < P-value <.05

Moderative evidence in favor of Ha

How many degrees of freedom do we have in significance testing of r?

N - 2, where N equals the total number of pairs of scores

Problems w/ positive and negative association

Not all relationships have a clear direction that we can describe as a positive association or negative association

rank-order (ordinal) variable

Values have an inherent order Military rank, race finish position

continious variable

Variable where there is an uncountable number of possible outcomes, represented by an interval on a number line.

Spurious relationship

Variables are not related to each other (False relationship)

∑P(x)=1 0≤P(x)≤1

What are the two rules for the probability distribution table?

standard deviation

a computed measure of how much scores vary around the mean score Allow us to compare measurements of 2 completely different things )ie SAT vs ACT)

statistic

a descriptive measure of a sample

experiment

an action whose outcome cannot be predicted with certainty

simple event

an event that consists of a single outcome

Ordinal

assign numbers to objects but these numbers have meaningful order (ie 1st place, 2nd place, 3rd place)

Example of significance determined by computer printout for two tailed one sample t-test

if had .08 two tailed would not have significance at .05 level for two tailed

A​ motorcycle's fuel efficiency represents the ninth decile of vehicles in its class. Make an observation about the​ motorcycle's fuel efficiency.

The​ motorcycle's fuel efficiency is greater than the fuel efficiency for​ 90% of vehicles in its class

percentages of normal curve

This graph shows the standardized normal graph with the percentage of results (data) that will fall between standard deviations on that graph. For example, 68.27 percent of results will fall within one standard deviation of the mean. On this graph, it's represented by two z-scores from the z table: the area between z = -1 and z = 1. 2%, 13.5%, 34%, 34%, 13.5%, 2%

Ellipse on correlation scatter plot

if you draw a circle around all the points on the graph

How could a restricted range hide a correlation that actually exists

if you look at a restricted range and it appears that there is no correlation based on the scatter plot, but if you widen the range to the full possible values, could actually really be a correlation

event "A and B"

intersection of A and B A∩B

•Figures used in research articles / how Sample selection

is usually mentioned in the methods section of a research article

Odds ratio example

lung cancer present/ absent: Smokers: 40/60 =0.67 Nonsmokers: 10/90 =0.11 0.67:0.11= smokers are 6x more likely to develop lung cancer than nonsmokers

What does smaller sample sizes do to critical values of df

makes them larger and larger meaning that you would have to have huge critical values for significance

The normal distribution provides a basis for the understanding of: a) Using Probability in statistics b) Significance Testing c) Confidence levels d) All of the above

Answer = D - All of the above

How does the estimated standard error of the differences between means measure the distance between (M1-M2 and μ1-μ2)

By definition the standard distance between the sample statistic (M1-M2) and the corresponding population parameter (μ1 - μ2)

curved pattern in residual plot

shows that a straight line model is not appropriate for the data

Critical value

t*n-1 degrees of freedom = n-1 t* = critical value for t-distribution

z-score

tells you how many standard deviations a point is from the mean

How does small sample size (small df) influence the shape of the family of t-curves

tend to be flatter and more spread out (more variable)

relative frequency table

the PERCENT of individuals who fall into each category

nominal level of measurement

variable is at the nominal level of measurement if the values of the variable name, label, or categorize. In addition, the naming scheme does not allow for the values of the variable to be arranged in a ranked or specific order.

LSRL slope interpretation

where ŷ is the predicted ____, the slope is a prediction that with every increase in x, the y value will increase in "b"

Post hoc comparisons are used in one-way ANOVA to compare all the possible

pairs of means from a factor, one pair at a time, to determine which are significantly different from each other.

sample

part of the population we actually study to gather information

inflection points

points at which the curvature changes, located at a distance σ on either side of the mean

How does the APA write up change if you are using a computer printout (?)

report p= instead of p< or p≥; You are giving the exact p value, the exact alpha level (exact probability) associated with the t value (?)

What is the first step to calculating a Pearson r

set up a five column table with x2, x, xy, y, and y2 and calculate the ∑ for each column before you start the problem

If the correlation coefficient is 0.790 what is the variation?

62.4%

The data in the graph is a) Positively skewed b) Normally distributed c) Negatively skewed d) Bimodal

Answer = D - Bimodal (Twin Peaks)

How to determine if relationship b/w explanatory and response variable

- Make scatterplot and look for overall pattern; if linear, find regression line and plot it - Look at size of residuals - Look at residual plot - Find r2 and s to determine how well the line describes data and how large our prediction errors will be

P-value

- Measures strength of the evidence against the null hypothesis in favor or the alternative hypothesis

Use the diagram to the right to answer the question. What is the probability that a registered voter did not vote in the​ election?

.536

A physics class has 50 students. Of​ these, 17 students are physics majors and 18 students are female. Of the physics​ majors, seven are female. Find the probability that a randomly selected student is female or a physics major.

.560 you subtract the 7 female physics majors from the 17 (18/50)+(10/50)

A physics class has 40 students. Of​ these, 16 students are physics majors and 17 students are female. Of the physics​ majors, seven are female. Find the probability that a randomly selected student is female or a physics major.

.650

A random sample of 70 printers discovered that 20 of them were being used in small businesses. Find the 99% limit for the population proportion of printers that re used in small businesses.

0.146<p<0.425

Steps to calculating t for independent sample t test

1) calculate the pooled variance 2) calculate the estimated standard error of the differences between means 3) calculate t

The prices (in dollars) for a graphing calculator are shown below for 8 online vendors. Estimate the true mean price for this particular calculator with 95% confidence (124, 129,130, 155, 134, 158, 150, 142)

136.6<μ<147.7

APA formatting one sample t test example

College students average hours of sleep per night (M=6.417, SD=1.021) was significantly less than the populations average of 7.5, t(5)=-2.597, p≤.05, one tailed, Cohen's d=1.061

event

a subset of the sample space.

statistically significant

an observed effect so large, that it would rarely occur with chance

How to use SPSS to conduct a hypothesis test using Pearson r

analyze- correlate; bivariant- pull the variables into the box that you want to measue- check one or two tailed- check if you want to flag significant correlations or not

experimental unit

individual on which the experiment is done

Determine whether the data set is a population or a sample. Explain your reasoning. The number of cars for 10 households in a neighborhood of 30 households

Sample, because the collection of the number of cars for 10 households is a subset of all households in the neighborhood.

Population Variance

σ²

confounding

occurs when two variables are associated in such a way that their effects on a response variable cannot be distinguished from each other no conclusion, can't tell which one is better.

When scores for responses to a number of items are added together to provide a composite score, this is known within SPSS for Windows as: a) COMPOSITE b) RECODE c) COMPUTE d) VARIABLE

Answer = C - COMPUTE

variable

a characteristic that varies from one person or thing to another

random selection

Every member of the population has an equal chance of being chosen to be a member of the sample

6 . The median of the numbers 3, 4, 6, 8, 9 is: a) 8 b) 7 c) 5 d) 6

Answer = D - 6

Assignment Bias

Factors that skew the results of a study

Statistical Hypoteheses

Statements about population parameters

mean of difference of random variables

ux = µx - µy

Limitations of regression and correlation: 3

Correlation and least-squares regression lines are not resistant. They are affected by outliers.

Randomisation (to groups)

Helps remove Holding constant and matching across groups (extraneous variables)

explanatory

In an experiment, the ___ variable is the one that is manipulated.

interquartile range

Q3-Q1

Graph for displaying relationship between two quantitative variables

Scatterplot

Tail of the distribution

The section where the scores taper off toward one end of a distribution

What parametric test is analogous to the two-way chi-square?

There is no analogous parametric test

expected value/expectation

"mean" (of a random variable)

percentile value

% of scorees

Calculating Probability

(# favorable outcomes)/(total # possible outcomes)

A recent study of business travelers claims they spend an average of $41.00 per day on meals. As a test of this claim, a random sampling of 16 business travelers found they had spent an average of $45.00 per day with a standard deviation of $3.65.What are the critical values for a two tailed t test of this claim with ∝ =0.05?

+ or - 2.131

At certain university, the average attendance at basketball games has been 3225. This year the attendance for the first 12 games has been 2815 with a standard deviation of 635. The athletic director claims that the attendance is the same as last year. If ∝=0.05 what are the critical values for this two tailed t test?

+ or - 2.201

How can we calculate the the Margin of Error (ME) more accurately?

- By using the standard normal table to find the critical value (z*) such that a certain percent of the area beneath the standard normal curve is between -z* and z*

Sampling distribution for a sample mean

- If the standard deviation of the sampling distribution of y-bar is small, then the sampling distribution is closely concentrated about mew and y-bar will likely be close to mew - If the standard deviation of y-bar is large, then y-bar is more likely to be far from mew

Confidence interval for the population mean

- Important to keep im mind that unlike with the population proportion for binary variables, the mean does NOT completely describe the population - Just as with proportions, inferences about a population mean will be based on a simple random sample from the population - We will estimate the population mean mew (a parameter) with the sample mean, y-bar (a statistic)

Two very important properties of simple random sampling:

- It yields unbiased estimates of population means and proportions - The variability of an estimator (p-hat or y-bar) can be estimated from the data (this is what standard error does)

Will a 90% confidence interval for p be wider or narrower than the 99% interval?

- Narrower - 90% confidence interval ---> z*= 1.685 - 99% confidence interval ---> z*= 2.576

HYPOTHESIS TEST FOR A MEAN

- P-values have the same meaning - n-1 = degrees of freedom

What should you always do to the degrees of freedom if the desired line is not found in the T table?

- Round down: We want to be conservative - going up in degrees of freedom would make the interval narrower

a. Randomization Condition

- The Sample should be a simple random sample from the population. - If an experiment, the subjects should have been randomly assigned to treatments.

1. Independence Assumption

- The individuals in the sample must be selected independently of each other. Often the assumption is checked considering the (a) Randomization condition and the (b) 10% condition.

Large P-value

- The sample result would not be unusual if the subject guessed and we would not be convinced the subject has ESP - Fail to reject Null Hypothesis

b. 10% Condition

- The sample size n should be no more than 10% of the population size. (sampling a large proportion of the population reduces sampling variability (for random samples))

Appropriate interpretation of the confidence interval for sample means

- We don't know whether or no this particular interval contains the true mean net weight mew, but 95 percent of intervals constructed in this way (from random samples) do contain mew

how do you define an outlier?

- an observation more than 2 sd away. -call an observation an outlier if it falls more than 1.5 x IQR above the third quartile or below the first quartile

properties of a probability density function

- f(x) is continuous - f(x) greater than or equal to 0 - the total area under the f(x) is equal to 1 -it is not necessarily the case that f(x) less than or equal to 0

basic rules of probability

- for any event, A, 0≤ P(A) ≤1 -all possible outcomes together must have probabilities whose sum is 1. -if all outcomes in the sample space are equally likely, the probability that event A occurs can be found using P(a) = number of outcomes/ total number of outcomes

At certain university, the average attendance at basketball games has been 3225. Due to the dismal showing of the team this year, the attendance for the first 8 games has averaged only 2915 with a standard deviation of 535. The athletic director claims that the attendance is the same as last year. What is the test value needed to evaluate the claim?

-2.17

Using the z test, find the critical value for an ∝=0.015 left tailed test

-2.17

If the equation for the regression line is y1 =6x+5 then the value of x= -2will result in a predicted value for y of

-7

compare a frequency histogram to a relative-frequency histogram

-almost identical -scale on vertical axis is different (frequency: count, relative frequency: proportion)

density curve

-always on or above the horizontal axis -has an area of one under uneath the curve -describes the pattern of the distribution. area under the curve and above any interval of values on the horizontal axis is the proportion of all observations that fall in that interval median=area's point mean=point at which the curve would balance

impossible event

-an event that cannot occur -has probability 0

certain event

-an event that must occur -has probability 1

what are the sampling schemes that match each of these: simple random sample, stratified random sample, and cluster sample?

-completely randomized design -randomized complete block design -no experimental design taught in this course

steps of taking a cluster sample

-divide the population into groups (clusters) -obtain a simple random random sample of the clusters -use all members of the clusters obtained in the previous step

order the steps of stratified random sampling with proportional allocation

-divide the populations into subpopulations called strata -from each stratum, obtain a simple random sample of size proportional to the size of the strata -use all members obtained as the sample

other things to look at in residual plot

-individual points with large residuals are outliers because they lie outside the straight-line pattern -individual points that are extreme in the x direction may not have large residuals, but can be very important

inferential statistics

-methods for drawing and measuring the reliability of conclusions about a population based on information obtained from a sample of the population -only performed on populations -conclusions from the sample are inferred to the population -typically random and include at lease one inference

the relationship between individuals randomly assigned to groups and individuals who are randomly selected

-randomly assigned and randomly selected: inference about pop: YES. Inference about cause and effect: YES - randomly assigned, but not randomly selected. inference about pop: NO. inference about cause and effect: YES - not randomly assigned, but randomly selected. Inferences about pop: YES. Inference about cause and effect: NO -not randomly assigned nor selected. Inference about pop: NO. inference about cause and effect: NO randomly assigned is with cause and effect. randomly selected is with population.

Bernoulli

-repeated trials of an experiment -each trial has two possible outcomes, (s for success and f for failure) -trials are independent -the probability of a success (success probability) and denoted p remains the same from trial to trial

designed experiment

-researchers impose treatments and controls and then observe characteristics and take measurements -can help establish causation

observational study

-researchers simply observe characteristics and take measurements -can reveal only association

representative sample

-sample that reflects as closely as possible the relevant characteristics of the population

binomial distribution

-the distribution that provides a formula for finding the probabilities associated with the number of successes in a sequence of n independent Bernoulli trials, each having the same probability of success, p

binomial random variable

-the random variable X that has a binomial distribution with parameters n and p

A probability experiment consists of rolling a sixteen​-sided die and spinning the spinner shown at the right. The spinner is equally likely to land on each color. Use a tree diagram to find the probability of the given event. Then tell whether the event can be considered unusual. ​Event: rolling a 9 and the spinner landing on green

.016 yes ​, because the probability is 0.05 or less.

Use the bar graph​ below, which shows the highest level of education received by employees of a​ company, to find the probability that the highest level of education for an employee chosen at random is Upper E. The probability that the highest level of education for an employee chosen at random is Upper E is ________________

.061

A probability experiment consists of rolling a eight​-sided die and spinning the spinner shown at the right. The spinner is equally likely to land on each color. Use a tree diagram to find the probability of the given event. Then tell whether the event can be considered unusual. ​Event: rolling a number less than 3 and the spinner landing on green

.063 D. No​, because the probability is not close enough to 0.

Two cards are selected from a standard deck of 52 playing cards. The first card is not replaced before the second card is selected. Find the probability of selecting a spade and then selecting a diamond.

.0637

A coin is tossed and a six​-sided die numbered 1 through 6 is rolled. Find the probability of tossing a head and then rolling a number greater than 5.

.083

What is a medium effect for r2

.09

A probability experiment consists of rolling a fair 10​-sided die. Find the probability of the event below. rolling a 4

.1

A company is conducting a survey to determine how prepared people are for a​ long-term power​ outage, natural​ disaster, or terrorist attack. The frequency distribution on the right shows the results. Use the table to answer the following question. What is the probability that the next person surveyed is very​ prepared?

.115

The probability that an event will not happen is Upper P left parenthesis Upper E prime right parenthesisequals0.84. Find the probability that the event will happen.

.16

A probability experiment consists of rolling a fair 10​-sided die. Find the probability of the event below. rolling a number greater than 8

.2

What is a small effect for estimated cohen's d

.2

Use the pie chart at the​ right, which shows the number of tulips purchased from a nursery. Find the probability that a tulip bulb chosen at random is yellow.

.25

What is a large effect for r2

.25

A doctor gives a patient a 80​% chance of surviving bypass surgery after a heart attack. If the patient survives the​ surgery, then the patient has a 55​% chance that the heart damage will heal. Find the probability that the patient survives the surgery and the heart damage heals.

.44

What is the probability that a registered voter voted in the​ election?

.459

A probability experiment consists of rolling a fair 15​-sided die. Find the probability of the event below. rolling a number divisible by 2

.467

A physics class has 40 students. Of​ these, 18 students are physics majors and 17 students are female. Of the physics​ majors, two are female. Find the probability that a randomly selected student is female or a physics major.

.825

Use the pie chart at the​ right, which shows the number of workers​ (in thousands) by industry for a certain country. Find the probability that a worker chosen at random was not employed in the manufacturing industry.

.891

Use the frequency distribution to the​ right, which shows the number of voters​ (in millions) according to​ age, to find the probability that a voter chosen at random is in the given age range. not between 18 to 20 years old

.942

A random number generator is used to select a number from 1 to 200 ​(inclusively). What is the probability of selecting the number 238​?

0

If a sample mean has a value equal to µ, the corresponding value of t will be equal to

0.0

According to Beautiful Bride magazine the average age of a groom is now 26.2 years. A sample of 16 prospective grooms in Chicago revealed that their average age was 26.6 years with a standard deviation of 5.3 years. What is the test value for a t test of claim?

0.30

If the probability of a type II error in a hypothesis test is =0.30 and ∝ =0.05 the power of this test is

0.70

An educational psychologist is interested in determining whether attitudes toward school change with age. She randomly selects 30 seventh-graders, 30 ninth-graders, and 30 eleventh-graders and administers a "Do you like school?" test. How many factors does this experiment have?

1

How many degrees of freedom do you loose for each sample

1

Determine the number of outcomes in the event. Decide whether the event is a simple event or not. Upper A computer is used to select randomly a number between 1 and 9 comma inclusive. Event Upper A is selecting a number greater than 8.

1 outcome yes, because event a has exactly one outcome

Steps to calculate midpoint

1) Add lower and upper limits together 2) Divide by 2

Assumptions for independent sample t test

1) Random sampling if possible but definitely random assignment for a true experiment independent 2) independent observations 3) assume that the scores are coming from a normally distributed population 4) homogeneity of variance

Steps to calculating the t-staitistic

1) calculate the mean for the sample 2) calculate the SD (s) for the sample 3) calculate the estimated standard error of M for the sample 4) calculate t

Steps to SPSS for one sample t-test

1) click analyze, compare means, one sample t test 2) highlight the variable that you want to test and move it into the test variable box 3) change the test value to the population mean 4) press okay 5) interpreting

Decision process of determining significance for independent sample t test using SPSS

1) determine equality of variances 2) find row 3) after you determine what row to look in, it is no different than reading a one sample t-test

Three things that you need to know about z scores and correlation

1) if you were to figure out the z-scores for each number in each variable, when the z-scores match perfectly in terms of sign and value you have a perfect positive (+1.00) correlation 2) if the z-scores match perfectly in terms of value but have opposite signs, you have a perfect negative (-1.00) correlation 3) to the extent that the signs and values do not match, the correlation will fall towards zero. If they are mostly matching up in terms of signs, then you have a pretty positive correlation. If very few of them are matching up in terms of sign and value, then you have a correlation that is approaching zero.

What are the assumptions that are similar in one sample and independent sample t test

1) independent observations, 2) normal distribution

What two criteria must you meet to perform a correlation

1) interval or ratio data 2) relationship has to be linear

Three issues to be aware of with the Pearson r correlation

1) issue of causality 2) Outliers 3) Restricted range

Function of the estimated standard error of the difference between means

1) it measures the distance between (M1-M2) and (μ1 - μ2) 2) measures the standard, or average size of the difference between (M1-M2) if the null hypothesis was not false. That is, it measures how much difference is reasonable to be expected between two sample means.

Assumptions for the one sample t test

1) population sampled must be normally distributed 2) random sampling 3) independent observations

Factors that influence the SE

1) sample variance, 2) sample size

Steps to t-statistic hypothesis test

1) state the hypotheses, 2) locate the critical region, 3) compute t-statistic, 4) make a decision

Three characteristics of a correlation

1) the direction of the relationship (sign), 2) the form of the relationship (we will be using Pearson r which measures linear), 3) the strength or consistency of the relationship (the number)

how to make a residual plot

1. Determine LSRL using LinReg(a+bx), L1 L2 and Y1. Store equation into Y1. 2. In L3, calculate the residuals using the formula L2 - Y1(L1) 3. in the y=windown, deactivate the LSRL and activate the line y=0 4. in a statplot, create a scatter plot with L1 and L3 5. graph: zoomstat 6. check dfor any obvious patterns that would suggest the linear model is not appropriate.

Notes about simulated sampling

1. Distributions become centered near the true proportion 2. As the number of simulations increases, the variability in the distribution decreases 3. As sample size increases, the sampling distribution becomes more "normal"

Rules for grouped frequency tables

1. Have a total of about 5-15 intervals 2. Try to avoid having intervals with no scores but if an interval does have no score it still much be shows next 3. All intervals MUST be the smw width 4. Interval width is usually an integer 5.M U ST LOWER NUMBER each interval should be evenly divided by interval widt

Procedure for hypothesis testing

1. Hypothesis - identify population parameter - state null and alternative hypotheses 2. Data collection - collect data from a random sample - see wheter random sample data tend to support or refute null hypothesis 3. Model and Test Statistic - use appropriate z/t test - check conditions (nearly normal, mean p0, std) - compute test statistic - how many std our observed p-hat is from what we would expect if null was true 4. Check assumptions - independence assumption - sample size assumption - success/failure condition 5. Mechanics - assumptions satisfied, proceed with calculation of test statistic and p value 6. Conclusion - accept/reject null hypothesis

How large is sufficiently large for the normal approximation to be accurate? It depends on the population distribution (means)

1. If the population distribution is fairly symmetric, then the sampling distribution of y-bar is approximately normal even for samples of size 5 or so 2. If the population distribution is extremely skewed or there are large outliers, then the sampling distribution of y-bar may not be accurately approximated by a normal model unless the sample size is large, around 30 or more. - The more skewed the population distribution, the larger the sample size needed for he normal model to be a good approximation

Assumptions and Conditions of the normal-based confidence interval:

1. Independence Assumption a. Randomization Condition b. 10% Condition 2. Sample Size Assumption

Assumptions and Conditions for using the normal model to approximate the sampling distributions for p hat

1. Independence assumption a. Radomization condition b. 10% condition 2. Sample size assumption

Properties of t-distributions

1. They are symmetric about 0 and mound-shaped (like the standard normal distribution) 2. The t-distribuiton has wider tails than the normal distribution to account for the extra variability in estimating sigma 3. The degrees of freedom completely specify which t-distribution to use 4. As degrees of freedom increase, the t-distribution gets closer to the standard normal curve

Geometric Distribution (Conditions)

1. Trial is repeated until success. 2. Repeated trials are independent of the other. 3. The probability of success P is the same for reach trial. 4. The random variable x represents the number of the trial in which success occurs. P(x) =pq^ x-1

finding a normal proportion/probability

1. state the problem in terms of the variable x. write a probability statement 2. standardize the variable into a z-score 3. restate the problem in terms of z. 4 draw the standard normal curve and label. shade the area in question 5 find the required area using table A or our calculator. 6. answer question in contez

How to make a scatterplot

1.) Decide which variable should go on each axis 2.) Label and scale your axes - Don't start at (0,0) - Start scale to highlight main body of points 3.) Title your plot 4.) Plot individual data values

Procedure with 2 variable statistics

1.) Plot data and calculate numerical summaries 2.) Look for overall patterns and deviations from those patterns 3.) When there's a regular overall pattern, use a simplified model to describe it

the area under each chi square distribution is equal to

1.00

What value of z ∝⁄2 is used in confidence interval shown below (90%=μ)

1.65

At a certain university, the average cost of books was $320 per student last semester and the population standard deviation was $85. This semester a sample of 40 students revealed an average cost of books of $345 per student. The dean of students believes that the costs are greater this semester. What is the test value for this hypothesis?

1.86

Confidence level:95%

1.96

How many different​ 10-letter words​ (real or​ imaginary) can be formed from the following​ letters? nbsp Upper T comma Upper N comma Upper A comma Upper A comma Upper D comma Upper E comma Upper N comma Upper D comma Upper T comma Upper N

10!/2!x3!x2!x2!x1!= 75600

What is the value of y1 when x=3 if the equation of the regression lines is y1 =23.1-3.8x?

11.7

example 2

19) A farmer wishes to test the effects of a new fertilizer on her tomato yield. She has four equal-sized plots of land-- one with sandy soil, one with rocky soil, one with clay-rich soil, and one with average soil. She divides each of the four plots into three equal-sized portions and randomly labels them A, B, and C. The four A portions of land are treated with her old fertilizer. The four B portions are treated with the new fertilizer, and the four C's are treated with no fertilizer. At harvest time, the tomato yield is recorded for each section of land. a) Identify the experimental units. Dirt b) What is the treatment in this experiment? Fertilizer c) What is the response variable in this experiment? tomato yield d) How many levels does the treatment in this experiment have? 3 old fertilizer, new fertilizer and no fertilzer e) What type of experimental design is this? (random, block, matched-pairs, or single-blind) block: four different soil types

Confidence level: 99%

2.58

Identify the sample space of the probability experiment and determine the number of outcomes in the sample space. Randomly choosing a multiple of 4 between 20 and 40 comma inclusive

20,24,28,32,36,40 there are 6 outcomes

find percentage

21) Given the following table, where people were asked which of 3 paintings they liked the best, create a relative frequency distribution for men and for women: (round to hundredths place) Men Women Painting A 38 15 Painting B 25 31 Painting C 10 12 Total Relative Frequency for Men Relative Frequency for Women Painting A Painting B Painting C Total for Men: 73. Then find percent for each painting. A = 38/73=0.52 B=25/73=0.34 C=10/73=0.14 Total for Women: 58 A=15/58=0.26 B=31/58=0.53 C=12/58=0.21

a researcher reports a one-sample t-test with df = 24, how many individuals participated in this study?

25

What are the degrees of freedom for an independent samples t-test that uses one sample with n = 13 and one sample with n = 15?

26

A report states that 42% of home owners has a vegetable garden. How large a sample is needed to estimate the true proportion of home owners who have vegetable gardens to within 6% with 95% confidence?

260

A restaurant offers a​ $12 dinner special that has 6 choices for an​ appetizer, 12 choices for an​ entrée, and 4 choices for a dessert. How many different meals are available when you select an​ appetizer, an​ entrée, and a​ dessert?

288

A college believes that 26% of applicants to that school have parents who have remarried. How large a sample is needed to estimate the true proportion of students who have parents who have remarried within 6% points with 99% confidence.

356

A study of 35 white mice showed that their average weight was 4.2 ounces. The standard deviation of the population is 0.6 ounces. Which of the following is the 98% confidence interval for the mean weight per white mouse.

5.04<μ<5.36

Identify the degree of confidence displayed in the confidence interval shown below (∝⁄2=0.05 and ∝⁄2=0.05)

90%

Explain how to use random number assignment Use the row of numbers to generate 12 random numbers between 01-99 78086 85201 etc

99 is a two digit number. Break up the numbers in twos: 78 | 08 | 68 | 52 | 01 | Numbers that start with zero are singular. Numbers that end with a single digit pick up the second digit from the next number.

If the null hypothesis H0: μ=14.0 is rejected at ∝ =0.01when a mean of 19 is obtained from a random sample, one could also say that the

99% confidence interval for the population mean does not contain the value 16.

For Factor A, you have timed the rate at which participants can solve puzzles under three conditions of noise: high, medium, and low. In addition, for Factor B the participants have received either no caffeine or a high level of caffeine (equivalent to 4 cups of coffee). What kind of design do you have?

A 3 ´ 2 between-subjects, factorial design

Is it a simple event? tossing heads and rolling a 3

A = {H3} Event A has one outcome, it is simple

Patricia has collected data on 30 individuals. She measured the pulse rate of 15 individuals after they had watched a violent film clip. She measured the pulse rate of the other 15 people after they had watched a non-violent film clip. Because she knows her study does not meet the assumptions for a two-tailed t-test, what nonparametric test should she run?

A Mann-Whitney U

pie chart

A ___ is a circle divided into sectors. Each sector represent a category of data. The area of each sector is proportional to the frequency of the category.

census

A ___ is a list of all individuals in a population along with certain characteristics of each individual.

sample

A ___ is a subset of the population that is being studied.

placebo

A ___ is an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication.

bar graph

A ___ is constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency on the other axis. The width of each rectangle is the same and do not touch.

frequency distribution

A ___ lists each category of data and the number of occurrences for each category of data.

relative frequency distribution

A ___ lists each category of data with the relative frequency.

cluster

A ___ sample is obtained by selecting all individuals within a randomly selected collection or group of individuals.

stratified

A ___ sample is obtained by separating the population into homogeneous, non overlapping groups, and then obtaining a simple random sample from within each group.

convenience

A ___ sample is one in which the individuals in the sample are easily obtained. (These results should be looked upon with skepticism.)

control group

A ___ serves as a baseline treatment that can be used to compare to other treatments.

confounding

A ___ variable in a study occurs when the effect of two or more explanatory variables is not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study. When the effects of the explanatory variable upon the response variable cannot be determined, then ___ has occurred.

continuous

A ___ variable is a quantitative variable that has an infinite number of possible values it can take on and can be measured to any desired level of accuracy.

discrete

A ___ variable is a quantitative variable that has either a finite number of possible values or a countable number of possible values.

confounding

A ___ variable is an explanatory variable that was considered in a study whose effect cannot be distinguished from other explanatory variable(s) in the study.

lurking

A ___ variable is an explanatory variable that was not considered in a study, but affects the value of the response variable in the study.

census

A census is a list of all individuals in a population along with certain characteristics of each individual.

Negative correlation

A correlation where the two variables tend to go in opposite directions. As the X variable increases, the Y variable decreases. That is, it is an inverse relationship. The sign of the correlation is separate, so (-1.00) indicates a perfectly consistent relationship just like (+1.00). Also known as an inverse relationship. If the numbers go from upper right to lower left.

Frequency Distribution, when is it uniform?

A frequency distribution is uniform when all​ entries, or​ classes, in the distribution have equal or approximately equal frequencies

Inverse correlation

A higher level of one variable is associated with a lower level of the other variable Variables are moving in differing directions

Histogram

A histogram is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other

normal z-scores

A normal probability plot is a graph that plots the observed values on the x-axis and the ___ on the y-axis.

resistant

A numerical summary of data is said to be ___ if extreme values (very large or small) relative to the data do not affect its value.

If you are interested in how well students perform on a standardized math achievement test after they have completed a six-week math unit in either a computer-assisted class, a videotaped course, or a regular classroom, what kind of design do you have?

A one-way design

pie chart

A pie chart is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category

Buffering effect

A process in which a psychological resource reduces the impact of life stress on psychological wellbeing. Having a resource contributes to adjustment because persons are less effected by negative life events

Residual plot

A residual plot is a scatterplot of the residuals against the explanatory variable. Residual plots help us assess how well a regression line fits the data.

Independent sample t test

A statistical test that measures if two groups are similar to each other. Each sample is a different group of people. Instead of comparing a sample against the population, you are comparing samples to other samples. Used when you do not have the population means, but instead are comparing sample mean to sample mean. Data is analyzed by taking the difference between sample means and to determine whether that difference is so different that it is significant.

Correlation matrix

A table that is used to report multiple correlations when you have several variables and the correlations between all possible pairings. Footnotes are used to indicate which correlations are significant

unusual event

A(n) ___ is an event that has a low probability of occurring.

Quasi - experiment

An empirical interventional study used to estimate the causal impact of an intervention on target population with random assignment - No random allocation, control group

What is the difference between an outcome and an​ event?

An outcome is the result of a single probability experiment. An event is a set of one or more possible outcomes.

Outlier (definition)

An outlier is an observation that lies outside the overall pattern of other observations. Points that are outliers in the y direction but not the x direction of a scatterplot have large residuals. Other outliers (large in x direction but not y direction) may not have large residuals.

Pooled variance

An unbiased measure. Used to calculate the standard error of the differences between means. It is is the method for correcting the bias that would happen if two unequal sized samples were treated as equal in the final calculations. It allows the larger sample to carry more weight in determining the final value, thus eliminating the bias.

A market researcher obtains a sample of 50 people by standing outside a store and asking every 20th person who enters the store to fill out a survey until she has 50 people. What sampling method is being used here? Will the resulting sample be a random sample? Will it be a simple random sample? Explain your thinking.

Answer =

A researcher randomly assigned 50 students to take a test in a hot room and 50 students to take the same test in a cold room. The researcher recorded the percent correct on the test. The researcher gave each group 50 minutes to take the test. Which of the following is a correct statement? a) The independent variable is the temperature of the room -- hot vs. cold. b) The independent variable is the percent correct on the test. c) The independent variable is that each group had 50 minutes to take the test. d) There are no independent variables in this study. There are only subject variables.

Answer =

Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. 1) The subjects in which college students major. A) Ratio B) Ordinal C) Nominal D) Interval 2) Amount of fat (in grams) in cookies. A) Nominal B) Interval C) Ordinal D) Ratio

Answer =

Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. The sample of spheres categorised from softest to hardest. A) Ordinal B) Nominal C) Ratio D) Interval 5) Temperatures of the ocean at various depths. A) Interval B) Ordinal C) Nominal D) Ratio

Answer =

Identify the number as either continuous or discrete. 1) The total number of phone calls a sales representative makes in a month is 425. A) Continuous B) Discrete 2) The number of limbs on a 2-year-old oak tree is 21. A) Continuous B) Discrete

Answer =

Identify the sample and population. Also, determine whether the sample is likely to be representative of the population. In a poll of 50,000 randomly selected college students, 74% answered "yes" when asked "Do you have a television in your dorm room?".

Answer =

If distribution A is mesokurtic, which of the distributions is platykurtic? a) Distribution B b) Distribution C c) Distribution D d) None of the distributions are platykurtic.

Answer =

If we want to generalise what we know about a sample to a population, we must ensure that a) the sample is sufficiently large. b) the sample is selected in a random manner from the population. c) the sample consists of the entire population. d) Answers A and B.

Answer =

In a choice reaction time experiment, one of four lights comes on. If the first light comes on, the participant is to press the first of four keys. If the second light comes on, the participant is to press the second of four keys, and so on. The time between when the light comes on and when the key is pressed is recorded. What is the level of measurement for this study? a) Nominal b) Ordinal c) Interval d) Ratio

Answer =

In a personality inventory, people are asked to rate how much they agree with a given statement. If they strongly agree with the statement, they circle the number 1. If they agree with the statement, they circle the number 2. If they disagree with the statement, they circle the number 3. If they strongly disagree with the statement, they circle the number 4. What is the level of measurement for this inventory? a) Nominal b) Ordinal c) Interval d) Ratio

Answer =

The median is another name for a) the difference of the first and third quartiles. b) the 50th percentile. c) the mean. d) the square root of variance.

Answer =

Use critical thinking to address the key issue. A researcher wished to gauge public opinion on gun control. He randomly selected 1000 people from among registered voters and asked them the following question: "Do you believe that gun control laws which restrict the ability of Americans to protect their families should be eliminated?". Identify the abuse of statistics and suggest a way the researcher's methods could be improved.

Answer =

Use critical thinking to develop an alternative conclusion. A study shows that adults who work at their desk all day weigh more than those who do not. Conclusion: Desk jobs cause people to gain weight.

Answer =

Validity a) is the extent to which a measurement is repeatable. b) is the extent to which a measurement measures what it claims to. c) can only be high if reliability is also high. d) Both answers B and C.

Answer =

What are examples of counter balancing

Answer =

What is the difference between an extraneous variable and a confounding variable?

Answer =

What is the median for the following data: 3 4 6 7 8 8 a) 8 b) 6.5 c) 6 d) 6 and 7

Answer =

Which distribution has a mean that is larger than the median? a) Distribution A b) Distribution B c) Distribution C d) Distribution D

Answer =

Which of the distributions is positively skewed? a) Distribution A b) Distribution B c) Distribution C d) Distribution D

Answer =

Which of the following does NOT fit the specifications of a within-subject design? a) Research subjects are tested only once. b) Research subjects are given more than one treatment. c) The performance of research subjects is compared across treatments. d) Research subjects complete an experiment multiple times.

Answer =

Which of the following experiments would warrant a within-subject design? a) An experiment where the research subjects' past experiences may influence the results. b) An experiment where many people of comparable backgrounds are participating. c) An experiment where the researcher has a very large pool of research subjects. d) An experiment where a skill that will improve with practice is being tested.

Answer =

Which of the following scenarios is a within-subject design? a) Research subjects are randomly assigned to one of two treatment conditions. b) All of the research subjects are exposed to every treatment condition. c) Research subjects are randomly assigned to an experimental or control group in a double-blind study. d) Two separate groups of research subjects are exposed to two different treatment conditions.

Answer =

You should use the range as a measure of dispersion when a) the variable has a nominal scale. b) the variable has a ratio scale. c) you are presenting the results to people with little to no knowledge of statistics. d) the distribution is skewed.

Answer =

This scenario applies to Questions 1 to 3: A randomised experiment was done by randomly assigning each participant either to walk for half an hour three times a week or to sit quietly reading a book for half an hour three times a week. At the end of a year the change in participants' blood pressure over the year was measured, and the change was compared for the two groups. 1. This is a randomised experiment rather than an observational study because: a) Blood pressure was measured at the beginning and end of the study. ] b) The two groups were compared at the end of the study. c) The participants were randomly assigned to either walk or read, rather than choosing their own activity. d) A random sample of participants was used. 2. The two treatments in this study were: a) Walking for half an hour three times a week and reading a book for half an hour three times a week. b) Having blood pressure measured at the beginning of the study and having blood pressure measured at the end of the study. c) Walking or reading a book for half an hour three times a week and having blood pressure measured. d) Walking or reading a book for half an hour three times a week and doing nothing. 3. If a statistically significant difference in blood pressure change at the end of a year for the two activities was found, then: a) It cannot be concluded that the difference in activity caused a difference in the change in blood pressure because in the course of a year there are lots of possible confounding variables. b) Whether or not the difference was caused by the difference in activity depends on what else the participants did during the year. c) It cannot be concluded that the difference in activity caused a difference in the change in blood pressure because it might be the opposite, that people with high blood pressure were more likely to read a book than to walk. d) It can be concluded that the difference in activity caused a difference in the change in blood pressure because of the way the study was done.

Answer = 1. C 2. A 3. D

What are examples of non experimental designs

Answer = Cross-sectional - Single snapshot of time (one observation) i.e. IQ, Used for different cohorts Longitudinal - Multiple observations over time helps with generational differences, costly, dropouts, repeated measures (cary over effects) practice effects

What are the types of observation

Answer = Naturalistic: Real life (anthropology) Participant observation: (Researcher part of the situation) Contrived: Situation set up. by a experimenter

What are some descriptive research strategies

Answer = Observations Survey Case study - Descriptions

What are the correlation test pros and cons

Answer = Pros: High external validity, no manipulation Cons: Low internal validity (don't know cause and effects) cant say what the IV or DV is. 3rd variable problem, direction of the association is unknown, Spurious (false) association?

Between group designs Pros & Cons

Answer = Pros: No bleed through or carryover affects, no sequence effects cons: Assignment Bias

Within group designs (repeated measures) Pros & Cons

Answer = Pros: Reducing error variance (control for individual differences Cons: Fatigue, cary over affect, get better at test (Counter balancing)

What are the Observation scoring methods

Answer = frequency duration interval

What are some examples when a Quasi experiment

Answer = genders, Smokers v's non smokers, Fathers v's Mothers, coffee drinkers v's non coffee drinkers

Numbers of errors on a test of English comprehension for ten individuals = 5; 10; 2; 0; 8; 12; 7; 6; 0; 9. The most frequent score is: a) 10 b) 2 c) 0 d) 12

Answer = A - 10

The ages of all patients in the isolation ward of the hospital are 38, 26, 13, 41 and 22. What is the population variance? a) 106.8 b) 91.4 c) 240.3 d) 42.4 e) None of the above

Answer = A - 106.8

3 . The following items are from a scale that measures reasons for doings Voluntary Work, for which responses are scored Disagree=1, Not Certain=2, and Agree=3. Higher scores on the voluntary work scale are designed to measure unselfish reasons for carrying out voluntary work. Using this information, what would be the total scores given to someone who agreed with question 1 and disagreed with question 2? 1. I (would) do voluntary work to put something back into society2. My doing voluntary work will (would) look good on my Curriculum Vitae [CV] a) 6 b) 2 c) 4 d) 3

Answer = A - 6 Notes its 6 because its a dichotomous scale so both agree and disagree need to be taken into account

A variable that measures the effect that manipulating another variable has is known as: a) A dependent variable b) A confounding variable c) A predictor variable d) An independent variable

Answer = A - A dependent variable

A researcher tested 40 adults. Each adult had to rate their mood after listening to a tape of people being sick, and then again after a tape of people laughing. Half of the participants were asked to visualise the people that they could hear on the tape, whereas the other half just listened. What experimental design has been used? Answer choices a) A mixed design b) A repeated-measures design c) A matched design d) A between-subjects design

Answer = A - A mixed design

Levels within a variable are best described as: a) A way of distinguishing between the different elements that exist within the measurement of a variable. b) All the variables included in a study c) A number of variables that can be categorised together. d) How researchers can categorise different variables

Answer = A - A way of distinguishing between the different elements that exist within the measurement of a variable.

Which one of the following variables is not categorical? a) Age of a person. b) Gender of a person: male or female. c) Choice on a test item: true or false. d) Marital status of a person (single, married, divorced, other)

Answer = A - Age of a person.

Which of the following is true? a) An experiment can have more than one dependent variable. b) An experiment can only have one dependent variable. c) There must be the same number of independent variables as there are dependent variables.' d) Having more than one dependent variable allows the examination of interactions between them.

Answer = A - An experiment can have more than one dependent variable.

What is meant when the term co-vary is used in research methods? a) As one variable changes, so does the other variable. b) Research results depend on the varying qualities of the researcher's skills. c) If two researchers achieve the same research results, they are said to co-vary. d) None of the above

Answer = A - As one variable changes, so does the other variable.

Quasi-experimental designs have which of the following characteristics? a) Because participants are not randomly allocated to the various conditions, we cannot be certain that our pseudo-manipulation of the independent variable is responsible for any differences between conditions. b) Because participants are randomly allocated it is easy to infer causation from quasi-experimental designs. c) Participants are randomly allocated to the various conditions that make up the independent variable. d) The data can be analysed using Spearman's Rho.

Answer = A - Because participants are not randomly allocated to the various conditions, we cannot be certain that our pseudo-manipulation of the independent variable is responsible for any differences between conditions.

In an experiment, the variable that is being measured is referred to as the.... a) Dependent variable. b) Independent variable c) Dependant variable. d) Measurement variable.

Answer = A - Dependent variable.

How does a field experiment differ from a natural experiment? a) In a field experiment, a variable is manipulated. In a natural experiment, the researcher makes use of pre-existing 'conditions'. b) A field experiment has low ecological validity, but a natural experiment has high ecological validity. ] c) A field experiment is as well controlled as a laboratory experiment, while a natural experiment has less control than a laboratory experiment. d) Because a natural experiment uses pre-existing conditions, there is less room for error than in a field experiment.

Answer = A - In a field experiment, a variable is manipulated. In a natural experiment, the researcher makes use of pre-existing 'conditions'.

Match the type of research design and their description. Quasi Experimental designs a) Investigates groups of individuals and does not use random allocation of participants to conditions. b) Involves random allocation of participants to conditions of the independent variable. c) Examines relationships between variables and cannot infer causation.

Answer = A - Investigates groups of individuals and does not use random allocation of participants to conditions.

Which of the following is correct? a) It is important to distinguish between the major sorts of research design because they use different methods of analysis. b) The within-participants design needs a greater number of participants. c) The between-participants design has more control over confounding variables between conditions. d) Cause and effect is more likely to be implied from the correlational design.

Answer = A - It is important to distinguish between the major sorts of research design because they use different methods of analysis.

A researcher testing the value of money to individuals obtains a probability value of .50. Is it likely that the researcher will conclude the findings are: a) Not significant b) Significant at the .01 criterion level c) Significant at the .05 criterion level d) None of the above

Answer = A - Not significant

The data in the graph is a) Positively skewed b) Normally distributed c) Negatively skewed d) Bimodal

Answer = A - Positively Skewed (To left)

Which of the following is the most accurate? a) Quite substantial deviations from the normal distribution generally make relatively little difference to your statistical analysis. b) If your scores do not follow the normal distribution then your statistical analysis will not be very accurate. c) If you have more than thirty scores then it is essential that they are normally distributed. d) There are many statistical tests which will tell you whether your scores are normally distributed.

Answer = A - Quite substantial deviations from the normal distribution generally make relatively little difference to your statistical analysis.

The "consistency" or "repeatability" of research results over time is best described by the term __________. a) reliability b) dependent variable c) validity d)concept

Answer = A - Reliability

A researcher testing the value of money to individuals obtains a probability value of .03. Is it likely that the researcher will conclude the findings are: [Hint] a) Significant at the .05 criterion level b) Not significant c) Significant at the .01 criterion level d) None of the above

Answer = A - Significant at the .05 criterion level

What is the correct definition of an interaction? a) The effect of one independent variable is different at different levels of another independent variable. b) The effect of the independent variable depends on the level of the dependent variable. c) One independent variable has an effect on another independent variable. d) The combined effect of two or more independent variables on the dependent variable.

Answer = A - The effect of one independent variable is different at different levels of another independent variable.

The reasons why the independent variable might not have caused an effect on the dependent variable are called threats to internal validity. a) True b) False

Answer = A - True

Mortality refers to the fact that participants may drop out of experiments. a) True b) False

Answer = A - True? Notes: is this attrition?

In regard to symbols used in research methods, which of the following represents the independent variable? a) X b) Y c) Z d) none of the above

Answer = A - X

The normal curve is: a) a bell-shaped distribution. b) the standard against which all other distributions are assessed. c) a convenient fiction. d) the typical frequency curve for psychological data.

Answer = A - a bell-shaped distribution.

A researcher studied the effect of defendant physical attractiveness on juror decisions. The attractive person was 20 years old, and the unattractive person was a 45-year old. The problem here is that: a) age is confounded with attractiveness. b) it is very difficult to operationally define physical attractiveness. c)attractiveness is not related to perceptions of guilt.

Answer = A - age is confounded with attractiveness.

A solution typically used for dealing with the effects that can occur in a within-participants design as a result of participants doing the conditions in a particular order is called: a) counterbalancing. b)demand effects. c) spurious effects. d) order effects.

Answer = A - counterbalancing.

The research term indicating that unknown variables may be influencing the relationship between the independent and dependent variable is known as ____________. a) exogenous variables b) outlier variables c) predictor variables d) none of the above

Answer = A - exogenous variables

Which of the following is an appropriate statistical test for a within-subjects experiment with two experimental conditions and a dependent variable that produces score data? a) repeated measures ANOVA b) t-test for independent groups c) chi-square d) ANOVA for independent groups

Answer = A - repeated measures ANOVA

10. You identified the 15 employees in a large organisation who were absent from work the most days during the previous month. You require these employees to attend a one-day program on time and stress management in an attempt to reduce absenteeism. In the following month, all of the employees improved their attendance. The improvement could be caused by the program or it might be due to: a) statistical regression b) mortality c) instrument decay

Answer = A - statistical regression Notes: look at further

If you have nominal data: a) it is impossible to calculate the percentiles. b) it is impossible to calculate the percentage frequencies. c)it is impossible to calculate the frequencies. d) none of these.

Answer = A - t is impossible to calculate the percentiles.

If strong carry-over effects are expected in an experiment, a) the within-subjects design is not recommended. b) the within-subjects design is recommended. c) controls for attrition are crucial. d) the problem statement for that particular study cannot be researched.

Answer = A - the within-subjects design is not recommended.

What are the two types of designs used to introduce the correlation in correlated-groups designs? a) within-subjects designs and matched-subjects designs b) between-subjects designs and matched-subjects designs c) randomised designs and matched-subjects designs d) between-subjects designs and within-subjects designs

Answer = A - within-subjects designs and matched-subjects designs Notes: Not sure if this is important to the 202 course

An advantage of a repeated measures design is that it requires fewer participants. a) True b) False

Answer = A- True

When scores on a variable are demonstrated by a histogram, and the majority of scores are concentrated around high scores on the variable, the best way to describe the distribution of scores is as a: a) Positive Skew b) Bi-nominal Distribution c) Negative Skew d) Normal Distribution

Answer = Answer = C - Negative Skew

For the variable 'Colour of respondents' eyes,' which has the available answers Brown, Blue, Green, and Other, how many levels of this variable would the researcher identify for coding the data a) 6 b) 4 c) 3 d) 7

Answer = B - 4

Numbers of errors on a test of English comprehension for ten individuals = 5; 10; 2; 0; 8; 12; 7; 6; 0; 9. The cumulative percentage frequency of the score 5 is: a) 0% b) 40% c) 10% d) none of these.

Answer = B - 40%

If there are 20 participants in each experimental condition in a 5 x 4 between-groups design, how many participants would be needed in total? a) 200 b) 400 c) 20 d) 40

Answer = B - 400

For a normal curve, the mean, median and mode are: a) the mean is always the largest and the mode always the smallest. b) all equal. c) all zero. d) all different.

Answer = B - All equal

Which of the following is NOT one of the key characteristics of a true experiment? a) Holding everything constant apart from the variable being manipulated. b) All participants experience all experimental conditions. c) The measurement of changes caused by the manipulation of a variable. d)The manipulation of a variable.

Answer = B - All participants experience all experimental conditions.

A predictor variable is another name for: a) A dependent variable b) An independent variable c) A confounding variable d) A discrete variable

Answer = B - An independent variable Notes: is this because you are predicting that it will have an effect on the DV?

A researcher is looking at the effect of drinking alcohol on the ability to play darts. Half of the participants drink a pint of beer, while the other half drink a pint of water. All participants throw three darts at a dartboard and have the score recorded. How is this experiment best summarised? Within Groups design: IV three dart score, DV before/after training Between Groups design: IV three dart score, DV before/after training Within Groups design: IV before/after training, DV three dart score Between Groups design: IV before/after training, DV three dart score a) Between-groups design. Independent variable is the three dart score. Dependent variable is the amount of alcohol drunk. b) Between-groups design. Independent variable is the amount of alcohol drunk. Dependent variable is the three dart score. c) Within-groups design. Independent variable is the three dart score. Dependent variable is the amount of alcohol drunk. d) Within-groups design. Independent variable is the amount of alcohol drunk. Dependent variable is the three dart score.

Answer = B - Between-groups design. Independent variable is the amount of alcohol drunk. Dependent variable is the three dart score.

What kind of variable is IQ, measured by a standard IQ test? a) Categorical b) Continuous c) Discrete d) Nominal

Answer = B - Continuous Notes: is it a ratio because there is a zero point?

Which of these statements relating to the experimental design or true experiment is false? a) Random allocation of participants to conditions is a major feature of experiments. b) Experiments cannot reveal causal relationships as well as other research designs. c) Experiments involve the manipulation of one variable systemically to see what effect it has on other variables. d) The laboratory experiment is more artificial compared to more naturalistic research settings.

Answer = B - Experiments cannot reveal causal relationships as well as other research designs.

The use of existing natural groups of participants usually results in equivalent groups for the experiment. a) True b) False

Answer = B - False Notes: There is no random assignment therefore the results cant be generalised to other settings

Practice and fatigue effects are both problems with independent groups designs. a) True b) False

Answer = B - False Notes: They are only exposed to one condition therefore less attrition and fatigue

A researcher wants to visually present some data that is best described as 'categorical'. Which chart should the researcher use? [Hint] a) Categorical Chart b) Histogram c) Bar-chart d) Bi-nominal Chart

Answer = B - Histogram

Match the type of research design and their description. Experimental designs a) Investigates groups of individuals and does not use random allocation of participants to conditions. b) Involves random allocation of participants to conditions of the independent variable. c) Examines relationships between variables and cannot infer causation.

Answer = B - Involves random allocation of participants to conditions of the independent variable.

Which of the following is NOT a strength of experimental studies? a) Establishment of causal links between variables. b) Narrow definition of concepts. c) Control of variables. d) Replicability

Answer = B - Narrow definition of concepts.

The data in the graph is a) Positively skewed b) Normally distributed c) Negatively skewed d) Bimodal

Answer = B - Normally distributed

A bank identifies the current rating of a loan with regard to the likelihood of repayment. Classifications are either: Good standing--all payments on schedule Problem loan--several instalments not made Unsatisfactory--a loan written off as not collectible This is an example of which level of measurement? a) Nominal b) Ordinal c) Sample d) Discrete e) Continuous

Answer = B - Ordinal

What are field experiments and natural experiments collectively known as? a) Pseudo-experiments b) Quasi-experiments c) Qualitative studies d) False experiments

Answer = B - Quasi-experiments

All things being equal, which design is more likely to result in a statistically significant effect? a) Independent groups b) Repeated measures

Answer = B - Repeated measures

Which of the following is not a measure of central tendency a) Mean b) Standard Deviation c) Mode d) Median

Answer = B - Standard Deviation

A parameter is: a) a sample characteristic b) a population characteristic c) unknown d) normal normally distributed

Answer = B - a population characteristic

A magazine printed a survey in its monthly issue and asked readers to fill it out and send it in. Over 1000 readers did so. This type of sample is called a) a cluster sample. b) a self-selected sample. c) a stratified sample. d) a simple random sample.

Answer = B - a self-selected sample.

Randomly assigning treatment to experimental units allows: a) population inference ' b) causal inference c) both types of inference d) neither type of inference

Answer = B - causal inference

Complete the following sentence: Sometimes the difference a researcher has observed in a dependent variable as a result of manipulating the independent variable may not be due to the manipulation but due to: a) categorical variables. b) confounding variables. c) spurious variables. d) dichotomous variables.

Answer = B - confounding variables

Which of the following is a major control for sequence effects? a) random assignment of participants b) counterbalancing c) holding the variable constant d) including the factor as a research variable

Answer = B - counterbalancing

Single-subject experiments a) are the same as case studies. b) differ from case studies because independent variables are manipulated. c) cannot support causal inferences. d) have extremely high external validity.

Answer = B - differ from case studies because independent variables are manipulated.

A goal of some researchers is to be able to generalise their findings to the larger population. This is referred to as __________. a) internal validity b) external validity c) operationalisation d) Cronbach's alpha

Answer = B - external validity Notes: Therefore generalisability to other settings

Which of the following is NOT used in correlated-groups designs? a) matching of participants b) free random assignment to conditions c) within-subjects procedures d) careful measurement of the dependent variable

Answer = B - free random assignment to conditions

Complete counterbalancing means that: a) there were no practice effects b) all possible orders of the IV were used c) all Latin squares were constructed

Answer = B -all possible orders of the IV were used

In regard to symbols used in research methods, which of the following represents the dependent variable? a) X b) Y c) Z d) none of the above

Answer = B = Y

If in SPSSFW there is a grouping variable in the data screen this tells you that it is a: a) correlational design. b) between-participants design. c) repeated measures design. d) within-participants design.

Answer = B between-participants design. Revisit in week 4

The following items are from a scale that measures reasons for doing Voluntary Work, for which responses are scored Disagree=1, Not Certain=2, and Agree=3. Higher scores on the voluntary work scale are designed to measure unselfish reasons for carrying out voluntary work. Using this information, which scores given to responses of two of the following four items, need to be recoded to ensure totals for scores on the scale are accurate? 1. I (would) do voluntary work to put something back into society 2. My doing voluntary work will (would) look good on my Curriculum Vitae [CV] 3. I (would) do voluntary work because it is good to be able to help the less fortunate 4. I (would) do voluntary work because it impresses my friends and family a) 1 and 3 b) 3 and 4 c) 1 and 2 d) 2 and 4

Answer = C - 1 and 2 Notes: Why not 3&4 as an answer?

Numbers of errors on a test of English comprehension for ten individuals = 5; 10; 2; 0; 8; 12; 7; 6; 0; 9. The percentage frequency of the score 0 is: a) 2 b) 0 c) 20% d) none of these.

Answer = C - 20%

How many independent variables are there in a 2 x 2 x 2 design? a) 8 b) None of the other answers is correct. c) 3 d) 6

Answer = C - 3

Numbers of errors on a test of English comprehension for ten individuals = 5; 10; 2; 0; 8; 12; 7; 6; 0; 9. What percentage of the scores fall at or below the 30th percentile? a) 95% b) 3% c) 30% d) none of these.

Answer = C - 30%

If there is a probability of 5% in how many cases would a result arise solely due to chance? a) 0.0005 b) 50/50 c) 5/100 d) 1/50 e) None of these f) 10/100

Answer = C - 5/100

Which of the following best describes a confounding variable? a) A variable that is manipulated by the experimenter. b) A variable that has been measured using an unreliable scale. c) A variable that affects the outcome being measured as well as, or instead of, the independent variable.' d) A variable that is made up only of categories.

Answer = C - A variable that affects the outcome being measured as well as, or instead of, the independent variable.'

A variable that changes in a systematic way with the independent variable and may also affect the dependent variable is known as a.... a) Error variable b) Problem variable c) Confounding variable d) Intruder variable

Answer = C - Confounding variable

Which of the following is not a category that is used to distinguish between different types of variables. a) Continuous b) Categorical c) Continuum d) Discrete

Answer = C - Continuum

____________ allow a researcher to examine the degree and direction of the relationship between two characteristics or variables. a) Confounding variables b) Experimental designs c) Correlational designs d) Quasi-experimental designs

Answer = C - Correlational designs

Which of the following is true for within-subjects designs? a) There must be at least three conditions manipulated. b) participants are all randomly assigned to conditions. c) Each participant serves as his or her own control. d) They require more participants than between-subjects designs do.

Answer = C - Each participant serves as his or her own control.

"Professor Logan reported, on his study on student behaviour to a group of university academics that students are more likely to pass exams if they revise for them". From the extract above, the variables being identified are: a) Student behaviour and whether student revise for exams or not. b) Student behaviour and exam success. c) Exam success and whether students revise for exams or not. d) Student behaviour and academic judgement.

Answer = C - Exam success and whether students revise for exams or not.

Match the type of research design and their description. Correlational designs a) Investigates groups of individuals and does not use random allocation of participants to conditions. b) Involves random allocation of participants to conditions of the independent variable. c) Examines relationships between variables and cannot infer causation.

Answer = C - Examines relationships between variables and cannot infer causation.

Which of the following is true? a) Percentiles are more accurate than cumulative frequencies. b) If a distribution is skewed, it is not possible to calculate percentiles. c) If a distribution is normally distributed, the 50th percentile is the same as the mean, median and mode. d) None of these.

Answer = C - If a distribution is normally distributed, the 50th percentile is the same as the mean, median and mode.

Which of the following is true about a quasi-experimental design? a) It enables researchers to make claims of causality to the same extent as full experimental designs. b) It is distinct from correlational designs because the independent variable is always manipulated. c) It does not contain random assignment of subjects to conditions. d) It is considered to be in the same category as observational methods and case study methods since it does not fit into other experimental design category.

Answer = C - It does not contain random assignment of subjects to conditions.

The effect that an individual independent variable has on the dependent variable is called what? a) Simple effect. b) Interaction c) Main effect d) Significant effect

Answer = C - Main effect

Do you believe the Government should introduce a carbon tax? Yes No The data produced is an example of: a) quantitative data b) time series data c) nominal data d) ordinal data e) continuous data

Answer = C - Nominal data

If an index of kurtosis is -2.89, then: a) there is a mistake in the calculation. b) the curve is relatively steep. c) the curve is relatively flat. d) none of these is true.

Answer = C - The curve is relatively flat.

A study is carried out to examine whether senior consultants have more positive coping skills than junior consultants. Which following statement is true of this study? a) The independent variable is coping and the dependent variable is seniority. b) Both variables are independent as the researcher cannot manipulate them. c) The independent variable is seniority and the dependant variable is coping skills. d) The study is a correlational design.

Answer = C - The independent variable is seniority and the dependant variable is coping skills.

A study is carried out to compare offenders with non-offenders on their levels of coping. Which following statement is true of this study? a) The independent variable is coping and the dependent variable is type of person. b) There are two independent variables; offender and non-offender, and one dependant variable, which is level of coping. c) The independent variable is type of person and the dependant variable is their level of coping. d) Both variables are dependant as the researcher cannot manipulate them.

Answer = C - The independent variable is type of person and the dependant variable is their level of coping.

You have found that men who went into care at a young age commit more crimes. Which of the following could you conclude? a) There is a causal relationship between being in care and committing crime. b) Men who go into care develop criminal attitudes. c) There is not necessarily a causal relationship between going into care and the amount of crimes committed. d) Going into care is the cause of crime.

Answer = C - There is not necessarily a causal relationship between going into care and the amount of crimes committed. Notes: Why is this?

Field and Lawson (2003) reported the effects of giving children aged 7-9 years positive, negative or no information about novel animals (Australian marsupials). This variable was called 'Infotype'. Each child received all three types of information about different animals. The gender of the child was also examined. The outcome was the time taken for the children to put their hand in a box in which they believed either the positive, negative, or no information animal was housed. Which of the following is the most appropriate method to analyse their data? Answer choices a) One-way independent ANOVA b) One-way repeated-measures ANOVA c) Two-way mixed ANOVA d) Two-way independent ANOVA

Answer = C - Two-way mixed ANOVA

In which design are all participants exposed to all experimental conditions? a) randomised, post test-only b) randomised, pretest-posttest c) within-subjects d) between-subjects

Answer = C - Within subjects design

Julie's study investigates the effect of gender on snack choice in college students. Subjects are individually brought into a lab at the same time each day and presented with eight snack options commonly found in grocery stores. Julie records which snack each subject eats, if any. Her study has: a) a full experiment design. b) a correlational design. c) a quasi-experimental design. d) a single-case experimental design.

Answer = C - a quasi-experimental design. No random selection

Matching participants a) becomes easier as the number of matching variables increases. b) increases error variance. c) is best when there are a small number of matching variables. d) requires equal numbers of males and females.

Answer = C - is best when there are a small number of matching variables.

Which of the following is NOT a true statement about an experiment with a between-groups design? a) It needs greater care to ensure that groups are equivalent than a within-groups design. b) It reduces demand characteristics compared to a within-groups design. c) It has greater ecological validity than a within-groups experiment. d) It requires more participants than an equivalent within-groups design.

Answer = C- It has greater ecological validity than a within-groups experiment.

The standard deviation of the numbers 2, 4, 6 is: [Hint] a) 4 b) 8 c) 2.66 d) 1.63

Answer = D - 1.63

A researcher tested 40 children aged 6 years. Each child engaged in a task where they had to use two dolls (one representing themselves and one representing a teacher) and they had to enact a time when their teacher had been angry with them. All children were videotaped and 20 children were told that their teacher would see the tape and 20 were not. What experimental design has been used? Answer choices a) A repeated-measures design b) A matched design c) A mixed design d) A between-subjects design

Answer = D - A between-subjects design

Reversal designs are also called a) pre-post designs. b) randomised time-series designs. c) multiple baseline designs. d) ABA designs.

Answer = D - ABA designs.

Within statistics, a variable can be described as: a) A concept that helps researchers to conceptualise and plan their research b) Different types of objects, events, feelings, attitudes............ c) The measurement of anything that varies d) All of the above

Answer = D - All of the above

Which of the following is not a category that is used to distinguish between different types of variables. a) Nominal b) Interval c) Ordinal d) Bi-nominal

Answer = D - Bi-nominal

If neither the experimenter nor the participant knows which experimental condition the participant has been assigned to, this is known as... a) Single-blind b) Standardisation c) Experimental conditions d) Double-blind e) Demand characteristics.

Answer = D - Double-blind

What sort of variable is manipulated by the researcher? a) Co-dependent. b) Dependent. c) Independent. d) All variables are manipulated by the researcher.

Answer = D - Independent.

What is the major strength of the within-subjects design? a) More participants can be used in a single study. b) Interactive effects can be identified. c) Carry-over effects are eliminated. d) It guarantees that the participants in the various conditions are equivalent at the start of the study.

Answer = D - It guarantees that the participants in the various conditions are equivalent at the start of the study.

The discrepancy between the numbers used to represent something that we are trying to measure and the actual value of what we are measuring is called: a) Variance b) The 'fit' of the model C) Reliability D) Measurement error

Answer = D - Measurement error Notes: Investigate more

What is the correct term for an experiment that has at least one independent variable that is manipulated between-groups, and at least one independent variable manipulated within groups? a) Within-groups design b) Matched-pairs design c) Between-groups design D) Mixed design

Answer = D - Mixed design

Which of the following is true? a) With a normally distributed set of scores, the value of the percentiles equals the frequencies. b) Percentiles are not possible with bimodal distributions. c) The standard deviation of cumulative frequencies ranges from +1 to -1. d) None of these are true.

Answer = D - None of these are true

"Government researchers yesterday identified a growing trend of the relationship between spending and the amount of debt among UK residents" From the extract above, the variables being identified are: a) Debt and trends in the UK economy. b) trends in the UK economy and where respondents reside. c)Spending and where respondents reside. d) Spending and Debt.

Answer = D - Spending and Debt.

Which one of these problems associated with the within-participants design is true? a) It is a less sensitive design as you cannot control for participants individual differences. b) You are not able to use counterbalancing. c) You can use them in many quasi-experimental designs. d) There can be effects of a participant serving in more than one condition of the study.

Answer = D - There can be effects of a participant serving in more than one condition of the study.

It is possible to calculate the skewness of a set of numerical scores? a) This is possible in some circumstances. b) False. c) This is never possible because you need nominal data. d) d) True.

Answer = D - True

When several questions are used to measure a variable (social concept) this is referred to as which of the following? a) a scale b) a composite c) an index d) all of the above

Answer = D - all of the above

Complete the following statement. Between-participants designs should be considered when: a) the independent variable doesn't lend itself well to repeated measure, e.g. gender. b) order effects are likely. c) participants might be affected by demand effects. d) all of the above.

Answer = D - all of the above.

Variables are: a) the main focus of research in science. b) something that can vary in terms of precision. c) something that we can measure. d) all of the above.

Answer = D - all of the above.

When research subjects are given slightly different questions or measures of the same concept to help increase the reliability of the research results, this is referred to as ___________________. a) intraobserver reliability b) interitem reliability c) intercoder reliability d) alternate forms reliability

Answer = D - alternate forms reliability

Why is a final A to B (i.e., back to the treatment) reversal sometimes carried out in reversal designs? a) to demonstrate control b) to reduce the number of participants needed c) for ethical reasons d) both a and c

Answer = D - both a and c

A flat curve is technically: a) a platypoid. b) leptokurtic. c) a platypus. d) none of these.

Answer = D - none of these. Notes: Does it have a name?

A dependent variable refers to: a) the variable being manipulated or varied in some way by the researcher. b) the experimental condition. c) a variable with a single value which remains constant in a particular context. d) the variable which shows us the effect of the manipulation.

Answer = D - the variable which shows us the effect of the manipulation.

Which one of these statistics is unaffected by outliers? a) Mean b) Interquartile range c) Standard deviation d) Range

Answer = Interquartile range

Which of the following constitute discrete variables? a) Number of reported crimes in one week. b) A student's top typing speed. c) Favourite animal. d) Type of offender, e.g. rapist, burglar, thief.

Answer = Number of reported crimes in one week. Notes: Descrete variables take on set numbers i.e. 1,2,3,4 etc

What name is given to data which is on a continuous scale with a neutral zero? a) Ratio data b) Skewed data c) Interval data d) Ordinal data e) Categorical data f) Ranked data

Answer = Ratio data

Which of the following statements is false? Using a within-participants design means that: a) different people are tested in each condition of the IV. b) the same people can be measured twice on the dependent variable. c) it provides for a more sensitive test of the differences between conditions because it controls for differences between individuals. d) you get participants to complete all the various experimental conditions but in different orders.

Answer = different people are tested in each condition of the IV. Notes: A within-participants design means that the same people can be measured twice on the dependent variable. You have participants complete all the various experimental conditions but in different orders known as counterbalancing to reduce order effects. This also provides for a more sensitive test of the differences between conditions because it controls for differences between individuals.

The same people participate in each condition of an experiment. What type of design is this? a) repeated measures (within-subjects) b) independent groups (between-subjects) c) matched pairs

Answer = repeated measures (within-subjects)

Numbers of errors on a test of English comprehension for ten individuals = 5; 10; 2; 0; 8; 12; 7; 6; 0; 9. The cumulative frequency of the score 8 is: a) 7 b) 8 c) 1 d) none of these.

Answer =A - 7

A researcher tested 40 adults. Each adult had to rate their mood after listening to a tape of people being sick, and then again after a tape of people laughing. What experimental design has been used? Answer choices a) A matched design b) A repeated-measures design c) A mixed design d) A between-subjects design

Answer+ B - A repeated-measures design

Correct answers are in bold italics.. This scenario applies to Questions 1 and 2: A study was done to compare the lung capacity of coal miners to the lung capacity of farm workers. The researcher studied 200 workers of each type. Other factors that might affect lung capacity are smoking habits and exercise habits. The smoking habits of the two worker types are similar, but the coal miners generally exercise less than the farm workers. 1. Which of the following is the explanatory variable in this study? a. Exercise b. Lung capacity c. Smoking or not d. Occupation 2. Which of the following is a confounding variable in this study? a. Exercise b. Lung capacity c. Smoking or not d. Occupation

Answers = 1. D - Occupation 2. A - Exercise

When scores on a variable are demonstrated by a histogram, and the majority of scores are concentrated around low scores on the variable, the best way to describe the distribution of scores is as a: a) Normal Distribution b) Negative Skew c) Bi-nominal Distribution d) Positive Skew

Answers = D - Positive Skew

APA formatting changes for SPSS

Any time (one sample, independent or correlation) that you are using this you report the significance value (making sure you divide by 2 if it is a 1 tailed) instead of p≤ or >.

Use the Empirical Rule. The mean speed of a sample of vehicles along a stretch of highway is 63 miles per​ hour, with a standard deviation of 4 miles per hour. Estimate the percent of vehicles whose speeds are between 59 miles per hour and 67 miles per hour.​ (Assume the data set has a​ bell-shaped distribution.)

Approximately 68​% of vehicles travel between 59 miles per hour and 67 miles per hour.

measured score=true score+error

As long as the error is small, the scores will be consistent from one measurement to the next and so can be thought of as reliable; When there is a large error component, there are huge differences from one measurement to the next, and so is not reliable

Steps for determining level of measurement

Ask if you can: 1) Put the data into​ categories? 2) Can the data be arranged in​ order? 3) Can one label value be subtracted from​ another? 4) Can one label value be considered a multiple of​ another?

Caution w/ scatterplots

Association does not imply causation because there may be other variables lurking in the background that contribute to the relationship between two variables

Facts about correlation: #2

Because r uses the standardized values of the observations, r does not change when we change the units of measurement of x, y, or both. Transformations do not affect r.

2 outcomes (success/failure) set # of observations each trial independent of the other P(sucess) remains the same for each trial

Binomial Probability Distribution B N I S

# of outcomes (success/failure)

Binomial Probability Distribution - What does the 2 represent in the following? 2 N I S

each observation is independent of the other

Binomial Probability Distribution - What does the I represent in the following? 2 N I S

the probability of each success remains constant

Binomial Probability Distribution - What does the S represent in the following? 2 N I S

What is a similarity between the Empirical Rule and​ Chebychev's Theorem?

Both estimate proportions of the data contained within k standard deviations of the mean. Bo

In terms of displaying​ data, how is a​ stem-and-leaf plot similar to a dot​ plot?

Both plots can be used to determine specific data entries. Both plots can be used to identify unusual data values. Both plots show how data are distributed.

For which of the three F-tests in a two-way ANOVA do you collapse across the levels of the other factor(s) in computing the means?

Both the A main effect and the B main effect

After a hurricane​, a disaster area is divided into 200 equal grids. Forty of the grids are​ selected, and every occupied household in the grid is interviewed to help focus relief efforts on what residents require the most. What type of sampling is used?

Cluster sampling is​ used, since the disaster area is divided into​ grids, and some of those grids are selected and everyone in those grids is interviewed.

Statistics

Collection of methods for planning experiments, obtaining data, organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on data.

Facts about correlation: #1

Correlation makes no distinction between explanatory and response variables. It makes no difference which variable you call x and which you call y in calculating correlation (you can multiply in any order)

How do you determine how well a t-distribution approximates a normal distribution?

Determined by df.

Residual

Difference between an observed value of the response variable and the value predicted by the regression line. Residual = observed y - predicted y = y - yhat *Represents the leftover variation in the response variable after fitting the regression line

Negatively skewed

Distribution of scores on the left hand side of a data set

Systematic

Each member of a population is assigned a number. The members of the population are ordered in some way, a starting number randomly selected, and then sample members are selected at regular intervals from starting number.

99.7

Empirical Rule: The distribution is roughly bell shaped. Approx ___% of the data lie within 3 standard deviations

A correlation coefficient r was calculated to be 0.830 the coefficient of nondetermination would be 0.170

False

A population is the collection of some​ outcomes, responses,​ measurements, or counts that are of interest.

False. A population is the collection of all​ outcomes, responses,​ measurements, or counts that are of interest.

In a frequency​ distribution, the class width is the distance between the lower and upper limits of a class. T/F

False. In a frequency​ distribution, the class width is the distance between the lower or upper limits of consecutive classes.

T/F In a frequency​ distribution, the class width is the distance between the lower and upper limits of a class.

False. In a frequency​ distribution, the class width is the distance between the lower or upper limits of consecutive classes.

More types of calculations can be performed with data at the nominal level than with data at the interval level

False. More types of calculations can be performed with data at the interval level than with data at the nominal level.

The method for selecting a stratified sample is to order a population in some way and then select members of the population at regular intervals.

False. The method for selecting a systematic sample is to order a population in some way and then select members of the population at regular intervals.

Using a systematic sample guarantees that members of each group within a population will be sampled.

False. Using a stratified sample guarantees that members of each group within a population will be sampled

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. Using a systematic sample guarantees that members of each group within a population will be sampled.

False. Using a stratified sample guarantees that members of each group within a population will be sampled.

The accompanying table shows the results of a survey in which 250 male and 250 female workers ages 25 to 64 were asked if they contribute to a retirement savings plan at work. Complete parts​ (a) and​ (b) below.

Find the probability that a randomly selected worker contributes to a retirement savings plan at​ work, given that the worker is male: .48 The probability that a randomly selected worker is​ female, given that the worker contributes to a retirement savings plan at​ work: .544

The table below shows the results of a survey that asked 1052 adults from a certain country if they favored or opposed a tax to fund education. A person is selected at random. Complete parts​ (a) through​ (c).

Find the probability that the person opposed the tax or is female: .826 REMEMBER TO SUBTRACT THE FEMALES WHO OPOSED THE BILL Find the probability that the person supports the tax or is male: .692 Find the probability that the person is not unsure or is female: .986

Output from SPSS independent sample t test

First box is descriptive statistics, second box is independent samples test; 2 rows in the second box and in order to accurately use the printout you have to decide which row to look in

The estimated standard error of the differences between means (SM1-M2)

For the independent measures t formula, the standard error measures the amount of error that is expected when you use the sample mean difference (M1-M2) to represent a population mean difference (μ1 - μ2). Measures the difference that is expected typically between two sample means. Measures how accurately the difference between sample means represents the difference between two population means.

Outline for independent sample t-test APA formatting

Group 1 Independent variable Dependent variable (M=, SD=) WAS OR WAS NOT significantly different/higher/lower than group 2 Independent variable dependent variable (M=, SD=), t(df)=, p> or ≤ (.05 or .01), two or one tailed, d= or r2=

Which type of alternative hypothesis is used in the figure below

H1: μ > k

Flags Pearson r SPSS hypothesis test

If SPSS is set to flag significance, then stars will come up next to the significant values for significant correlations. If the correlation is significant at the .05 level, then there will be one * next to the significant value, and if it is significant at the .01 level (and thus also at the .05 level) there will be two **.. MIGHT NOT BE FLAGGED ON THE EXAM

A​ stem-and-leaf plot for the number of touchdowns scored by all Division 1A football teams is shown below. Complete parts​ (a) through​ (c).

If a team is selected at​ random, find the probability the team scored at least 33 touchdowns: .619 If a team is selected at​ random, find the probability the team scored between 40 and 49 touchdowns inclusive: .254 If a team is selected at​ random, find the probability the team scored more than 79 touchdowns: Are any of these events​ unusual: Scoring more than 79 touchdowns is unusual.

certainty

If an event is a ___, the probability of the event is 1.

Two tailed one sample t- test sig value

If it is less than or equal to alpha then you have significance. If greater than your alpha you do not have significance

Guessed Value for p-hat

If no guess is available, use the "worst case value" of p-hat = 0.05

Graph sig vs critical value

If on a curve, and the t that fell exactly at the critical value, then, using a .05 level you would say that the exact probability of that happening is 5%. If it moves farther and farther into the tail, the critical value is more extreme and the probability of that happening is decreasing, thus decreasing your significance value.

After constructing a relative frequency distribution summarizing IQ scores of college​ students, what should be the sum of the relative​ frequencies?

If percentages are​ used, the sum should be​ 100%. If proportions are​ used, the sum should be 1.

linear

If sample data are taken from a population that is normally distributed, a normal probability plot will be approximately ___

mean

If the data is normal, which central tendency best describes the data?

Why do you always use the smaller df value if the exact df is not found

If you use more degrees of freedom than you actually have, it makes it easier to claim significance and therefore the statistician is accused of lying.

Percentile explaination

If your 3-month-old daughter is in the 40th percentile for weight, that means 40 percent of 3-month-old girls weigh the same as or less than your baby, and 60 percent weigh more. The higher the percentile number, the bigger your baby is compared to other babies her same age.

Classes

In summarizing quantitative data, we first determine whether the data are discrete or continuous. If the data are discrete with relatively few different values of the variable, then the categories of data (called classes) will be the observations (as in qualitative data). If the data are discrete, but with many different values of the variable or if the data are continuous, then the categories of data (the classes) must be created using intervals of numbers. We will first present the techniques for organizing discrete quantitative data when there are relatively few different values and then proceed to organizing continuous quantitative data. Note

Between subjects design

In the design two or more groups are being tested by different independent variables

Explanatory variable

In the study in Example 2, the researchers obtained 480 rats and divided the rats into three groups. Each group was intentionally exposed to various levels of radiation. The researchers then compared the number of rats that had brain tumors. Clearly, there was an attempt to influence the individuals in this study because the value of the explanatory variable (exposure to radio frequency) was influenced. Because the researchers controlled the value of the explanatory variable, we call the study Note

Assumptions and conditions of using the t-interval for a mean:

Independence Assumption: - Individuals selected independently of one another - Randomization condition: SRS from the population - Sample size n should be no more than 10% of the population Normal Population Assumption - Nearly normal condition - the distribution is nearly normal and symmetric

For Joan's doctoral dissertation experiment, 15 wheelchair users were randomly assigned to three groups with 5 in each group. These participants navigated in virtual-reality settings. Group 1 participants were in the virtual-reality setting (a building) as wheelchair users. Group 2 participants were in the virtual-reality setting in a wheelchair pushed by a walking person. Group 3 participants walked without aid in the virtual-reality setting. Joan measured the time each participant needed to complete the navigation of the virtual-reality setting. What are the dependent and independent variables?

Independent = virtual-reality condition; dependent = time needed

In​ 1965, researchers used random digit dialing to call 1200 people and ask what obstacles kept them from voting. What potential sources of bias were​ present, if​ any? Select all that apply.

Individuals may have refused to participate in the sample. This may have made the sample less representative of the population. Individuals may have not been available when the researchers were calling. Those individuals that were available may have not been representative of the population. Telephone sampling only includes people who had telephones. People who owned telephones may have been older or wealthier on​ average, and may not have been representative of the entire population.

Tautology

Is an expression or phrase that says something twice, just in two different ways

A state lottery randomly chooses 8 balls numbered from 1 through 36 without replacement. You choose 8 numbers and purchase a lottery ticket. The random variable represents the number of matches on your ticket to the numbers drawn in the lottery. Determine whether this experiment is binomial. If​ so, identify a​ success, specify the values​ n, p, and q and list the possible values of the random variable x.

Is the experiment​ binomial? ​No, because the probability of success is different for each trial.

What are some benefits of using graphs of frequency​ distributions?

It can be easier to identify patterns of a data set by looking at a graph of the frequency distribution

What are some benefits of using graphs of frequency​ distributions?

It can be easier to identify patterns of a data set by looking at a graph of the frequency distribution.

Which of the following is not true of the analysis of variance

It has a higher rate of Type I error than the two-sample t-tests

Correlation vs. association

It only makes sense to talk about correlation between two quantitative variables. If one or both variables are categorical, you should refer to the association b/w them. To be safe, use "association" when describing relationship b/w 2 variables.

How to examine scatterplots

Look for the overall pattern and striking departures from that pattern. 1.) To describe the OVERALL PATTERN of a scatterplot, discuss the direction/trend, the form/shape, clusters and the strength of the relationship 2.) To describe DEPARTURES from the OVERALL PATTERN discuss outliers (an individual that falls outside the overall pattern of the relationship)

mean formula

M = mean ∑ = sum (add up all ofthe scores following this symbol) X = scores in the distribution of the variable X N = number of scores in the distribution

A nonparametric procedure that corresponds to the independent samples t-test is the

Mann-Whitney U

Explanatory variable

May help explain or influence changes in a response variable

Previously, we've referred to the result of dividing a sum of squares by degrees of freedom as variance. In an ANOVA, this is referred to as which of the following terms?

Mean square

One tailed one sample t-test sig value

Must divide in half before you compare it to your alpha (divide this, never t in half). If this divided value is equal to or less than your alpha then you have significance. If this divided value is greater than your alpha then you do not have significance.

In a normal​ distribution, which is​ greater, the mean or the​ median? Explain.

Neither; in a normal​ distribution, the mean and median are equal

Are the following statements Ho:=9 and H1: ≠9 valid null and alternative hypothesis Ho:λ=9 and H1:λ<9 a valid pair of null and alternative hypothesis

No, there are no parameters contained in these statements

Are most correlations perfect

No. there may be some tendency for the value of Y to increase whenever X increases, but the amount that Y changes is not always the same and occasionally Y decreases when X increases.

mean

Normal density curve is symmetric about its ___

independent event (probability)

Occurrence of one event does not affect subsequent events P(B|A) = P(B) or P(A|B)=P(A).

Why do you use a t statistic instead of a z-score?

Oftentimes you do not have the population mean and SD. If you have the population mean but not the population SD then you use this test which allows you to estimate the population SD

Which of the following statements about the correlation coefficient is true?

One should not accept that a correlation coefficient represents a relationship unless it is significant

uniform distribution

One way that a variable is described is through the shape of its distribution. Distribution shapes are typically classified as symmetric, skewed left, or skewed right. Figure 15 on the following page displays various histograms and the shape of the distribution. Figures 15(a) and (b) show symmetric distributions. They are symmetric because, if we split the histogram down the middle, the right and left sides are mirror images. Figure 15(a) is a uniform distribution because the frequency of each value of the variable is evenly spread out across the values of the variable.

The top five books on the best seller list last year are shown below. 1. The Racketeer 2. Gone Girl 3. Spring Fever 4. Threat Vector 5. Private London Identify the level of measurement of the data set. Explain your reasoning

Ordinal. The data can be arranged in order comma but the differences between data entries are not meaningful.

Lower and upper class limits and class width

Organize Continuous Data in tables Classes are categories into which data are grouped. When a data set consists of a large number of different discrete data values or when a data set consists of continuous data, we create classes by using intervals of numbers. Table 10 is a typical frequency distribution created from continuous data. The data represent the number of U.S. residents, ages 25-74, who had earned a bachelor's degree or higher in 2013. Notice that the data are categorized, or grouped, by intervals of numbers. Each interval represents a class. For example, the first class is 25- to 34-year-old U.S. residents who had a bachelor's degree or higher. We read this interval as follows: "The number of U.S. residents, ages 25-34, with a bachelor's degree or higher was 14,481,000 in 2013." There are five classes in the table, each with a lower class limit (the smallest value within the class) and an upper class limit (the largest value within the class). The lower class limit for the first class in Table 10 is 25; the upper class limit is 34. The class width is the difference between consecutive lower class limits. In Table 10 the class width is 35-25=10. The data in Table 10 are continuous. So the class 25-34 actually represents 25-34.999 . . . , or 25 up to every value less than 35. Notice that the classes in Table 10 do not overlap. This is necessary to avoid confusion as to which class a data value belongs. Notice also that the class widths are equal for all classes. One exception to the requirement of equal class widths occurs in open-ended tables. A table is open ended if the first class has no lower class limit or the last class has no upper class limit. The data in Table 11 represent the number of births to unmarried mothers in 2012 in the United States. The last class in the table, "40 and over," is open-ende Note

What are some benefits of representing data sets using frequency​ distributions? What are some benefits of using graphs of frequency​ distributions?

Organizing the data into a frequency distribution can make patterns within the data more evident.

Problem with judging linear relationships

Our eyes are not a good judge of strength of a linear relationship. It is easy to be fooled by different scales are the amount of space around the cloud of points. We need to use a numerical measure to supplement the graph. Correlation is the measure we use.

Which of the following is the desired outcome of an ANOVA?

Our group means are significantly different

compliment rule

P(A^c) = 1 - P(A)

Correlation represents z-score comparisions

Pearson Correlation measures the relationship between an individual's location on the X distribution and location on the Y distribution. Z-scores identify the exact location of each individual score within the distribution. each X value can be transformed into a z score, zX using mean and standard deviation of set of Xs and each Y score can be transformed into zY.

Decide if the situation involves​ permutations, combinations, or neither. Explain your reasoning. The number of ways 12 people can line up in a row for concert tickets.

Permutations. The order of the 12 people in line matters.

Which kind of estimation is performed when we claim that a population mean is equal to the sample mean

Point estimation

Determine whether the data set is a population or a sample. Explain your reasoning. The salary of each baseball player in a league.

Population, because it is a collection of salaries for all baseball players in the league.

Determine whether the data set is a population or a sample. Explain your reasoning. The number of floors in each home in a town.

Population, because it is a collection of the number of floors for all homes in the town.

Positive association

Positive association, negative association Two variables have a positive association when above-average values of one tend to accompany above-average values of the other, and when below-average values also tend to occur together.

Another name for residual

Prediction error

Continous Probability Distribution

Probability distribution for continuous random variables on an infinite number line.

The responses of 1405 voters to a survey about the way the media conducted themselves in a recent political campaign are shown in the accompanying Pareto chart. Find the probability of each event listed in parts​ (a) through​ (d) below.

Randomly selecting a person from the sample who did not give the media an A or a B: .759 Randomly selecting a person from the sample who gave the media a grade better than a D: .421 Randomly selecting a person from the sample who gave the media a D or an F: .579 Randomly selecting a person from the sample who gave the media a C or a D: .374

If the test value for the difference between the means of two large samples is 2.57 when the critical value is 1.96 what decision would be made?

Reject the null hypothesis (picture)

What is replication in an​ experiment? Why is replication​ important?

Replication is repetition of an experiment under the same or similar conditions. Replication is important because it enhances the validity of the results.

4 methods of sampling

Sampling= method of choosing people from the population to be in the sample •Random selection = Every member of the population has an equal chance of being chosen to be a member of the sample •Stratified random selection =Population is divided into relevant categories then sampled randomly within each category •Haphazard selection = An attempt to be random that in fact is clearly not random •Biased sample = some members of the population are not as likely to be included in the sample as are others L •Sample of Convenience =members of the sample are those who are easy to get to be in the study

One way to increase power is to maximize the difference produced by the two conditions in the experiment. How might this be accomplished?

Select two very different levels of the independent variable that are likely to produce a relatively large difference between the means

Scatterplot

Shows the relationship between two quantitative variables measured on the same individuals. The values of one variable (explanatory variable) appear on the horizontal axis and the values of the other variable (response variable) appear on the vertical axis. Each individual in the data appears as a point in the graph.

In​ 1965, researchers used random digit dialing to call 1400 people and ask what obstacles kept them from exercising. What type of sampling was​ used? What potential sources of bias were​ present, if​ any? Select all that apply.

Simple random sampling was​ used, since each number had an equal chance of being​ dialed, so all samples of 1400 phone numbers had an equal chance of being selected. Telephone sampling only includes people who had telephones. People who owned telephones may have been older or wealthier on​ average, and may not have been representative of the entire population. Individuals may have refused to participate in the sample. This may have made the sample less representative of the population. Individuals may have not been available when the researchers were calling. Those individuals that were available may have not been representative of the population.

SSE > SST

Since the least-squares line yields the smallest possible sum of squared prediction errors, SSE can never be more than SST which is based on the line y = ybar. In the worst case scenario, the least squares line does not better at predicting y than y = ybar does. Then SSE = SST and r2 = 0

A study found that people who suffer from obstructive sleep apnea are at increased risk of having heart disease. Identify the two events described in the study. Do the results indicate that the events are independent or​ dependent?

Sleep apnea and heart disease dependent

The size of slope

Small slope does not mean there is no relationship. The size of the slope depends on units in which we measure the two variables. You can't say how important a relationship is by looking at the size of the slope of the regression line (unlike correlation).

What happens to the least squares regression line if we standardize both variables?

Standardizing a variable converts it mean to 0 and standard deviation to 1. So, xbar, ybar is transformed to (0,0) so the least-squares line for the standardized values will pass through (0,0). Since sx = sy = 1, the slope is equal to the correlation.

Determine whether the value is a parameter or a statistic. A study of 6,076 adults in public rest rooms found that Modifying 23 % with underline did not wash their hands before exiting.

Statistic

Determine whether the given value is a statistic or a parameter. Upper A sample of professors is selected and it is found that 55 % own a vehicle.

Statistic because the value is a numerical measurement describing a characteristic of a sample.

Determine whether the underlined numerical value is a parameter or a statistic. Explain your reasoning. The average annual salary of 50 of a company's 800 employees is $ 54000

Statistic​, because the data set of salaries of 50 employees is a sample.

What is an advantage of using a​ stem-and-leaf plot instead of a​ histogram?

Stem-and-leaf plots contain original data values where histograms do not.

Example of independent sample t-test APA formatting

Students alertness on Monday (M=4.500, SD=3.317) was not significantly less than students alertness on Friday ( M=8.750, SD=4.272), t(6)= -1.572, p>.05, one tailed, d=1.111.

0.5 < P-value < .10

Suggestive evidence in favor of Ha

How to calculate the correlation

Suppose that we have data on variables x and y for n individuals. The means and the standard deviations of the two variables are xbar and sx for the x-values and ybar and sy for the y values. The correlation r between x and y is 1/(n-1) times the sum of the products of zx and zy (Use calculator: 6 1 4)

standard deviation of n observations

Sx measures the average distance of the observations from their mean.

Estimated Cohen's d for one-sample t

Tells how many standard deviations of change you have. Not influenced by the number of scores in the sample because it uses the standard deviation (in this case sample SD) instead of the sample standard error of M. Was defined as a measure of effect in terms of the population mean difference and the population standard deviation. However, in most situations the population values are not known so have to substitute the corresponding sample values in their place.

r2 (also called the percentage of variance accounted for by the treatment)

Tells the proportion of the variability in the scores that is due to (accountable) the independent variable (treatment); An alternative method for measuring effect size. Treatment causes scores to increase or decrease which means that the treatment is causing the scores to vary. By measuring how much variability is explained by the treatment, will obtain a measure of the size of the treatment effect. Always positive and never larger than 1.

interval vs ratio

Temperature (0 does not indicate an absence of the property) Vs weight

interval vs ordinal

Temperature vs Place in race (1st,2nd,3rd) interval must be in even measures

The following appear on a​ physician's intake form. Identify the level of measurement of the data. Temperature Age Allergies Change in health left scale of - 5 to 5

Temperature: interval Age: Ratio Allergies: nominal Change in health: Ordinal.

z scores tips & info

The SND (i.e. z-distribution) is always the same shape as the raw score distribution. For example, if the distribution of raw scores if normally distributed, so is the distribution of z-scores. The mean of any SND always = 0. The standard deviation of any SND always = 1. Therefore, one standard deviation of the raw score (whatever raw value this is) converts into 1 z-score unit. The SND allows researchers to calculate the probability of randomly obtaining a score from the distribution (i.e. sample). For example, there is a 68% probability of randomly selecting a score between -1 and +1 standard deviations from the mean (see Fig. 4).

probability distribution

The ___ of a discrete random variable X provides the possible values of the random variable and their corresponding probabilities.

mean

The ___ of a variable is computed by adding all the values of the variable in the data set and dividing by the number of observations.

range

The ___ of a variable is the difference between the largest data value and the smallest data values.

median

The ___ of a variable is the value that lies in the middle of the data when arranged in ascending order.

independent

The ___ variable is also known as the explanatory variable.

dependent

The ___ variable is also known as the response variable.

Accuracy of predictions

The accuracy of predictions from a regression line depends on how much the data scatter about the line.

Nonsense correlations

The correlation is real but the conclusion that changing one variable causes a change in the other variable is nonsense.

Differential attrition

The difference in rate of attrition between the program and control groups.

The coefficient of determination

The fraction of the variation in the values of y that is accounted for by the least squares regression line of y on x. We can calculate r2: r2 = 1 - SSE/SST where SSE = sum of residuals squared and SST equals sum of observations-mean squared

Statistical Inference

The idea of drawing inferences (or conclusions) about a population parameter based on a random sample from the population

In a recent​ year, about 37​% of all infants born in a country were conceived through​ in-vitro fertilization​ (IVF). Of the IVF​ deliveries, about ​twenty-six percent resulted in multiple births. ​(a) Find the probability that a randomly selected infant was conceived through IVF and was part of a multiple birth. ​(b) Find the probability that a randomly selected infant conceived through IVF was not part of a multiple birth. ​(c) Would it be unusual for a randomly selected infant to have been conceived through IVF and to have been part of a multiple​ birth? Explain.

The probability that a randomly selected infant was conceived through IVF and was part of a multiple birth is: .096 The probability that a randomly selected infant conceived through IVF was not part of a multiple birth is: .74 No, this is not unusual because the probability is not less than or equal to 0.05.

relative frequency

The relative frequency is the proportion (or percent) of observations within a category and is found using the formula Relative frequency= frequency/ sum of all frequencies

Important things to look for when you examine a residual plot: #1

The residual plot should show no obvious patterns. Ideally, the graph shows an unstructured (Random) scatter of points in a horizontal band centered at zero. A curved pattern in a residual plot shows that the relationship is not linear. If the spread about the regression line increases for larger/smaller values of x, predictions of y using this line will be less accurate for these values of x.

Interpret a residual

The residual says ___ than predicted by the least squares regression line

Suppose you tested two age groups on the number of details they could recall from a paragraph. The mean for the older group is 16, and the mean for the younger group is 14. Further suppose that you fail to reject the null hypothesis for this independent samples t-test. Which of the following best accounts for the difference between these sample means?

The sample means probably came from the same population, and the difference is due to sampling error

Questioning students as they leave an athletic facility​, a researcher asks 363 students about their dating habits What potential sources of bias are​ present, if​ any? Select all that apply.

The sample only consists of members of the population that are easy to get. These members may not be representative of the population. Because of the personal nature of the​ question, students may not answer honestly.

The importance of slope vs. y-intercept

The slope of a regression line is an important numerical description of the relationship between the two variables. Although we need the value of the y intercept to draw the line, it is statistically meaningful only when the explanatory variable can actual take values close to zero.

Standard error of the mean (σ2)

The spread of the means, symbolized by σM. The "standard deviation" of the distribution of sample means. Provides a measure of how much distance, on average, is expected between a sample mean (M) and the population mean (μ) As sample size increases, standard error decreases.

Estimated standard error of the mean

The standard error in a single sample t formula measures the amount of error that is expected for a sample mean and is represented by sM. The value that is used to estimate the real standard error σM when the value of σ is unknown. It is computed from the sample variance or sample standard deviation and provides an estimate of the standard distance between the sample mean (M) and the population mean μ

True or False A sample statistic will not change from sample to sample

The statement is false. A sample statistic can change from sample to sample.

T/F Data at the ratio level cannot be put in order.

The statement is false. A true statement is​ "Data at the ratio level can be placed in a meaningful​ order."

Determine whether the statement below is true or false. If it is​ false, rewrite it as a true statement. A combination is an ordered arrangement of objects.

The statement is false. A true statement would be​ "A permutation is an ordered arrangement of​ objects."

T/F Some quantitative data sets do not have medians.

The statement is false. All quantitative data set have medians

An ogive is a graph that displays relative frequencies. T/F

The statement is false. An ogive is a graph that displays cumulative frequencies.

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. You toss a fair coin nine times and it lands tails up each time. The probability it will land heads up on the tenth flip is greater than 0.5.

The statement is false. The correct statement is​ "You toss a fair coin nine times and it lands tails up each time. The probability it will land heads up on the tenth flip is exactly​ 0.5."

The mean is the measure of central tendency most likely to be affected by an outlier.

The statement is true

The accompanying table shows the numbers of male and female students in a certain region who received​ bachelor's degrees in a certain field in a recent year. A student is selected at random. Find the probability of each event listed in parts​ (a) through​ (c) below.

The student is male or received a degree in the field: .519 REMEMBER TO SUBTRACT THE MALES IN FIELD The student is female or received a degree outside of the field: .904 The student is not female or received a degree outside of the field: .925

Determine whether the study is an observational study or an experiment. Explain. To study the effects of social media on​ teenagers' brains, researchers showed a few dozen teenagers photographs that had varying numbers of​ "likes" while scanning the reactions in their brains.

The study is an experiment, because it applies a treatment to the teenagers

Determine whether the study is an observational study or an experiment. Explain. To study the effects of social media on​ teenagers' brains, researchers showed a few dozen teenagers photographs that had varying numbers of​ "likes" while scanning the reactions in their brains.

The study is an experiment, because it applies a treatment to the teenagers.

To study the effects of social media on​ teenagers' brains, researchers showed a few dozen teenagers photographs that had varying numbers of​ "likes" while scanning the reactions in their brains.

The study is an experiment, because it applies a treatment to the teenagers.

Determine whether the study is an observational study or an experiment. Explain. In a survey of 1291 adults in a​ country, 54​% said the​ country's leader should release all medical information that might affect their ability to serve.

The study is observational, because it does not apply a treatment to the adults.

Determine whether you would take a census or use a sampling to collect data for the study described below. If you would use a​ sampling, determine which sampling technique you would use. Explain. The most popular chain restaurant among the 65 employees of a company.

The study is a census, because the population is small enough for it to be practical to record all of the responses.

(c) How could this experiment be designed to be a​ double-blind? Choose the correct answer below.

The study would be a​ double-blind study if both the researcher and the patient did not know which patient received the real drug or the placebo.

Diffusion

The treatment spreads from treatment group to the control group

For the given pair of​ events, classify the two events as independent or dependent. Wearing no shoes or shirt Getting kicked out of a convenience store

The two events are dependent because the occurrence of one affects the probability of the occurrence of the other.

For the given pair of​ events, classify the two events as independent or dependent. Driving 30 mph over the speed limit Getting a speeding ticket

The two events are dependent because the occurrence of one affects the probability of the occurrence of the other.

For the given pair of​ events, classify the two events as independent or dependent. Flipping a fair coin and getting tails Flipping the same coin again and getting heads

The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other.

For the given pair of​ events, classify the two events as independent or dependent. Winning $ 100 on your first trip to the casino Winning $ 100 on your second trip to the casino

The two events are independent because the occurrence of one does not affect the probability of the occurrence of the other.

Extrapolation

The use of a regression line for prediction far outside the interval of values of the explanatory variable x used to obtain the line. Such predictions are often not accurate. *Few relationships are linear for all values of the explanatory variable. Don't make predictions using values of x that are much larger or much smaller than those that actually appear in your data.

interpereting z-scores

The value of the z-score tells you how many standard deviations you are away from the mean. If a z-score is equal to 0, it is on the mean. A positive z-score indicates the raw score is higher than the mean average. For example, if a z-score is equal to +1, it is 1 standard deviation above the mean. A negative z-score reveals the raw score is below the mean average. For example, if a z-score is equal to -2, it is 2 standard deviations below the mean.

Determine whether the variable is qualitative or quantitative. Explain your reasoning. Parcel tracking numbers

The variable is qualitative because tracking numbers are attributes or labels.

1.1

The weights (in pounds) of babies born at St Mary's hospital last month are summarized in the table. Find the class width. Class Frequency 5.0-6 7 6.1-7.1 11 7.2-8.2 20 8.3-9.3 10 9.4-10.4 3 6.1 - 5.0 = 1.1

Cluster

There are a bunch of data points together - Name ranges of each variable where cluster appears

How is the null hypothesis of the independent samples t-test verbalized?

There is no relationship between the independent variable and the dependent variable

Quartiles

Three values represented by Q1, Q2, and Q3 that divide the distribution into four subsets. About one - half of the data falls on or below Q2 (the second quartile is the median). About 3/4 of data fall on or below Q3.

design

To ___ an experiment means to describe the overall plan in conduction the experiment.

An interval estimate may or may not contain the true value of the parameter being estimated

True

Consider the null hypothesis H0: μ1- μ2 =0. If the CI for μ1- μ2 does not contain 0 the null hypothesis shoe be rejected.

True

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. If two events are mutually​ exclusive, they have no outcomes in common.

True

In performing a hypothesis test, one should decide whether to reject or not reject the null H0 before summarizing the results.

True

It is impossible for the Census Bureau to obtain all the census data about the population of the United States.

True

T/F Class boundaries ensure that consecutive bars of a histogram touch.

True

T/F The midpoint of a class is the sum of its lower and upper limits divided by two.

True

T/F When each data class has the same​ frequency, the distribution is symmetric.

True

T/F Class boundaries ensure that consecutive bars of a histogram touch.

True

The confidence level of an interval estimate of a parameter is the probability that the interval estimate will contain the parameter.

True

The number of different ordered arrangements of n distinct objects is​ n!.

True

The second quartile is the median of an ordered data set.

True

The t-distribution has a variance that is greater than one

True

When conducting a two tailed z test with ∝=0.01, the test value was computed to be 2.07 the decision would be to not reject the null hypothesis.

True

Test value for the difference between the means of two large samples is 1.43 the CV is 1.96 the null hypo. shouldn't be rejected.

True (picture)

independent

Two events E and F are ___ if the occurrence of one event does not effect the probability of the other.

disjoint

Two events are ___ if they have no outcomes in common. Another name for these events is ___ events.

disjoint mutually exclusive

Two events are ___ if they have no outcomes in common. Another name for these events is ___ events.

What is the difference between independent and dependent​ events?

Two events are independent when the occurrence of one event does not affect the probability of the occurrence of the other event. Two events are dependent when the occurrence of one event affects the probability of the occurrence of the other event.

Negative association

Two variables have a negative association when above-average values of one tend to accompany bleow-average values of the other

Extraneous variables

Variables that you are not intentionally studying in your experiment or test that has an effect (Undesirable) These can then turn into confounding variable (3rd variables)

Prediction

We can use a regression line to help predict the response (y hat) for a specific value of the explanatory variable x

When are nonparametric procedures used instead of parametric procedures?

When our data do not meet the assumptions of parametric procedures

When is a t-test used instead of a z-test?

When the population standard deviation is unknown

When are two samples considered to be related?

When we pair each score in one sample with a particular score in the other sample

The mean value of land and buildings per acre from a sample of farms is ​$1200​, with a standard deviation of ​$100. The data set has a​ bell-shaped distribution. Using the empirical​ rule, determine which of the following​ farms, whose land and building values per acre are​ given, are unusual​ (more than two standard deviations from the​ mean). Are any of the data values very unusual​ (more than three standard deviations from the​ mean)? ​$1034 ​$1445 ​$1043 ​$844 ​$1280 ​$1348

Which of the farms are unusual​ (more than two standard deviations from the​ mean)? 1445 844 Which of the farms are very unusual​ (more than three standard deviations from the​ mean)? 844

The nonparametric procedure that corresponds to the related samples t-test is the

Wilcoxon T test

What is the difference between a random sample and a simple random​ sample?

With a random​ sample, each individual has the same chance of being selected. With a simple random​ sample, all samples of the same size have the same chance of being selected.

When you calculate the number of combinations of r objects taken from a group of n objects what are you​ counting? Give an example.

You are counting the number of ways to select r of the n objects without regard to order. An example of a combination is the number of ways a group of teams can be selected for a tournament.

Issue of causality

You cannot make causal statements from a correlation. One of the most common errors in interpreting correlations is to assume that a correlation necessarily implies a cause-effect relationship when it does not.

How is the estimated standard error of the differences between means different than standard error

You have gone one step further and are using a different distribution. Because you have two samples, you are using both samples in the t formula. Two samples taken from a population gives you a better estimate of the population parameter

How should you interpret correlation from a restricted range?

You should be careful. Only make correlation statements about the correlation inside the restricted range and do not generalize the correlation to the full x range

Output of SPSS to conduct a hypothesis test using Pearson r

You will see a matrix that contains correlation values, sig values, and n values. There will be repetition if you use the entire matrix because each correlation will be listed twice.

nonresponse

___ bias exist when individuals selected to be in the sample who do not respond to the survey have different opinion from those who do respond.

sampling

___ bias means that the techniques used to obtain the individuals to be in the sample tends to favor one part of the population over another.

empirical

___ probabilities rely on the relative frequency with which an event happens.

classical

___ probabilities requires the outcomes in the experiment to be equally likely.

blinding

___ refers to nondisclosure of the treatment an experimental unit is receiving

descriptive

___ statistics deals with the organization and summarization of collected information.

inferential

___ statistics makes conclusions about populations using data drawn from a sample.

quantitative

___ variables classify individuals in a sample according to numerical values.

LSRL intercept equation

a = ŷ - bx̄

Completely Randomized Design

a Completely Randomized Design Problem A farmer wishes to determine the optimal level of a new fertilizer on his soybean crop. Design an experiment that will assist him. Approach Follow the steps for designing an experiment. Solution Step 1 The farmer wants to identify the optimal level of fertilizer for growing soybeans. We define optimal as the level that maximizes yield. So the response variable will be crop yield. Step 2 Some factors that affect crop yield are fertilizer, precipitation, sunlight, method of tilling the soil, type of soil, plant, and temperature. Step 3 In this experiment, we will plant 60 soybean plants (experimental units). Step 4 List the factors and their levels. • Fertilizer. This factor will be controlled and set at three levels. We wish to measure the effect of varying the level of this variable on the response variable, yield. We will set the treatments (level of fertilizer) as follows: Treatment A: 20 soybean plants receive no fertilizer. Treatment B: 20 soybean plants receive 2 teaspoons of fertilizer per gallon of water every 2 weeks. Treatment C: 20 soybean plants receive 4 teaspoons of fertilizer per gallon of water every 2 weeks. See Figure 6. • Precipitation. The amount of rainfall cannot be controlled, but the amount of watering done can be controlled. Each plant will receive the same amount of precipitation. • Sunlight. This uncontrollable factor will be roughly the same for each plant. • Method of tilling. Control this factor by using round-up ready method of tilling for each plant. • Type of soil. Certain aspects of the soil, such as level of acidity, can be controlled. In addition, each plant will be planted within a 1-acre area, so it is reasonable to assume that the soil conditions for each plant are equivalent. • Plant. There may be variation from plant to plant. To account for this, randomly assign the plants to a treatment. • Temperature. This factor is uncontrollable, but will be the same for each plant. Step 5 (a) Randomly assign each plant to a treatment group. First, number the plants from 1 to 60 and randomly generate 20 numbers. The plants corresponding to these numbers get treatment A. Next number the remaining plants 1 to 40 and randomly generate 20 numbers. The plants corresponding to these numbers get treatment B. The remaining plants get treatment C. Now till the soil, plant the soybean plants, and fertilize according to the schedule prescribed. (b) At the end of the growing season, determine the crop yield for each plant. Step 6 Determine any differences in yield among the three treatment groups. Figure 7 on the following page illustrates the experimental design

normal distribution

a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve. It has these properties. 1. The mean, median and mode are equal. 2. The normal curve is bell-shaped and symmetric about the mean. 3. The total area under the normal curve is equal to 1. 4. The normal curve approaches, but never touches, the x-axis as it extends farther and farther away from the mean. 5. Between µ - sigma and µ + sigma (in the center of the curve), the graph curves downward. The graph curves upward to the left of µ - sigma and to the right of µ + sigma. The points at which the curve changes from curving upward to downward are called inflection points.

The alternative hypothesis in a two-tailed significance test of correlation states that

a correlation exists in the population

parameter

a descriptive measure of a population

Histogram

a diagram consisting of rectangles whose area is proportional to the frequency of a variable and whose width is equal to the class interval. Block shaped no internal lines

probability histogram

a graph of the probability distribution that displays the possible values of the discrete random variable on the horizontal axis and the probabilities of those values on the vertical axis

block design

a group of experimental units that are similiar in ways that are expected to affect the response to the treatments. For example, gender may influence the effectiveness of a drug under study, so the subjects are divided into male and female groups then are randomly assigned to treatments within each block.

In a two-way ANOVA, an F involving a comparison among the level means of a factor is referred to as a test for the significance of

a main effect

matched-pairs design example

a matched-Pairs Design Problem An educational psychologist wants to determine whether listening to music has an effect on a student's ability to learn. Design an experiment to help the psychologist answer the question. Approach We will use a matched-pairs design by matching students according to IQ and gender (just in case gender plays a role in learning with music). Solution Match students according to IQ and gender. For example, match two females with IQs in the 110 to 115 range. For each pair of students, flip a coin to determine which student is assigned the treatment of a quiet room or a room with music playing in the background. Each student will be given a statistics textbook and asked to study Section 1.1. After 2 hours, the students will enter a testing center and take a short quiz on material in the section. Compute the difference in the scores of each matched pair. Any differences in scores will be attributed to the treatment. Figure 8 illustrates the design.

normal probability plot

a plot of the observed values of the variable vs the normal scores, which are the observations expected for a variable having standard normal distribution

marginal probability

a probability that corresponds to an event represented in the margin of a contingency table

sig value

a probability value that is telling you the exact probability of the value happening by chance

probability density function

a probablity density function has two requirements: 1) the total area under the curve is equal to 1 2) the function can never be negative

discrete

a random variable is discrete when it has finite or countable number of possible outcomes

continuous random variable

a random variable whose possible values are represented by some type of interval

The confidence interval for a single m is

a range of values of m that our sample mean is likely to represent

systematic sample

a sample drawn by selecting individuals systematically from a sampling frame

Random:

a sample in which every member of a population has an equal chance of being selected.

simple random sample

a sample in which every possible sample of the same size has the same chance of being selected.

Correlation

a statistical technique that is used to measure and describe the relationship between two variables. Usually the two variables are simply observed as they occur naturally without an attempt to control or manipulate the variables. Requires two scores for each individual (one from each of the two variables). Scores are normally defined as X and Y. Not trying to figure out causation, but only if there is a relationship between two variables. Correlation coefficient must fall between -1.00 and +1.00. Two parts to correlation are the sign, where (+) tells you you have a positive correlation and (-) tells you that you have a negative correlation. The number of the correlation tells you the strength of the relationship.

Least squares regression line LSRL

a straight line that describes the relationship between an explanatory variable, x, and a response variable, y. Used to predict values in a response variable for given values of x. Is the line of y on x that makes the sum of the squares of the vertical distances of the data points from the line as small as possible.

replication

a sufficient number of experimental units should be used to ensure that randomization creates groups that resemble each other closely and to increase the chances of detecting any differences among the treatments

distribution of a data set is:

a table, graph, or formula that provides the values of the observations and how often they occur

ratio variable

a variable that meets the criteria for an interval variable but also has a meaningful zero point As for equal interval + meaningful absolute zero Ie temperature in degrees kelvin, height

Check for these 3 conditions for the "Nearly normal Condition":

a) For small samples (n < 15 or so) the data should follow a normal model pretty closely - look for outliers or strong skewness b) For moderate sample sizes ( n between 15 and 40 or so), the t-method will work pretty well as long as the data are unimodal and reasonably symmetric c) For large sample sizes (n > 40 or so), the t-method will work pretty well unless the data are extremely skewed or have many outliers

You roll a​ six-sided die. Find the probability of each of the following scenarios. ​(a) Rolling a 6 or a number greater than 3 ​(b) Rolling a number less than 5 or an even number ​(c) Rolling a 6 or an odd number

a. .5 ((1/6)+(3/6))-(1/6) b. .833 (5/6) c. .667 (4/6)

One tailed (directional) independent sample t test hypotheses

a. H0: μ1≤ or ≥μ2 b. H1: μ1 < or > μ2

An individual stock is selected at random from the portfolio represented by the​ box-and-whisker plot shown to the right. Find the probability that the stock price is​ (a) less than ​$23​, ​(b) between ​$23 and ​$57​, and​ (c) ​$32 or more.

a.) .25 b.) .50 c.) .50

systematic random sampling

after placing subjects into a random order, an interval is determined by which each "k th" subject is selected for the sample - a random starting point is selected in a telephone directory and every 50th entry is selected for a survey - sampling interval is based on the population size/sample size

floor effect

all the scores cluster at the low end •Scores pile up toward the lower end of the distribution •because it is not technically possible to have a lower score (the measuring instrument does not go that low) even though conceptually the construct might

qualitative or categorical variables

allow for classification of individuals based on some attribute or characteristic

conditions of "r"

always between -1 and 1 r>0 - positive r<0 - negative values near zero indicate a very weak linear relationship and the strength increases as r moves away from 0 and toward -1 or 1 no unit does not describe curved relationships not resistant to outliers

probablity experiment

an action, or trail, through which specific results (counts, measurements, or responses) are obtained.

Margin of Error

an amount (usually small) that is allowed for in case of miscalculation or change of circumstances.

negative association

an increase in explanatory but decreases in response variable

Outliers

an individual with X and/or Y values that are substantially different (larger or smaller) from the values obtained for the other individuals in the data set

How to compute SPSS independent sample t test

analyze-compare means-independent sample t test=dependent variable is placed in test variable and independent variable is placed in grouping variable define the groups (? What does define the groups mean)-press okay

lurking variavble

another variable that may influence the response variable

The normal curve is: a) symmetrical. b) asymmetrical. c) multimodal. d) bimodal.

answer = A - symmetrical.

Sample size and sample variance influence on rejecting the null in one sample t-test

any factor that influences the standard error influences the likelihood of rejecting the null hypothesis and finding a significant treatment effect.

class boundaries

are the numbers that separate classes without forming gaps between them. for data that are integres, subtract. 5 from each lower limit to find the lower class boundaries. to find the upper class boundaries add .5 to each upper limit. the upper boundary of a class will equal the lower boundary of the next higher class.

nominal

assign numbers to objects but the numbers have no significance or meaning- just assigned for identification purposes (ie jersey numbers)

Homogeneity of variance assumption for independent sample t test

assumes that the two group's variances are roughly the same; most important when there is a large discrepancy between sample sizes; violating it can negate any interpretation of data

In a two-way analysis of variance where Factor A has 3 levels and Factor B has 4 levels, what are the appropriate df if 240 participants are evenly distributed among the treatments?

b

LSRL slope equation

b = r ( Sy / Sx)

bar graph

bar graph is constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn for each category. The height of each rectangle represents the category's frequency or relative frequency.

We perform ANOVA instead of multiple t-tests because with ANOVA the experimentwise error rate will

be equal to a

Assume that a 95% confidence interval for the mean is 11.5 < μ <16. the null hypothesis H0: μ=13.0 at ∝ =0.05 would

be rejected because 14 is between 13.5 and 15

Assume that a 99% confidence interval for the mean is 14.5 < μ <17.5. the null hypothesis H0: μ=13.0 at ∝ =0.01would

be rejected because 14 is less than 16

why can you not have a negative in the denominator of a pearson r formula

because it is the SS which is based on squared values and that therefore cannot be negative

In a one-way chi-square test, the null hypothesis states that if the observed frequencies do not equal the expected frequencies, it is

because of sampling error

normal distribution

bell shaped density curve defined by its mean and standard deviation

In ANOVA, an independent variable that is studied using independent samples in all conditions is called a

between-subjects factor.

How to use scatter plot to determine the strength of the relationship

can get a feeling based on how close the dots fall to a straight line. As the dots get farther and farther away from being a straight line you are getting close to a zero correlation and as they get closer to a straight line you are getting closer to a +1.00 or -1.00 correlation

How can a correlation be used to measure validity

can test it by taking a known test and testing it against a new test to see if they are correlated

nominal variables

categorical variables for which the categories do not have a natural ordering

Classify the following statement as an example of classical​ probability, empirical​ probability, or subjective probability. Explain your reasoning. The probability of choosing 6 numbers from 1 to 43 that match the 6 numbers drawn by a certain lottery is StartFraction 1 Over 6 comma 096 comma 454 EndFraction almost equals0.00000016.

classical every combination of 6 numbers has an equal chance of being drawn.

Use of ellipse to tell strength of correlation

close to a circle then the correlation is zero and if it is close to a straight line the correlation is nearing +1.00 or -1.00.

population

collection of all individuals or items under consideration in a statistical study

the three principles of experimental design

control - eliminates confounding variables randomization - create groups that are similiar to be able to make comparisons replication- many trials to reduce the chance of variation

central tendency/measures of center

descriptive measures that indicate where the center, or most typical value, of a data set lies

rang

difference between the max and minimum data entries

systematic​ sample

each member of the population is assigned a​ number; the population is ordered using these​ numbers, a starting number is randomly​ selected, and then sample members are selected at regular intervals from the starting number.

In an experiment, the "proportion of variance accounted for" goes by another name. It is called the

effect size

Random sampling

evaluated. The researcher analyzed the results to determine whether there was an association between economic status and happiness.

intersection of two events A and B

every event that occurs in both A and B

In a two-way ANOVA, the interaction effect is the

extent to which the influence one factor has on scores depends on the level of the other factor.

bell-shaped distribution

figure 15(b) displays a bell-shaped distribution because the highest frequency occurs in the middle and frequencies tail off to the left and right of the middle. That is, the graph looks like the profile of all bell. Note

finding percentage

finding what percentage 20) Suppose the payroll amounts for 26 major-league baseball teams are given, and 10 of those are in the $20 - $30 million range. Calculate approximately what percentage of the payrolls were in the $20-$30 million range. Round to the nearest whole percent. 10/26=0.3846 or about 38%

Platokurtic distribution

flat curve distribution. Light tailed

How are t-statistic distribution shape different than z-distribution (normal curve) shape

flatter and wider

In statistical terminology, when we "collapse across a factor," we average together all the scores

from all levels of that factor

frequency polygon

graph of a frequency distribution that shows the number of instances of obtained scores, usually with the data points connect by straight lines

frequency polygon

graph of a frequency distribution that shows the number of instances of obtained scores, usually with the data points connect by straight lines. emphasizes the continuous change in frequencies.

Degrees of freedom and sample variance

greater the df, the better that the sample variance represents the population variance and the better the t statistic approximates a z-score. Associated with sample variance and therefore describes how well t represents z

population

group of individuals we want information about

blocks

groups in which experimental units are placed when they are similar in ways that are expected to affect the response

clusters

groups of the population that are selected via a simple random sample

statistically independent events

if knowing that one event occurs does not change the probability that the other occurs

Determining significance for independent sample t test once you are in the correct row

if the sig for the correct row- NOT the equality of variances sig but the sig for the t test, is greater than alpha you do not have significance and if it is less than alpha you do have significance. remember that you have to divide sig by 2 for a one tailed test before you make your judgment

Example of restricted range correlation

if you take a sample to see if there is a correlation between SAT score and college GPA, and your SAT scores fall between 480 and 660 then you could say that there is a correlation "for SAT scores between 480 and 660" but could not say for SAT and college GPA overall

Example of using correlation for reliability

if you wanted to find out if an IQ test was reliable, then if there is a high correlation, then the people who score high the first time would score high the second time

experimental units

in a designed experiment, when the individuals or items on which the experiment is performed

How is homogeneity of variance similar to assumption that the population SD stays the same in z-score hypothesis test?

in z-score test assumed that the population SD was staying the same because treatment effect was adding or subtracting a constant to every score; in t-test assumption is made based on the fact that the t-statistic formula is obtained by averaging together sample variances so it only makes sense to average these two values if they are estimating the same population variance

What to do in independent sample t test if you are unsure if your scores are coming from a normally distributed population

increase the sample size to 30 or more

fanning in residual plot

increasing or decreasing spread about the line as x increases indicated the model may not be appropriate for values beyond the domain of the explanatory variable - good at beginning, not at end.

The first 10 students who arrived for the Friday lecture filled out a questionnaire on their attitudes toward the instructor. The first 10 who were late for the lecture were spotted, and afterward filled out the same questionnaire. The appropriate design for testing the significance of the difference between the means is

independent samples t-test

The two variables in a scatter plot are called the

independent variable and dependent variable

The primary interpretation of a two-way ANOVA rests on the interpretation of the

interaction effect, if it is significant

The process of specifying a range of values within which the population parameter is estimated to fall is known as

interval estimation

Sample size effect on estimated standard error

inverse relationship with the larger the sample, the smaller the error

raw score

is a regular score before it has been converted into a Z score

complement of an event E

is every event in the sample space but which is not an event in E

union of two events A and B

is every event that occurs in A and B and in both A and B

What does the null curve for Pearson r look like

it is truncated, meaning that you cut it off at +1.00 and -1.00 because that is where the curve ends

how to describe an asociation

linear strong, moderate, weak positive or negative outliers?

How to determine equality of variances assumed for independent sample t-test using SPSS

look at Levene's Test for Equality of Variances (in the second output box). 1) if the significance value for the test is greater than .05 we can assume equality of variances and therefore you stay on the top row (for .05, .01 one and two tailed you always use the .05 determinant). 2) If the significance value is less than .05, then you drop down to the lower row because you can assume that equality of variances is not assumed, since you are rejecting the null hypothesis of equality of variances.

confidence interval for a single

m describes a range of values of m

There are two ways in which samples can be related. In a _____ design, each participant in one condition is paired with a participant in the other condition. In a _____ design, each participant is tested under both conditions of the independent variable.

matched-samples; repeated-measures

The term Z ∝⁄2 ( σ/√n) describes

maximum error of estimate

When r is not significantly different from 0, the best predictor of y is the .... of the data values of y

mean

response variable

measures outcome of study (dependent variable)

sample standard deviation

measures variation by indicating how far, on average, the observations are from the sample mean

sample of convenience

members of the sample are those who are easy to get to be in the study

which is a better distribution, box plots or modified box plots?

modified, because it is resistant to outliers.

the effect of multiplying by a constant to data

multiplies measures of CENTER LOCATION and SPREAD. does not change the shape of the distribution.

multiplying by a constant

multiplies measures of center and location multiplies measures of spread does not change shape

You have 5 different video games. How many different ways can you arrange the games side by side on a​ shelf?

n!=n*(n-1)(n-2) 5*4*3*2*1 = 120 ways

At a blood​ drive, 5 donors with type Oplus ​blood, 4 donors with type Aplus ​blood, and 2 donors with type Bplus blood are in line. In how many distinguishable ways can the donors be in​ line?

n= 11 n1= 5 n2= 4 n3= 2 11!/5!x4!x2! = 6930

binomial probability equations

nCr = (1-p)^n-r(p)^r

Probability Formula

nProbability =Number of possible /. total possible outcomes Favorable Outcomes/Total Possible Outcomes

calculate odds ratios

nSometimes odds are given as a fraction or decimal e.g., odds of 1:4 may be given as ¼ = .25 nImportantto note that this fraction or decimal is odds and not probability! nTo show how much of an effect an intervention has, odds ratios (OR) are sometimes given OR isthe odds of success in one circumstance divided by the odds of success in another circumstance(both odds expressed as a deci

Assumption of normal distribution for one sample t-test

necessary part of the math underlying the development of the t statistic and t distribution table; violating this assumption has little practical effect on the results obtained for a t statistic, especially when the sample is relatively large; With small samples, you need this assumption with larger samples this assumption can be violated without affecting the validity of the hypothesis test

double blind

neither the subject nor the person in contact with them knows which treatment the subject recieved

Can r2 ever be greater than 1

no

are mutually exclusive events always independent?

no

when looking at a frequency distribution bar chart for a qualitative variable, should the bars touch?

no

Can r2 ever be negative

no because you are squaring.

The null hypothesis in a two-tailed significance test of correlation states that

no correlation exists in the population

nominal vs ordinal

no meaning fun measure/ratio/order of value vs aligned according to some kind of value meaning

Does correlation equal causation

no!

In order to find confidence interval for variances and standard deviation, one must assume that the variable is

normally distributed

Fractiles

numbers that partition, or divide, an order data set into equal parts.

inferential statistics

numerical data that allow one to generalize- to infer from sample data the probability of something being true of a population

descriptive statistics

numerical data used to measure and describe characteristics of groups. Includes measures of central tendency and measures of variation.

A major reason for conducting a study with two factors is to

observe the interaction between the factors

how to calculate a residual

observed - predicted = y - ŷ

undercoverage

occurs when some groups in the population are left out of the process of choosing a sample.

Assignment bias

occurs when the process used to assign different participants to different treatments produces groups of individuals with noticeably different characteristics (Failure)

unimodal distribution

one peak

continuous quantitative variable

one whose possible values form some interval of numbers

convenience​ sample

only members of the population that are easy to get are sampled.

When a Mann-Whitney U test is significant, we accept

our data represent the predicted difference between our conditions in the population.

How to calculate pooled SD

pool the SS and divide by the combined degrees of freedom; this is pooled variance, and if you take the square root of that, then you have pooled SD

When we construct a 95% confidence interval, we are 95% sure that the

population mean falls within the interval

In a​ poll, 1 comma 005 men in a country were asked whether they favor or oppose the use of​ "federal tax dollars to fund medical research using stem cells obtained from human​ embryos." Among the​ respondents, 46​% said that they were in favor. Identify the population and the sample.

population: all men Sample: 1005 men selected

continuous probability distribution f(x)

provides a measure of how dense the values are in a tiny neighborhood round x

In a study of 100 new cars, 27 are white. Find p̂ and q^ where p̂ is the proportion of new cars that are white.

p̂=0.32, q^=0.68

poisson random variable

randome variable X that is said to have the poisson distribution with parameter lambda

Random assignment (assumption for independent sample t-test)

randomly put people into groups, where one person is placed does not determine where another is placed

Control

refers to controlling the extraneous variable

When one has the option, a related-samples design should be chosen over an independent samples design because

related samples result in less variability, and therefore the design is more powerful

In a chi-square procedure we test whether, "the frequencies in each category in the sample data

represent specific frequencies in the population."

How to interpret r2 for independent sample t test

same as one sample t test; .01 is small, .09 is medium, .25 is large

Example of significance determined by computer printout for one tailed one sample t-test

sig value is .048 then you divide in half that is .024 then you would have significance if .05

standard deviation of a discrete random variable is denoted

sigma

A study found that people who suffer from obstructive sleep apnea are at increased risk of having heart disease. Identify the two events described in the study. Do the results indicate that the events are independent or​ dependent?

sleep apnea and heart disease. dependent.

Do you want a large or small sig value

small because that means that there is less of a probability of that happening by chance

event

some specified result that may or may not occur when an experiment is performed

steps in designing an experiment

step 1 Identify the Problem to Be Solved. The statement of the problem should be as explicit as possible and should provide the experimenter with direction. The statement must also identify the response variable and the population to be studied. Often, the statement is referred to as the claim. Step 2 Determine the Factors That Affect the Response Variable. The factors are usually identified by an expert in the field of study. In identifying the factors, ask, "What things affect the value of the response variable?" After the factors are identified, determine which factors to fix at some predetermined level, which to manipulate, and which to leave uncontrolled. Step 3 Determine the Number of Experimental Units. As a general rule, choose as many experimental units as time and money allow. Techniques (such as those in Sections 9.1 and 9.2) exist for determining sample size, provided certain information is available. Step 4 Determine the Level of Each Factor. There are two ways to deal with the factors: control or randomize. 1. Control: There are two ways to control the factors. (a) Set the level of a factor at one value throughout the experiment (if you are not interested in its effect on the response variable). (b) Set the level of a factor at various levels (if you are interested in its effect on the response variable). The combinations of the levels of all varied factors constitute the treatments in the experiment. 2. Randomize: Randomly assign the experimental units to treatment groups. Because it is difficult, if not impossible, to identify all factors in an experiment, randomly assigning experimental units to treatment groups mutes the effect of variation attributable to factors (explanatory variables) not controlled. Step 5 Conduct the Experiment. (a) Replication occurs when each treatment is applied to more than one experimental unit. Using more than one experimental unit for each treatment ensures the effect of a treatment is not due to some characteristic of a single experimental unit. It is a good idea to assign an equal number of experimental units to each treatment. (b) Collect and process the data. Measure the value of the response variable for each replication. Then organize the results. The idea is that the value of the response variable for each treatment group is the same before the experiment because of randomization. Then any difference in the value of the response variable among the different treatment groups is a result of differences in the level of the treatment. Step 6 Test the Claim. This is the subject of inferential statistics. Inferential statistics is a process in which generalizations about a population are made on the basis of results obtained from a sample. Provide a statement regarding the level of confidence in the generalization. Methods of inferential statistics are presented in Chapters 9 through 1

continuous random variable

takes all values in an interval of numbers. the probability distribution of X is described by a density curve. the probability of any event is the area under the density curve and above the values of X that make up the event. . can only calculate intervals of continuous random variables.

Example of using correlation to measure validity

test a newly developed IQ test against the WISC or WAIS. If there is good validity the person's score on the new test would need to be similar to where they placed in WISC or WAIS

The numerical value obtained from a statistical test is called the

test value

which is a more appropriate measure of spread: IQR or Standard deviation?

the IQR, because it is resistant to outliers.

population variance

the average of the squares of the deviations population variance = σ^2 =Σ(x - μ)^2 ------------ N

For the conjecture " the average rent of an apartment is more than $950 per month," the alternative hypothesis is

the average rent of an apartment is greater than $950 per month.

response variable

the characteristic of the experimental outcome that is to be measured or observed

data set

the collection of all observations

t-distribution

the complete set of t values computed for every possible random sample for a specific size sample (n) or a specific degrees of freedom (df). It approximates the shape of a normal distribution just as the t-statistic approximates a z-score.

What ratio is pearson r comparing?

the covariability of X and Y in the numerator with the variability of x and y separately in the denominator. Measures how the two scores are varying together versus how they are varying separately

The null hypothesis in the chi-squred test for goodness of fit is

the data fit the expected frequencies

range

the difference between the highest and lowest scores in a distribution

range

the difference between the maximum and minimum observations

Treatment variance is defined as

the differences between the populations produced by a factor

population distribution/distribution of the variable

the distribution of population data

Sample means:

the distribution of y-bar converges to a normal distribution

poisson distribution

the distribution used to model the frequency of a specified event occurring at rate lambda during a particular period of time

To determine the extent to which the conditions of the independent variable determine dependent scores, we should compute

the effect size

Population

the entire group of individuals about which we want information

mean of geometric random variable

the expected number of trials required to get the first success is 1/p

randomization

the experimental units should be randomly divided into groups to avoid unintentional selection bias in constituting the groups

Calculating estimated standard error of the differences between means

the first half of this formula (the first set of parenthesis is the pooled variance formula which you multiply by (1/n1+1/n2) and then take the square root of the whole thing

Sample size and normal t-distribution

the greater the same size (n) the larger the df and the better the distribution approximates a normal distribution

A significant interaction effect indicates that

the influence of one factor is not the same for each level of the other factor.

An interaction effect that is not significant indicates that

the influence of one factor is the same for each level of the other factor

first quartile

the median of the part of the entire data set that lies at or below the median of the entire data set

"r²" coefficient of determination

the percentage of variation in the values (y variable) that is explained by the LSRL.

rounding error

the percents in a relative frequency table to not all add up to 100

cluster random sampling

the population is divided into clusters, usually based on geography. a few clusters are randomly chosen and each subject in those chosen clusters is polled. Dividing into groups, and ALL are polled from SOME groups. some groups, all people - nchs is divided into 2nd period classrooms. 8 random classrooms are chosen and with every student present is polled

Cluster

the population is divided into groups (or clusters) and all of the members in one or more (but not all) of the clusters are selcted. To avoid a biased sample, care must be taken to ensure that all clusters have similiar characteristics.

levels

the possible values of a factor

probability

the probability of any outcome of a chance process is a number between 0 and 1 that describes the proportion of times the outcome would occur in a very long series of repetitions.

multiplication rule

the probability that two or more independent events will occur together is the product of their individual probabilities P(A and B) = P(A) x P(B|A) Dependent. P(A and B) = P(A) x P(B) Independent.

k factorial

the product of the first k positive integers (counting numbers)

percentile

the pth percentile of a distribution with p percent of the observations less than it. ex: Since 21 of 25 observations (84%) are below her score, Jenny is at the 84th percentile in her class's distribution of scores.

when using the sample standard deviation as measure of variation, what is the reference point?

the sample mean

sample space

the set of all possible outcomes of a probability experiment

The larger the sM

the smaller the value of t and the less likelihood of rejecting the null hypothesis

Sum of squares is defined as the sum of

the squared deviations between the mean and each score

What does the number value of a correlation measure

the strength and consistency of the relationship

midpoint of a class

the sum of the lower and upper limits of the class divided by 2. sometimes called the class mark.

mean

the sum of the observations divided by the number of observations

As degrees of freedom increaes

the t-distribution gets closer to the standard normal curve

subject

the term for when the experimental units are human

Validity

the test is measuring what it claims to be measuring

An unconfounded comparison occurs in comparing two cell means when

the two cells differ along one factor

percentile point

the value on the measurement scale below which a specified percentage of the scores in the distribution fall

sample data

the values of a variable for a sample of the population

data

the values of a variable for one or more people or things

population data/census data

the values of a variable for the entire population

Analysis of variance is the most common inferential statistical procedure used to analyze experiments because

there are several different versions of it, and so it can be used with many different experimental designs

Weak correlation indicates what about reliability

there is not a consistent relationship between the first score and the second score; that is a weak correlation indicates poor reliability

sum of squared deviations

total of each score's squared difference from the mean

Determine whether the statement below is true or false. If it is​ false, rewrite it as a true statement. 7 Upper C 5 equals 7 Upper C 2

true

Determine whether the statement below is true or false. If it is​ false, rewrite it as a true statement. The number of different ordered arrangements of n distinct objects is​ n!.

true

Determine whether the statement below is true or false. If it is​ false, rewrite it as a true statement. When you divide the number of permutations of 11 objects taken 3 at a time by​ 3!, you will get the number of combinations of 11 objects taken 3 at a time.

true

true or false: the larger the sample size, the better the approximation to the population tends to be

true

true or false: two data sets that have identical frequency distributions have identical relative-frequency distributions

true

independent events

two events a and B are independent if the occurrence of one event has no effect on the chance that the other event will happen. In other words, events a and B are independent if P( a | B) = P(a) and P(B | a) = P(B)

Independent observations

two observations are independent if there is no consistent, predictable relationship between the first and the second meaning that the occurrence for the first event has no effect on the probability of the second event

bimodal distribution

two peaks

event "A or B"

union of A and B A ∪ B

combination

unordered arrangement of objects

Random sampling assumption for independent sample t test

up until this time, we were comparing one sample against a population so needed random sampling; now we are comparing sample with sample, so could have random assignment only, you could run the study without random sampling but if you are trying to generalize the study to a population then you need random sampling

T-statistic

used to test hypotheses about an unknown population mean μ when the value of σ is unknown. The formula for the it has the same structure as the z-score formula, expect that the it uses the estimated standard error in the denominator

classical (or theoretical) probability

used when each outcome in a sample space is likely to occur. The classical probability for an event E is given by: P(E) = Number of outcomes in event E -------------- Total number of outcomes in sample space

equal-interval variable

variable in which the numbers stand for approximately equal amounts of what is being measured Ordinal + space between gradations is the same Ie temperature degrees Fahrenheit

Ordinal of measurement

variable is at the ordinal level of measurement if it has the properties of the nominal level of measurement, however the naming scheme allows for the values of the variable to be arranged in a ranked or specific order.

P-value < .0001

very strong evidence in favor of ha

tree diagram

visual display of the outcomes of a probability experiment by using branches that originate from a starting point.

r2 for independent sample t test

way to figure out the strength of the effect. Determines the portion of variability that is due to the independent variable (the portion of variability that is explained by treatment)

bias

when a sample does not accurately reflect the population because of the sampling method employed; when a question is worded in such a way to elicit a certain response.

when would it not be appropriate to make a pie chart of categorical data?

when the percents of each categories are parts to different wholes. They don't add up to 100.

For our purposes, what does μ1-μ2 equal in the independent measures t test

will equal zero. We are trying to find if the differences between the sample means is significantly different from zero

how is a sig value different than a critical value

with a critical value you have to exceed it in order to be significant and with this you have to be below it for there to be significance. With critical value you want a larger value because it is pushing it out into the tail and thus decreasing the probability of it happening by chance

Why do you not have the fourth requirement (same SD) in the t-test that you have in the z-scores

with the z-test you use the stated SD in calculations but in a t-test the SD is not stated and you are estimating the SD

How can an outlier make it look like there is a correlation when it actually does not exist

without the outlier, there might not be a correlation but when you add in the outlier it will seem like there is a correlation

the formula for the confidence interval of the mean for a specific ∝ is

x̄ - z ∝⁄2 ( σ/√n)<μ<x̄ + z ∝⁄2 ( σ/√n)

what can you conclude about associations?

you CANNOT imply causation since it is just an association or best fit model

if the plot is roughly linear, then,

you can assume that the variable is approximately normally distributed

what can you not assume about an association?

you cant assume that there is causation even when they have a strong association.

Why can you not make causal statements from correlation?

you do not know that X causes Y, Y causes X or a third variable Z causes both X and Y

How do you eliminate repetition in correlation matrix in SPSS

you draw a diagonal line through all the 1 correlations and you only deal with the numbers that are left on the top right-hand corner after you eliminate the rest of it.

In APA formatting, how do you report t

you need to put the number of degrees of freedom underneath

z-score

z = x - mean / standard deviation a type of standard score that tells us how many standard deviation units a given score is above or below the mean for that group

mean of the sum of random variables

µt = µx + µy

Population mean

μ

standard deviation of binomial random variable

σ = √np(1-p)

Determine whether the statement is true or false. You toss a coin and roll a die. The event​ "tossing tails and rolling a 4 or 6​" is a simple event.

​False, the event is not simple because it consists of two possible outcomes.

Determine whether the underlined numerical value is a parameter or a statistic. Explain your reasoning. The average grade on the midterm exam in a certain math class of 50 students was an 88 <--

​Parameter, because the data set of all 50 midterm exams in the math class is a population.

What technology format could be used to generate six random numbers between 1 and 800?

​RandInt(1,800,6​)

Determine whether the statement is true or false. When an event is almost certain to​ happen, its complement will be an unusual event.

​True, the complement would be an unusual event.

What technology format could be used to generate ten random numbers between 1 and 950​?

​​RandInt(1,950​,10​)

Determine whether the data set is a population or a sample. Explain your reasoning. The ages of one person per row in a cinema

​​Sample, because the collection of ages of one person per row is a subset of all people in the cinema.

Central Limit Theorem

•: If each score is due to very many random influences and there are very many scores then their frequency distribution will be normal.

Percentages of Normal Distribution

•A normal curve table shows the percentages of scores associated with different segments of the normal curve. ØSee Table A-1, pages 671 - 674 in the textbook •The first column of this table lists the Z score •The second column is labeled "% Mean to Z" and gives the percentage of scores between the mean and that Z score. •The third column is labeled "% in Tail" and gives the percentage of scores more extreme than that Z score.

why is normal distribution so common

•Any one score is due to many influences most of which are random. •Usually these influences cancel out, so most scores are in the middle. The chances of several influences going toward the same one direction become progressively less as the influences become more extreme.

mean

•Arithmetic average of a group of scores

population parameters

•Calculated characteristics of a population (mean, variance, standard deviation, etc.) Usually unknown and estimated from information obtained from a sample of the population

Variability

•Distributions with the same mean can have very different amounts of spread. •Distributions with different means can have the same amount of spread.

steps converting z-score to raw score

•Figure the deviation score. •Multiply the Z score by the standard deviation. •Figure the raw score. •Add the mean to the deviation score. •Formula for changing a Z score to a raw score: X= (Z)(SD)+M

standard deviation formula

•Figure the variance. •Take the square root of the variance. •SD = √SD2 •SD = √6.60 •SD = 2.57 (P - O) / 6

Calculating Z Score

•Formula for changing a raw score to a Z score: Z= X-M SD •To change a raw score to a Z score: •Figure the deviation score. •Subtract the mean from the raw score. •Figure the Z score. •Divide the deviation score by the standard deviation. (raw score-mean)/standard deviation)

calculating odds

•If probability = .20, => proportion of successes is .20 and, therefore, (1.00 - .20 = .80) the proportion of unsuccessful outcomes is .80 •So odds would be .20:.80 or 1:4 (alwaysuse smallest integers) •This is read as, "one to four"

what do you see in research articles

•Means and standard deviations are often seen in research articles. •Means and standard deviations can be displayed in tables or directly in the text of the articles.

why study samples

•Most research is conducted by evaluating a sample of subjects/events who are representative of a population of interest. •It is usually more practical to obtain information from a sample than from the entire population. •

Figures used in research articles / how normal curve

•Normal curve is sometimes discussed in terms of the distribution of scores of a particular variable.

4 basic pieces of information necessary for a data set

•Number of cases •Shape of the distribution •Central Tendency Variability

range of probabilities

•Probability cannot be less than 0 nor greater than 1. •Something with a probability of 0 has nochance of happening. •Something with a probability of 1 is absolutely certainto happen Probability cannot be less than 0 or greater than 1

Probabiliy can be expressed (3 ways)

•Proportion a number between 0 and 1; i.e., .20 •Percentage; i.e., 20% •Fraction; i.e., 1/5

To find the median

•To find the median: •Line up all the scores in order from lowest to highest (or highest to lowest). •Figure how many scores there are to the middle score by adding 1 to the number of scores and dividing by 2. •Count upto the middle score or scores. must have ordinal data

population

•entire set of things of interest •e.g., the entire piggy bank of pennies e.g., the entire population of HIV+ adults in the US

sum of squares formula

∑(X-M)2 = Sum of Squares = SS So, SD2 =SS N The Sum of Squares comes up very often in statistics

standard deviation of a discrete random variable

√∑(xi-µx)²(pi) on average, a randomly selected ____ is on average ____ away from the mean of _____

The probability of a type I error is represented by which of the following symbols?

adjacent

the most extreme observations that still lie within the lower and upper limits

A confounded comparison occurs in comparing two cell means when

the two cells differ along more than one factor

Cluster​ sample

The population is divided into​ subgroups, called​ clusters, and all of the members of one or more​ (but not​ all) clusters are selected.

Decide whether the random variable x is discrete or continuous. Explain your reasoning. Let x represent the time it takes to run a mile.

​Continuous, because x is a random variable that cannot be counted.

Critical values for each confidence level: - 90% - 95% - 99%

- 1.645 - 1.96 - 2.576

68-96-99.7 Rule

- 68% of the samples will have a p-hat that's within 1 standard deviation of p -95% of the samples with have a p-hat that's within 2 standard deviations of p - 99.7 of the samples will have a p-hat that's within 3 standard deviations of p

The assumptions of the t-test for related samples are the same as those for the t-test for independent samples except for requiring

that each score in one sample be paired with a particular score in the other sample.

frequency table

the COUNT of individuals that falls into each category

What is the total area under the normal​ curve?

1

for any discrete random variable X, the sum of the probabilities across all values X is always equal to:

1

2

23) Find the class width for the frequency table below. Class Frequency 35-36 3 37-38 1 39-40 3 41-42 6 43-44 2 Take two successive beginnings of the class and subtract: 37 - 35 = 2

A researcher wanted to know which type of toy 3-year-olds like to play with. The researcher placed a tricycle, blocks, a train set, and puzzles in a room and observed how many of 20 children played with each type of toy. The k for this study is

4

In within-subjects designs, the unwanted effects due to the influence of one condition on the following conditions is called a) positive practice effects. b) negative practice effects. c) carry-over effects. d) attrition effects.

Answer = C - carry-over effects.

If two researchers study the same subject yet record (enter into the data analysis software) two different responses, this is referred to as ___________________. a) intraobserver reliability b) interitem reliability c) intercoder reliability d) alternate forms reliability

Answer = C - intercoder reliability

Kurtosis refers to: a) the relative steepness or shallowness of a curve compared to the rest of the data. b) the number of steps in a histogram. c) the relative steepness or shallowness of a curve compared to the normal distribution. d) the narrowness of the range of scores.

Answer = C - the relative steepness or shallowness of a curve compared to the normal distribution.

In a study using 13 samples, and in which the population variance is unknown, the distribution that should be used to calculate confidence intervals is

a t-distribution with 12 degrees of freedom

frequency distribution

a table that shows classes or intervals of data entries with a count of the number of entries in each class

factor

a variable whose effect on the response variable is of interest in the experiment

Two tailed (non-directional) independent sample t test hypotheses (?)

a. H0: μ1=μ2 b. H1: μ1≠μ2

the effect of adding or subtracting a constant to data

adding a constant changes the value of CENTER and LOCATION, but not value of a number in relation to all of the other numbers. it does not change the center of shape of the distributions.

positive association

associates with an increase in explanatory variable and an increase in response

What happens to family of t-curves as df increases

gets closer and closer to the shape of a normal distribution. Infinite sized sample would be a perfectly normal cure.

What happens to the family of t-curves as df decreases

gets flatter and more spread out. critical values get pushed out and get more and more extreme.

standard deviation of the residuals

gives the approximate size of a typical or average prediction error (residual)

z-score table - why is "z" value incorrect

gives the percentage of cases falling between a given z score and the mean.

In a two-way ANOVA, the values of n and k

may be different for each factor

mean of a discrete random variable is denoted

mew

At a blood​ drive, 4 donors with type Oplus ​blood, 3 donors with type Aplus ​blood, and 2 donors with type Bplus blood are in line. In how many distinguishable ways can the donors be in​ line?

n1= 4 n2= 3 n3=2 n!=nx(n-1)(n-2)(n-3)..... =9!/(4!x3!x2!) = 1260

Is a negative correlation good, bad or indifferent

not bad; it is measuring the inverse relationship. A perfect correlation can be -1.00 meaning that as one variable increases, the other perfectly predictably decreases.

Sample Variance

quantative variable

takes numerical values for which is makes sense to find average. Ex: height

outcome

the result of a single trial in a probability experiment

placebo effect

when the patient's response is due to some other reason than the treatment imposed - a control group (placebo) is used to offset the effects of lurking variables. Without accounting for lurking variables, results of an experiment could show bias.

When is homogeneity of variances most important

when there is a large discrepancy between sample sizes; when there is equal or nearly equal sample sizes the assumption is less important but is still important

mutually exclusive or disjoint

when two events have no outcomes in common. the probability that one or the other occurs is the sum of their individual probabilities. P( A or B) = P(A) + P(B)

Matching across groups (Extraneous variable)

Same gender in each group, same average IQ This can become problematic

If the sample mean is 9 the hypothesized population mean is 10 and the population standard deviation is 3 compute the test value needed for the z test.

- 0.33

How do we decide whether a certain proportion of correctly identified cards is unusual?

- First, compute the probability of getting a sample result as unusual or more unusual than the one we got if the Null Hypothesis were true

Notes on Hypothesis Testing

- Hypotheses are about parameters, not statistics Ho: p = 0.5 (not p-hat!!) - The sample proportion is not part of the hypotheses. The hypotheses are formed before we collect data. Hypotheses motivate how the data are collected and which data are collected

Central Limit Theorem for Sample Proportions

- If n is large enough, then the sampling distribution of p can be approximated by a *normal model* with *mean* p and *standard deviation* sqrt(pq/n)

What is the difference between a frequency polygon and an​ ogive

A frequency polygon displays class frequencies while an ogive displays cumulative frequencies.

What is the difference between a frequency polygon and an​ ogive?

A frequency polygon displays class frequencies while an ogive displays cumulative frequencies.

One-proportion z-interval

- A confidence interval for the true value of a proportion

CLT for sample Means

- CLT says that the sampling distribution of a sample mean can be approximated by a normal distribution if n is large. - The larger the sample size, the better the normal approximation will be FOR SRS ONLY!!

If r = 0.72, what proportion of the variance in Y is accounted for by its relationship with X?

0.52

How to determine if a line is an appropriate model to use for the data

1.) Residual plot (scattered/no pattern) 2.) Small residuals 3.) Find S 4.) Find r2

Histogram

A graph of vertical bars representing the frequency distribution of a set of data.

bias

A random sample must be used to not bring ___ to the study or experiment.

Dot Plot

One more graph! We draw a dot plot by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed.

Describe the relationship between quartiles and percentiles.

Quartiles are special cases of percentiles. Q1 is the 25th​ percentile, Q2 is the 50th ​percentile, and Q3 is the 75th percentile.

variance

The ___ of a variable is the mean of the squared deviations about the population mean.

multimodal distribution

three or more peaks

coefficient of determination (r2) parameters

Same as r2 parameters for t test

85

Z₁₅ means the score is above ___% of the population

Interpret regression line

_ % of the variation in the (response variable) is accounted for by the regression line

variables

___ are the characteristics of the individuals within the population.

In Tanisha's study, each person's respiration rate was tested for baseline (no exercise), after 5 minutes of strenuous exercise, and 5 minutes after strenuous exercise had stopped. Because her study does not meet one of the assumptions for a parametric test, what nonparametric test should she run?

a Friedman

frequency histogram

a bar graph that represents the frequency distribution of a data set. 1. The horizontal scale is quantitative and mesures data entries. 2. The vertical scale measures the frequencies of the classes. 3. Consecutive bars much touch.

Hypotheses for one sample t-test for one tailed test

a. H0: μ≤ or ≥(population mean) b. H1: μ< or >(population mean)

​(a) List an example of two events that are independent. ​(b) List an example of two events that are dependent.

a. Rolling a die twice b. Drawing one card from a standard​ deck, not replacing​ it, and then selecting another card

A standard deck of cards contains 52 cards. One card is selected from the deck. ​(a) Compute the probability of randomly selecting a heart or spade. ​(b) Compute the probability of randomly selecting a heart or spade or club. ​(c) Compute the probability of randomly selecting a ten or club.

a.) .5 = (13/52)+(13/52) b.) .75 c.) .308

Population Correlation

ρ

If we INCREASE the confidence level:

- The width of the confidence interval INCREASES

complement

1 - P(event)

What is the difference between a parameter and a statistic?

A parameter is a numerical description of a population characteristic. A statistic is a numerical description of a sample

What is / are the modes for the following data: 8 8 9 7 6 9 8 9 7 7 a) 6 b) 7.8 c) 8 d) 7, 8, and 9

Answer =

True or False. A statistic is a measure that describes a population characteristic

False. A statistic is a measure that describes a sample characteristic.

Hypothesis Testing

Having a theory or hypothesis about a population proportion that we would like to test

When is r2 = 1

If all the points fall directly on the least-squares line, SSE = 0 and r2 = 1. Then all of the variation in y is accounted for by the linear relationship with x.

impossible

If an event is ___, the probability of the event is 0.

conditional probability formula

P( B | A) = P(A and B) / P(A)

15

P₁₅ means the score is above ___% of the population

Comparing the means of samples from two normally distributed pop., if the samples are independent and the pop. variances are known, the z test can be used.

True

If the equation for the regression line is y1 =-6x+7 then the intercept of this line is

7

Outside a​ home, there is an 8​-key keypad with letters Upper A comma Upper B comma Upper C comma Upper D comma Upper E comma Upper F comma Upper G and Upper H that can be used to open the garage if the correct eight​-letter code is entered. Each key may be used only once. How many codes are​ possible?

8x7x6x5x4x3x2x1= 40320

Example of Confidence interval interpretation: "We are 99% confident that this interval contains the proportion of ALL U.S. Adults who do not want to see Roe v. Wade overturned. What does this mean?

- In repeated sampling, 99% of all intervals constructed in the same way, would contain the true proportion of all U.S. adults who do not want to see Roe v. Wade overturned by the supreme court.

Small P-value

- It would be unusual to do so well by guess and we might be convinced that the subject has some special ability - Reject Null Hypothesis

If the equation for the regression line is y1 =-8x+7 then the slope of this line is

-8

Influences on effect measures

1) sample size not influence, 2) sample vairance doe sinfluence

Sampling Distribution notes

1. As n gets bigger, the histogram becomes more bell-shaped and symmetric 2. As n gets bigger, there is less variability (less spread) 3. As n gets bigger, the histograms become centered about the mean

Rules for Grouped Frequency Distributions

1.Have a total of about5 to 15 intervals 1.Try to avoid having intervals with no scores, but if an interval does have no scores it still must be shown. 2.All intervals must be the same width 3.Interval width is usually an integer 4.Lower number of each interval should beevenly divided by the interval width

Convert probability to odds

1.Subtract probability from 1i.e., if probability = .25; 1.00-.25 = .75 2.Form a ratioi.e., .25:.75 3.Convert to smallest integersi.e., 1:3

Of the cartons produced by a​ company, 3​% have a​ puncture, 4​% have a smashed​ corner, and 0.5​% have both a puncture and a smashed corner. Find the probability that a randomly selected carton has a puncture or a smashed corner.

6.5% ((3/100)+(4/100))-(.5/100)=.065=6.5%

Find the values for x^2 left and x^2 right when ∝=0.05 and n=17

6.908 and 28.845

completely randomized design

A completely randomized design is one in which each experimental unit is randomly assigned to a treatment.

nominal

A variable is at the ___ level of measurement if the values of the variable name, label, or categorize. In addition, the naming scheme does not allow for the values of the variable to be arranged in a ranked, or specific order.

continuous

A(n) ___ random variable has infinitely many values.

mutually exclusive

Another name for disjoint events is ___ events.

Define sampling error and nonsampling error. Give examples of nonsampling error.

Answer =

The mean of the numbers 2, 5, 6, 7, 8 is: a) 5.6 b) 6.5 c) 6.4 d) 6

Answer = a - 5.6

law of large numbers

As an experiment is repeated over and over, the empirical probability of an event approaches the theoretical (actual) probability of the event.

About 90​% of babies born with a certain ailment recover fully. A hospital is caring for seven babies born with this ailment. The random variable represents the number of babies that recover fully. Decide whether the experiment is a binomial experiment. If it​ is, identify a​ success, specify the values of​ n, p, and​ q, and list the possible values of the random variable x.

Binomial experiment Success = baby recovers. n = 7, p =.90 x = 0,1,2,....7

Ratio

Determine the level of measurement of the variable choose nominal, ordinal interval or ratio weight of rice bought by a customer. (measurement)

Data

Facts and statistics collected together for reference or analysis

A correlation coefficient of 0.961 would mean that the values of x increase as the values of y decrease

False

The nonparametric procedure that corresponds to the within-subjects ANOVA is the

Friedman

Chebychev's Theorem

Gives a rule for the portion of any data set lying within k standard deviations ​(k > ​1) of the mean.​ Does not assume symmetric or bell shaped.

Why should the number of classes in a frequency distribution be between 5 and​ 20?

If the number of classes in a frequency distribution is not between 5 and​ 20, it may be difficult to detect any patterns.

bias

If the results of the sample are not representative of the population, then the sample has bias

James has conducted a study involving a total of 60 participants. These individuals were randomly assigned to one of three different treatment conditions--No Sound, White Noise, or Conversation. Each was then tested on a reading recall task. Because James knows his study does not meet one of the assumptions for a parametric test, what nonparametric test should he run?

Kruskal-Wallis H

Sample size effect on effect size

Little to no effect, unlike in hypothesis testing. Does not influence estimated Cohen's d at all and the measures of r2 are only slightly affected (barely any influence at all)

A political strategist claims that 55% of voters in Madison County support his candidate. In a poll of 300 randomly selected voters, 147 of them support strategist's candidate. At ∝ =0.05 is the political strategist claim warranted?

No, because the test value -2.09 is in the critical region

multiplication rule

P(A and B) = P(A) × P(B | A)

If two events are mutually​ exclusive, why is Upper P left parenthesis Upper A and Upper B right parenthesis equals 0​?

P(A and B)=0 because A and B cannot occur at the same time.

addition rule for two events

P(A or B) = P(A) + P(B) - P(A and B)

The regions of a country with the six highest per capita incomes last year are shown below. 1. Northeast 2. Eastern 3. Southeast 4. Western 5. Southwest 6. Northwest Determine whether the data are qualitative or quantitative and identify the data​ set's level of measurement.

Qualitative. Ordinal.

What is another way to see how well a least squares line fits our data

R2 (the coefficient of determination) tells us how well the least-square predicts the values of the response variable

Benefits of residuals

Residuals show how far data fall from regression line and thus help us assess how well the line fits/describes the data. Residuals can be be calculated from any model fitted to data. However, residuals from least-squares line have a special property: the mean of the least-squares residuals is always zero.

side-by-side bar graph

Suppose we want to know whether more people are finishing college today than in 1990. We could draw a side-by-side bar graph to compare the data for the two different years. Data sets should be compared by using relative frequencies, because different sample or population sizes make comparisons using frequencies difficult or misleading.

The probability that a person in the United States has type B​+ blood is 7​%. Four unrelated people in the United States are selected at random. Complete parts​ (a) through​ (d).

The probability that all four have type B​+ blood is: .000024 Find the probability that none of the four have type B​+ blood.:.748 Find the probability that at least one of the four has type B​+ blood: .252 The event in part left parenthesis a right parenthesis is unusual because its probability is less than or equal to 0.05.

In a sample of 1200 U.S.​ adults, 208 think that most celebrities are good role models. Two U.S. adults are selected at random from the population of all U.S. adults without replacement. Assuming the sample is representative of all U.S.​ adults, complete parts​ (a) through​ (c).

The probability that both adults think most celebrities are good role models is: .030 The probability that neither adult thinks most celebrities are good role models is: .683 The probability that at least one of the two adults thinks most celebrities are good role models is: 1-.683= .317

Determine if the survey question is biased. If the question is​ biased, suggest a better wording. Why is eating ice cream bad for​ you?

The question is biased. The wording​ "How do you think eating ice cream affects your​ health?" would be better.

Identify the sample space of the probability experiment and determine the number of outcomes in the sample space. Playing the game of​ roulette, where the wheel consists slots numbered​ 00, 0,​ 1, 2,​ ..., 43 To play the​ game, a metal ball is spun around the wheel and is allowed to fall into one of the numbered slots. Identify the sample space.

The sample space is​ {00, 0,​ 1, 2,​ ..., 43​}. 45 outcomes. Starts at 00.

The 50th percentile is equivalent to Upper Q 1

The statement is false. The 50th percentile is equivalent to Upper Q 2.

Draw two normal curves that have the same mean but different standard deviations. Describe the similarities and differences.

The two curves will have the same line of symmetry. The curve with the larger standard deviation will be more spread out than the curve with the smaller standard deviation.

Why is the standard deviation used more frequently than the​ variance?

The units of variance are squared. Its units are meaningless.

Example of impact of stating that there is causation when there is none

There is a positive correlation between the number of years working a night shift and the rate of breast cancer A news reporter claimed that the fluorescent lighting in the hospital was causing the breast cancer: This caused people to panic, and is not true. There are many other third variables than lights which could be related to breast cancer such as sleeping and eating habits. Do not know if working nights is causing breast cancer or if there are other factors involved

Daniel Wiseman for Gres Trans Corp. wants to determine if the flow rate of particular material changes with different changes in temperature. The data plotted in (picture) what type of relationship exists between the flow rate and the change in temperature.

There is no relationship

Which of the following is one of the assumptions for hypothesis testing of the Pearson correlation coefficient?

There is random sampling of X-Y pairs.

A pharmaceutical company wants to test the effectiveness of a new allergy drug. The company identifies 250 females​ 30-35 years old who suffer from severe allergies. The subjects are randomly assigned into two groups. One group is given the new allergy drug and the other is given a placebo that looks exactly like the new allergy drug. After six​ months, the​ subjects' symptoms are studied and compared. (b) Identify a potential problem with the experiment design being used and suggest a way to improve it.

There may be a bias on the part of the researcher if the researcher knows which patients were given the real drug.

Outlier

There's a lot of white space around the data point - Outlier in response variable (y) - Outlier in explanatory variable - Outlier in both - Outlier b/c doesn't follow the overall pattern/trend

Determine whether the following events are mutually exclusive. Explain your reasoning. Event​ A: Randomly select a voter who legally voted for the President in California. Event​ B: Randomly select a voter who legally voted for the President in Iowa.

These events are mutually​ exclusive, since it is not possible for a voter to both have legally voted for the President in California and have legally voted for the President in Iowa.

Determine whether the following events are mutually exclusive. Explain your reasoning. Event​ A: Randomly select a female economics major. Event​ B: Randomly select a economics major who is 20 years old

These events are not mutually​ exclusive, since it is possible to select a female economics major who is 20 years old.

How to use t-distribution table

To use it, you locate the number of degrees of freedom on the left column, and then you move over to the correct column, based on whether you are doing a 1 or 2 tailed test, and whether it is .05 or .01 critical value--do not use n in the table AT ALL, use the df instead

A regression line was calculated as y1 =9.7-3.2x. the slope of this line is -3.2

True

A type I error occurs if one rejects the null hypothesis when it is

True

Every thirtieth person entering a library is asked to choose his or her favorite author from a list of five different authors that includes a description of each.

Type of sampling: Systematic sampling is​ used, because every thirtieth person is selected Bias: The wording of the question may direct respondents towards a particular author. If there is a regular pattern to the people entering the library​, the sample may not be representative

Shapes of Frequency Distributions

Unimodal Bimodal Rectangular Multimodal

What do you do if you are concerned that a one sample t-distribution is not normally distributed

Use a larger sample (n=30 or more). Concepts of central limit theorem still hold

Hypotheses of pearson r

Use the Greek letter ⍴ instead of μ because you are not comparing means, you are comparing correlations; You are always testing ⍴ against zero

How can we predict y if we don't know x

Use the mean of the response variable

Why do you use Cohen's d

Use this because if you have a huge sample and are using the t (or z) formula then a minuscule difference could have a huge effect. This eliminates this possibility.

Example of Confidence Interval Interpretation for sample means

We are 95 percent confident that the true mean net weight, mew, of all m and m packages is between 48.438 and 49.66 g

Finding t-values

We can find the t-critical (t*n-1) values using table T in the back of the book (cannot get from a TI-83 calculator). Find the desired confidence level in the last row of the table, then find the row corresponding to the df, which is n-1

D

What is the difference between a bar chart and a histogram? A) The bars in a bar chart are all the same width while the bars of a histogram may be of various widths. B) There is no difference between these two graphical displays. C) The bars in a bar chart may be of various widths while the bars of a histogram are all the same width. D) The bars on a bar chart do not touch while the bars of a histogram do touch.

(x-µ)/σ

What is the formula for calculating the z-score?

µ

What is the symbol for the mean of the population?

σ

What is the symbol for the population standard deviation?

s sub x

What is the symbol for the sample standard deviation?

x bar

What is the symbol fore the mean of the sample?

0≤P(x)≤1

What must be true about each of the probabilities in a probability distribution table?

1

What must the sum of the probabilities in a probability distribution table equal?

mean<median<mode

When a distribution is skewed left: ___<___<___

mean>median>mode

When a distribution is skewed right: ___>___>___

contingency or two-way table

a frequency distribution for bivariate data

CLT for Sample Means

When a random sample is drawn from any population with an mean mew and a standard deviation sigma, the samplign distribution of the sample mean y-bar is approximately normal with mean mew and standard deviation sigma/sqrt(n), if the sample size is large enough. That is, the sampling distribution of the sample mean is approximately N(mew, sigma/sqrt(n)) for large n

Let N be the number of data entries in a population and n be the number of data entries in a sample data set. Choose the correct answer below.

When calculating the population standard​ deviation, the sum of the squared deviation is divided by​ N, then the square root of the result is taken. When calculating the sample standard​ deviation, the sum of the squared deviations is divided by nminus​1, then the square root of the result is taken.

Rounding

When doing calculations, don't round until the end of the problem. Use as many decimal places as your calculator stores to get accurate values of the slope and y intercept.

How does violating homogeneity of variances negate any interpretation of data?

When you compute the t-statistic, the only unknown value is the population mean difference; When you violate homogeneity of variance, you have two unknown values: the population mean difference and the average of the two variances; Therefore, you do not know which one is responsible for an extreme t-statistic. You cannot reject the hypothesis because it might be that pooled variance produced the extreme t-statistic, and not the population mean difference.; Without satisfying homogeneity of variance, you cannot accurately interpret a test statistic and hypothesis test is meaningless

Predicted value

Y hat is the predicted value of the response variable y for a given value of the explanatory variable x *Approximation

The average greyhound can reach a top speed of 18.9 meters per second. A particular greyhound breeder claims her dogs are faster than the average greyhound. A sample of 40 of her dogs ran on average, 19.5 meters per second with a population standard deviation of 1.5 meters per second. With ∝ =0.05 is her claim correct?

Yes, because the test value 2.53 falls in the critical region

Variable

a characteristic or attribute that can assume different values

event

a collection of outcomes for the experiment, that is, any subset of the sample space

measure of variation/measure of spread

a descriptive measure that indicates the amount of variation, or spread, in a data set

haphazard selection

a method of selecting sample items in an unstructured manner but without any intentional bias

qualitative variable

a nonnummerically valued characteristic that varies from one person or thing to another

central tendency

a number that describes something about the "average" score of a distribution A single number that characterizes a group of scores

quantitative variable

a numerically valued characteristic that varies from one person or thing to another

random variable

a quantitative variable whose value depends on chance

frequency distribution

an arrangement of data that indicates how often a particular score or observation occurs

The sum of the observed frequencies for all the rows of a mc028-1.jpg test for independence must equal

all of the above. the total N for the experiment. the sum of the expected frequencies for all the rows. the sum of the observed frequencies for all the columns.

deviation of an entry

an entry, x, in a population data set is the difference between the entry and the mean of the data set. x = x - µ

idea of probability

chance behavior is unpredictable in the short run but has a regular and predictable pattern in the long run

convenience sampling

choosing subjects that are easiest to reach - friends, neighborhood

What is necessary to make a causal statement

conduct a true experiment in which one variable is manipulated by a researcher and the other variables are rigorously controlled

x represents the number of dependent children in a household. Is the random variable x discrete or​ continuous?

discrete

class width

distance between lower(or upper) limits of consecutive classes

Grouped frequency table

frequency table in which the number of individuals (frequency) is given for intervals of values

Blocking

gouping together similar (homogeneous) experimental units and then randomly assigning the experimental units within each group to a treatment is called blocking. Each group of homogeneous individuals is called a blocking

The freshmen competed with the sophomores to see who could raise the most money by recycling. The appropriate design for testing the significance of the difference between the means is

independent samples t-test

At a card players' club, the poker players had a contest with the blackjack players to see who could win the most money. The appropriate design for testing the significance of the difference between the means is

independent samples t-test.

Strong positive correlation between two data points indicates what about reliability

indicates good level of reliability: people who scored high on the first measurement also scored high on the second

negative z-score

indicates that the observation is below the mean

what type of representative value is the balance point of distribution

mean

conditions for normal approximation

np≥10 and n(1-p)≥10

individuals

objects described by a set of data

influential outliers

observation is influential if removing it would markedly change the position of the regression line. no rule-just observe where it lies in relation to pattern.

outliers in residual plots

observation that lies outside the overall pattern. observation can be in both x and y directions

observational study

observes individuals and measures variables of interest without attempting to influence the responses

marginal distribution

one of categorical variables in a two way table of counts.

Ordinal

ranking( first place,second place, etc) of contestants in a singing competition. (like the medals)

frequency polygon (line graph)

raw scores on X-Axis, frequency on Y-Axis XStart one interval below below lowst one above highest xaxis `

random variable x

represents a value associated with each outcome of a probability experiment.

standard score (z-score)

represents the number of standard deviations a value x lies from the mean, μ. To find the z score for a value use the formula z = value - mean --------------- standard deviation = x- μ --------- σ

sample size

the collection of all possible outcomes for an experiment

Two samples are said to be independent when

we randomly select participants for a sample, without regard to who has been selected for either sample

randomized block design

within each of the blocks, experimental units are randomly assigned to treatments

What are the degrees of freedom for an independent samples t-test that uses two samples with n = 12 in each sample?

22

If the sample mean is 7, the hypothesized population mean is 6 and the population standard deviation is 2, compute the test value needed for the z test

0.50

response variable

A variable that measures an outcome of a study.

event

A(n) ___ is any collection of outcomes from a probability experiment.

sample space

A(n) ___ of a probability experiment is the collection of all outcomes possible.

What is the best way to examine the idea of a sampling distribution of a statistic?

- Perform a simulation

Computing p-value for a t-test

- tcdf (lower, upper, degrees of freedom)

what can you do with a LSRL

-calculate the means, x̄ and ȳ, and the standard deviations Sx and Sy, and their correlation r

probability sample

-selected by a procedure that uses a random device to decide which members of the population will constitute the sample -eliminates unintentional selection bias -guarantees that inferential statistics can be applied

lower and upper class limits

-smallest value within the class -largest value within the class

What using a restricted range for correlation could do

1) reverse a correlation 2) hide a correlation that actually exists 3) show a correlation that really doesn't exist if you go beyond that restricted range

Hypothesis testing for independent sample t-test

1) state the hypotheses, 2) determine critical region, 3) run independent sample t test, 4) make a decision

Why is correlation useful

1) tells us how two variables relate to each other 2) measures the relationship between the z-scores

When would you use a correlation

1) theory verification, 2) prediction, 3) to help determine the reliability of a test, 4) measure validity

Poisson Distribution

1. The xperiment consists of counting the number of times x an event occurs in a given interval.The interval can be an interval of time, area, or volume. 2. The probability of the event occuring is the same for each interval.. 3. The number of occurrences in one interval is independent of the number of of occurrences in other intervals. The probability of exactly x occurrences in an interval is P(x) = µ^x *e^-µ / x!

Find t∝⁄2 when n=12 for the 90% confidence interval for the mean

1.80

A sample of 35 different payroll departments found that employees worked an average of 240.6 days a year. If the population standard deviation is 18.8 days, find the 90% confidence interval for the average number of days μ worked by all employees who are paid through payroll departments

235.4<μ<245.8

The probability that an event will happen is Upper P left parenthesis Upper E right parenthesisequalsStartFraction 28 Over 31 EndFraction . Find the probability that the event will not happen.

3/31

A random sample of 100 voters found that 44% were going to vote for a certain candidate. Find the 99% limit for the population proportion of voters who will vote for the candidate.

31.2%<p<56.8%

How many participants would be required for a completely randomized 4 ´ 5 between-subjects design with three observations per cell?

60

Determine which numbers could NOT be used to represent the probability of an event.

64/25 because probabilities cannot be greater than 1. -1.5 because probability cannot be less than 0.

68-95-99.7

68% of the observations fall with in 1 standard deviation of the mean 95% of the observations fall within 2 standard deviations of the mean 99.7% of the data falls within 3 standard deviations of the mean

Pareto Chart

A Pareto chart is a bar graph whose bars are drawn in decreasing order of frequency or relative frequency

One-proportion z-interval

A confidence interval for the true value of proportion. The confidence interval is where z* is a critical value from the Standard Normal model corresponding to the specified confidence level

confounding variable

A confounding variable is an explanatory variable that was considered in a study whose effect cannot be distinguished from a second explanatory variable in the study.

positively skewed distribution

A distribution where the scores pile up high right

matched pairs design

A matched-pairs design is an experimental design in which the experimental units are paired up. The pairs are selected so that they are related in some way (that is, the same person before and after a treatment, twins, husband and wife, same geographical location, and so on). There are only two levels of treatment in a matched-pairs design.

standard normal distribution

A normal distribution with a mean of 0 and a standard deviation of 1.

observed data

A normal probability plot is a graph that plots the ___ on the x-axis and the normal z-scores on the y-axes.

Value

A possible outcome that a score can have

randomized block design

A randomized block design is used when the experimental units are divided into homogeneous groups called blocks. Within each block, the experimental units are randomly assigned to treatments

Examining residual plots

A residual plot turns the regression line horizontal. It magnifies deviations of the points from the line, making it easier to see unusual observations and patterns. If the regression line captures the overall pattern of the data, there should be no pattern in the residuals.

How is a sample related to a population?

A sample is a subset of a population.

simple random

A subset of a statistical population in which each member of the subset has an equal probability of being chosen. It is meant to be an unbiased representation of the group.

Outlier

A value that "lies outside" (is much smaller or larger than) most of the other values in a set of data.

constant

A value that does not change

ordinal

A variable is at the ___ level of measurement if the values of the variable can be arranged in a ranked, or specific, order.

interval

A variable is at the ___ level of measurement the difference in the values of the variable have meaning.

probability model

A(n) ___ lists the possible outcomes of a probability experiment and each outcomes probability.

Influential observation

An observation is influential for a statistical calculation if removing it would markedly change the result of the calculation; points that are outliers in the x direction of a scatterplot are often influential for the least-squares regression line

the Law of Large Numbers

As the number of repetitions of a probability experiment increases, the proportion with which a certain outcome is observed gets closer to the probability of the outcome.

Is it simple? Tossing heads and rolling an even number

B = {H2, H4, H6} Event B has more than one outcome, so it is not simple.

binomial distribution conditions

BINS -Binary - only two outcomes, success or failure -Independent - the result of one trial does not have any effect on the result of any other trial -Number - there is a fixed number of observations "n" -Success - the probability of success is the same for each observations

How to find the critical region for one-sample t test

Because you are estimating the population mean you cannot use the values that are found in the unit normal table because you cannot say that you have a perfectly normal curve. Instead you use this new table

set number of observations

Binomial Probability Distribution - What does the N represent in the following? 2 N I S

Discuss the similarities and the differences between the Empirical Rule and​ Chebychev's Theorem

Both estimate proportions of the data contained within k standard deviations of the mean. The Empirical Rule assumes the distribution is approximately symmetric and​ bell-shaped and​ Chebychev's Theorem makes no assumptions.

After a hurricane​, a disaster area is divided into 200 equal grids. Forty of the grids are​ selected, and every occupied household in the grid is interviewed to help focus relief efforts on what residents require the most. What is the potential bias?

Certain grids may have been much more severely damaged than others. The grids that are selected may not be representative in terms of damage. Certain grids may have been much more severely damaged than others. Severely damaged grids may have fewer occupied households.

Suppose a survey of 505 women in the United States found that more than 55% are the primary investor in their household. Which part of the survey represents the descriptive branch of statistics? Make an inference based on the results of the survey.

Choose the best statement of the descriptive statistic in the problem. 55% of women in the sample are the primary investor in their household. Choose the best inference from the given information. There is an association between U.S. women and being the primary investor in their household.

What is the difference between class limits and class​ boundaries?

Class limits are the least and greatest numbers that can belong to the class. Class boundaries are the numbers that separate classes without forming gaps between them. For integer​ data, the corresponding class limits and class boundaries differ by 0.5.

Counterbalancing

Controlling order effects in a repeated measures experiment (receive treatments in differing orders than others)

Questioning students as they leave an athletic facility​, a researcher asks 363 students about their dating habits. What type of sampling is this?

Convenience sampling is used, because students are chosen due to convenience of location.

Limitations of regression and correlation: 2

Correlation and regression lines describe only linear relationship. You can calculate correlation and least-squares line for any relationship b/w 2 quantitative variables, but the results are only useful if the scatterplot shows a linear pattern (always plot your data!)

What are the two main branches of statistics?

Descriptive and inferential

Interval

Determine the level of measurement of the variable choose nominal, ordinal, interval or ratio The day of the month (the 0th day does not mean the absence of a day, and the 9th day is not twice as much as the 4th day)

Interval

Determine the level of measurement of the variable choose nominal,ordinal, interval or ratio The year of manufacture of a car (there is no meaning in doubling the year of the manufacture, if this had been the age of the car in years, then doubling does have meaning and it would be a ratio measure.)

Positively skewed

Distribution of scores on the right hand side of a data set

Leptokurtic distribution

Distribution with positive excess kurtosis that has more returns clustered around the mean and fatter tails. Heavy tailed

68

Empirical Rule: The distribution is roughly bell shaped. Approx ___% of the data lie within 1 standard deviation

A type II error occurs if one does not reject the null hypothesis when it is

False

The t-distribution must be used when the sample size is greater than 30 and the variable is normally or approximately normally distributed.

False

Distance and standard deviations

For an increase of one standard deviation (sx) in the value of the explanatory variable x, the least-squares regression line predicts an increase of r standard deviations (rxy) in the response variable y

Limitations of regression and correlation: 1

For regression, the distinction b/w explanatory and response variables is important. Least-squares regression makes distance of data points from line small only in y direction. If we reverse role of two variables we get a different line. This is not true for correlation.

What is a disadvantage of using a​ stem-and-leaf plot instead of a​ histogram?

Histograms easily organize data of all sizes where​ stem-and-leaf plots do not.

Strength (heteroscedasticity)

How scattered is the data (based on the oval) How close the points in a scatterplot lie to a simple form such as a line - Thin hot dog shape = strong - Football shape = moderate - Basketball shape = week - Fan out = differs for different values of explanatory variable *Correlation coefficient

how to make a histogram in statcrunch

How to make a histogram in StatCrunch: Open a new spreadsheet Enter data in a single column Graph - histogram. Select Column (var 1), enter start value and width, title on x-axis, compute. Save using the right click on the graph.

Which of the following represent dependent samples i. Life spans of pairs of sibling's ii. Life spans of randomly selected pairs of people iii. Life spans of pairs of mother and daughters.

I and III

experiment

If a researcher assigns the individulas in a study to a certain group, intentionally changes the value of the explantory variable, and then record the value of the reposne variable for each group, the researcher is conducting a designed ___.

What is different about the SS in independent sample t test versus Pearson r

In independent sample t test you are adding them together and in Pearson r you are multiplying them together

Which of the following is not true of the analysis of variance?

It has a higher rate of Type I error than the two-sample t-tests.

Specific values of variables

It is easiest to identify explanatory and response variables when we actually specify values of one variable to see how it affects another variable.

Why is a sample used more often than a population?

It is usually impossible to count the entire population

converting between probability & odds

Let E be an event. • If P(E)=ab, then the odds in favor of E are a to (b−a). • If the odds in favor of E are a to b, then P(E) = a / a+b.

The term condition matches which of the following ANOVA terms?

Level

Zero correlation

Means that there is no consistency at all, data points are scattered randomly with no clear trend. If you put a circle around all the data points on the scatter plot it will look like a circle.

Perfect correlation and prediction

Means that you can make predictions. Based on one data point you could make a perfect prediction about another dot on the scatter plot. If you were the top person in the first variable, then you would be the top person in the second variable (or the bottom if you are doing a negative correlation)

how to find correlation coefficient r, and explain what the value means

Means... then use words like association is positive strong linear

Properties of Sx

Measures spread. Only use when the mean is used as the measure of center. S=0 when all measurements are the same. Strongly affected by outliers.

Coefficient of determination (r2)

Measures the strength of the effect of a Pearson correlation. It is the proportion of the variability in the Y scores that can be predicted because of Y's relationship with X. A correlation of r=.80 or -.80 means that r2 is .65 of the variability in the Y scores can be predicted from the relationship with X. Careful about the wording, correlation is not causation so you cannot say caused but have to say predicted

would the mean or median be a more appropriate summary of the center of a distribution?

Median, because it is unaffected/resistant to any outliers, where the mean is.

Are larger variances desired?

No because they increase estimated standard error of M and therefore decrease likelihood of rejecting the null

The class levels of 31 students in a physics course are shown below. Find the​ mean, median, and mode of the​ data, if possible. If any measure cannot be​ found, explain why. ​Freshman: 6 ​Junior: 10 ​Sophomore: 12 ​Senior: 3

No median or mean. Data is nominal. Mode is sophomore. Typical data entry.

What levels of measurement can be qualitative?

Nominal, Ordinal

Which of the following statements about nonparametric procedures is false?

Nonparametric procedures do not assume that samples come from population distributions.

Shapes of Frequency Distributions

Normal Leptokurtic Platokurtic

Causation

Often we want to know whether changes in the explanatory variable causes a change in the response variable. Remember, correlation does NOT imply causation.

An experimenter studies the effect of type of music during practice on recall of word lists using rock background music for 20 subjects, classical music for 22 others, and country/western for another 25. What kind of experimental design is this?

One-way, between-subjects

independent or dependent? Selecting a king from a standard deck of 52 cards, not replacing it and then selecting a queen.

P(B) = 4/52 and P(B|A) = 4/51. Occurrence of A changes B so events are dependent.

How does t-statistic permit hypothesis testing?

Permits hypothesis testing in situations for which you do not have a known population mean to serve as the standard. All you need is the hypothesis and a sample from an unknown population. Can be used in situations where then null hypothesis is obtained from a theory, logical prediction or wishful thinking. (?)

The heights in inches right parenthesis of a sample of a species of tree two years after being planted are shown below. 25.6 22.6 25.5 23.3 22.4 21.6 25.7 25.5 24.3 Determine the level of measurement of the data set. Explain your reasoning.

Ratio. The data can be ordered and differences between data entries are​ meaningful, and a zero entry is an inherent zero.

what acronym do you use to describe a distribution

SOCS Shape-symmetric, skewed, bellshaped, more than one peak Outliers-any unusual observations Center-mean, median, mode Spread-range-large range or clustered-shows variation

Interpreting Cohen's d for independent sample t test

Same parameters; .2 is small effect, .5 is medium effect, .8 is large effect

Outline for APA formatting one sample t test

Sample group + dependent variable(M=, SD=), WAS OR WAS NOT significantly different/higher/lower than the population's average of (state the population average), t(# of degrees of freedom)=, p (> if not significantly different or ≤ if significantly different)= .05 or .01, ONE or TWO tailed, Cohen's d= (Instead of Cohen's d can report r2)

In​ 1965, researchers used random digit dialing to call 1200 people and ask what obstacles kept them from voting.

Simple random sampling was​ used, since each number had an equal chance of being​ dialed, so all samples of 1200 phone numbers had an equal chance of being selected.

List some ways that a graph can be Misleading:

Statistics: The only science that enables different experts using the same figures to draw different conclusions.—Evan Esar Statistics often gets a bad rap for having the ability to manipulate data to support any position. One method of distorting the truth is through graphics. We mentioned in Section 2.1 how visual displays send more powerful messages than raw data or even tables of data. Since graphics are so powerful, care must be taken in constructing graphics and in interpreting their messages. Graphics may mislead or deceive. We will call graphs misleading if they unintentionally create an incorrect impression. We consider graphs deceptive if they purposely create an incorrect impression. In either case, a reader's incorrect impression can have serious consequences. Therefore, it is important to be able to recognize misleading and deceptive graphs. The most common graphical misrepresentations of data involve the scale of the graph, an inconsistent scale, or a misplaced origin. Increments between tick marks should be constant, and scales for comparative graphs should be the same. Also, because readers usually assume that the baseline, or zero point, is at the bottom of the graph, a graph that begins at a higher or lower value can be misleading.

Regression line

Suppose that y is a response variable (plotted on the vertical axis) and x is an explanatory variable (plotted on the horizontal axis) A regression line relating y to x has an equation of the form: y (hat) = a + bx

How to generate a distribution between sample means

Take all the possible pairs of samples and subtract M1-M2; Usually the means will be close together if they are coming from the sample population. If treatment did not do anything the samples are really coming from the same population and if treatment did have an effect, then the population means are different and you would not expect an average difference of zero.

The ratio SSE/SST

Tells us what proportion of the total variation in y still remains after using the regression line to predict values of the response variable (interpret: ___ of the variation in __response variable___ is unaccounted for by the linear model relating y to x

Another meaning of correlation

The average of the products of the standardized scores

Interpret s in context

The average residual/prediction error for predicting the response variable is __ using the least squares line

How is a Pareto chart different from a standard vertical bar​ graph?

The bars are positioned in order of decreasing height with the tallest bar on the left

One vs. two tailed Pearson r SPSS

The computer does run one tailed tests for you and lists whether it has run a one tailed or a two tailed test which will be listed in parenthesis next to the sig value. You need to check this, because sometimes you will be given a one tailed test correlation matrix and need to calculate a two tailed test (multiply the sig value by 2 to find the correct value) or a two tailed test and you need to calculate a 1 tailed test (divide by 2 to find the correct sig value). You Never divide the correlation in half, regardless of if it is a one or a two tailed test, the correlation will stay the same, it is the sig value that changes between a one and a two tailed test.

Correlation (coefficient)

The correlation r measures the direction and strength of the linear relationship between two quantitative variables. - The correlation r is always a number b/w -1 and 1. - Correlation indicates the direction of a linear relationship by its sign: r > 0 for a positive association and r <0 for a negative association - Values of r near 0 indicate a very weak linear relationship. The strength of the linear relationship increases as r moves away from 0 toward -1 or 1. - The extreme values r = -1 and r = 1 occur only in the case of a perfect linear relationship, when the points lie exactly along a straight line

If a​ z-score is​ zero, which of the following must be​ true? Explain your reasoning. bullet The mean is zero. bullet The corresponding​ x-value is zero. bullet The corresponding​ x-value is equal to the mean.

The corresponding​ x-value is equal to the​ mean, because the​ z-score is equal to the difference between the​ x-value and the​ mean, divided by the standard deviation.

Correlation coefficient and determination coefficient

The determination coefficient is the correlation coefficient squared! There is a relationship between correlation and regression. When reporting regression, find r to note strength of linear relationship. When reporting correlation, find r2 to note how successful the regression was in explaining the response.

Cohen's d for independent sample t test

The difference between means over the pooled standard deviation.

Skewed left/skewed right

The distribution in Figure 15(c) is skewed right. Notice that the tail to the right of the peak is longer than the tail to the left of the peak. Finally, Figure 15(d) illustrates a distribution that is skewed left, because the tail to the left of the peak is longer than the tail to the right of the peak

What is the standard error of the difference?

The estimated standard deviation of the sampling distribution of differences between the means

Determine whether the events are independent or dependent. Explain your reasoning. Returning a rented movie after the due date and receiving a late fee

The events are dependent because the outcome of returning a rented movie after the due date affects the probability of the outcome of receiving a late fee.

A pharmaceutical company wants to test the effectiveness of a new allergy drug. The company identifies 250 females​ 30-35 years old who suffer from severe allergies. The subjects are randomly assigned into two groups. One group is given the new allergy drug and the other is given a placebo that looks exactly like the new allergy drug. After six​ months, the​ subjects' symptoms are studied and compared. a) Identify the experimental units and treatments used in this experiment.

The experimental units are theTh ​ 30- to​ 35-year-old females being given the treatment. The treatment is the new allergy drug.

Form/shape

The general shape of the graph Ex: linear relationships/curved relationships/outliers/clusters

Explain how the interquartile range of a data set can be used to identify outliers

The interquartile range​ (IQR) of a data set can be used to identify outliers because data values that are greater than Q3 + 1.5 ( IQR right) or less than Q1 -1.5 (IQR) are considered outliers.

When you calculate the number of permutations of n distinct objects taken r at a​ time, what are you​ counting?

The number of ordered arrangements of n objects taken r at a time.

Degrees of freedom

The number of scores that are independent and free to vary before all remaining scores are determined. The shape of the family of t-curves is linked directly to the degrees of freedom.

Percentile rank

The percentage of individuals in the distribution with scores at or below the particular value

Statified Sampling

The population is divided into​ subgroups, called​ strata, based on some​ characteristic, and then a random sample is taken from each stratum

Graphing the critical values on the null curve for t-statistic

The same as graphing the critical values in a z-test. You locate the critical region (on both sides for 2 tailed and 1 side for 1 tailed, and shade the tail beyond it). See example below.

For data at the interval​ level, you cannot calculate meaningful differences between data entries

The statement is false. A true statement is​ "For data at the interval​ level, you CAN calculate meaningful differences between data​ entries."

What is a t-distribution?

The t-distribution is an entire family of distributions with a single parameter called the "Degrees of freedom" (df)

Estimated standard error of M (sM)

The value that is used to estimate the real standard error σM when the value of σ is unknown. It is computed from the sample variance or sample standard deviation and provides an estimate of the standard distance between the sample mean (M) and the population mean μ

Normalcdf(left z, right z)

What is the calculator button(s) required to find the area if you know the z-scores?

Invnorm(left area)

What is the calculator button(s) required to find the z-score if you know the left area?

Q₃-Q₁

What is the formula for calculating the IQR?

A scientist claims that only 64% of geese in his area fly south for the winter. He tags 55 random geese in the simmer and finds that 17 of them do not fly south in the winter. If ∝ 0.05 is, the scientist belief warranted?

Yes because the test value 0.79 is in the noncritical region

response

___ bias exist when the answers on a survey do not reflect the true feelings/answers of the respondent.

A basket contains 18 ​eggs, 7 of which are cracked. If we randomly select 8 of the eggs for hard​ boiling, what is the probability of the following​ events? a. All of the cracked eggs are selected. b. None of the cracked eggs are selected. c. Two of the cracked eggs are selected.

a. nCr=n!/(n-r)!r! P(A) = # ways can occure/#simple events) = s/n 18C8 = 43758 7c7= 1 11c1= 11 1x11=11 11/43758 = .0003 b. 7C0=1 11c8= 165 1x165= 165 165/43758 c. 7C2=21 11c6 = 462 21x462=9702 9702/43758 =.2217

Ordinal

determine the level of measurement of the variable. choose nominal,ordinal, interval or ratio The medal received (gold,silver,bronze) by an Olympic gymnast. (medals are named but an order is implied)

Ratio

differences are meaning - but also ratios are meaningful & there is a true point zero (ie weight in pounds)

observation

each individual piece of data

In a two-way ANOVA, the main effect of a factor is the

effect of changing the levels of that factor on the dependent variable scores, ignoring all other factors in the study.

Classify the following statement as an example of classical​ probability, empirical​ probability, or subjective probability. Explain your reasoning. According to a​ survey, the probability that an adult chosen at random is in favor of police body cameras is about 0.43.

empirical the stated probability is calculated based on observations of adults' opinions.

Why are t-statistics more variable than z-scores

in a z-score distribution the bottom of the formula does not change form one sample to the next provided that all the samples are the same size and from the same distribution. In a t-statitstic the value of the sample variance changes from one sample to the next which makes the estimated standard error of M also change, making both the numerator and the denominator change from one sample to the next

proportional allocation

in stratified random sampling, when the strata should be sampled proportional to their size

Evaluate the given expression and express the result using the usual format for writing numbers​ (instead of scientific​ notation).

nCr= n!/(n-r)!r! = 43*42.../ (41*40...)2*1 = 43*42/2 =903

In order to conduct an​ experiment, 5 subjects are randomly selected from a group of 32 subjects. How many different groups of 5 subjects are​ possible?

nCr=N!/r!(n-r!) n=32 r= 5 = 32!/5!(32-5)! = 32x31x30x29x28/5! = 201376

Probability

nExpected relative frequency of a particular outcome

Example of how correlation does not equal causation

positive correlation between number of churches in an area and a crime rate; Cannot say that number of churches causes/influences crime;Third variable: high population density (number of people in an area more liable to have more crime)

Regression

process of using relationships to make a prediction

quantitative variable

provide numerical measures of individuals. Arithmetic operations such as addition and subtraction can be performed on the values and provide meaningful results.

How can you find the percentage of the result that is accounted for by treatment

r2 * 100

Critical value for independent sample t test

still use the t-table; df used is the combined degrees of freedom (df=n1+n2-2); in a one tailed test you need to figure out sign- if you are using < in the alternative then you have a negative critical value and if you are using > in the alternative then you have a positive critical value

Classify the following statement as an example of classical​ probability, empirical​ probability, or subjective probability. Explain your reasoning. An analyst feels that a certain​ stock's probability of increasing in price over the next month is 0.57.

subjective the stated probability is most likely based on intuition, an educated guess, or an estimate.

discrete random variable

takes a fixed set of possible values with gaps between. has a countable number of possible values

sample

that part of the population from which information is obtained

response bias

the interviewer can behave in such a way as to elicit a certain response, or those surveyed respond in a way they are expected to.

comparing mean and median

the mean and median of a roughly symmetric distribution are close together. If the distribution is exactly symmetric, the mean and median are exactly the same. In a skewed distribution, the mean is usually farther out in the long tail than is the median.

second quartile

the median of the entire data set

third quartile

the median of the part of the data set that lies at or above the median of the entire data set

which measure of center is appropriate for qualitative data?

the mode

When an experiment design has two factors but one factor involves related samples while the other factor involves independent samples, we should perform a

two-way mixed-design ANOVA

Sample mean

Determine whether the following statement is true or false. If it is​ false, explain why. The probability that event A or event B will occur is Upper P left parenthesis Upper A or Upper B right parenthesis equals Upper P left parenthesis Upper A right parenthesis plus Upper P left parenthesis Upper B right parenthesis minus Upper P left parenthesis Upper A or Upper B right parenthesis.

​False, the probability that A or B will occur is Upper P left parenthesis Upper A or Upper B right parenthesis equals Upper P left parenthesis Upper A right parenthesis plus Upper P left parenthesis Upper B right parenthesis minus Upper P left parenthesis Upper A and Upper B right parenthesis.

sample statistics

•Calculated characteristics of a sample •Calculated from known information

A right tailed test is used when H0: μ≥k

False

experiment

In terms of probability, a(n) ___ is any process with uncertain results that can be repeated.

A regression line can be used to show trends in data

True

mode median

Which two central tendencies are resistant?

parameter

A ___ is a numerical value based on a popultation

score

A particular person's value

The mode of the numbers 2, 5, 5, 6, 7, 8, 9 is: a) 7 b) 6 c) 5 d) 8

Answer = C - 5

SSE

Measures the sum of squared errors

geometric probability model

P(y = n) = (1-p)^n-1 (p)

formula for variance

•Find the deviation score for each score. •Subtract the mean from each score. •Find the squared deviation score for each score. •Square each of these deviation scores. •Find the sum of squared deviations. •Add up the squared deviation scores to get the sum of squared deviations. •Find the average of the squared deviations. •Divide the sum of squared deviations by the number of scores to get the average of the squared deviations. SD^2 •SD2 = ∑(X-M)2 N •SD2 = 66 10 •SD2 = 6.60

mean (expected value) of a discrete random variable

∑ xi pi

Why is it correct to say​ "a" normal distribution and​ "the" standard normal​ distribution? Describe the cases in which the different terms are used. Choose the correct answer below.

"The" standard normal distribution is used to describe one specific normal distribution left parenthesis mu equals 0 comma sigma equals 1 right parenthesis . ​"A" normal distribution is used to describe a normal distribution with any mean and standard deviation.

When graphed, a significant interaction effect produces two or more lines that

are not parallel

Confidence Interval

- A confidence interval is a type of interval estimate, computed from the statistics of the observed data, that might contain the true value of an unknown population parameter.

All symmetric confidence intervals based on the normal distribution have the same basic form:

(estimate) +/- (critical value) x (standard error of the estimate)

The Empirical Rule​

(or 68-95-99.7 ​Rule) indicates percentages of data that lie within​ one, two, and three standard deviations of the mean for data sets with distributions that are approximately symmetric and​ bell-shaped. Assumes approx symmetric and bell shaped.

Using the z table, find the critical value for an ∝=0.09 two tailed test

+ or - 1.69

Using the z-table, determine the critical values for a two tailed test when ∝=0.03

+ or - 2.17

Sampling Distribution (of the estimator/statistic)

- The distribution of the values of the estimates across all possible samples *of a given size*

Margin of Error (ME)

- The margin of error (ME) of a confidence interval (CI) is the amount we add or subtract from p-hat to get the interval

Sample Size Assumption

- The sample size n is considered "large enough" if the Success/Failure condition is satisfied. - The Success/Failure condition is that there are at least 10 successes and at least 10 failures in the sample.

If we INCREASE the sample size:

- The width of the confidence interval DECREASES

If p̂ is equal to 0.85 then q^ is equal to

0.15

Determine the number of outcomes in the event. Decide whether the event is a simple event or not. Upper A computer is used to select randomly a number between 1 and 9 comma inclusive. Event Upper C is selecting a number greater than 8.

1 yes exactly one

Steps for Relative frequency

1) Add the sample size/frequencies together. 2) Divide individual frequency by the total amount. This should be a decimal/percentage

Ways to measure effect

1) Cohen's d, 2) r2

How do you determine the critical region for pearson r

1) df=n-2; 2) use the correlation table, not the statistic table, 3) graph is truncated

Which of the following is a potential confounding factor in within-subjects designs, but not in a between-subjects design? a) regression to the mean b) sequence effects c) attrition d) history

Answer = B - sequence effects

converting odds to probability

1.Add the two numbers togetheri.e., for 1:3, 1+ 3 = 4 2.Divide this sum into number that was on the left i.e., 1/4 = .25

Steps for Making a Frequency Table

1.Make a list down the page of each possible value, from highest to lowest 2.Go one by one through the scores, making a mark for each next to its value on the list 3.Make a table showing how many times each value on your list is used→Be sure the highest value is at the top of the table and the lowest value at the bottom. ☺ 4.Figure the percentage of scores for each value

Your height and weight are examples of which level of measurement? a) Nominal b) Ordinal c) Qualitative d) Discrete e) Continuous

Answer = E - Continuous

steps for making extended frequency table

1.Make a list down the page of each possible value, from highest to lowest 2.Go one by one through the scores, making a mark for each next to its value on the list 3.Make a table showing how many times each value on your list is used→Be sure the highest value is at the top of the table and the lowest value at the bottom. ☺ 4.Figure the percentage of scores for each value

how to assess normality

1. the shape of the histogram should suggest normality 2. the box plot should suggest symmetry 3 the normal probability plot should be approximately linear 4. the mean and median should be approximately equal, but this doesn't always mean symmetry. 5. Most of the data should be within 3 standard deviations from the mean 6. about 68% of data should be within 1 standard deviation of the mean 7. the lower and upper quartiles should have z-scores near -.67 and .67

Cautions: describing distribution two variables is more complex than describing the distribution of one variable

1.) Correlation requires that both variables be quantitative so that it makes sense to do the arithmetic indicated by the formula for r. 2.) Correlation measures the strength of only the linear relationship between two variables. Correlation does not describe curved relationships between variables no matter how strong the relationship is. A correlation of 0 doesn't guarantee that there's no relationship between two variables, just that there's no linear relationship. 3.) Like the mean and standard deviation, the correlation is not resistant: r is strongly affected by a few outlying observations. Use r with caution when outliers appear in the scatterplot. 4.) Correlation is not a complete summary of two-variable data, even when the relationship b/w the variables is linear. You should give the means and standard deviations of both x and y along with the correlation.

Confidence level: 80%

1.28

A student looked up the number of years served by 35 of the more than 100 supreme court justices. The average number of years served by those 35 justices was 13.8. If the standard deviation of the entire population is 7.3 year, find the 95% confidence interval for the average number of years served by all supreme court justices.

11.4<μ<16.2

A medical researcher is interested in whether patients left arms or right arms are longer. If 14 patients participate in this study, how many degrees of freedom should the researcher use when finding the critical value for a t test?

13

If the equation for the regression line is y1 =7x-5 then the value of x=3 will result in a predicted value for y of

16

Determine the number of outcomes in the event. Decide whether the event is a simple event or not. You randomly select one card from a standard deck of 52 playing cards. Event Upper B is selecting a red four. nothing nothing

2 outcomes no more than one

Identify the sample space of the probability experiment and determine the number of outcomes in the sample space. Randomly choosing an even number between 1 and 10 comma inclusive

2,4,6,8,10 There are 5 outcomes

What is the critical value for a two tailed t test when ∝ =0.02 and n=19?

2.552

correlation r

measures the direction and strength of the linear relationship between two quantitative variables

An archaeology club has 29 members. How many different ways can the club select a​ president, vice​ president, treasurer, and​ secretary?

29!/(29-4)! = 29x28x27x26= 570024

A realtor uses a lock box to store the keys to a house that is for sale. The access code for the lock box consists of five digits. The first digit cannot be 3 and the last digit must be even. How many different codes are​ available? (Note that 0 is considered an even​ number.)

45000

A study of 55 apple trees showed that the average number of apples per tree was 625 the standard deviation of the population is 100. Which of the following is the 90% confidence interval for the mean number of apples per tree for all trees.

481<μ<569

The measure of central tendency, which by definition separates the lower 50% of a set of values from the upper 50% of the set, is the a) mode b) mean c) inter quartile range d) halfway between the first and third quartiles e) median

Answer = E - Median

Of the cartons produced by a​ company, 6​% have a​ puncture, 4​% have a smashed​ corner, and 0.9​% have both a puncture and a smashed corner. Find the probability that a randomly selected carton has a puncture or a smashed corner.

9.1%

expected value

= E(x) = µ = Σx ⋅ P(x) The expected value of a discrete random variable is equal to the mean of the random variable. Represents the break-even point in profit and loss analysis. Can be negative.

Problems with correlation calculation

A value of r close to 1 or -1 does not guarantee a linear relationship between two variables. A scatterplot with a clear curved form can have a correlation near 1 or -1. Always plot your data.

A summary measure calculated from population is called: a) a sample b) a statistic c) an inference d) a census e) a parameter

Answer = E - Parameter

Example APA write up Pearson r

A Pearson correlation revealed that hours of sleep and quiz performance were not significantly correlated, r=+.059, n=5, p>.05, one tailed, r2=.0035.

You have treadmill endurance test scores for 14 college students taken just before final exam period and a second set of scores for the same students obtained on the afternoon of the last day of finals. You are pretty sure that the populations these scores are sampled from are highly skewed, and so you plan to test for differences in physical endurance using a nonparametric procedure. Which one should you use?

A Wilcoxon T

double-blind

A ___ experiment is on in which neither the experimental unit nor the researcher in contact with the experimantal unit knows which treatment that experimental unit is receiving.

Pareto chart

A ___ is a bar graph where the bars are drawn in decreasing order of frequency or relative frequency.

histogram

A ___ is constructed by drawing rectangles of the same width for each class of data. The rectangles touch each other. And the height of each rectangle is the frequency or relative frequency of the classes.

What is the difference between a census and a​ sampling?

A census includes the entire population. A sampling includes only part of the population

control group

A control group serves as a baseline treatment that can be used to compare it to other treatments. For example, a researcher in education might want to determine if students who do their homework using an online homework system do better on an exam than those who do their homework from the text. The students doing the text homework might serve as the control group (since this is the currently accepted practice). The factor is the type of homework. There are two treatments: online homework and text homework.

convenience sample

A convenience sample is a sample in which the individuals are easily obtained and not based on randomness

Pearson r (Pearson correlation)

A correlation that measures the degree and direction of the linear relationship between two variables. The most common kind of correlation. For a sample identified by r, and for a corresponding correlation for the entire population identified by Greek letter rho (⍴).

Positive correlation

A correlation where the two variables tend to change in the same direction; as the value of the X variable increases form one individual to another, the Y variable also tends to increase; when the X variable decreases, the Y variable also decreases. Also known as a direct correlation. When the numbers on a graph go from the bottom left to the upper right

example

A drug company wanted to test a new depression medication. The researchers found 200 adults aged 25-35 and randomly assigned them to two groups. The first group received the new drug, while the second received a placebo. After one month of treatment, the percentage of each group whose depression symptoms decreased was recorded and compared. a) Identify the experimental units. the 200 adults aged 25-35 b) What is the treatment in this experiment? the new drug c) What is the response variable in this experiment? depression symptoms are measured d) How many levels does the treatment in this experiment have? 2 drugs and placebo e) What type of experimental design is this? (random, block, matched-pairs, or single-blind) random, single-blind

Environmental variables

A dynamic variable that can effect how the running processes will behave

Frequency Distribution, when is it skewed left?

A frequency distribution is skewed left when a tail of the graph elongates more to the left than to the right

Frequency Distribution, when is it skeweed right?

A frequency distribution is skewed right when its tail extends to the right instead of to the left.

frequency distribution

A frequency distribution lists each category of data and the number of occurrences for each category of data. Note

Y-intercept

A is the y-intercept, the predicted value of y when x = 0

Independent measures research design

A research design that uses a separate group of participants for each treatment condition (or each population). Also called a between subject's design.

Simple random

A sample in which every possible sample of the same size has the same chance of being selected from a population.

simple random sampling

A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. The sample is then called a simple random sample

Alternative Hypothesis (Ha)

A statement about he population parameter if there were something going on

Null Hypothesis (Ho)

A statement about what the population parameter would be if there was nothing special going on

discrete

A(n) ___ random variable has either a finite or countable number of values.

The following appear on a​ physician's intake form. Identify the level of measurement of the data. A)Family history of illness B)Happiness level scale of 0 to 10 C) Height D)Temperature

A) Nominal B)Ordinal C)Ratio D) Interval

examples of misleading data

A.We do not know if this study used a representative sample. B.The measure of central tendency should have been the mean instead of the median. C.There is not much difference between the medians. D.The statement implies all controls have more trust than all survivors. ØRequires equal-interval or ratio data

Which of the following is one of the assumptions we make when we do a one-way, between-subjects ANOVA?

All conditions of the single independent variable contain independent samples.

frequency distribution

All the values for a variable & the number score fit each value

Explanatory-response relationship

Always plot explanatory variable if there is one on horizontal axis (x axis) of the scatterplot. We usually call the explanatory variable x and the response variable y. If there is no explanatory-response distinction, either variable can go on the horizontal axis.

experiment

An ___ is a controlled study to determine the effect of varying one or more explanatory variables or factors has on a response variable.

observational

An ___ study measures the values of the reposne variable without attempting to influence the value of either the response or explanatory variables.

Association does not imply causation

An association between an explanatory variable x and a response variable y, even if it is very strong, is not by itself good evidence that changes in x actually cause changes in y. *Sometimes association is due to cause and effect but other times it is due to lurking variables

experiment,factors and treatment

An experiment is a controlled study conducted to determine the effect varying one or more explanatory variables or factors has on a response variable. Any combination of the values of the factors is called a treatment

Which of the following is true about single-subject designs? a) They are variations of within-subjects designs. b) They have poor internal validity. c) They have good internal and external validity. d) They have poor internal validity but good external validity.

Answer = They are variations of within-subjects designs. Notes: is this because they go into all treatments but there is only one participant?

Can correlation be used as a statistical test?

Answer = Yes No variation, don't know the IV or DV specifically, no cause and effect i.e. Height and weight

Slope

B is the slope; the amount by which y is predicted to change when x increases by one unit *Coefficient of x is always the slope no matter what symbol is used

geometric random variables

BITS -Binary - only two outcomes, success or failure -Independent - the result of one trial does not have any effect on the result of any other trial -Trials - the goal is to count the number of trials until the first success is observed -Success - the probability of success is the same for each observations

Which of the following would increase the power of a t-test?

Changing the sample size from N = 25 to N = 100

(1-1/k²)100%

Chebyshev's Thm: For any data set, no matter what shape or distribution At least ___ of the observations lie within k standard deviations of the mean, where k is any number greater than 1.

Classify the following statement as an example of classical​ probability, empirical​ probability, or subjective probability. Explain your reasoning. According to company​ records, the probability that a washing machine will need repairs during a nine​-year period is 0.19.

Classical​ (or theoretical) probability is used when each outcome in a sample space is equally likely to occur. Empirical​ (or statistical) probability is based on observations obtained from probability experiments. Subjective probability results from​ intuition, educated​ guesses, and estimates. the stated probability is calculated based on observations from the company records.

making a stem-and leaf plot

DONT FORGET A KEY

Caveat to notation of SM1-M2

Do not let the notation confuse you. Standard error is a statistic that is measuring the discrepancy that is reasonable to expect between the sample statistic and the corresponding population parameter. You are not subtracting the two means.

Formula for coefficient of determination (r2)

Do not need to use the r2 formula that you use in t tests, you can just square the r that you find using the correlation formula

Generalisability

Describes the extent to which research findings can be applied to settings other than their in which they were originally tested

cross-sectional

Determine what type of observational study is described. Choose retrospective, cross-sectional or cohort. Can money buy happiness? A researcher wanted to determine whether there was any association between economic status and happiness. She selected a sample of 1000 adults and interviewed them. Each person was asked about their financial situation and their level of happiness was evaluated. The researcher analyzed the results to determine whether there was an association between economic status and happiness. (happened at a specific point in time.)

Sample variance effect on effect size

Does influence, just like it does in hypothesis testing. High variance reduces the likelihood of rejecting the null hypothesis and reduces the measure of effect.

Direction/trend

Draw oval around data and find the slope of the major axis: negative slope means negative trend while positive slope means positive trend If relationship has a clear direction, we speak of positive association (high values of one variable tend to occur together) or negative association (high values of one variable tend to occur with low values of the other variable)

68 95 99.7

Empirical Rule: The distribution is roughly bell shaped. Approx ___% of the data lie within 1 standard deviation Approx ___% of the data lie within 2 standard deviations Approx ___% of the data lie within 3 standard deviations

You randomly select one card from a standard deck of 52 playing cards. Event C is selecting a club. nothing

Event C has 13 outcomes. (There are 13 clubs in a deck of cards) Not a simple event, more than one outcome

The accompanying table shows the numbers of male and female students in a particular country who received​ bachelor's degrees in business in a recent year. Complete parts​ (a) and​ (b) below.

Find the probability that a randomly selected student is male​, given that the student received a business degree: .515 business ans female: .159

Platykurtic distribution

Flatter and more spread out than a normal curve. (Memory: 'Plat' sounds like 'flat')

Mauricio Cruz a wine merchant for Cruz's Spirits Emporium, wants to determine if the average price of imported wine is less than the average price of domestic wine he obtained data shown in the table below. Imported wine x̄1=$7.03 S1 =$2.31 n=15 Domestic Wine x̄1=$9.78 S1 =$3.62 n=16 The null hypothesis is

H0: μ1= μ2

For conjecture "the average weight of a cuckoo bird is less than 2.2 pounds", the null and alternative hypothesis are

H0: the average weight of a cuckoo bird is equal to 2.2 pounds H1: the average weight of a cuckoo bird is less than 2.2 pounds

Perfect correlation

Identified by a correlation of 1.000 and indicates a perfectly consistent relationship. Each change in X is accompanies by a perfectly predictable change in Y. Falls on a perfectly straight line. Can be either positive or negative.

no mode

If no observation occurs more than once, we say the data have ___.

median

If the data is not normal, which central tendency best describes the data?

When two pink flowers ​(RW​) are​ crossed, there are four equally likely possible outcomes for the genetic makeup of the​ offspring: red ​(RR​), pink​(RW​), pink​(WR​), and white ​(WW​). If two pink snapdragons are​ crossed, what is the probability that the offspring will be​ (a) pink​, ​(b) red​, and​ (c) white​?

If two pink snapdragons are​ crossed, the probability that the offspring will be pink is: .5 red: .25 white: .25

Key Idea of Hypothesis testing

In a hypothesis test, we always assume that the null hypothesis is true, and then see how unusual the sample result would be. If it would be unusual, then we doubt the null hypothesis

experimental group and subject

In an experiment, the experimental unit is a person, object, or some other well-defined item upon which a treatment is applied. We often refer to the experimental unit as a subject when he or she is a person. The subject is analogous to the individual in a survey. The goal in an experiment is to determine the effect various treatments have on the response variable. For example, we might want to determine whether a new treatment is superior to an existing treatment (or no treatment at all). To make this determination, experiments require a control group.

What is the difference between an observational study and an​ experiment?

In an​ experiment, a treatment is applied to part of a population and responses are observed. In an observational​ study, a researcher measures characteristics of interest of a part of a population but does not change existing conditions.

Problems with regression lines

In most cases, no line passes exactly thru all the points in a scatterplot. Because we use the line to predict y from x, the prediction errors we make are errors in y, the vertical direction in the scatterplot. A good regression line makes vertical distances (residuals) of the points from the line as small as possible.

single-blind experiment & double-blind experiment

In single-blind experiments, the experimental unit (or subject) does not know which treatment he or she is receiving. In double-blind experiments, neither the experimental unit nor the researcher in contact with the experimental unit knows which treatment the experimental unit is receiving Note

Which of the following would not result in an increase in the power of ANOVA?

Increasing the variability of the scores within each condition

T-distribution shape

Like a normal distribution, the t distributions are bell shaped, symmetrical and have a mean of zero. However, t distributions are more variable than a normal distribution as indicated by the flatter and more spread out shape. The larger the value of df is, the more closely the t-distribution approximates a normal distribution

Linear relationships

Linear relationships are important because a straight line is a simple pattern that is quite common; a linear relationship is strong if the points lie close to a straight line and weak if they are widely scattered about a line.

SST

Measures the sum of the totals (total variation in the y values). It is a constant multiple of the variance.

Stratified

Members of a population are divided into two or more subsets, called strata, that share a similar characteristics. A sample is then randomly selected from each of the strata. Using a stratified sample ensures that each segment of the population is represented.

A basket contains 15 ​eggs, 4 of which are cracked. If we randomly select 9 of the eggs for hard​ boiling, what is the probability of the following​ events? a. All of the cracked eggs are selected. b. None of the cracked eggs are selected. c. Two of the cracked eggs are selected.

NOT THE CORRECT WORK FOR THIS PROBLEM!! USE AS GUIDE a. nCr=n!/(n-r)!r! P(A) = # ways can occure/#simple events) = s/n 15C9 = 4c4= 1 9c5= 126 1x126=126 126/11440 = .0110 b. 7C0=1 11c8= 165 1x165= 165 165/43758 c. 7C2=21 11c6 = 462 21x462=9702 9702/43758 =.2217

Can the denominator ever be negative in the t-formula for independent sample t test

No because it is not possible to have a SS less than zero. The only way to get a negative t value is if you have a negative value in the numerator

nonresponse bias

Nonresponse bias exists when individuals selected to be in the sample who do not respond to the survey have different opinions from those who do. Nonresponse can occur because individuals selected for the sample do not wish to respond or the interviewer was unable to contact them.

X = 0 as an extrapolation

Often using the regression line to make a prediction for x = 0 is an extrapolation. That's why the y-intercept isn't always statistically meaningful.

Overall pattern vs. departures

Once common method of data analysis is looking for an overall pattern and for striking departures from the pattern. A regression line describes the overall pattern of a linear relationship between an explanatory variable and a response variable. We see departures from this pattern by looking at the residuals.

Correlation of determination (r2) and prediction

One of the most common uses for correlation is prediction. If two variables are correlated, you can use the value of one variable to predict the other. Squared correlation measures the gain in accuracy that is obtained from using the correlation for prediction.

Choose the three formulas that can be used to describe complementary events. Select the three formulas that can be used to describe complementary events.

P(E)+P(E')=1 P(E)=1-P(E') P(E')=1-P(E)

Sixteen of the 100 digital video recorders​ (DVRs) in an inventory are known to be defective. What is the probability you randomly select a DVR that is not​ defective?

P(E)= #outcomes in event E/Total numer of outcomes in sample = .84

the probability that it takes more than n trials to see the first success

P(x>n) = (1-p)^n

"We failed to reject the null hypothesis"

P-value was large

"We rejected the null hypothesis"

P-value was small

Within subjects design

Participants are being tested against all independent variables

Which kind of estimation is performed when we claim that a population mean is equal to the sample mean?

Point estimation

stratified random sampling

Population is divided into relevant categories then sampled randomly within each category A form of probability sampling; a random sampling technique in which the researcher identifies particular demographic categories of interest and then randomly selects individuals within each category.

The estimated percent distribution of a certain​ country's population for 2025 is shown in the accompanying pie chart. Find the probability of each event listed in parts​ (a) through​ (d) below. read the % on the chart

Randomly selecting someone who is under 5 years old: 4.3% Randomly selecting someone who is 45 years old or over: 44.4% Randomly selecting someone who is not 65 years old or over: 80.6 Randomly selecting someone who is between 20 and 34 years old: 19

numeric (quantitative) variables

Rank order Equal interval variables Ratio variables

The masses (in grams) of a sample of a species of fish caught in waters of a region are shown below 25.3 25.2 23.4 25.4 24.3 27.7 23.5 23.4 22.5 Determine the level of measurement of the data set. Explain your reasoning

Ratio. The data can be ordered and differences between data entries are​ meaningful, and a zero entry is an inherent zero.

Family of t-curves (?)

Refers to the shape of the curve that has different proportions in it. The shape differs indirectly based on the sample size but directly on the degrees of freedom. There is a family of t-distributions, not just a single t-distribution. Each sampling distribution of t (distribution of all the possible t values) depends on the number of degrees of freedom.

What is the difference between relative frequency and cumulative​ frequency?

Relative frequency of a class is the percentage of the data that falls in that class, while cumulative frequency of a class is the sum of the frequencies of that class and all previous classes.

APA pearson r

Report should include sample size, calculated value for the correlation, whether it is a statistically significant relationship, the probability level, and the type of test used (one or two tailed).

APA formatting for independent sample t test

Report the descriptive statistics followed by the results of the hypothesis test and the measure of effect size (the inferential statistics); You report the mean and SD of each group, statement of significance, and the symbolic presentation of what happened; The formatting for the inferential statistics is the same as in the single sample t-test

Important things to look for when you examine a residual plot: #2

Residuals should be relatively small in size. A regression line that fits the data well should come "close" to most of the points. How do we decide whether residuals are small enough? We consider the size of a "typical prediction error.

What can an outlier do to the correlation?

Same three things as restricted range: 1) reverse the correlation (reverse the sign), 2) hide a correlation that actually exists, 3) make it look like there is a correlation when it actually does not exist

Holding constant (extraneous Variable)

Same time of day, same researcher, same room, same equipment, same lighting, same noise only participants with the same age, same gender, same education, same income etc Could this make the generalisability inaccurate? (Value judgement)

Skewed distribution

Scores of a data set taper off to one side (either negatively or positively skewed)

How to use correlation to determine reliability?

See if there is a strong correlation between two sets of data

Percentile

Specific point in a distribution of data that has a given percentage of cases below it.

Alternative hypothesis for one-sample t test

States that treatment did have an effect, that the population mean is changed.

Null hypothesis for one sample t-test

States that treatment had no effect, that the population mean is unchanged.

CLT NOTES

The CLT holds the sample mean of ANY quanititative variable from virtually ANY population - The CLT is arguably the MOST IMPORTANT RESULT IN STATISTICS

Notes on P-values

The P-value IS NOT the probability that the null hypothesis is true. It is the probability of getting a sample result as extreme or more extreme in the direction of the alternative hypothesis given that the null hypothesis is true If the p-value is small, this says that our sample result would be unusual if the null hypothesis were true. This tends to make us doubt the null hypothesis. The small the p-value, the more evidence there is against the null hypothesis and in favor of the alternative hypothesis We can NEVER prove the null hypothesis is wrong, because no matter all small the p-value is, it is always possible that the sample results just occurred by chance A large P-value does not "prove" the null hypothesis, it only means the data are not consistent with the null hypothesis P-values are probabilities, they are always between 0 and 1

mode

The ___ of a variable is the most frequent observation of the variable that occurs in the data set.

standard deviation

The ___ of a variable is the square root of the mean of the squared deviation about the population mean.

1

The area under the normal density curve is ___

Two successive lower class limits

The class width is the difference between A) The upper class limit and the lower class limit of a class B) The largest frequency and the smallest frequency C) Two successive lower class limits D) The high and the low data values

sampling with replacement and without replacement

The clients must be listed (the frame) and numbered from 01 to 30. Step 2 Five unique numbers will be randomly selected. The clients corresponding to the numbers are sent a survey. This process is called sampling without replacement. In a sample without replacement, an individual who is selected is removed from the population and cannot be chosen again. In a sample with replacement, a selected individual is placed back into the population and could be chosen a second time. We use sampling without replacement so that we don't select the same client twice.

Explain how the complement can be used to find the probability of getting at least one item of a particular type.

The complement of​ "at least​ one" is​ "none." So, the probability of getting at least one item is equal to 1minus​P(none of the​ items).

population

The entire group of individuals to be studied is called the ___.

Decide whether the events shown in the accompanying Venn diagram are mutually exclusive. Explain your reasoning.

The events are mutually​ exclusive, since there are no movies that are rated PG and are rated G.

Decide whether the events shown in the accompanying Venn diagram are mutually exclusive. Explain your reasoning. The events _______ mutually​ exclusive, since there _____________________ and _______________ presedential election

The events are not mutually​ exclusive, since there is at least 1 presidential candidate who lost the last pre dash election poll and lost the election

Hawethorne Effect

The extent to which subject change their behaviour simply because they know that that behaviour is being studied Habituation?

Are all outliers influential?

The least-squares line is most likely to be heavily influenced by observations that are outliers in x. Influential points often have small residuals because they pull the regression line toward themselves. The scatterplot alerts you of these (don't just plot residual plot b/c may miss influential points)

(xbar, y bar)

The least-squares regression line for any data set passes through the point (x bar, y bar)

What requirements are necessary for a normal probability distribution to be a standard normal probability​ distribution?

The mean and standard deviation have the values of mu equals 0 and sigma equals 1.

comparing representative values

The median is better than the mean or mode as a representative value

Determine whether the number describes a population parameter or a sample statistic. Explain your reasoning. ​Sixty-three of the 97 passengers aboard an airship survived an explosion.

The number is a population parameter because it is a numerical description of all of the passengers that survived.

The access code for a gym locker consists of three digits. Each digit can be any number from 0 through 7​, and each digit can be repeated. Complete parts​ (a) through​ (c). ​(a) Find the number of possible access codes. ​(b) What is the probability of randomly selecting the correct access code on the first​ try? ​(c) What is the probability of not selecting the correct access code on the first​ try?

The number of different codes available is: 512 The probability of randomly selecting the correct access code is: .002 not selecting: .998

Determine whether the following problem involves a permutation or a combination and explain your answer. How many different 7​-letter passwords can be formed from the letters Upper S​, Upper T​, Upper U​, Upper W​, Upper X​, Upper Y​, and Upper Z if no repetition of letters is​ allowed?

The problem involves a permutation because the order in which the letters are selected does matter.

Determine whether the random variable x is discrete or continuous. Explain. Let x represent the distance a baseball travels in the air after being hit.

The random variable is continuous​, because it has an uncountable number of possible outcomes.

Empirical Rule

The rules gives the approximate % of observations w/in 1 standard deviation (68%), 2 standard deviations (95%) and 3 standard deviations (99.7%) of the mean when the histogram is well approx. by a normal curve 68 - 95 - 99.7

If you are interested in how well students perform on a standardized math achievement test after they have completed a six-week math unit in either a computer-assisted class, a videotaped course, or a regular classroom, and you also want to include a factor for sex (boys vs. girls), what is the dependent variable?

The scores on the math achievement test

There is a close connection between correlation and the slope of the least-squares regression line

The slope equation says that along the regression line, a change of one standard deviation in x corresponds to a change of r standard deviations in y. When the variables are perfectly correlated (r = 1 or r = -1) the change in the predicted response is the same (in standard deviations units) as the change in x. Otherwise, because -1 ≤ r ≤ 1, the change in y hat is less than the change in x. As the correlation grows less strong, the prediction moves less in response to changes in x.

Explain the relationship between variance and standard deviation. Can either of these measures be​ negative? Explain.

The standard deviation is the positive square root of the variance. The standard deviation and variance can never be negative. Squared deviations can never be negative.

A​ student's IQ score is in the 91st percentile on an intelligence scale. Make an observation about the​ student's IQ score

The student has a higher IQ score thanTh ​ 91% of the students in the same age group

A​ student's score on an actuarial exam is in the 78th percentile. What can you conclude about the​ student's exam​ score?

The student scored higher than​ 78% of the students who took the actuarial exam.

Qualitative or quantitative? Species of fish in a lake?

The variable is qualitative because species are attributes or labels.

qualitative or quantitative. Distances between plants

The variable is quantitative because distances are numerical measurements.

Qualitative or quantitative? Favorite color

The variable is Th qualitative because color describes an attribute or characteristic.

What does the homogeneity of variance assumption state?

The variance in one population is equal to the variance in the other population

Determine whether the following events are mutually exclusive. Explain your reasoning. Event​ A: Randomly select a voter who legally voted for the President in South Carolina. Event​ B: Randomly select a voter who legally voted for the President in California.

These events are mutually exclusive, since it is not possible for a voter to both legally vote for a president in south carolina and have legally voted in California.

T/F A data set can have the same​ mean, median, and mode.

True

Equation for the least-squares regression line

We have data on the explanatory variable x and a response variable y for n individuals. From the data we calculate the means xbar and ybar and the standard deviations sx and sy of the two variables and their correlation r. The least-squares regression line is the line yhat = a + bx with slope b = r(sy/sx) with y intercept a = ybar - bxbar

S (standard deviation of the residuals)

We know the average prediction error (mean of residuals) is 0 when using a least-squares regression line since positive and negative residuals cancel. That's why we use standard deviation to find the approximate size of a "typical" or "average" prediction error (residual). If we use a least-squares line to predict the value s of a response variable y from an explanatory variable x, the standard deviation of the residual (s) is given by s = square root of (sum of residuals squares)/(n-2) s = square root of (sum of y - yhat)/(n-2)

binomial experiment

a probability experiment that satisfies these conditions. 1. Experiment has a fixed number of trails, where each trial is independent of other trials. 2. There are only tow possible outcomes of interest for each trial. Each out come is classified as success or failure. 3. The probability of a success is the same for reach trial. 4. The random variable x counts the number of successful trails.

Given a data​ set, how do you know whether to calculate sigma or​ s?

When given a data ​ set, one would have to determine if it represented the population or if it was a sample taken from the population. If the data are a​ population, then sigma is calculated. If the data are a​ sample, then s is calculated.

Could you have a large effect and still reject the null?

Yes, the largeness of the effect (d or r2) is measuring the number of SD of change, not how statistically likely it is. When you have small samples you could be saying that there is no significant difference but there could still be large SD of change since measures of effect are not affected by sample size (d not at all and r2 barely)

Are the following statements Ho:λ=9 and H1:λ<9 a valid pair of null and alternative hypothesis

Yes, the null hypothesis specifies an equality and alternative specifies a difference

undercoverage

___ occurs when the proportion of one segment of the population is lower in a sample than it is in the the population.

subjective

___ probabilities are based of personal judgement.

qualitative

___ variable classify individuals in a sample according to traits or attributes that are not numeric.

A standard deck of cards contains 52 cards. One card is selected from the deck. ​(a) Compute the probability of randomly selecting a jack or four. ​(b) Compute the probability of randomly selecting a jack or four or five. ​(c) Compute the probability of randomly selecting a nine or diamond.

a. .154 b. .231 c. .308

how to make a relative frequency table

divide values up into increments ( 0<5 , 5<10 ) and use a histogram to evaluate

stratified random sampling

dividing the population into groups based on some characteristic, then randomly selecting SOME SUBJECTS from ALL GROUPS some people, all groups - divide the population into different ethnic categories and randomly selecting subjects from each group

When reporting the correlation using SPSS need to be careful about repitition

do not want to state the correlation twice (there is a correlation between X and Y, and Y and X). You only state the correlation once

Why is pooled variance needed?

hen sample sizes are different, the two sample variances are not equally good and should not be treated equally. The law of large numbers indicates that the variance obtained from a larger sample is a more accurate estimate of the population variance than a variance that is obtained from a small sample.

mutually exclusive events

if no two of two or more events have outcomes in common

Fundamental Counting Principle

if one event can occur in m ways and a second event can occur in n ways, then the number of ways the two events can occur in sequence is m x n. This rule can be extended to any number of events occuring in sequence.

How to use scatter plot to determine the sign of the relationship

if the numbers go from bottom left to upper right it is positive, if the numbers go from upper right to bottom left it is negative

law of large numbers

if we observe more and more repetitions of any chance process, the proportion of ties that a specific outcome occurs approaches a single value. this value is called probability.

The logic behind computing a confidence interval is to compute the highest and lowest values of a _____ mean that are not significantly different from the _____.

population; the current sample mean

inferential statistics

procedures used to draw conclusions about larger populations from small samples of data

in stratified random sampling, the population is first divided into subpopulations called what?

strata

descriptive statistics

summarize and describe data

other term for r2

the percentage of variance accounted for by the treatment

How does the estimated standard error of the difference between means (SM1-M2) measure the average size of the difference between M1-M2 if the null hypothesis was not false

the population mean difference is zero. Therefore, the standard error of the differences between means is measuring how far the sample mean is from zero, thus measuring how large it is.

relative frequency of a class

the portion or percentage of the data that falls in that class. divide the frequency f by the same size.

conditional probability

the probability that event B happens given that A has happened. P(B | a)

conditional probability

the probability that event B occurs if the event A has already occured

What happens to predicting as the dots of a scatter plot get farther and farther away from a straight line (correlation coefficient getting closer and closer to zero)

your ability to predict drops. Based on one variable, you cannot tell exactly where you would be on the second variable.

standard deviation of difference of random variables.

σd = √(σ²x + σ²y)

standard deviation of the sum of random variables

σt = √(σ²x + σ²y)

Determine which numbers could not be used to represent the probability of an event.

​-1.5, because probability values cannot be less than 0. and 64/25​, because probability values cannot be greater than 1.

Determine whether the following statement is true or false. If it is​ false, rewrite it as a true statement. If two events are​ independent, ​P(A|B)equals​P(B).

​False; if events A and B are​ independent, then​ P(A and ​B)equals​P(A) times ​P(B).

Identify the sample space of the probability experiment and determine the number of outcomes in the sample space. Draw a tree diagram. Determining an athlete​'s sport ​(baseball left parenthesis Upper B right parenthesis comma soccer left parenthesis Upper S right parenthesis comma football left parenthesis Upper F right parenthesis​) and skill ​(low left parenthesis Upper L right parenthesis comma medium left parenthesis Upper M right parenthesis comma high left parenthesis Upper H right parenthesis​)

​{BL comma BM comma BH comma SL comma SM comma SH comma FL comma FM comma FH​} 9 outcomes B s f lmh lmh lmh

P-value

The probability level which forms basis for deciding if results are statistically significant (not due to chance).

Range of probabilities rule

The probability of an event E is between 0 and 1 0 ≤ P(E) ≤ 1

What does the notation​ P(B|A) mean?

The probability of event B​ occurring, given that event A has occurred

When an event is almost certain to​ happen, its complement will be an unusual event

True

systematic

A ___ sample is obtained by selecting every kth individual from the population.

The midpoint of a class is the sum of its lower and upper limits divided by two.

True. midpoint = (lower class limit) + (upper class limit) / 2

A two-way ANOVA contains

a main effect for each factor and an interaction

matching

putting together subjects with similiar traits as a control for such a lurking variable event though that variable is not part of the study.

Individual

a person or object that is a member of the population being studied

Frequency symmetry

Symmetrical Positively skewed Negativity skewed

probability distribution

a listing of the possible values and corresponding probabilities of a discrete random variable

Statistic

is a numerical measurement describing some characteristic of a sample

The goals scored per game by a soccer team represent the first quartile for all teams in a league. What can you conclude about the​ team's goals scored per​ game?

. The team scored fewer goals per game than​ 75% of the teams in the league.

CHAPTER 14 - Confidence Intervals for Means

...

CHAPTER 15: HYPOTHESIS TESTING

...

CHAPTER 16: MORE ABOUT TESTS AND INTEVALS

...

What is a small effect for r2

.01

Linear relationship requirement for correlation

when you look at the plot of data, it needs to basically fall in a straight line, it cannot be a curve.; There are other correlations that look at other shapes of correlation (like curved) but for the Pearson r, it has to be this

cluster

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) At a local technical school, five auto repair classes are randomly selected and all of the students from each class are interviewed. (all from each cluster (class) are interviewed)

Process of Statistics

1. Identify the research objective. A researcher must determine the question(s) he or she wants to be answered. The question(s) must clearly identify the population that is to be studied. 2. Collect the data needed to answer the question(s) posed in (1). Conducting research on an entire population is often difficult and expensive, so we typically look at a sample. This step is vital to the statistical process, because if the data are not collected correctly, the conclusions drawn are meaningless. Do not overlook the importance of appropriate data collection. We discuss this step in detail in Sections 1.2 through 1.6. 3. Describe the data. Descriptive statistics allow the researcher to obtain an overview of the data and can help determine the type of statistical methods the researcher should use. We discuss this step in detail in Chapters 2 through 4. 4. Perform inference. Apply the appropriate techniques to extend the results obtained from the sample to the population and report a level of reliability of the results. We discuss techniques for measuring reliability in Chapters 5 through 8 and inferential techniques in Chapters 9 through 15

lurking variable

A lurking variable is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables considered in the study

Sample

A relatively small proportion of people who are chosen in a survey so as to be representative of the whole.

stratified sample

A stratified sample is obtained by separating the population into nonoverlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous (or similar) in some way

ratio measurement

A value of zero does not mean the absence of the quantity. Arithmetic operations such as addition and subtraction can be performed on values of the variable. A variable is at the ratio level of measurement if it has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. A value of zero means the absence of the quantity. Arithmetic operations such as multiplication and division can be performed on the values of the variable Note

interval level of measurement

A variable is at the interval level of measurement if it has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning.

systematic

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) Every fifth adult entering an airport is checked for extra security screening. (every 5th)

observational study

An observational study measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. That is, in an observational study, the researcher observes the behavior of the individuals without trying to influence the outcome of the study. Observational studies do not allow a researcher to claim causation, only association.

Open questions

An open question allows the respondent to choose his or her response: A closed question requires the respondent to choose from a list of predetermined responses: What is the most important problem facing America's youth today? What is the most important problem facing America's youth today? (a) Drugs (b) Violence (c) Single-parent homes (d) Promiscuity (e) Peer pressure In closed questions, the possible responses should be rearranged because respondents are likely to choose early choices in a list rather than later choices. An open question should be phrased so that the responses are similar. (You don't want a wide variety of responses.) This allows for easy analysis of the responses.

The temperatures (in Celsius) of air samples taken simultaneously over a large city are shown below 16.3 19.7 18.9 18.7 20.9 17.5 18.3 19.8 18.4 Determine whether the data are qualitative or quantitative and identify the data​ set's level of measurement.

Are the data qualitative or​ quantitative? Quantitative What is the data​ set's level of​ measurement? Interval

case-control study

Case-control Studies These studies are retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records. In case-control studies, individuals who have a certain characteristic may be matched with those who do not. For example, we might match individuals who smoke with those who do not. When we say "match" individuals, we mean that we would like the individuals in the study to be as similar (homogeneous) as possible in terms of demographics and other variables that may affect the response variable. Once homogeneous groups are established, we would ask the individuals in each group how much they smoked over the past 25 years. The rate of lung cancer between the two groups would then be compared. A disadvantage to this type of study is that it requires individuals to recall information from the past. It also requires the individuals to be truthful in their responses. An advantage of case-control studies is that they can be done relatively quickly and inexpensively.

Closed questions

Closed questions limit the number of respondent choices and, therefore, the results are much easier to analyze. The limited choices, however, do not always include a respondent's desired choice. In that case, the respondent will have to choose a secondary answer or skip the question. Survey designers recommend conducting pretest surveys with open questions and then using the most popular answers as the choices on closed-question surveys. Another issue to consider in the closed-question design is the number of possible responses. The option "no opinion" should be omitted, because this option does not allow for meaningful analysis. The goal is to limit the number of choices in a closed question without forcing respondents to choose an option they do not prefer, which would make the survey have response bia

cohort

Cohort Studies A cohort study first identifies a group of individuals to participate in the study (the cohort). The cohort is then observed over a long period of time. During this period, characteristics about the individuals are recorded and some individuals will be exposed to certain factors (not intentionally) and others will not. At the end of the study the value of the response variable is recorded for the individual Typically, cohort studies require many individuals to participate over long periods of time. Because the data are collected over time, cohort studies are prospective. Another problem with cohort studies is that individuals tend to drop out due to the long time frame. This could lead to misleading results. That said, cohort studies are the most powerful of the observational studies. One of the largest cohort studies is the Framingham Heart Study. In this study, more than 10,000 individuals have been monitored since 1948. The study continues to this day, with the grandchildren of the original participants taking part in the study. This cohort study is responsible for many of the breakthroughs in understanding heart disease. Its cost is in excess of $10 million.

Confounding

Confounding in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study

cross-sectional study

Cross-sectional Studies These observational studies collect information about individuals at a specific point in time or over a very short period of time. For example, a researcher might want to assess the risk associated with smoking by looking at a group of people, determining how many are smokers, and comparing the rate of lung cancer of the smokers to the nonsmokers. An advantage of cross-sectional studies is that they are cheap and quick to do. However, they have limitations. For our lung cancer study, individuals might develop cancer after the data are collected, so our study will not give the full picture. Note

Nominal

Determine the level of measurement of the variable. choose nominal, ordinal, interval or ratio The musical instrument played by a music student. (instruments are named not numerical)

cohort study

Determine what type of observational study is described. Choose retrospective, cross-sectional or cohort. Researchers wanted to determine whether there was an association between city driving and stomach ulcers. They selected a sample of 900 young adults and followed them for a twenty-year period. At the start of the study none of the participants was suffering from a stomach ulcer. Each person kept track of the number of hours per week they spent driving in city traffic. At the end of the study each participant underwent tests to determine whether they were suffering from a stomach ulcer. The researchers analyzed the results to determine whether there was an association between city driving and stomach ulcers. (happened over a 20 year period.)

cross-sectional study

Determine what type of observational study is described. Choose retrospective, cross-sectional or cohort. Researchers wanted to determine whether there was an association between high blood pressure and the suppression of emotions. The researchers looked at 1800 adults enrolled in a Health Initiative Observational Study. Each person was interviewed and asked about their response to emotions. In particular they were asked whether their tendency was to express or to hold in anger and other emotions. The degree of suppression of emotions was rated on a scale of 1 to 10. Each person's blood pressure was also measured. The researchers analyzed the results to determine whether there was an association between high blood pressure and the suppression of emotions. (happened at a specific point in time.)

designed experiment

If a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, and then records the value of the response variable for each group, the study is a designed experiment.

Retrospective? case-control study

Determine what type of observational study is described. Choose retrospective, cross-sectional or cohort. Vitamin D is important for the metabolism of calcium and exposure to sunshine is an important source of vitamin D. A researcher wanted to determine whether osteoporosis was associated with a lack of exposure to sunshine. He selected a sample of 250 women with osteoporosis and an equal number of women without osteoporosis. The two groups were matched - in other words they were similar in terms of age, diet, occupation, and exercise levels. Histories on exposure to sunshine over the previous twenty years were obtained for all women. The total number of hours that each woman had been exposed to sunshine in the previous twenty years was estimated. The amount of exposure to sunshine was compared for the two groups. (subjects were asked to look back over the last 20 years and estimate.)

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. A statistic is a measure that describes a population characteristic.

False. A statistic is a measure that describes a sample characteristic.

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. More types of calculations can be performed with data at the nominal level than with data at the interval level.

False. More types of calculations can be performed with data at the interval level than with data at the nominal level. Your answer is correct.

random sampling example

For the results of a survey to be reliable, the characteristics of the individuals in the sample must be representative of the characteristics of the individuals in the population. The key to obtaining a sample representative of a population is to let chance or randomness play a role in dictating which individuals are in the sample, rather than convenience. If convenience is used to obtain a sample, the results of the survey are meaningless

A survey of 2003 third- to​ twelfth-grade students found that they devoted an average of 1 hour and 26 minutes per day to studying for exams. Identify the population and the sample.

Identify the population. The time spent per day on studying for exams by all​ third- through​ twelfth-graders Identify the sample The time spent per day on studying for exams by the 2003 third- through​ twelfth-graders sampled

simple random

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) A lobbyist for the oil industry assigns a number to each senator and then uses a computer to randomly generate ten numbers. The lobbyist contacts the senators corresponding to these numbers. (used random number technique)

stratified

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) A market researcher randomly selects 200 homeowners under 65 years of age and 200 homeowners over 65 years of age. (used some from each strata (age group)

systematic

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) A sample consists of every 30th worker from a group of 1000 workers. (every 30th)

convenience

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) A statistics student interviews everyone in his apartment building to determine who owns a cell phone.

Blinding

In an experiment, it is important that each group be treated the same way. It is also important that individuals do not adjust their behavior because of the treatment they are receiving. For this reason, many experiments use a technique called blinding. Blinding refers to nondisclosure of the treatment an experimental unit is receiving. There are two types of blinding: single blinding and double blinding

Observational studies are sometimes referred to as natural experiments. Explain what this means.

In an observational​ study, a researcher measures characteristics of interest of a part of a population but does not change existing conditions.

obtaining a cluster sample

Obtaining a Cluster Sample Problem A sociologist wants to gather data regarding household income within the city of Boston. Obtain a sample using cluster sampling. Approach The city of Boston can be set up so that each city block is a cluster. Once the city blocks have been identified, obtain a simple random sample of the city blocks and survey all households on the blocks selected. Solution Suppose there are 10,493 city blocks in Boston. First, the sociologist must number the blocks from 1 to 10,493. Suppose the sociologist has enough time and money to survey 20 clusters (city blocks). The sociologist should obtain a simple random sample of 20 numbers between 1 and 10,493 and survey all households from the clusters selected. Cluster sampling is a good choice in this example because it reduces the travel time to households that is likely to occur with both simple random sampling and stratified sampling. In addition, there is no need to obtain a frame of all the households with cluster sampling. The only frame needed is one that provides information regarding city blocks. •

Determine whether the value is a parameter or a statistic. The average age of men who have walked on the moon was 39 years, 11 months, 15 days

Parameter

Determine whether the data set is a population or a sample. Explain your reasoning. The age of each resident in an apartment building.

Population, because it is a collection of ages for all people in the apartment building.

difference between quantitative and qualitative variables

Problem Determine whether the following variables are qualitative or quantitative. (a) Gender (b) Temperature (c) Number of days during the past week that a college student studied (d) Zip code Approach Quantitative variables are numerical measures such that meaningful arithmetic operations can be performed on the values of the variable. Qualitative variables describe an attribute or characteristic of the individual that allows researchers to categorize the individual. Solution (a) Gender is a qualitative variable because it allows a researcher to categorize the individual as male or female. Notice that arithmetic operations cannot be performed on these attributes. (b) Temperature is a quantitative variable because it is numeric, and operations such as addition and subtraction provide meaningful results. For example, 70°F is 10°F warmer than 60°F. (c) Number of days during the past week that a college student studied is a quantitative variable because it is numeric, and operations such as addition and subtraction provide meaningful results. (d) Zip code is a qualitative variable because it categorizes a location. Notice that, even though zip codes are numeric, adding or subtracting zip codes does not provide meaningful results

Difference between discrete and continuous variables

Problem Determine whether the quantitative variables are discrete or continuous. (a) The number of heads obtained after flipping a coin five times. (b) The number of cars that arrive at a McDonald's drive-thru between 12:00 p.m. and 1:00 p.m. (c) The distance a 2014 Toyota Prius can travel in city driving conditions with a full tank of gas. Approach A variable is discrete if its value results from counting. A variable is continuous if its value is measured. Solution (a) The number of heads obtained by flipping a coin five times is a discrete variable because we can count the number of heads obtained. The possible values of this discrete variable are 0, 1, 2, 3, 4, 5. (b) The number of cars that arrive at a McDonald's drive-thru between 12:00 p.m. and 1:00 p.m. is a discrete variable because we find its value by counting the cars. The possible values of this discrete variable are 0, 1, 2, 3, 4, and so on. Notice that this number has no upper limit. (c) The distance traveled is a continuous variable because we measure the distance (miles, feet, inches, and so on). •

seed

Problem Find a simple random sample of five clients for the problem presented in Example 2. Approach The approach is similar to that given in Example 2. Step 1 Obtain the frame and assign the clients numbers from 01 to 30. Step 2 Randomly select five numbers using a random number generator. To do this, we must first set the seed. The seed is an initial point for the generator to start creating random numbers—like selecting the initial point in the table of random numbers. The seed can be any nonzero number. Statistical software such as StatCrunch, Minitab, or Excel can be used to generate random numbers, but we will use a TI-84 Plus C graphing Note

response bias

Response bias exists when the answers on a survey do not reflect the true feelings of the respondent. Response bias can occur in a number of ways.

Undercoverage

Sampling bias also results due to undercoverage, which occurs when the proportion of one segment of the population is lower in a sample than it is in the population. Undercoverage can result if the frame used to obtain the sample is incomplete or not representative of the population. Some frames, such as the list of all registered voters,

sampling bias

Sampling bias means that the technique used to obtain the sample's individuals tends to favor one part of the population over another. Any convenience sample has sampling bias because the individuals are not chosen through a random sample.

sampling error

Sampling error results from using a sample to estimate information about a population. This type of error occurs because a sample gives incomplete information about a population.

Steps in Systematic Sampling

Steps in Systematic Sampling 1. If possible, approximate the population size, N. 2. Determine the sample size desired, n. 3. Compute N n and round down to the nearest integer. This value is k. 4. Randomly select a number between 1 and k. Call this number p. 5. The sample will consist of the following individuals: p, p+k, p+2k,c, p+(n-1)k

characteristics of an experiment

The Characteristics of an Experiment Problem Lipitor is a cholesterol-lowering drug made by Pfizer. In the Collaborative Atorvastatin Diabetes Study (CARDS), the effect of Lipitor on cardiovascular disease was assessed in 2838 subjects, ages 40 to 75, with type 2 diabetes, without prior history of cardiovascular disease. In this placebo-controlled, double-blind experiment, subjects were randomly allocated to either Lipitor 10 mg daily (1428) or placebo (1410) and were followed for 4 years. The response variable was the occurrence of any major cardiovascular event. Lipitor significantly reduced the rate of major cardiovascular events (83 events in the Lipitor group versus 127 events in the placebo group). There were 61 deaths in the Lipitor group versus 82 deaths in the placebo group. (a) What does it mean for the experiment to be placebo-controlled? (b) What does it mean for the experiment to be double-blind? (c) What is the population for which this study applies? What is the sample? (d) What are the treatments? (e) What is the response variable? Is it qualitative or quantitative? Approach Apply the definitions just presented. Solution (a) The placebo is a medication that looks, smells, and tastes like Lipitor. The placebo control group serves as a baseline against which to compare the results from the group receiving Lipitor. The placebo is also used because people tend to behave differently when they are in a study. By having a placebo control group, the effect of this is neutralized. (b) Since the experiment is double-blind, the subjects, as well as the individual monitoring the subjects, do not know whether the subjects are receiving Lipitor ExamPlE 1 r the placebo. The experiment is double-blind so that the subjects receiving the medication do not behave differently from those receiving the placebo and so the individual monitoring the subjects does not treat those in the Lipitor group differently from those in the placebo group. (c) The population is individuals from 40 to 75 years of age with type 2 diabetes without a prior history of cardiovascular disease. The sample is the 2838 subjects in the study. (d) The treatments are 10 mg of Lipitor or a placebo daily. (e) The response variable is whether the subject had any major cardiovascular event, such as a stroke, or not. It is a qualitative variable.

voluntary response

The most popular of the many types of convenience samples are those in which the individuals in the sample are self-selected (the individuals themselves decide to participate in a survey). These are also called voluntary response samples. One example of self-selected sampling is phone-in polling; a radio personality will ask his or her listeners to phone the station to submit their opinions. Another example is the use of the Internet to conduct surveys. For example, a television news show will present a story regarding a certain topic and ask its viewers to "tell us what you think" by completing a questionnaire online or phoning in an opinion. Both of these samples are poor designs because the individuals who decide to be in the sample generally have strong opinions about the topic. A more typical individual in the population will not bother phoning or logging on to a computer to complete a survey. Any inference made regarding the population from this type of sample should be made with extreme caution. Convenience samples yield unreliable results because the individuals participating in the survey are not chosen using random sampling. Instead, the interviewer or

frame

The results of Example 1 leave one question unanswered: How do we select the individuals in a simple random sample? We could write the names of the individuals in the population on different sheets of paper and then select names from a hat. Often, however, the size of the population is so large that performing simple random sampling in this fashion is not practical. Instead, each individual in the population is assigned a unique number between 1 and N, where N is the size of the population. Then n distinct random numbers from this list are selected, where n represents the size of the sample. To number the individuals in the population, we need a frame—a list of all the individuals within the population. Note

Determine whether you would take a census or use a sampling to collect data for the study described below. If you would use a​ sampling, determine which sampling technique you would use. Explain. The average age of 105,000 online movie rental subscribers.

The study would use a sampling. The study would use simple random sampling because it would be easy for the company to randomly select a portion of its subscribers.

Determine whether the variable is qualitative or quantitative. Explain your reasoning. Favorite film

The variable is qualitative because a favorite film describes an attribute or characteristic.

Determine whether the variable is qualitative or quantitative. Explain your reasoning. Hair color

The variable is qualitative because color describes an attribute or characteristic.

design

To design an experiment means to describe the overall plan in conducting the experiment. Conducting an experiment requires a series of step

Variable

Variables are the characteristics of the individuals within the population. For example, recently, my son and I planted a tomato plant in our backyard. We collected information about the tomatoes harvested from the plant. The individuals we studied were the tomatoes. The variable that interested us was the weight of a tomato. My son noted that the tomatoes had different weights even though they came from the same plant. He discovered that variables such as weight may vary. If variables did not vary, they would be constants, and statistical inference would not be necessary. Think about it this way: If each tomato had the same weight, then knowing the weight of one tomato would allow us to determine the weights of all tomatoes. However, the weights of the tomatoes vary. One goal of research is to learn the causes of the variability.

Identify the sampling techniques​ used, and discuss potential sources of bias​ (if any). Explain. After a tsunami, a disaster area is divided into 150 equal grids. Thirty of the grids are​ selected, and every occupied household in the grid is interviewed to help focus relief efforts on what residents require the most.

What type of sampling is​ used? Cluster sampling is​ used, since the disaster area is divided into​ grids, and some of those grids are selected and everyone in those grids is interviewed. What potential sources of bias are​ present, if​ any? Certain grids may have been much more severely damaged than others. The grids that are selected may not be representative in terms of damage. Certain grids may have been much more severely damaged than others. Severely damaged grids may have fewer occupied households.

cluster sample

cluster sample is obtained by selecting all individuals within a randomly selected collection or group of individuals

Ratio

determine the level of measurement of the variable. choose nominal,ordinal,interval or ratio height of a tree (can have zero height, and an 8 ft tree is twice as tass as a 4ft tree)

Discreate Variables

is a quantitative variable that either has a finte number of possible values or a countable number of possible values.

Placebo

method for defining the control group is through the use of a placebo. A placebo is an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication.

Parameter

numerical summary of a population

A pharmaceutical company wants to test the effectiveness of a new allergy drug. The company identifies 250 females​ 30-35 years old who suffer from severe allergies. The subjects are randomly assigned into two groups. One group is given the new allergy drug and the other is given a placebo that looks exactly like the new allergy drug. After six​ months, the​ subjects' symptoms are studied and compared. Answer parts​ (a) through​ (c) below.

​(a) Identify the experimental units and treatments used in this experiment. The experimental units are the​ 30- to​ 35-year-old females being given the treatment. The treatment is the new allergy drug. (b) Identify a potential problem with the experiment design being used and suggest a way to improve it. There may be a bias on the part of the researcher if the researcher knows which patients were given the real drug. (c) How could this experiment be designed to be a​ double-blind? The study would be a​ double-blind study if both the researcher and the patient did not know which patient received the real drug or the placebo.


Conjuntos de estudio relacionados

Scratch Programming - Intro, Events, and Motion

View Set

Community Study Guide (Key Terms)

View Set

CHAPTER 11: Cardiovascular System Exam Review

View Set

PrepU: Chapter 22: Nursing Management of the Postpartum Woman at Risk

View Set

Asepsis and infection control (Final)

View Set