test #2 - stats and methods - ch. 3 & 4

अब Quizwiz के साथ अपने होमवर्क और परीक्षाओं को एस करें!

What is the real lower limit of the interval 60 - 69 in the table below? Please use 2 decimal places in your answer. 50 - 59 60 - 69 70 - 79 80 - 89

59.5

Suppose you wanted to construct a box plot of the following data. What is the first quartile? We will use this same data for the rest of the questions in this practice test. 50, 60, 75, 79, 80, 81, 82, 83, 84, 85, 87, 88, 92, 95, 100, 106, 120

80

The mode of the numbers 1 3 4 5 6 6 7 8 9 9 9 is

9

Sample

A portion of the population selected for a study

Real limits

Denoted as "60-64" actually has "real limits" of 59.5-64.5.

What is a major problem with the interquartile range?

It deletes so many observations that it eliminates much of the interesting variability in addition to the variability due to plan old extreme scores.

How would you describe the interquartile range with respect to trimmed samples?

It is a 25% trimmed sample

Sample size

Number of units in a sample

Relative Frequency and Cumulative Relative Frequency Distributions

RF (frequency divided by N) Can determine the fraction of scores at or below your score; CRF (cumulative frequency divided by N)

Median location

The location of the median in an ordered series

Mean, in plain arithmetic

The mean is calculated by summing all the scores in a data set, then dividing the sum by the total number of scores.

Independent

The one thing you change in the experiment; the variable you are observing and manipulating

Our ability to draw meaningful conclusions based on a sample statistic depends, in part, on the _____ of our sample.

Variability

Bimodal

When a distribution has two modes.

when is the median most useful?

When we don't want extreme scores to influence the result.

variance

a measure of variability based on the squared deviations of the data values about the mean

experiment

a process that generates well defined outcomes

sample point

an element of the sample space. a sample point represents an experimental outcome

outlier

an unusually small or large data value

Mode cannot

be calculated algebraically

mutually exclusive events

events that have no sample points in common

Probable limits are used to _____.

give us limits within which we have a specified probability (e.g., .95) that a randomly chosen observation will fall.

Skewness

measure of the degree to which a distribution is asymmetrical

If a store manager wanted to stock the men's clothing department with shirts fitting the most men, which measure of central tendency of men's shirt sizes should be employed?

mode

To get an accurate idea about the shape of a distribution:

relatively large samples of data are needed.

s (sample statistics)

standard deviation

σ (population parameter)

standard deviation

We are most likely to randomly pick which score from an actual data set?

the mode

point estimator

the sample statistic when used to estimate the corresponding population parameter

Upper real limits

60 + 0.5 = 60.5/ 64 + 0.5 = 64.5

Mean, a visual representation

A visual point that perfectly balances two sides of a distribution.

An outlier is

An unusually extreme score

Histogram

Bar graph; bars do touch [real and apparent limits]

Name three types of definitions of probability.

Frequentistic, analytic, and subjective

Given the following data, what is the median location? .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

Ml = (n+1) / 2, which equals (10 + 1) / 2 = 5.5

Mean

Sum of the observations divided by the number of observations = average. The average result of a test, survey, or experiment.

Dependent

The change that happens because of the independent variable

Range

The difference between the largest and smallest data value in a data set

What do we mean by density?

The height of the curve representing the distribution of events measures on a continuous scale.

Mode (Mo)

The most commonly occurring score

Why do we use trimmed samples?

To eliminate the influence of extreme scores.

why do we use trimmed samples?

To eliminate the influence of extreme scores.

Unimodal

When a distribution of scores has more than one mode.

leaf

horizontal axis of display containing the trailing digits

nothing

nothing

"u" is the

population mean

Xbar is the

sample mean

Given the following data, what is the quartile location? 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

QL = (ML + 1) / 2 = (8+1) / 2 = 9/2 = 4.5 but you drop the fraction so it is 4

Given the following data, what is the quartile location? .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

QL = (ML + 1) / 2, which equals (5 + 1) / 2, which equal 6/2 = 3 Note, we dropped the fraction in the ML, from 5.5 to 5

A major characteristic of a good graphic is _____.

Simplicity

Calculate the Mean

Forumula M= ΣX/N Step 1. Add up all the scores in a sample. Step 2. Divide the total of all the scores by the total number of scores.

Given the following data, what is the end of the whisker on the left? .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

From the 1st quartile, go .90 to the left, which is 1.13 - .90 = 0.23 There is not a 0.23 in the data so move to the right until you reach 0.95

Given the following data, what is the interquartile range? .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

IQR = 3rd quartile minus 1st quartile = 1.73 - 1.13 = .60

Bar Graph

Height of bar indicates frequency of occurrence; bars don't touch

What is the difference between an interval and a ratio scale?

In the latter we can meaningfully speak of the ratio between numbers (e.g., "twice as big"). But we will generally apply the same statistical procedures to both kinds of data.

The primary purpose of plotting data is to make them _____.

Interpretable

An advantage of the mean is

It can be manipulated algebraically

List three important things about a stem-and-leaf display.

It can be used to present both the shape of a distribution and the actual values of the scores; it can be used back-to-back to compare two related distributions; it can be adjusted to handle different sized values for the dependent variable

What is the 32nd percentile?

It is the point below which 32% of a distribution falls.

What is a risk ratio, also known as "relative risk"?

It is the ratio of one risk over another. Put differently, it is the probability of depression for females divided by the probability of depression for males.

What is special about a standard normal distribution?

Its mean is 0 and its standard deviation is 1.0.

Which measurement of Central Tendency to Use

Nominal Data -Mode Ordinal Data -Median response Interval or Ratio Data -Symmetrical Distribution (No outliers) -Mean Skewed Distribution (Outliers) -Median

Name the four common scales of measurement.

Nominal, ordinal, interval, ratio

Scale of Measurement of Scores

Nominal: Mode Ordinal: Mode Median Interval: Mode Median Mean Ratio: Mode Median Mean

Scales of measurement

Nominal; Ordinal; Interval; Ratio

SIQR

Not affected by outliers § Interquartile range - Q3-Q1 □ The medium of the first half of the distribution and the medium of the third half of that § Semi-interquartile range - interquartile range divided by 2 □ (Q3 - Q1)/2 Q1 25% upper real limit Q2 Medium Q3 75% upper real limit

Interval

Numbers have orders, but there are also equal intervals between adjacent values [temperature]

Measures of Variability

Numbers that indicate how much scores differ from each other and the measure of central tendency in a set of scores. -Range, Variance, Standard Deviation

Measures of Central Tendency

Numbers that represent the average or typical score obtained from measurements of a sample. -Indicate typical score obtained -Mean, Median, Mode

Measures of central tendency

Numerical values that refer to the center of the distribution

What is the general rule about what to do with parentheses in an equation?

Perform the operation within the parentheses before you perform the operation outside of the parentheses.

MAD

The mean absolute deviation is the mean (average) of the absolute value of the difference between the individual values in the dataset and the mean. The method tries to measure the average distances between the values in the dataset and the mean.

Give two advantages of the mean relative to the other measures.

The mean gives a more stable estimate of the central tendency of a population over repeated sampling. The mean can be used algebraically.

Give two advantages of the mean relative to the other measures.

The mean gives a more stable estimate of the central tendency of a population over repeated sampling. The mean can be used algebraically.

Mean, the arithmetic average

The mean is simple to calculate and a gateway to understanding statistical formulas.. It is an important concept in statistics with four ways to think about it: verbally, arithmetically, visually and symbolically (using statistical notation).

Properties of z-scores

The mean of a complete set of z-scores is 0 The standard deviation is 1 Converting a set of raw scores into z-scores will not change the shape of the distribution. You can use z-scores to find the location of one group with respect to all other groups of the same size

What is the 20% trimmed mean of the following distribution? Round your answer to the nearest 2 decimal places. X 10 13 14 17 19 22 24 24 26 27 27 27 28 29 30 37 47 77 88 100

The mean of the 20 numbers is 34.3 and the 20% trimmed mean is 26.67.

What are mutually exclusive events?

The occurrence of one event precludes the occurrence of the other. You have one outcome or the other, but not both.

What is the difference between odds and risks?

The odds are the number of occurrences of event divided by the number of occurrences of the other event. Risk is the number of occurrences of one event divided by the total number of occurrences of any event.

Mode

The value or values that occur most frequently in a data set

Why is the standard deviation a better measure than the variance when we are trying to describe data?

The variance is a measure presented in terms of squared units, whereas the standard deviation is presented in terms of the units of measurement themselves.

How data is measured in Central Tendency

The way data clusters around central tendency is measured in three different ways: mean, median and mode

tree diagram

a graphical representation that helps in visualizing a multiple step experiment.

box plot

a graphical summary of data based on a five number summary

A figure that plots various values of the dependent variable on the X axis and the frequencies on the Y axis is called____

a histogram or frequency distribution to some

multiplication law

a probability law used to compute the probability of the intersection of two events

addition law

a probability law used to keep law used to compute the probability of the union of two events

empirical rule

a rule that can be used to compute the percentage of data values that must be within one, two, and three standard deviations of the mean for data that exhibit a bell shaped distribution

Statistics

a set of concepts, rules, and procedures that help us to: o organize numerical information in the form of tables, graphs, and charts; o understand statistical techniques underlying decisions that affect our lives and well-being; and o make informed decisions.

Standard deviation

a statistic that tells you how tightly all the various examples are clustered around the mean in a set of data. When the examples are pretty tightly bunched together and the bell-shaped curve is steep, the standard deviation is small. When the examples are spread apart and the bell curve is relatively flat, that tells you that you have a relatively large standard deviation. About 68% of the data will fall within one standard deviation of the mean, 95% of the data will fall within two standard deviations of the mean and 99.7% of the data will fall within three standard deviations of the mean.

Sample

a subset of a population which is too large to measure; the representative of the population

When we refer to Xi we are referring to ____.

any specific value of the variable X

Given the following data, what is the median? .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

count over to the rigth, 5.5 and the ML is between 1.41 and 1.46. The mean of these is 1.485

The onset of eating disorders was shown to occur most often during puberty and during the late teen years in girls. A distribution of the frequencies of onset of eating disorders by age would most likely be:

bimodal.

Which of the following can be defined algebraically?

both mean and median location

A random sample is one in which ____.

every member of the population has an equal chance of being included.

Kurtosis

o - Like skewness, kurtosis has a specific mathematical definition, but generally it refers to how scores are concentrated in the center of the distribution, the upper and lower tails (ends), and the shoulders (between the center and tails) of a distribution.

In using ordinal data, which measure of central tendency is probably least useful?

mean

When the distribution is symmetric, which of the following are always equal?

mean and median

When the distribution is symmetric and unimodal, which of the following are always equal?

mean, median, and mode

The measure of central tendency that is most useful in estimating population characteristics because it is less variable from sample to sample is the:

mean.

Alison received a score of 480 on the verbal portion of her SAT. If she scored at the 50th percentile, her score represents the ________ of the distribution of all verbal SAT scores.

median

On a histogram, which always refers to the highest point on the distribution?

mode

Which of the following is useful with data collected with nominal scales?

mode

μ (population parameter)

population mean/ mean of population values

A major characteristic of a good graphic is___

simplicity

Which of the following is an advantage of the median?

relatively unaffected by extreme scores it does not depend on the assumption of interval or ratio level data

posterior probabilities

revised probabilities of events based on additional information

trailing digit

rightmost numeral of a number

X(with a '-' on the top of it) sample statistics

sample mean

advantages of mode

score that occurred; represents the largest number of people having the same score; probability that an observation drawn at random (Xi) will be equal to the mode is greater than the probability that it will be equal to any other specific score; applicable to nominal data;

Someone asks you if you have seen the movie Titanic. Before you answer, you look back into your memory for all of the movies you have ever seen and review the titles one at a time. This is an example of

sequential processing

exploratory data analysis (EDA)

set of techniques developed by Tukey for presenting data in visually meaningful ways

Nonparametric

not looking at the distributions of those parameters

modality

number of major peaks in a distribution

less significant digit

numeral to the right of the leading digit

Standard deviation

o - (s or ) is defined as the positive square root of the variance. The variance is a measure in squared units and has little meaning with respect to the data. Thus, the standard deviation is a measure of variability expressed in the same units as the data. The standard deviation is very much like a mean or an "average" of these deviations. In a normal (symmetric and mound-shaped) distribution, about two-thirds of the scores fall between +1 and -1 standard deviations from the mean and the standard deviation is approximately 1/4 of the range in small samples (N < 30) and 1/5 to 1/6 of the range in large samples (N > 100).

Symmetric

o - Distributions that have the same shape on both sides of the center are called symmetric. A symmetric distribution with only one peak is referred to as a normal distribution.

Measurement data

sometimes called quantitative data -- the result of using some instrument to measure something (e.g., test score, weight);

quartiles

the 25th,50th, and 75th, percentiles referred to as the first, the second, and the third quartile, respectively. the quartiles can be used to divide a data set into four parts with each containing approximately 25% or the data.

Mean, most common measure of Central Tendency

the arithmetic average of a group of scores. Often called the average and used to represent the typical score in a distribution as a precise calculation.

We refer to quantities like N - 1 as _____.

the degrees of freedom.

Range

the difference between the largest value and the smallest value in the data set; unreliable because it is only based on two scores in the dataset

Population

the entire group that you are interested in; usually large

complement of A

the event consisting of all sample points that are not in A

union of A and B

the event containing all sample points belonging to both A or B or both. the union is denoted by A U B

intersection of A and B

the event containing the sample points belonging to both A and B. the intersection is denoted as A ∩ B.

The "real lower limit" of an interval in a histogram is

the lowest continuous value that would be rounded up into that interval.

Which of the measures of central tendency are you most likely to see reported in the popular press?

the mean

which of the measures of central tendency are you most likely to see reported in the popular press?

the mean

give two advantages of the mean relative to the other measures

the mean gives a more stable estimate of the central tendency of a population over repeated sampling. the mean can be used algebraically.

Positive Skew

the mean is greater than the median Ceiling Effect - bounce the distribution at the top end (maximum score)

Negative Skew

the mean is less than the median Floor Effect - bounce the distribution at the lower end (minimum score)

weighted mean

the mean obtained by assigning each observation a weight that reflects its importance

If we were interested in studying salaries in the National Basketball Association, the least useful measure of the typical salary would be

the mean.

An advantage of the mode is

the mode can be used with nominal data

mode

the most common score; score obtained from the largest number of participants; the value of X, the dependent variable that corresponds to the highest point on the distribution; if two adjacent are most frequent can take the average;

independent events

two events that have no influence on each other

bimodal

two nonadjacent regions occur with equal frequency then the distribution is bimodal and report both modes

advantages of median

unaffected by extreme scores; useful in studies in which extreme scores occasionally occur, but have no particular significance; does not require any assumptions about the interval properties of the scale;

σ2 (population parameter)

variance - variance measures how far a set of numbers are spread out. A variance of zero indicates that all the values are identical.

stem

vertical axis of display containing the leading digits

When we make implicit assumptions about a scale having interval properties,

we are assuming the distance between 4 and 6 is the same as the distance between 6 and 8.

Measures of Shape

• - For distributions summarizing data from continuous measurement scales, statistics can be used to describe how the distribution rises and drops.

Measures of Center

• - Plotting data in a frequency distribution shows the general shape of the distribution and gives a general sense of how the numbers are bunched. Several statistics can be used to represent the "center" of the distribution. These statistics are commonly referred to as measures of central tendency.

Graphs

• - visual display of data used to present frequency distributions so that the shape of the distribution can easily be seen.

Variance

• subtract the mean from each of the values in the data set • square the result • add all of these squares • and divide by the number of values in the data set.

Stem-and-Leaf

Most information about individual scores

What is a good percentage to trim from a sample?

10% or 20% from each end

What is a good percentage to trim from a sample?

10% or 20% from each end.

Given the numbers 6,7,9,11,15,71,86, how many numbers fall below the median?

3

The real lower limit and the real upper limit of the interval 40-49 are:

39.5 and 49.5

For the following data set [1, 7, 9, 15, 33, 76, 103, 118], what is the median location?

4.5

For the following set of data , the mean is:

5

What is the mean of the following distribution? Round your answer to the nearest 2 decimal places. 2 4 6 7 8

5.4

What are the mean and standard deviation of T scores?

50 and 10

Given the following data, list the outliers. 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

50, 60, 100. These are the values that go beyond the ends of the whiskers.

"5s" represents what numbers on a stem-and-leaf display according to Tukey?

56-57

What is the median of the following distribution? Round your answer to the nearest 2 decimal places. 2 4 6 7 8

6

What is the median of the following distribution? What is the median of the following distribution? Round your answer to the nearest 2 decimal places. 2 4 6 6 6 10 20 25

6.17 -- RLL of 6 plus 2/3 5.5 + .67 = 6.17

What is the median of the following distribution? Round your answer to the nearest 2 decimal places. 2 4 6 7 8 10

6.5

Lower real limits

60 - 0.5 = 59.5/ 64 - 0.5 = 63.5

What does it mean to "sample with replacement?"

After we draw an observation we replace it before the next draw.

population parameter

a numerical value used as a summary measure for a population

sample statistic

a numerical value used as a summary measure for a sample

To oversimplify, random selection is useful to _______ while random assignment is useful to _____.

"assure that we can generalize to the population from which we sampled"; "assure that differences between groups are not due to extraneous variables"

Center of distribution

(1) Mode (2) Medium (3) Mean

Measure of dispersion

(1) Range (2) Mean Absolute Deviation (MAD) (3) Variance (4) Standard Deviation (5) SIQR - Semi-interquartile range

Shape of distribution

(1) Symmetric [one peak - unimodal] (2) Bimodal [two peaks] (3) Skew [skew left - more data to the right - left tailed] (4) Uniform

Dealing with Outliers

(1) Trimming - Winsorizing Trimmed mean - robust estimator of location (2) Data transformation - reduce the impact of outliers and make distribution more symmetric

Statistic

(Italic) M is a statistic, where numbers are based on a sample taken from a populations

Sum of Squares

(SS) A numerical value obtained by subtracting the mean of a distribution from each score in the distribution, squaring each difference, and then summing the differences.

Mean

(X) The sum of set of scores divided by the number of scores summed. - Arithmetic average -Most common measurement of central tendency -Influenced by extreme scores -Data should have interval properties; can not be used with nominal or ordinal data -Sample mean is the best estimator of population mean. -Can be manipulated algebraically. -Any change of a score in the distribution affects the mean

Parameter

(mew μ) A number based on the whole population; parameters are usually symbolized by Greek letters.

Three important things about stem and leaf displays:

*They can be used to present both the shape of a distribution and the actual values of the scores. *they can be used back to back to compare two related distributions * They can be adjusted to handle different sized values for the dependent variables.

Positively skewed

- A distribution is positively skewed when is has a tail extending out to the right (larger numbers) When a distribution is positively skewed, the mean is greater than the median reflecting the fact that the mean is sensitive to each score in the distribution and is subject to large shifts when the sample is small and contains extreme scores.

Negatively skewed

- A negatively skewed distribution has an extended tail pointing to the left (smaller numbers) and reflects bunching of numbers in the upper part of the distribution with fewer scores at the lower end of the measurement scale.

Mesokurtic

- A normal distribution is called mesokurtic. The tails of a mesokurtic distribution are neither too thin or too thick, and there are neither too many or too few scores in the center of the distribution.

Leptokurtic

- If you move scores from shoulders of a mesokurtic distribution into the center and tails of a distribution, the result is a peaked distribution with thick tails. This shape is referred to as leptokurtic.

Platykurtic

- Starting with a mesokurtic distribution and moving scores from both the center and tails into the shoulders, the distribution flattens out and is referred to as platykurtic.

Outliers

-An extreme score that is not typical of the rest of the distribution -It may be larger than the other numbers or smaller than the other numbers. -Distorts the mean To find an outlier -Organize your data -Look for extreme scores -If the mean and median differ by a large amount, you have an outlier

What is the real lower limit of the interval .60 - .69 in the table below? Please use 3 decimal places in your answer. .50 - .59 .60 - .69 .70 - .79 .80 - .89

0.595

A measurement instrument was used at Mercy Hospital in a sample of 175 patients. There were 35 true positives, 40 false positives, 10 false negatives and 90 true negatives. What is the specificity of the measurement tool?

0.69

A measurement instrument was used at Mercy Hospital in a sample of 175 patients. There were 35 true positives, 40 false positives, 10 false negatives and 90 true negatives. What is the sensitivity of the measurement tool?

0.78

What is the real upper limit of the interval .80 - .89 in the table below? Please use 3 decimal places in your answer. .50 - .59 .60 - .69 .70 - .79 .80 - .89

0.895

A measurement instrument was used at Mercy Hospital in a sample of 175 patients. There were 35 true positives, 40 false positives, 10 false negatives and 90 true negatives. What is the Prevalence of disease in the sample?

.26

A measurement instrument was used at Mercy Hospital in a sample of 175 patients. There were 35 true positives, 40 false positives, 10 false negatives and 90 true negatives. What is the positive predictive value of the measurement tool?

.47

A measurement instrument was used at Mercy Hospital in a sample of 175 patients. There were 35 true positives, 40 false positives, 10 false negatives and 90 true negatives. What is the efficiency of the screening instrument?

.71 -- Efficiency is all true tests divided by total sample. That is (125/175) x 100 = .71428 x 100 = 71.43

A measurement instrument was used at Mercy Hospital in a sample of 175 patients. There were 35 true positives, 40 false positives, 10 false negatives and 90 true negatives. What is the negative predictive value of the meaurement tool?

.9

Assume that you have a set of data with 70 values spread fairly evenly between 0 and 100. The optimal number of categories for a histogram of these data would be approximately:

10

What is the mode of the following distribution? Round your answer to the nearest 2 decimal places. 2 2 2 2 3 5 7 9 10 10 10

2

Calculate the 20% winsorized trimmed mean of the following distribution. Round your answer to the nearest 2 decimal places. x 10 13 14 17 19 22 24 24 26 27 27 27 28 29 30 37 47 77 88 100

27.2 replace these 4 and those 4 with the adjacent value

Suppose you wanted to construct a box plot of the following data. What is the end of the lower whisker? This is the end of the lower whisker, not the maximum lower whisker. We will use this same data for the rest of the questions in this practice test. 50, 60, 75, 79, 80, 81, 82, 83, 84, 85, 87, 88, 92, 95, 100, 106, 120

75

The mode of the numbers 1,3,4,5,6,6,7,8,9,9,9 is

9

correlation

A measure of the extent to which two factors vary together, and thus of how well either factor predicts the other. a reciprocal connection between two or more things

What are important ways that the field of statistics has changed over time?

A movement toward the meaningfulness of a result, a movement toward combining the results of multiple studies, and a movement away from hand calculation.

Normal distribution

A normal distribution of data means that most of the examples in a set of data are close to the "average," while relatively few examples tend to one extreme or the other

What does N(m, s2) represent?

A normal distribution with a mean of m and a variance of s2.

The standard normal distribution (z-score distribution)

A normal distribution with μ = 0 and σ = 1 The distribution you get when you transform all the scores from any normal distribution into z-scores Area under the curve equals 1 Area under the curve is the probability of an event

Random sample

A sample drawn in such a way that each element of the population has the same chance of being included in the sample

What is the difference between a statistic and a parameter?

A statistic refers to a measure (e.g., average) based on a sample of data and a parameter refers to that measure on a whole population of objects. 3. Inferential statistics are used to draw conclusions about a whole population (T or F).

What do we mean by an "unbiased" estimate of a parameter?

An unbiased estimate is one whose long-range average is equal to the parameter it is estimating.

Nominal

Assign numbers to objects where different numbers indicate different objects [1=male; 2=female]

Ordinal

Assign numbers to objects, but here the numbers also have meaningful order. But spaces between an adjacent value are not necessarily equal [class ranking, small, medium, large, IQ]

Normal Distribution

Can have different means or standard deviations All normal distributions the same shape This means that a z-score will fall in the same relative location for different distributions STANDARD normal distribution always has mean 0, SD 1

Apparent limits

Class intervals; the values denoting the interval as 60-64

Converting a raw score into a z-score (Linear Transformation)

Converting a set of raw scores into z scores will NOT change the shape of the distribution Ex: If I convert all the scores in a positively skewed distribution to z-scores, my resulting distribution will have a mean of 0, a standard deviation of 1, and it will be POSITIVELY SKEWED (the shape will not change)

Given the following data, what is the first quartile? .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

Count over 3 from the left (3 is the QL) and you get 1.13

Given the following data, what is the whisker end on the right? .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

Count over to the right, end of whisker plus max whisker length = 1073 + .90 = 2.63 There is not a 2.63 in the data so move to the left until you reach 2.03

Outliers

Data skewed by one or a few outliers that are extreme scores either very high or very low in comparison to other scores. When the outlier is omitted from the density data, the means becomes more representative of the actual scores in the sample.

What is the practical distinction between discrete and continuous variables? Howell, David C. (2016-02-22). Fundamental Statistics for the Behavioral Sciences (MindTap for Psychology) (Page 31). Wadsworth Publishing. Kindle Edition.

Discrete variables take on only a few different values, but continuous variables can take on any value between the lowest and highest score. Howell, David C. (2016-02-22). Fundamental Statistics for the Behavioral Sciences (MindTap for Psychology) (Page 31). Wadsworth Publishing. Kindle Edition.

Symmetrical unimodal

In a perfectly symmetrical unimodal distribution, mean, median, and mode are identical.

Given the following data, what is the median location? 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

ML = (n+1) / 2 = (15 + 1) / 2 = 16 / 2 = 8

Given the following data, what is the maximum length of the whisker? .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

Max length of whisker = 1.5 + IQR = 1.5 * .60 = .90

Given the following data, what is the end of the maximum whisker length? 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

Maximum whisker length is 1.5 * IQR = 1.5 * 8 = 12

median

Mdn; middle score in an ordered set or data; the 50th percentile; if even take average of the two middle scores; if inside an interval midpoint of the interval is median; not necessarily stable from sample to sample; does not require any assumptions about the interval properties of the scale

Which of the measures of central tendency are you most likely to see reported in the popular press?

Mean

Central Tendency of a skewed distribution

Mean Median Mode SIQR; find values most typical of a group

Median, the middle score

Median is the middle score of all the scores in a sample when the scores are arranged in ascending order. Step 1. Line up all the scores in ascending order. Step 2. Find the middle score. Calculate the mean of the two middle scores if even numbers.

Median

Middle score -The score that has an equal number of scores above and below it (the 50th percentile). -It cuts the distribution into two equal parts. 50% split of data. -Not affected by extreme scores (desirable for skewed distributions). -Can be used with ordinal and interval data, but not with nominal data. -Does not take into account all scores. -Not a stable measure of central tendency.

Mode

Most frequent score Finding the Mode -Put the data in order -Choose the most frequent occurring score in the data set UNIMODAL: distribution has only one mode. BIMODAL: distribution has two modes MULTIMODAL: distribution has more than 2 modes. -Mode may not appear in all data sets. -Data set may contain multiple modes. -Not a stable measure of central tendency. -Not affected by extreme scores. -Can be used with nominal, ordinal interval, or ratio data.

What is a Winsorized sample?

One in which the trimmed values are replaced by the largest and smallest values that remain.

Ratio

Order matters, intervals are equal and true zero point (absence of the property)

Given the following data, what is the interquartile range (IQR)? 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

Quartile on the right is 4 digits over = 85 Quartile on the left is 4 digits over = 77 88 - 77 = 8

In the ideal situation our sample should be a _____ sample from some population.

Random

Central Tendency

Refers to the descriptive statistic that best represents the center of a data set, the particular value that all the other data seem to be gathering around. p. 79 The central tendency is usually at the highest point in the histogram.

Mean, symbolic notation

Several symbols represent the mean: M or x-bar For samples from a population M is a statistic. For a population the Greek letter μ (mew) Sigma or Σ is the summation symbol. N is the total number of scores in a data set n is the number of sample scores

Standard deviation

Standard deviation is the square root of the variance ○ If a constant is added to (or subtracted from) every score in a distribution, the standard deviation will not be affected ○ If every score is multiplied (or divided) by a constant, the standard deviation will be multiplied (or divided) by that constant ○ The standard deviation from the mean will be smaller than the standard deviation from any other § Measure of dispersion/ measure of spread

Given the following data, what is end of the whisker on the left side of the distribution? 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

Start from the 1st quartile, which is 77 Subtract the maximum whisker length = 77 - 12 = 65. This is the maximum possible location of the whisker. There is not a 65 in the data so move to the right until you find the data that represents the end of the whisker, which is 73.

Given the following data, what is end of the whisker on the right side of the distribution? 50, 60, 73, 77, 80, 81, 82, 83, 84, 84, 84, 85, 88, 95, 100

Start from the 3rd quartile, which is 85 Add the maximum whisker length = 85 + 12 = 97. This is the maximum possible location of the whisker. There is not a 97 in the data so move to the left until you find the data that represents the end of the whisker, which is 95.

Descriptive Statistics

Statistical procedures used to summarize and describe the data from a sample. -Describe raw data with a single number -Way of capturing trends in data -Two Types of Descriptive Statistics

Why was Stevens more concerned about scales of measurement than we are?

Stevens was trying to make psychology respectable to people in the physical sciences who complained that the measurements we make are sloppy. We no longer have that same perspective and have learned to live with less precise measurement. But we continue to refer to such scales by name.

Three different terms for describing the shape of a distribution:

Symmetry, modality and skewness

List three different terms for describing the shape of a distribution.

Symmetry, modality, and skewness

Positively skewed distribution has a tail stretching out to the right

TRUE

On a recent fundraising drive, most of the 30 volunteers raised between $10 and $50 each. However, Brian and Karen each raised over $100. Which of the following is true

The amounts of money raised by Brian and Karen are outliers.

What is wrong with the average deviation from the mean?

The average deviation will always be 0.0.

Which measure of Central tendency is best?

The choice is usually between the mean and the median. The mean usually wins, but when distributions are skewed by outliers, the median may provide a better sense of a distributions central tendency.

Deviation

The difference of a score in a set of scores from the mean of that set of scores.

Given the following data, list the outliers. .95, 1.06, 1.13, 1.40, 1.41, 1.56, 1.63, 1.73, 1.73, 2.03

The ends of the whiskers are the highest and lowest numbers in the data. That means there are no outliers.

Sampling Distribution of the Mean

The mean of the sampling distribution of the mean is the same as the population mean The standard deviation of the sampling distribution of the mean: standard error of the mean < the standard deviation of the population distribution Extreme scores are more likely than extreme means, so the distribution of means will be less variable than the population As N increases, sample means are clustered more closely and the standard error gets smaller

Trimmed mean

The mean that results from trimming away (or discarding) a fixed percentage of the extreme observations

Suppose you wanted to construct a box plot of the following data. What would be the median? We will use this same data for the rest of the questions in this practice test. 50, 60, 75, 79, 80, 81, 82, 83, 84, 85, 87, 88, 92, 95, 100, 106, 120

The median location is 17 +1 divided by 2 equals 9 and the ninth number is 84

Median

The middle number or center value of a set of data in which all the data are arranged in sequence

Skewed Distribution

The mode is at the peak of the curve, mean is closest to the tail and median is positioned between the mode and the mean. The median is the best measure of central tendency for skewed distributions.

Mode

The mode is used in three situations: 1. When one particular score dominates a distribution. 2. When the distribution is bimodal or multimodal. 3. When data is nominal. When not sure which is best, report all three.

Mode, most common score

The most common scores of a sample from a frequency table, a histogram, or a frequency polygon. When there is more then one common score, and when scores have several decimal places it may be reported as a common interval i.e. 60-70.

Degrees of Freedom (df)

The number of scores that are free to vary (N-1) Example: Out of the five scores, the first 4 scores can be anything. The 5th one's determined.

What is the multiplicative rule?

The probability of one event followed by another is the probability of the first event times the probability of the second event, assuming that the events are independent.

What is the additive rule?

The probability of the occurrence of one or another of two mutually exclusive events is the sum of the two probabilities.

Give an example of a conditional probability.

The probability that it will snow given that the temperature is below 32 degrees is .20.

What do we mean by "standardization?"

The process of transforming a raw score to a scale with a specified mean and variance, usually 0 and 1.

Sampling Distribution vs. Population Distributions

The sampling distribution is normal if the population distribution is normal The sampling distribution will approach normal even if the population distribution is not normal (if N is large enough - Central Limit Theorem) Mean will be the same as population distribution, but the variability is less (smaller standard error) Must use the sampling distribution of the mean for groups

Median (Mdn)

The score corresponding to the point having 50% of the observations below it when the observations are arranged in numerical order

Mean (X-bar)

The sum of the scores divided by the number of scores

If the distribution of the ages of people were positively skewed, which of the following is most likely correct?

There are more young people than old people.

What is the "quartile location?"

They are the points that cut off the first and third quartiles.

How do we determine the values that will be the end of the whiskers in a boxplot?

They are the values that are no more than 1.5 times the interquartile range from the top and bottom of the box.

Why do we divide by N - 1 instead of N when we are computing the variance and the standard deviation?

This gives us an unbiased estimate of the population variance or standard deviation.

What is the independent variable?

This is the variable that we are trying to study, as opposed to the score that we obtain.

Why would we sample with replacement?

This keeps the probabilities constant over trials.

A positively skewed distribution has a tail stretching out to the right. (T or F)

True

Inferential statistics are used to draw conclusions about a whole population (T or F).

True

When we are engaged in drawing conclusions about a population we are using inferential statistics. (T or F)

True

List three things that partly determine the specific analysis that we will use to analyze a data set.

Type of data, number of groups or variables, differences versus relationships.

Shape of Distribution of Scores

Unimodal and perfectly symmetrical distribution Mean=Medain=Mode Skewed Distribution Mode> Median > Median negatively skewed Mode< Median< Mean positively skewed

Which of the following distributions can be symmetric?

Unimodal, normal, bimodal-all of these choices

A linear transformation is one in which _____.

We only multiply or divide by a constant and add or subtract a constant. It does not change the shape of the distribution in the slightest.

How do we signify conditional probabilities?

We place a vertical bar between the two events. For example, p(snow|below 32°).

The most important characteristics behind using different scales is to keep in mind the numbers themselves. (T or F)

What is important is the underlying variable that we hope we are measuring.

Multimodal

When a distribution of scores has more than two modes.

When is the median most useful?

When we don't want extreme scores to influence the result.

Discrete

a finite number of values and there are gaps between adjacent values [# of students] Ordinal + Nominal

mean

Xbar; the sum of the scores divided by the number of scores; or M; xbar=sigmaX/N (N=number of X values); Ybar or Xbar= mean of that variable; it is easily influenced by extreme scores; generally very stable from sample to sample

The "ordinate" is what we have previously called the _____ axis.

Y

Do the data from the Seeing Statistics example support what perceptual psychology would expect us to see?

Yes

What do we report when a distribution has two distinct, and non-adjacent, modes?

You should report both. Similarly, you should report the mode of non-zero scores if zero more appropriately means "non-applicable."

What do we report when a distribution has two distinct, and nonadjacent, modes?

You should report both. Similarly, you should report the mode of nonzero scores if zero more appropriately means "non-applicable."

event

a collection of sample points

mean

a measure of central location computed by summing the data values and dividing by the number of observations

median

a measure of central location provided by the value in the middle when the data are arranged in ascending order

correlation coefficient

a measure of linear association between two variables that takes on values between -1 and +1. Values near +1 indicate a strong positive linear relationship; and values near -1 indicate a strong negative relationship, values near zero indicate a lack of relationship

covariance

a measure of linear association between two variables. Positive values indicate a positive relationship; negative values indicate a negative relationship

mode

a measure of location, defined as the value that occurs with the greatest frequency

coefficient of variation

a measure of relative variability computed by dividing the standard deviation by the mean and multiplying by 100

Skewness

a measure of the shape of a data distribution. data skewed to the left result in negative skewness; a symmetric data distribution results in zero skewness; and data skewed to the right results in a positive skewness.

standard deviation

a measure of variability computed by taking the positive square root of the variance

interquartile range

a measure of variability, defined to be the difference between the third and first quartiles

range

a measure of variability, defined to be the largest value minus the smallest value

classical method

a method of assigning probabilities that is appropriate when all the experimental outcomes are equally likely

relative frequency method

a method of assigning probabilities that is appropriate when data are available to estimate the proportion of the time the experimental outcome will occur if the experiment is repeated a large number of times

subjective method

a method of assigning probability on the basis of judgement

Bayes theorem

a method used to compute posterior probabilities

probability

a numerical measure of the likelihood that an event will occur

Chebyshev's theorem

a theorem that can be used to make statements about the proportion of data values that must be within a specified number of standard deviations of the mean

z-score

a value computed by dividing the deviation about the mean by the standard deviation s. A z-score is referred to as a standardized value and denotes the number of standard deviations x, from the mean.

percentile

a value such that at least "p" percent of the observations are less than or equal this value and at least (100-p) percent of the observations are greater than or equal to this value. The 50th percentile is a median.

o Qualitative Variable

a variable based on categorical data.

Continuous Variable

a variable that can take on many different values, in theory, any value between the lowest and highest points on the measurement scale.

Independent Variable

a variable that is manipulated, measured, or selected by the researcher as an antecedent condition to an observed behavior. In a hypothesized cause-and-effect relationship, the independent variable is the cause and the dependent variable is the outcome or effect.

Dependent Variable

a variable that is not under the experimenter's control -- the data. It is the variable that is observed and measured in response to the independent variable.

Discrete Variable

a variable with a limited number of values (e.g., gender (male/female), college class (freshman/sophomore/junior/senior).

Mean

affected by outliers ○ If a constant is added to (or subtracted) from every score in a distribution, the mean is increased (or decreased) by that constant ○ If every score is multiplied (or divided) by a constant, the mean will be multiplied (or divided) by that constant ○ The sum of the deviations from the mean will always equal zero ○ The sum of the squared deviations from the mean will be less than the sum of squared deviations around any other point in the distribution Symbolized in population by μ ("mu") Sample: M or "X bar"

For the data set [1, 3, 3, 5, 5, 5, 7, 7, 9], the value "5" is:

all of these choices:mode, median, mean

Categorical data

also referred to as frequency or qualitative data. Things are grouped according to some common property(ies) and the number of members of the group are recorded (e.g., males/females, vehicle type).

five number summary

an exploratory data analysis technique that uses five numbers to summarize the data: smallest value, first quartile, median, third quartile, and largest value.

Continuous

an infinite number of values and there are no gaps between adjacent values [time] Ratio and Interval

diagram in which occurrence frequency of different values of X is represented by height

bar graph

a normal distribution must

be symmetric

A normal distribution must

be symmetric.

Which of the following is the least important characteristic of graphics?

beauty

real lower limit

boundary halfway between the bottom of one interval and the top of the next

real upper limit

boundary halfway between the top of one interval and the bottom of the next

The "real lower limit" of an interval in a histogram is

c. the lowest continuous value that would be rounded up into that interval.

advantages of mean

can be manipulated algebraically, can use in an equation and manipulate it through the normal rules of algebra; sample means that resulted would be more stable; the sample mean is in general a better estimate of the population mean than is the mode or the median

unimodal

characteristic of distribution having one distinct peak

symmetry

characteristic of having the same shape on both sides of the center

Frequency Polygon

connects the dots from the histogram

grouped data

data available in class intervals as summarized by a frequency distribution. Individual values of the original data are not available.

A scheme for distinguishing and choosing among statistical procedures is called a

decision tree.

disadvantages of mode

depends on how we group our data; may not represent entire collection of numbers especially when modal value is zero;

Descriptive Statistics

describe the data that you are looking at; for a sample and calculate inferential statistics to make inferences of the population

Histogram

diagram in which rectangles are used to represent recurrence of observations within each interval

line graph

diagram in which the Y values corresponding to different values of ? are connected

Mode

distinguish multimodal from unimodal distribution Unreliable, but the only measure of central tendency for nominal scales

relationships of mean, median, and mode

distribution is symmetric: mean =median symmetric and unimodal: all three are same

Sample form

divided by (n-1)

Population form

divided by n

disadvantages of median

does not enter readily into equations; not as stable from sample to sample as is the mean

Outliers are

extreme or unusual values.

Data

facts, observations, and information that come from investigations.

SPSS

for calculating central tendencies on a larger scale; analyze/descriptive statistics/explore to obtain the mean median and to produce a histogram and look for the most frequently appearing interval; for the histogram select graphs/legacy dialogs/ histogram;

stem-and-leaf display

graphic presenting original data arranged into a histogram

Correlational research

groups selected by the researcher; be cautious about making causal conclusions ○ The difference between the two - in correlational research, you cannot talk about causality

A negatively skewed distribution

has a tail pointing to the left

A negatively skewed distribution

has a tail pointing to the left.

A figure that plots various values of the dependent variable on the X axis and the frequencies on the Y axis is called a _____.

histogram—though some people also refer to it as a frequency distribution

disadvantages of mean

influenced by extreme scores; its value may not actually exist in the data, its interpretation in terms of the underlying variable being measured requires at least some faith in the interval properties of the data;

prior probabilities

initial estimates of the probabilities of events

The primary purpose of plotting data is to make them___

interpretable

Which of the following is not an advantage of the median?

it can be manipulated algebraically

leading digit

leftmost numeral of a number

Interquartile Range (IQR)

o - Provides a measure of the spread of the middle 50% of the scores. The IQR is defined as the 75th percentile - the 25th percentile. The interquartile range plays an important role in the graphical method known as the boxplot. The advantage of using the IQR is that it is easy to compute and extreme scores in the distribution have much less impact but its strength is also a weakness in that it suffers as a measure of variability because it discards too much data. Researchers want to study variability while eliminating scores that are likely to be accidents. The boxplot allows for this for this distinction and is an important tool for exploring data.

Skewness

o - Refers to the degree of asymmetry in a distribution. Asymmetry often reflects extreme scores in a distribution.

Variance

o - The variance is a measure based on the deviations of individual scores from the mean. As noted in the definition of the mean, however, simply summing the deviations will result in a value of 0. To get around this problem the variance is based on squared deviations of scores about the mean. When the deviations are squared, the rank order and relative distance of scores in the distribution is preserved while negative values are eliminated. Then to control for the number of subjects in the distribution, the sum of the squared deviations, (X - X), is divided by N (population) or by N - 1 (sample). The result is the average of the sum of the squared deviations and it is called the variance.

Histogram

o - a form of a bar graph used with interval or ratio-scaled data. Unlike the bar graph, bars in a histogram touch with the width of the bars defined by the upper and lower limits of the interval. The measurement scale is continuous, so the lower limit of any one interval is also the upper limit of the previous interval.

Scatterplot

o - a form of graph that presents information from a bivariate distribution. In a scatterplot, each subject in an experimental study is represented by a single point in two-dimensional space. The underlying scale of measurement for both variables is continuous (measurement data). This is one of the most useful techniques for gaining insight into the relationship between tw variables.

Bar graph

o - a form of graph that uses bars separated by an arbitrary amount of space to represent how often elements within a category occur. The higher the bar, the higher the frequency of occurrence. The underlying measurement scale is discrete (nominal or ordinal-scale data), not continuous.

Boxplot

o - a graphical representation of dispersions and extreme scores. Represented in this graphic are minimum, maximum, and quartile scores in the form of a box with "whiskers." The box includes the range of scores falling into the middle 50% of the distribution (Inter Quartile Range = 75th percentile - 25th percentile)and the whiskers are lines extended to the minimum and maximum scores in the distribution or to mathematically defined (+/-1.5*IQR) upper and lower fences.

Quantitative Variable

o - a variable based on quantitative data.

frequency distribution

occurrence in which dependent variable values are tables or plotted against their recurrence

Cumulative Percentage polygon

one can estimate the percentile rank by simply looking at the graph

Percentile

percentiles are upper real limits of a category. Thus, the 95th percentile is X=4.5, the upper real limit of the X=4 category.

Variable

property of an object or event that can take on different values. For example, college major is a variable that takes on values like mathematics, computer science, English, psychology, etc.

Parametric

statistical techniques that deal with looking at population parameters and their distributions of values

If the mean score of test #1 was 80.00 in section 01 with 20 students, 70.00 in section 02 with 15 students and 50.00 in section 03 with 40 students, what is the mean score of all students in all three sections? Round your answer to the nearest 2 decimal places.

sum of mean*n = 4650 sum of n = 75 4650 / 75 = 62.00 Below is how I answered the question with Excel. nmean n * mean Section 012080 1600 Section 021570 1050 Section 034050 2000 sums =75200 4650 62 Weighted mean = sum of (n * mean) divided by sum of n, which is 4650 / 75 in this example

What is the mean of the following frequency distribution? Round your answer to the nearest 2 decimal places. X f 2 5 3 6 4 4

sum of xf = 44 sum of f = 15 44 / 15 = 2.93

The notation "Sigma" refers to _____.

summation

The Cumulative Frequency Distribution

sums the frequencies at and below a particular value; determine the number of scores at or below your score

trimmed mean

take one or more of the largest and smallest values in the sample, set them aside, and take the mean of what remains; always discard the same percentage of the scores from each end of the distribution (if 10% trimmed then 10% from both sides);

Inferential Statistics

takes the numbers to make inferences to a big population

If you created a stem-and-leaf display of the math SAT scores of all entering students in a large Midwestern state university, the stem would best be:

the numbers 2 through 8.

Parameters

the numbers that we calculate or measure from a population; noted as Greek letter; characteristic of population example: population mean is a type of parameter

Percentile rank

the percentile rank of X=3.5 is 70% (note that 3.5 is the upper real limit of the category where X=3)

median location

the position in an ordered distribution occupied by the median; median location of N numbers= (N+1)/2

conditional probability

the probability of an event given that another event already occurred

joint probability

the probability of two events both occurring, that is, the probability of the intersection of two events

The endpoints of an interval are called ___

the real upper (and lower) limits

The endpoints of an interval are called _____.

the real upper (and lower) limits

Central linear theorm

the sampling distribution of the mean of any independent, random variable will be normal or nearly normal, if the sample size is large enough

Medium

the second quartile (divide the distribution into 4 parts); 50th percentile not affected by outliers because they do not rely on extreme scores

sample space

the set of all experimental outcomes

Cumulative Frequency Polygon

the shape of the graph is a common shape called ogire

marginal probability

the values in the margins of a joint probability table that provide the probability of each event separately

The optimal number of intervals for a histogram (and for a stem-and-leaf display) is _____.

whatever makes the figure show the most useful description of the data without creating too many or too few intervals

The optimal number of intervals for a histogram (and for a stem and leaf display) is____

whatever makes the figure show the most useful description of the data without creating too many or too few intervals.

In deciding on the number of stems to use in a stem and leaf display,

you should normally make all of the stems the same width.

A _____ represents the number of standard deviations above or below the mean.

z score

Skewed distribution

§ The means is pulled in the direction of the skewed, most misleading § Use the medium and SIQR as the central tendency


संबंधित स्टडी सेट्स

Module 4 - Build a Simple Network Quiz

View Set

Money Creation & Federal Reserve

View Set

Random board questions/equations

View Set

Chapter 16- Skinner: Behavioral Analysis

View Set