Stat 250 Exam #1
Confusing Questions
"How many times a week do you exercise?"
Random Sample
"Picking from a hat" 1. Each member of the population is equally likely to be chosen (probability) 2. The members of the sample are chosen independently of one another
Spread
"how far" apart are the data? Smallest - largest data point
We say that the design of a study is biased if which of the following is true? Certain outcomes are systematically favored The correlation is greater than 1 or less than -1 The results are different than we expected Random placebos have been used
A
White blood count of every 4th person entering an emergency room A. Sample B. Population
A
Mode
The value that occurs most often. One value that has the greatest frequency: UNIMODAL Two values: BIMODAL All data only occurs once: No Mode
Dependent/Response Variable
The variable that measures the effect of the value of the explanatory variable (the outcome): the dependent variable
Independent/Explanatory Variable
The variable to which the researcher assigns values in the study: the independent variable (factors)
Median (MD)
The ½ way point (equal areas) Step 1: Put data points in order Step 2: Count how many data points there are. Step 3: If n is odd the middle is the median If n is even take the average of the middle 2
Convenience Sample
Uses results that are readily available
Box & Whisker Plot
5 Number Summary •Low, Quartile 1, Median, Quartile 3, Highest •Modified (Always Use!) •Test for outliers (1.5 IQR) •Measure of Spread: InterQuartile Range (Q3-Q1)
Average salary for 35 of a pharmaceutical company's 1000 employees is $88,000 A. Statistic B. Parameter
A
During the past four years the average enrollment in Statistics at PSU was 269.5 per year. A. Descriptive B. Inferential
A
Identify the variable Blood Type (i.e. A, B, AB...) as: A. qualitative B. quantitative, discrete C. quantitative, continuous D. None of these
A
Identify the variable zip codes as: A. qualitative B. quantitative, discrete C. quantitative, continuous D. None of these
A
John waited in the rain outside Franco and spoke with every 5th student entering the building. What kind of sampling method is he using? Systematic Stratified Cluster D. Simple Random
A
The cholesterol levels of 20 patients in a hospital with 100 patients A. Sample B. Population
A
Non-Response Bias
When an individual chosen for the sample can't be contacted or does not want to be in the sample Example: In a phone interview, even with several call backs, often 30% or more are not reached!
Recall Ability
When asked to recall past events one will often underestimate the # of occurrences Example: "Have you visited a dentist in the last 6 months" - often will say yes who last visited 8 months ago
Frequency Polygon (Grouped)
X-axis is based on midpoints of each class A line connects the points based on frequencies Always connect the graph back to the x-axis
A researcher followed the diet and health habits of 500,000 Americans ages 50-71 over a 10 year period and found that those who ate the most red meat had about a 20% higher death rate from cancer and heart disease than those who consumed the least red meat. This is said to be A. An observational study B. An experiment with a control group C. An experiment without a control group D. A census
a
Identify the variable time it takes for a drug to be washed out of the body as: A. Qualitative Continuous B. Qualitative Discrete C. Quantitative Continuous D. Quantitative Discrete
a
If our quantitative data set had many repeated values and a small range we would most likely use A. Histogram B. Ogive C. Stem and Leaf Plot D. Pie Graph
a
In which case would a stratified random sample be preferred? A. A researcher wishes to examine the effect that a drug has on both males and females B. A researcher wishes to examine the effectiveness of a certain pregnancy test C. A researcher wishes to test the effects of alcohol on 21 year old males D. A researcher wishes to select various physicians and interview all patients of those physicians
a
It is not important to keep the width of each class the same in a frequency distribution. True False
a
The type of graph used to represent data is determined by the type of data collected and by the researcher's purpose. A. True B. False
a
Which of the following indicates how many times every value in a distribution appears? A. Frequency B. Relative Frequency C. Cumulative Frequency D. Relative Cumulative Frequency
a
lengths of fish (in inches) 8 9 9 9 10 10 11 11 12 12 12 12 13 13 13 14 14 15 15 16 24 The table above lists the lengths, to the nearest inch, of a random sample of 21 brown bullhead fish. The outlier measurement of 24 inches is an error. Of the mean, median, and range of the values listed, which will change the most if the 24-inch measurement is removed from the data? Range Mean Median Mode
a
Biased
a design of a study is biased if it systematically favors certain outcomes (skews all of the data points occurs if the study targets specific people who are going to answer some way based on how they were chosen.
Placebo
a dummy pill that looked and tasted like the aspirin but had no active ingredients
A graph that retains the actual data while showing it in graphical form is a: A. Pareto Chart B. Ogive C. Histogram D. Stem and Leaf Plot
d
A random sample of 10 patients in a certain hospital reported that the proportion of individuals with Blood Type A was 0.34. The true unknown proportion of patients with Blood Type A is a A. Sample B. Statistic C. Population D. Parameter
d
Understanding z scores
z scores have a mean of 0 and a standard deviation of 1. •A z score is the number of standard deviations a value is away from the mean for a specific distribution. Whenever a value is less than the mean, its corresponding z score is negative
Levels
many experiments have several factors but vary values (levels) of each factor
If skewed or have outliers do not use mean, use ____
median
Non-sampling Error
occurs when the data are obtained erroneously or the sample is biased
SRS
of size "n" consists of "n" individuals from the population chosen in such a way that every set of "n" individuals has an equal chance to be the sample actually selected. Each individual AND each sample has equal chance Example: We have ___ people in our class (the entire population). We want to determine the most common major of those students in Stat 250
class boundaries
one more decimal place then the original data
population
parameter
Exploring Data
patterns and departures from patterns
Sampling & Experimentation
planning and conducting a study
identify the variable of average yearly rainfall (in inches) for reading Pa as
quantitative continuous
frequency distribution
raw data organized into a table using classes and frequencies
Symmetric Distribution
right and left sides of the graph are approximately mirror images of each other. Uniform
Skewed Right/Positive Distribution
right side of the graph extends much further out than the left side
sample
statistics
Placebo Effect
subjects who know they are part of a study actually changed their behavior (not from the treatment) in ways that affected the results of the study (aka trust in the doctor)
lower class boundaries
subtracting ½ unit from the lower class limit of each class
Center
the "middle" of the data set Mode Median Mean
Mean
the arithmetic average of the data points (balance point). summation
Descriptive
the branch of statistics involving organizing, summarizing and displaying data
Inferential
the branch of statistics that involves using a sample to draw conclusions about a population
Accuracy
the center how closely a measurement or come to measuring a "true value," since measurements are always subject to error
Population
the collection of ALL possible outcomes, responses, or measurements of interest
class width
the difference between the lower and upper boundaries for the same class constant throughout the frequency distribution
Sampling Error
the difference between the results obtained from a sample and the results from the population
Population
the entire group of individuals who you want to know something about
Factor
the explanatory variables; independent
Treatment Group
the group that gets an active ingredient
Control Group
the group that gets no active ingredient/treatment
Sampling Frame
the list of possible subjects who could be selected in a sample (bias if not equal to the population) Example: 1936 Literary Digest Magazine conducted a huge sample of voters to determine if Roosevelt or Landon would win the presidential election.
Design
the method used to choose the sample from the population
z score
the number of standard deviations a value is away from the mean for a specific distribution
class limits
the same number of decimals places as the original data
Precision
the spread how closely repeated measurements come to duplicating measured values
standard deviation
the square root of the variance Sample: S Population: o
Response Variables
the study looked for heart attacks, several kinds of cancer, and other medical outcomes
Statistic
variable that describes a sample (changes & used to estimate parameter)
parameter
variable that describes the population (do not know actual value)
Confounding (Lurking) Variables
variables that influence the dependent variable but cant be separated from the independent variable
Shape
what the data set "looks" like
Distribution
what values the variable takes and how often it takes these values. "Pattern of variation" is the distribution
The Scientific Method
1. Collect Data 2. Analyze Data 3. Probability (Chance) 4. Inference (Conclusions)
3 Principles of Experimental Design
1. Comparison (Control) 2. Randomization 3. Replication
Finding Quartiles
1. Order the data from lowest to highest 2. Find the median or Q2 3. Q1 is the median of the data values less than Q2 4. Q3 is the median of the data values more than Q2
Statistics
A branch of mathematics dealing with the collection, analysis, interpretation, and presentation of masses of numerical data "the science of Data"
Sample
A certain number of people surveyed out of the entire population
Dot Plot(ungrouped)
A listing of each data value on the x-axis with a dot placed above each value for each frequency.
Cluster Sampling
A more economical procedure (logistical) Divide the population into similar groups. Takes an SRS of those groups and samples all individuals in that group Example: Urban Areas: City Blocks
Leading Questions
A question that is worded such that the response may not truly represent the respondents opinion. Example: How do Americans feel about government spending too much on...
What is the main distinction between a population parameter and sample statistic?
A sample statistic changes each time you try to measure it, but a population parameter remains fixed
Controlled Experiment
A study in which groups receive different treatments whose effects can be compared (often to the status quo)
Survey
A study in which the researcher gathers data by asking for responses from subjects Controled by design (variables and levels of variables are chosen) Random Selection of Sampling Units More useful but hard to establish cause-and-effect Example: U.S. Bureau of Labor Statistics (BLS) routinely conducts surveys. Monthly they conduct the Current Population Survey which established basic information on the labor force, employment and unemployment (Economics). Example: Decisions on which products to market, where to market them, and how to advertise them are often made on the basis of sample survey data (Marketing).
Experiment
A study in which the researcher imposes treatment(s) on the subjects (not just recording natural observations).
Designed Experiment
A study in which the researcher imposes treatments on the subjects Variables and levels are controlled Random assignments of treatments to experimental units Very useful and highest quality of data since Cause-and-Effect can be established Example: I want to know what type of plant feed will make bean plants grow the tallest (Horticulture).
Observational Study
A study in which the researcher observes behaviors of the subjects Future Data Controlled in the sense of what data (variables) are collected Example: I want to know if individuals who eat salad are more or less likely to drink soda while eating (Health Sciences)
Stem & Leaf Plot (ungrouped)
A vertically ordered list of the left part of the data digits (or stem) and the right most digit of the data digits (called the leaf) listed horizontally and sequentially to the right. Advantage: It retains actual data while showing it in graphic form Remember a Key!
Biostatistics
Application of statistics to a wide range of topics in biology. The science of biostatistics encompasses the design of biological experiments, especially in medicine & agriculture
Empirical Rule
Approximately 68% of the data values fall within one standard deviation of the mean. Approximately 95% of the data values fall within two standard deviations of the mean. Approximately 99.7% of the data values fall within three standard deviations of the mean.
Variance
Averaged squared deviation Sample: S^2 Population: o^2
62 of 97 people in a clinical trial had an adverse reaction to the prescription. A. Statistic B. Parameter
B
Age of each patient in a particular study A. Sample B. Population
B
Annual salary for each Biologist at a specific company A. Sample B. Population
B
Identify the variable number of patients seen by a physician in a single day as: A. qualitative B. quantitative, discrete C. quantitative, continuous D. None of these
B
In a recent year, the average MCAT score for all who took the test was 25.2 A. Statistic B. Parameter
B
PSU Berks can expect an enrollment of 270-290 students next year. A. Descriptive B. Inferential
B
PSU Berks will never have more than 295 students take a Statistics class. A. Descriptive B. Inferential
B
Sam used a random selection of six dorm students, 12 commuter students, and 3 off campus students. What kind of sampling method is he using? Simple Random Systematic Stratified Cluster
B
The enrollment in this course in 2013-2014 was low because the course was too difficult. A. Descriptive B. Inferential
B
Which of the following statements is correct regarding observational studies? In an observational study, a researcher can control but not observe the explanatory variables. In an observational study, a researcher can observe but not control the explanatory variables. In an observational study, a researcher can minimize but not eliminate the explanatory variables. In an observational study, a researcher can define but not observe the explanatory variables.
B
A new headache remedy is given to a group of 55 patients who suffer severe headaches. Of these, 40 report that the remedy is very helpful in treating their headaches. From this information you can correctly conclude The remedy is effective for the treatment of headaches Nothing, because the sample size is too small Nothing, because there is no control group for comparison The new treatment is better than Aspirin
C
Identify the variable length of new born babies as: A. qualitative B. quantitative, discrete C. quantitative, continuous D. None of these
C
1. Which of the following are true statements? I. In an experiment some treatment is intentionally forced on one group to not the response. II. In an observational study information is gathered on an already existing situation. III. Sample surveys are observational studies, not experiments. I and II I and III II and III I, II and III
D
Sally was doing a survey for BioStat 250. She spoke with everyone in the second floor of Ivy. What kind of sample method is she using? Simple Random Systematic Stratified Cluster
D
Two studies are run to compare the effects of two different weight loss programs. The first study interviews 25 people who have been using each of the programs for at least 3 months. The second study randomly assigns 25 people to each program and interviews them after 1 year. Which of the following is true? Both studies are observations studies because of the time period involved Both studies are observational studies because there are no control groups The first study is an experiment, while the second is an observational study The first study is an observational study, while the second is an experiment
D
Which of the following are true statements? I. If bias is present in a sampling procedure, it can be overcome by dramatically increasing the sample size. II. There is no such thing as a "bad sample" III. Sampling techniques that use probability techniques effectively eliminate bias. I only II only III only None of the statements are true
D
Which of the following is a false statement? Non-response can cause bias in surveys because non-respondents often tend to behave differently from people who respond Slight changes in the wording of questions can make a measurable difference to survey results. People will sometimes answer a question differently for different interviewers. Sophisticated statistical methods can always correct the results if the population you are sampling from is different from the population of interest, e.g. due to undercoverage.
D
Retrospective Studies
Data has already been recorded (vs. prospective) Uncontrolled Generally a lot of confounding Very limited usefulness Example: Body temperature for all patients entering a doctors office last year (Medical)
Interviewer Flaws
Demeanor and/or appearance of interviewer - Race or sex could influence responses - Attitude/presentation of interviewer Example: 1960's interview about integration. Four interviewers went into an affluent white neighborhood. White Business Man, White Scruffy Man, Black Scruffy Man, Black Business Man
Construct a Grouped Frequency Distribution
Determine Classes High/Low ▫Range ▫How many classes? ▫Width= Range/#classes (round up) ▫Find lower/upper class limits ▫Find class boundaries •Tally •Frequencies
Stratified Random Sample
Divides population into similar groups called "strata" and then chooses a SRS in each stratum and combines these SRS's for the full sample The data collector has prior knowledge and uses this to select the sample Proportional allocations (guarantees representation) Example: Divide class into 2 Strata Males and Females
Measure of position
Fractiles- divides ordered data into equal parts •Median divides into 2 equal parts •Quartiles- divides into 4 equal parts •Calculate •Quartile 1 (Low) •Quartile 2 (Median) •Quartile 3 (Upper) •Percentiles- divides into 100 equal parts
Ogives (Grouped)
Frist find cumulative frequencies X-axis is based on Class Boundaries Plot Cumulative frequencies and connect Always will be monotonically increasing
Compare grouped vs ungrouped standard deviations and means
Grouped standard deviations and means are approximation and don't provide the true mean or standard deviation. I grouped standard deviations and means are exact and accurate representations of these
Range (R)
Highest - Lowest
Histograms
Horizontal axis is a list of the group ranges Vertical axis displays the frequency for each group or class X-axis is based Uses Class Boundaries as cutoff points Choice of the number of classes or bins influences the final graph
Data
Information with a "CONTEXT"...coming from measurements or responses
Standard Deviation
Most Common Measure of Spread
InterQuartile Range/5 # Summary
Outliers are Any data value larger that Q3 + 1.5 (IQR) Any data value smaller than Q1 - 1.5 (IQR)
Voluntary Response Sample
People who choose themselves by responding to an appeal or an advertisement Example: Last month, Ann Landers asked the following question in her column. "If you had to do it over again, would you have children?"
Cumulative Relative frequency
Percentage of the data values in a class plus the percentage for all lower classes
Systematic Sampling
Population of interest is available in a list. Draw one data point from the beginning of a list and then select every nth data point thereafter Examples: Voter Registration Select every 3rd from a list of patients
Response Bias
Respondents may lie if asked something they are embarrassed to admit Example: stealing something or report weight
Potential Sources of Bias
Sampling Frame / Undercoverage Response Bias Non-Response Bias Interviewer Flaws
SOCS
Shape Outliers Center (Accuracy) Spread (Precision)
Undercoverage
Some group of the population is left out of the process when choosing the sample Example: a opinion poll conducted by phone will miss...Amish, College Kids, Homeless, Inmates, 50-60% homes without phones
Coefficient of Variation
Standard deviation divided by the mean times 100
Census
Studying ALL subjects of the population of interest
Sampling
Studying a part in order to gain information about the whole
Although a research study is typically conducted with a relative small group of _______ known as a _______, most researchers hope to generalize their results to the much larger group known as the _______.
Subjects sample population
Cumulative frequency
Tally or count of the number of data values in a class plus the frequencies for all lower classes
Frequency ( f )
Tally or count of the number of data values in each class
Relative frequency ( f/n ):
Tally or count of the number of data values in each class divided by the total number of data values
Randmization
The use of chance to divide experimental subjects into groups
Midrange (MR)
The sum of the lowest and highest divided by 2.
Probability Sample
a sample chosen by chance
Sample
a subset (part) of the whole population
When using a statistic to estimate a parameter, a measure of center is used to describe how _______ our statistic is, while a measure of spread is used to describe how _____ that estimate is.
accurate precise
upper class boundaries
adding ½ unit to the upper class limit of each class
Treatment
an experimental condition; something actually applied/given to the units/subjects
Outliers
an individual observation that falls outside of the overall pattern of the graph
Statistical Significance
an observed value so large that it would rarely occur by chance (variation so large that cant be by chance)
Inference
answer a specific question with a known degree of confidence
Discrete variables
assume values that can be counted.
. __________ statistics is the branch of statistics involving organizing, summarizing and displaying data. _____________ statistics is the branch of statistics that involves using sample ___________ to draw conclusions about ____________ parameters. A. Inferential, descriptive, populations, statistic B. Descriptive, inferential, statistics, population C. Inferential, descriptive, statistics, population D. Descriptive, inferential, population, statistics
b
A statistics professor gives a 100 point test, with the highest score being 98 and the lowest score being 71. We want to divide this data into categories. Then, a reasonable width of categories could be A) 1 B) 5 C) 10 D) Do not know
b
Below are side by side plots for a sample of 25 cities in the US. The average high temperature in January and average high temperature in July are illustrated in the plots. Which of the following statements about the box plots is true? A. the median of July is less than the median of January B. The cities in this sample have more similar July high temperatures than January high temperatures. C. January high is strongly skewed left, where as July is approximately symmetrical D. Both January and July include clear outliers
b
Data such as blood types (A, B, AB and O) can be organized into a ______________ frequency distribution A. Grouped B. Categorical C. Ungrouped D. Not Sure
b
The purpose of stratified random sampling is to make certain that: A. every member of the population has an equal chance o being selected for the sample B. the sample proportionately represents individuals from different categories of the population C. The participants chosen for the study are the ones most likely to react to the treatment. D. The sample is more representative of the target population than the accessible population
b
The two dot plots below display frequency distributions of the height of a random selection of men and women. Which statement best describes the standard deviation of the two data sets. A. Men and women have very similar standard deviations B. Men have a larger standard deviation than women C. Women have a larger standard deviation than men D. Based on just looking at the graphs, we are unable to tell which gender has a larger standard deviation
b
Using the frequency distribution given below, which value will have the shortest rectangle if a histogram were created from the given data set? A) 5 Stops B) 10 Stops C) 15 Stops D) 20 Stops
b
We wish to draw a sample of size 5 without replacement from a population of 83 patients. Suppose the patients are numbered "01", "02"..."83". Suppose that the relevant line of the random number table is: 11369 25622 36237 90842 46843 62719 64049 17823 The 5 patients that would be selected are: A. 11, 13, 25, 62, 36 B. 11, 36, 56, 22, 23 C. 11, 36, 92, 56, 22 D. 11, 36, 56, 22, 36
b
What does the vertical scale on a histogram represent?? A) It represents the possible values of the variable. B) It represents the frequencies for the corresponding value or range of values of the variable. C) It represents the title. D) There is no vertical scale on a histogram.
b
What graph should be used to show the relationship between the parts the whole? A. Histogram B. Pie/Circle Chart C. Pareto Chart D. Ogive
b
Which of the following is the MOST accurate definition of standard deviation A. the average of a set of scores B. the typical amount by which scores differ from the mean of a set of scores C. the average of the sum of squared deviations from the mean D. the mean absolute deviation of the sum of the squared deviation from the average
b
Which of the following statements is TRUE? A. Non-response can never cause bias in surveys because non-respondents often tend to behave similarly to the people who do respond B. Sophisticated statistical methods cannot correct the results if the population you are sampling from is different from the population of interest, e.g. due to undercoverage. C. Slight changes in the wording of questions can never make a difference in survey results. D. As long as interviewers ask the same question, it is not necessary for them to behave/dress in the same manner as people will never answer a question differently for different interviewers.
b
You record the height, weight, gender, blood type and body temperature of 100 subjects in a clinical trial. The number of qualitative variables that you have recorded is A. 100 B. Two- gender, blood type C. Three- height, weight, body temperature D. Four- height, weight, blood type and body temperature
b
the five-number summary for scores on a statistics exam is 16, 34, 56, 76, 99. In all, 100 students too the test. About how many students had scores between 56 and 99? A. 25 B. 50 C. 75 D. 100
b
. In measuring the spread of the data from a skewed distribution, the Inner Quartile Range (IQR) would be preferred over the standard deviation for most purposes because: A. The IQR is the most frequent number while the standard deviation is most likely B. The standard deviation measures the center of the data C. The standard deviation may be too heavily influenced by the larger observations and this gives too high an indication of the spread D. The IQR is less than the standard deviation and smaller numbers are always appropriate for the spread
c
. Which of the following is true regarding standard deviations? A. The numbers 6, 6, and 7 have a standard deviation of 0 B. The standard deviation is the average of the sum of squared deviations from the mean C. The numbers 3, 4, 5 have the same standard deviation as 29, 30, and 31 D. The standard deviation is a measure of spread around the median of the data
c
Focusing on describing or explaining data versus going beyond immediate data analysis in order to make inferences is the difference between ________________. A. Central tendency and spread B. Mutually exclusive and mutually exhaustive properties C. Descriptive and inferential statistics D. Skewed right and skewed left distributions
c
Scientists believe a drug can improve memory in the elderly. Five hundred patients are divided into two groups. All patients are over 80, male, in good health, with severe memory loss. Only one group will receive the drug, the other group will receive a placebo. In this study, the group dependent variable is: A. the placebo Group B. The drug C. Memory D. the 500 elderly men
c
Scientists believe a drug can improve memory in the elderly. Five hundred patients are divided into two groups. All patients are over 80, male, in good health, with severe memory loss. Only one group will receive the drug, the other group will receive a placebo. In this study, the group that receives the placebo is the _____. A. Experimental Group B. Dependent Group C. Control Group D. Independent Group
c
Suppose the category "bachelors degree" of a given data set has a relative frequency of .35. What is the "slice" size of the category "bachelors degree" for this data set? A. 90 degrees B. 35 degrees C. 126 degrees D. Cannot be determined
c
The five-number summary for scores on a Statistics exam is 11, 59, 75, 79, 93. In all, 80 students took the test. About how many students had scores between 11 and 79? A. 20 B. 40 C. 60 D. 75
c
Twenty-six samples of Romano-British pottery were found at different kiln sites in Wales, Gwent and the New Forest. The percentage of oxides of two metals, iron and magnesium (measured by atomic absorption spectrophotometry) are displayed in the boxplots below. _____ 15. The box plot for Iron is: A. Has several outliers B. Skewed right C. Skewed left D. Symmetric
c
What are the boundaries for 8.6-8.8? A) 8-9 B) 8.5-9.5 C) 8.55-8.85 D) 8.65-8.75
c
Which of the following indicates the percent of times that each variable occurs in a distribution? A. Bar Chart B. Frequency C. Relative Frequency D. Cumulative Frequency
c
Which of the following is NOT correct about constructing frequency histograms? A) All class intervals should be of equal width. B) The bars of the histogram are centered over the class mark (midpoint). C) The first and last classes should be open-ended to account for extreme points. D) There should be no spaces between bars.
c
You record the height, weight, gender, blood type and body temperature of 100 subjects in a clinical trial. The number of quatitative variables that you have recorded is A. 100 B. Two- gender, blood type C. Three- height, weight, body temperature D. Four- height, weight, blood type and body temperature
c
which of the following is a false statement about a simple random sample? A. every element of the population has an equal chance of being selected B. Every sample of the desired size has an equal chance of being selected C. A sample must be large to be considered a simple random sample D. attributes of a simple random sample are usually very similar to the atributes of the population
c
Continuous variables
can assume an infinite number of values between any two specific values. They often include fractions and decimals
Variable
characteristic or attribute that can assume different values (our point of interest)
Pie chart (qualitative)
circle divided into sections proportional to the percentage in each category. •Note: the degrees for a segment is the relative frequency for the segment times 360 degrees
Qualitative
consists of measurements for which mathematical operations do not make sense (categories).
Quantitative
consists of measurements that are numerical and mathematical operations can be performed.
Below are side by side plots for a sample of 25 cities in the US. The average high temperature in January and average high temperature in July are illustrated in the plots. Based on the box and whisker plot for july high, the best statistical measurements for center and spread are A. mean/SD B. Mean/IQR C. Median/ SD D. Median/ IQR
d
The median of 6 people in a meeting is 25 years. One of the people, whose age is 34 years, leaves the room. The new median is A. 25 B. 30 C. 40 D. It cannot be determined by the information provided
d
To avoid experimenter bias, when the experimenter nor the participant is aware of which group the participant is in, this is known as: A. Placebo Effect B. Random Assignment C. Variable Manipulation D. Double Blind Study
d
Twenty-six samples of Romano-British pottery were found at different kiln sites in Wales, Gwent and the New Forest. The percentage of oxides of two metals, iron and magnesium (measured by atomic absorption spectrophotometry) are displayed in the boxplots below. Which of the following statements about the box plots is true? A. The median of Iron is less than the median of Magnesium B. The range of Iron is much larger than the range of Magnesium C. Both data sets have clear outliers D. More than 75% of Magnesium is smaller than the median of Iron
d
When analyzing qualitative data (ie. hair color, blood type), the best measure of central tendency to describe the data is A. mean B. midrange C. median D. mode
d
Which of the following techniques yields a simple random sample? A. Choosing volunteers from an introductory Biology class to participate. B. Listing the individuals by blood types and choosing a proportion from within each blood type at random. C. Randomly selecting hospitals, and then sampling everyone within those hospitals. D. Numbering all the elements of a sampling frame and then using a random number table to pick cases from the table.
d
Raw Data
data in the original form (right after being gathered and before being sorted)
Deviation
difference between data point and mean
Statistical Inference
estimating population parameters & testing hypotheses (conclusions)
Anticipating Patterns
exploring random phenomena using probability (chance) & simulation
Variability
how far "spread" out data is
Experimental Units
individuals on which experiment is done
Skewed Left/Negative Distribution
left side of the graph extends much further out than the right side
Measures of Spread
•Range •Variance/Standard Deviation •Coefficient of Variation •Empirical Rule •Z-scores •InterQuartile Range/5 # Summary