Bio Stats midterm
Pearson's measurement of skew
3(mean - median)/õ oÕ - standard deviation
Variance sum law
-Variance of sum or difference of 2 uncontrolled variables -The variance of x plus or minus y equals the variance of x plus the variance of y (equation) - Add the variances of the two different sets -X + y + (2)(%)square root square root
trimmed mean
-the mean computed after removing some of the higher and lower scores -remove a set of scores from the top and bottom and take the mean of the remaining scores -multiply total number of scores by the percent and then subtract from top and bottom
frequency polygon
-used for understanding the shapes of distributions -serve the same purpose as histograms but more helpful for comparing sets of data -they're good choice for displaying cumulative frequency distributions
(d) Right-skewed
1. A distribution that is elongated because of some very large atypical values is called (a) Symmetric (b) Normal (c) Left-skewed (d) Right-skewed
c. Left-skewed
1. A distribution that is elongated because of some very small atypical values is called a. Symmetric b. Normal c. Left-skewed d. Right-skewed
d. Right-skewed
1. A distribution that is elongated to the right because of some very small atypical values is called a. Symmetric b. Normal c. Left-skewed d. Right-skewed
(b) statistic
1. A number summarizing some feature of a sample is called a ____? (a) parameter (b) statistic (c) percentile (d) sample standard error (e) sample standard deviation
a. parameter
1. A summary value that represents a population characteristic is called a ____? a. parameter b. statistic c. percentile d. sample standard error e. sample standard deviation
a. parameter
1. An unknown value that represents a population value is called a ____? a. parameter b. statistic c. percentile d. sample standard error e. sample standard deviation
b) No
1. For two events, A and B, if Pr(A)=0.5, Pr(B)=0.5, and Pr(AUB)=0.5, are A and B independent events? a) Yes b) No
(c) Mode
1. In an ordered set (largest to smallest) of observations the most frequently occurring value is called the _______ ? (a) Mean (b) Median (c) Mode (d) Percentile (e) Geometric Mean
d. Interval
1. Temperature in degrees Fahrenheit is an example of a variable on the___ scale? a. Nominal b. Likert c. Ordinal d. Interval e. Ratio
b. Median
1. The 50th percentile is also called the? a. Mean b. Median c. Mode d. Range e. Geometric Mean
(c) Third quartile
1. The 75th percentile is also called the? (a) First quartile (b) Second quartile (c) Third quartile (d) Fourth quartile (e) Median
(a) individual to individual variability
1. The sample standard deviation and sample variance is a measure of (a) individual to individual variability (b) sampling error (c) bias (d) confidence level
e. Intersection
1. The symbol ∩ is used to represent which operation related to combinations of events?. a. Conditional probability b. Independence c. Complement d. Union e. Intersection
d. coefficient of variability
1. The unitless measure of variability formed as the ratio of the standard deviation to the mean is called the, a. range b. quartiles c. standard error of the mean d. coefficient of variability
d. Percentile
1. The value in an ordered set (largest to smallest/ smallest to largest) of observations such that a specified percentage of the measurements lie below that value is called the a. Mean b. Median c. Mode d. Percentile e. Geometric Mean
a. Conditional probability
1. This notation, Pr(A|B), is used to describe____ ? a. Conditional probability b. Independence c. Complement d. Union e. Intersection
d. Range
1. Which of the following is not a measure of location? a. Mean b. Median c. Mode d. Range e. Geometric Mean
b. mode
1. Which of the following is not a measure of spread? a. Range b. mode c. coefficient of variation d. Standard deviation e. Variance
(c) Continuous
1. Which of the following is not one of the four scales of measurement? (a) Nominal (b) Ordinal (c) Continuous (d) Interval (e) Ratio
b. Likert
1. Which of the following is not one of the four scales of measurement? a. Nominal b. Likert c. Ordinal d. Interval e. Ratio
b. Box plot
1. Which type of graph best displays information on symmetry, variability, and presence of outliers. a. Venn diagram b. Box plot c. scatter plot d. histogram e. barchart
c. 40
8. The following histogram represents the distribution of acceptance rates (percent accepted) among 25 business schools in 2004. In each class interval, the left endpoint but not the right is included, so the class intervals are 10 ≤ rate < 15, 15 ≤ rate < 20, etc. What is the approximate spread of the data? a. 25 b. 30 c. 40 d. 50
To describe features (characteristics) of the two study groups. For example, if the psychologist also measured/recorded demographic (race, sex, age, etc) and other related information, then the two groups could be compared to assess if the two groups are otherwise comparable.
A cognitive psychologist is interested in comparing two ways of presenting stimuli on subsequent memory. Twelve subjects are presented with each method and a memory test is given. What would be the roles of descriptive statistics in the analysis of these data?
To extrapolate the findings regarding the comparison of subsequent memory between the two methods of presenting stimuli to the appropriate target population from which the sample of study subjects were recruited. Note, since we do not know if subjects in the sample were taken randomly from a specified population, the samples may be non-representative.
A cognitive psychologist is interested in comparing two ways of presenting stimuli on subsequent memory. Twelve subjects are presented with each method and a memory test is given. What would be the roles of inferential statistics in the analysis of these data?
c. the length of time for the meeting
A company has three divisions and three conference rooms for meetings. To keep track of the use of their facilities, for each meeting held in the company, the division holding the meeting is recorded, the room for the meeting is recorded, and the length of time of the meeting is recorded. Which of the variables is quantitative? a. the division holding the meeting b. the conference room for the meeting c. the length of time for the meeting d. All of the answer options are correct.
b. whether or not the house has a finished
A description of different houses for sale includes the following variables. Which of the variables is categorical? a. the square footage of the house b. whether or not the house has a finished basement c. the monthly electric bill d. All of the answer options are correct.
c. approximately 50%.
A large university is divided into six colleges, with most students graduating from four of these colleges. The following bar chart gives the distribution of the percent graduating from the four most popular colleges in 2003. The percent of students graduating from either engineering or business is: a. approximately 30%. b. approximately 40%. c. approximately 50%. d. over 60%.
a) Sample b) statistic
A researcher is interested in how watching a reality television show featuring fashion models influences the eating behavior of 13-year-old girls. a. A group of 30 13-year-old girls is selected to participate in a research study. The group of 30 13-year-old girls is an example of a ___________. b. In the same study, the amount of food eaten in one day is measured for each girl and the researcher computes the average score for the 30 13-year-old girls. The average score is an example of a __________.
Parameter- since average computed from entire population
A researcher is interested in the texting habits of high school students in the United States. If the researcher measures the number of text messages that each individual sends each day and calculates the average number for the entire group of high school students, the average number would be an example of a ___________.
Discrete
A researcher studies the factors that determine the number of children that couples decide to have. The variable, number of children, is a ______________ (discrete/continuous) variable.
d. 25%.
A sample of 40 employees from the local Honda plant was obtained, and the length of time, in months, each employee worked at the plant was recorded. A stem & leaf plot of these data follows. In the plot, 5|2 represents 52 months. The percentage of employees in the sample that have worked at the plant for less than 5 years is: a. approximately zero. b. 10%. c. 15%. d. 25%.
A line graph is not appropriate. A bar chart is a better choice
A student has decided to display the results of his project on the number of hours people in various countries slept per night. He compared the sleeping patterns of people from the US, Brazil, France, Turkey, China, Egypt, Canada, Norway, and Spain. He was planning on using a line graph to display this data. Is a line graph appropriate? What might be a better choice for a graph?
Subject are not randomly sampled from a specified population.
A study is conducted to determine whether people learn better with spaced or massed practice. Subjects volunteer from an introductory psychology class. At the beginning of the semester 12 subjects volunteer and are assigned to the massed-practice condition. At the end of the semester 12 subjects volunteer and are assigned to the spaced-practice condition. This experiment involves two kinds of non-random sampling: (1) Subjects are not randomly sampled from some specified population and (2) subjects are not randomly assigned to conditions. a) Which of the problems relates to the generality of the results?
Subjects are not randomly assigned to a condition.
A study is conducted to determine whether people learn better with spaced or massed practice. Subjects volunteer from an introductory psychology class. At the beginning of the semester 12 subjects volunteer and are assigned to the massed-practice condition. At the end of the semester 12 subjects volunteer and are assigned to the spaced-practice condition. This experiment involves two kinds of non-random sampling: (1) Subjects are not randomly sampled from some specified population and (2) subjects are not randomly assigned to conditions. b) Which of the problems relates to the validity of the results?
Subjects are not randomly assigned to a condition is more serious as it can invalidate the experimental findings.
A study is conducted to determine whether people learn better with spaced or massed practice. Subjects volunteer from an introductory psychology class. At the beginning of the semester 12 subjects volunteer and are assigned to the massed-practice condition. At the end of the semester 12 subjects volunteer and are assigned to the spaced-practice condition. This experiment involves two kinds of non-random sampling: (1) Subjects are not randomly sampled from some specified population and (2) subjects are not randomly assigned to conditions. c) Which problem is more serious?
1. Age-ratio 2. Annual income-ratio 3. Martial status- nominal
A survey asks people to identify their age, annual income, and marital status (single, married, divorced, etc.). For each of these three variables, identify the scale of measurement that probably is used and identify whether the variable is continuous or discrete.
Descriptive, since the teacher is only interested in comparing males and females in the class.
A teacher wishes to know whether the males in his/her class have more conservative attitudes than the females. A questionnaire is distributed assessing attitudes and the males and the females are compared. Is this an example of descriptive or inferential statistics? Why?
ordinal
An English professor uses letter grades (A, B, C, D, and F) to evaluate a set of student essays. What kind of scale is being used to measure the quality of the essays?
Overall, the amount of pieces remembered seems to increase with experience. The tournament players have the greatest variability.
An experiment compared the ability of three groups of participants to remember briefly- presented chess positions. The data are shown below. The numbers represent the number of pieces correctly remembered from three chess positions. Create side-by-side box plots for these three groups. What can you say about the differences between these groups from the box plots?
d. All of the answer options are correct.
As part of a data base of new births at a hospital, some variables recorded are the age of the mother, marital status of the mother (e.g., single, married, divorced), weight of the baby, and sex of the baby. Of these variables: a. the individuals described are mothers and babies involved in births at a hospital. b. age of mother and weight of baby are quantitative variables. c. sex and marital status are categorical variables. d. All of the answer options are correct.
Around 85, since this is the highest point of the polygon
Based on the frequency polygon displayed below, the most common test grade was around what score? Explain.
b. more than half of the cars in the study were from the United States.
Consumers' Union measured the gas mileage per gallon of 38 199899 model automobiles on a special test track. The following pie chart provides information about the country of manufacture of the model cars that Consumers' Union used. Based on this pie chart, we may conclude that: a. Japanese cars get significantly lower gas mileage than cars of other countries. This is because their slice of the pie is at the bottom of the chart. b. more than half of the cars in the study were from the United States. c. Swedish cars get gas mileages that are between those of Japanese and U.S. cars. d. Mercedes Benz, Audi, Porsche, and BMW represent approximately one-quarter of the cars tested.
b. pie chart.
Enteroliths are calcifications that form in the gut of horses. The stones can cause considerable morbidity and mortality. A study was conducted to investigate factors such as diet and environment that may be related to the formation of enteroliths. Housing is a variable that is coded 1 for horses that live in a stall, 2 for horses that have access to a small paddock, 3 for horses that have a large paddock, 4 for horses that live in pasture, and 5 for other. An appropriate graphical way to display housing (stall, small paddock, large paddock, pasture, or other) for horses is given by: a. histogram. b. pie chart. c. stemplot. d. All of the answer options are correct.
a. Positive
For a set of X values and Y values, if X increases, Y increases. Thus, the correlation between X and Y is: a. Positive b. Negative c. No correlation
b. x and y are negatively and strongly correlated
For a set of values of x and y, if Pearson's Correlation r is equal to - 0.97, it means: a. x and y are positively and strongly correlated b. x and y are negatively and strongly correlated c. x and y have no relationship d. x and y are perfectly correlated
23 (1 + 2 + 4 + 16)
For the values of the variable X: 1, 2, 4, 16, compute the following: a) ∑X
277 (1^2 + 2^2 + 4^2 + (16) ^2 = 1 + 4 + 16 + 256) -Square each value of X
For the values of the variable X: 1, 2, 4, 16, compute the following: b) ∑X^2
529 ((23) ^2) - sum value of X
For the values of the variable X: 1, 2, 4, 16, compute the following: c) (∑X)^2
time to relief (in minutes)
Give an example of a dependent variable. A study is conducted to assess if a newly developed oral analgesic will relieve headaches faster than aspirin. Subjects who frequently experience headaches are recruited into the study and are randomly assigned to receive either aspirin or the new analgesic. Time to relief in minutes is recorded after taking the assigned medication.
treatment group assignment (aspirin or new analgesic).
Give an example of an independent variable. A study is conducted to assess if a newly developed oral analgesic will relieve headaches faster than aspirin. Subjects who frequently experience headaches are recruited into the study and are randomly assigned to receive either aspirin or the new analgesic. Time to relief in minutes is recorded after taking the assigned medication.
Half of the scores are between the upper and lower hinges, or the area represented by the box. The upper hinge represents the 75th percentile and the lower hinge represents the 25th percentile. Therefore, the area between the two represents the 50th percentile, half the results.
In a box plot, what percent of the scores are between the lower and upper hinges?
1. Order the data 2. Find the median of ALL the data (this is the 50%) 3. Find the median of the first half of numbers (this is the 25%) 4. Find the median of the second half of numbers (this is the 75%) 5. Plug into formula
How to find tri mean?
No, since there is no universally accepted definition of a percentile. For example, Using the 65th percentile as an example, the 65th percentile can be defined as the lowest score that is greater than 65% of the scores. Refer to this as Definition 1. The 65th percentile can also be defined as the smallest score that is greater than or equal to 65% of the scores. This we will call "Definition 2. And there are others.
If you are told only that you scored in the 80th percentile, do you know from that description exactly how it was calculated? Explain.
a. Variance, mean
In a bell-shaped distribution, changing ___can make the distribution flatter (i.e. more compressed); and changing ___ can shift the distribution to the left or to the right. a. Variance, mean b. Mean, variance c. Range, standard deviation d. Mode, range e. Mean, range f. Mode, variance
b. Mostly negative but above about gdp of 3000 it appears to be flat
In the figure shown above, the correlation between the percentage of children in each country who are underweight and country's gdp is: a. Mostly positive b. Mostly negative but above about gdp of 3000 it appears to be flat c. Zero d. Appears to be positive when the gdp is between 0 and 3000 and then appears to be negative when the gdp is between gdp of 3000 and 8000
1. Range 2. Interquartile range 3. Variance 4. Standard deviation
Measures of variability or dispersion
d. Scatter Plot
Some researchers believe that consuming fish high in omega-3 fatty acids can help prevent memory loss in the elderly. Imagine that a researcher designed such a study on 300 seniors in assisted living facilities. He recorded the amount of fish high in omega-3 fatty acids they consumed in the previous month and their memory test scores. He hypothesized that elders who consumed more of this kind of fish would have better memory test scores. Which kind of graph can the best portray this relationship: a. histogram of the amount of fish consumed b. Frequency distribution graph of the memory test scores c. Stem and leaf graph d. Scatter Plot
b. The distribution is extremely right-skewed
Sometimes we need to use median instead of mean to better measure central tendency in a sample if a. The sample has a large variance b. The distribution is extremely right-skewed c. We have measurement error when collecting the sample d. The sample mean is too small
1. Descriptive: describe the data at hand 2. Interential: Generalize about features/ characteristics of a population of interest (target population)
Statistical techniques are classified into two general categories. What are the two categories called, and what is the general purpose for the techniques in each category?
d. It is impossible to produce a histogram because the counts are in terms of a categorical variable.
The bar graph below gives the distribution of the most popular colors for cars and light trucks sold globally in 2010. How could you make a histogram of these data? a. It already is a histogram. b. Recode the data so that each of the observations falls into a defined bin. c. It is impossible to produce a histogram because the counts are in terms of a quantitative variable. d. It is impossible to produce a histogram because the counts are in terms of a categorical variable.
d. percentage of observations on the vertical (y) axis, whereas a frequency histogram indicates counts.
The difference between a frequency histogram and a relative frequency histogram is that the relative frequency histogram indicates: a. counts on the vertical (y) axis, whereas a frequency histogram indicates percentages. b. counts on the horizontal (x) axis, whereas a frequency histogram indicates percentages. c. percentage of observations on the horizontal (x) axis, whereas a frequency histogram indicates counts. d. percentage of observations on the vertical (y) axis, whereas a frequency histogram indicates counts.
a) Yes b) 76 (g = 16 + 3(20))
The formula for finding each student's test grade (g) from his or her raw score(s) on a test is as follows: g = 16 + 3s a) Is this a linear transformation? b) If a student got a raw score of 20, what is his test grade?
d. All of the answer options are correct.
The histogram below shows the time spent on a Saturday by visitors to a museum browsing an exhibit. There were 300 visitors that day. The histogram: a. is skewed right. b. has an outlier. c. is asymmetric. d. All of the answer options are correct.
b. 40.
The histogram below shows the time spent on a Saturday by visitors to a museum browsing an exhibit. There were 300 visitors that day. The number of visitors that spent less than 25 minutes at the museum that day is closest to: a. 25. b. 40. c. 60. d. 80.
c. 65%.
The stemplot below displays midterm exam scores for 34 students taking a calculus course. The highest possible test score was 100. The teacher declared that an exam grade of 65 or higher was good enough for a grade of "C" or better. The percent of students earning a grade of "C" or higher (as declared by the teacher) is closest to: a. 35%. b. 50%. c. 65%. d. 80%.
nominal
The teacher in a communications class asks students to identify their favorite reality television show. The different television shows make up a ______ scale of measurement.
scatter plots
used to show the relationship between two variables
Population and sample
Variance sum law 2 -Same as variance sum law 1 but instead of the decimal/percentage, it would be the population or the sample
a. mean b. median c. mode
Which of the following are measures of central tendency? Select all that apply a. mean b. median c. mode d. Variance e. standard deviation f. linear transformation
c. mode d. variance
Which of the following are measures of variability? Select all that apply a. mean b. median c. mode d. variance e. standard deviation f. linear transformation
1. ACT 2. GRE 3. IQ 4. temp
What are examples of interval levels of measurement?
1. gender 2. blood type 3. HIV status (pos or neg) 4. favorite color 5. Country you were born in
What are examples of nominal levels of measurement?
1. pain is mild, moderate, or severe 2. Median 3. Range 4. Frequency 5. Percentage 6. mode 7. Rating of the quality of a movie on a 7-point scale
What are examples of ordinal levels of measurement?
a) Country you were born in b) favorite Color
What are examples of qualitative variables?
1. BP 2. body weight 3. Time to respond to a question
What are examples of ratio levels of measurement?
a) Rating of the quality of a movie on a 7-point scale b) Age c) Time to respond to a question
What are examples quantitative variables?
1. Pie chart 2. Bar chart 3. Frequency table
What are some ways to graph qualitative variables?
1. Line graph 2. Histogram 3. Box plot 4. Stem and Leaf Displays 5. Frequency Polygons 6. Bar Charts 7. Dot Plots
What are some ways to graph quantitative variables?
1. norminal 2. ordinal 3. interval 4. ratio
What are the 4 levels of measurement?
measures of central tendency and variability
What are the fundamentals of linear transformations and effects on mean of variable and the variance of a variable?
1. mean 2. median 3. mode 4. trimmed mean 5. geometric mean 6. percentiles
What are the measures of central tendency
tail is longer to the left bc of small atypical values
What does left skew look like?
tail is longer to the right bc of small atypical values
What does right skew look like?
- ordered sequence of equal sized categories - Identify the direction and magnitude of a difference
What is interval level of measurement?
- unordered set of categories identified only by name -Simplest level of measurement
What is nominal level of measurement?
categories are organized in an order of sequence
What is ordinal level of measurement?
- interval scale where a value of 0 indicates none of the variable - Highest form of measurement -True 0 is absent
What is ratio level of measurement?
a) Converting from meters to kilometers c) Converting from ounces to pounds e) Multiplying all numbers by 2 and then adding 5 f) Converting temperature from Fahrenheit to Centigrade
Which of the following are linear transformations? a) Converting from meters to kilometers b) Squaring each side to find the area c) Converting from ounces to pounds d) Taking the square root of each person's height. e) Multiplying all numbers by 2 and then adding 5 f) Converting temperature from Fahrenheit to Centigrade
c. 1.6 e. -1.01
Which of the following is (are) not the possible value(s) of Pearson's correlations? Choose all that apply a. 1.0 b. 0.0 c. 1.6 d. -0.99 e. -1.01
(a) Venn diagram
Which type of graph best displays information regarding the relationships between sets of objects? (a) Venn diagram (b) Box plot (c) scatter plot (d) histogram (e) barchart
The amount of data would be the deciding factor in this case. With more data, a histogram can be very useful since it shows the overall shape of the distribution. It, therefore, can be used to view a large set of data efficiently. A stem and leaf display, on the other hand, is better for smaller sets of data. Therefore, if choosing between these two methods, the main factor will be the amount of data we are dealing with.
You have to decide between displaying your data with a histogram or with a stem and leaf display. What factor(s) would affect your choice?
a) 40 -iii b) 50 -ii c) 60-i
a. In scrambled order, the averages are 40, 50, and 60. Match the histograms with the averages: a) 40 ____ b) 50 ____ c) 60____
a) iii b) i c) ii
a. Match the histograms with the following descriptions: a) The median is less than the average b) The median is bigger than the average c) The median is about equal to the average
histogram
best suited for large amounts of data
stem and leaf
best suited for small moderate amounts of data
Geometric mean
computed by multiplying all the numbers together and taking the Nth root of the product
box plots
good at depicting differences between distributions -best displays information on symmetry, variability and presence of outliers