Statistics Chapter 1
Levels of Measurement
nominal, ordinal, interval, ratio
Which of the following is not a level of measurment? A.) Quantitative B.) Ordinal C.) Nominal D.) Ratio
Quantitive
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Years in which a recession occurred.
The interval level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be meaningful, and there is no natural zero starting point.
Which level of measurement consists of categories only where data cannot be arranged in an ordering scheme?
The nominal level of measurement is characterized by data that consist of names, labels, or categories only. The data cannot be arranged in an ordering scheme (such as low to high)
Determine whether the given value is a statistic or a parameter. A study was conducted of all 1411 college students in a certain country.
The value is a Parameter because it is a numerical measurement describing some characteristic of a population.
Completely Randomized Design
For a completely randomized design, subjects are assigned to different treatment groups through a process of random selection.
Which of the following consists of discrete data? A.) Number of suitcases on a plane B.) Hair color C.) Amount of rainfall D.) Tree height
Number of suitcases on a plane The number of suitcases on a plane is a number representing a count. This would consist of discrete data.
Experiment
We apply some treatment and then proceed to observe its effects on the subject.
nominal level of measurement
characterized by data that consist of names, labels, or categories only, and the data cannot be arranged in an ordering scheme (such as low to high)
Observational Study
we observe and measure specific characteristics, but we don't attempt to modify the subjects being studied.
Probability Sample
a sample in which each member of the population has some known (but not necessarily the same) chance of being included
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Years of elections: 1988, 1992, 1996, 2000, and 2004.
Interval
Randomization
Collect the individual sample items by using a random procedure that avoids bias.
Control Effects of Variables
Don't let other variables interfere with the effects you want to see
Qualitative (or categorical/attribute) Data
Can be separated into different categories that are distinguished by some non-numerical characteristic. Example: Marital Status Are you registered to vote? Eye Color (Defined categories or groups)
Which of the following would be classified as categorical data? A.) Amount of rainfall B.) Tree height C.) Number of suitcases on a plane D.) Hair color
Hair Color Hair color would be classified as categorical data. Categorical data consist of names or labels that are not numbers representing counts or measurements.
Randomized Block Design
In a randomized block design, a block is a group of subjects that are similar, but blocks differ in ways that might affect the outcome of the experiment. Treatments are assigned randomly to the subjects within each block.
A study is conducted to measure children's growth rates without any treatment applied to the children. What best classifies this study?
Observational Study
Determine whether the description corresponds to an observational study or an experiment. Research is conducted to determine if there is a relation between colon cancer and fat consumption.
Observational study
Convenience Sampling
Simply use results that are the easiest to get.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Volume of planets in cubic meters.
The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and there is a natural starting zero point.
The exact lengths (in kilometers) of the ocean coastlines of different countries.
The data are continuous because the data can take on any value in an interval.
Interval level of measurement
The interval level of measurement is like the ordinal level, with the additional property that the difference between any two data values is meaningful. However, data at this level do not have a natural zero starting point (where none of the quantity is present).
Determine whether the given value is a statistic or a parameter. A particular country has 45 total states. if the area of 35 states are added and the sum is divided by 35, the result is 188,512 square kilometers. Determine whether this result is a statistic or a parameter.
The result is a statistic because it describes some characteristic of a sample.
Determine whether the given value is a statistic or a parameter. In a study of all 2966 employees at a college, it is found that 45% own a vehicle.
The value is a Parameter because it is a numerical measurement describing some characteristic of a population.
Nonsampling Error
when sample data is incorrectly collected, recorded, or analyzed (such as by selecting a biased sample, using a defective measure instrument, or copying the data incorrectly.)
Quantitative Data
Consists of numbers representing counts or measurements.
Matched Pairs Deign
Matched pairs design compares two treatment groups (such as treatment and placebo) by using subjects matched in pairs that are somehow related or have similar characteristics.
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A large company wants to administer a satisfaction survey to its customers. Using their customer database, the company randomly selects 30 customers asks them about their level of satisfaction with the company.
Random A random sample is selected in such a way that each individual member of the population has the same chance of being selected.
retrospective (or case-controlled) study
data are collected from the past by going back in time (through examination of records, interviews, and so on.)
Prospective (or longitudinal/cohort) study
data are collected in the future from groups sharing common factors. (these groups are called cohorts)
Cross-sectional study
data are observed, measured, and collected at one point in time.
Random Sample
members from the population are selected in such a way that each individual member in the population has an equal chance of being selected
A study of an association between which ear is used for cell phone calls and whether the subject is left-handed or right-handed began with a survey e-mailed to 5000 people belonging to an otology online group, and 717 surveys were returned. (Otology relates to the ear and hearing.) What percentage of the 5000 surveys were returned? Does that response rate appear to be low? In general, what is a problem with a very low response rate?
717 / 5000 = 0.1434 0.1434 x 100 = 14% Of the 5000 surveys, 14% were returned. This response rate APPEARS to be low What is a problem with a very low response rate? It creates a serious potential for getting a biased sample that consists of those with a special interest in the topic.
Determine whether the given value is from a discrete or continuous data set. When a car is randomly selected, it is found to have 4 doors.
A discrete data set because there are a finite number of possible values.
Statistic
A numerical measurement describing some characteristic of a sample.
Determine if the following statement represents a meaningful ratio, so the ratio level of measurement apples. A movie with a 4-star rating is twice as good as one with a 2-star rating.
The ratio level of measurement does not apply. The ratio is not meaningful because the stars don't measure or count anything. Difference between star values are not meaningful.
Which sampling method divides the population up into sections, randomly selects some of those sections, then chooses all the members from the selected sections to study?
Cluster sampling involves subdividing the population and using all members from a randomly selected group of subdivisions.
Identify which of these designs is most appropriate for the given experiment: completely randomized design, randomized block design, or matched pairs design. Currently, there is no approved vaccine for the prevention of infection by a certain virus. A clinical trial of a possible vaccine is being planned to include subjects treated with the vaccine while other subjects are given a placebo.
Completely Randomized
Determine whether the value is from a discrete or continuous data set. Amount of fabric needed for a dress is 2.5 yards.
Continuous because the value for the "amount of fabric needed for a dress" is one of infinitely many possible values and those values cannot be counted.
Identify the type of sampling used: random, systematic, convenience, stratified, or cluster. A radio station asks its listeners to call in their opinion regarding the use of pesticides in residential areas.
Convenience
Identify the type of observational study. A researchers plans to obtain data by interviewing relatives of victims who perished in a tornado to see how they're coping now.
Cross-Sectional Study In a cross-sectional study data are observed, measured, and collected at one point in time.
Identify the type of observational study (cross-sectional, retrospective, or prospective) described below. A research company uses a device to record the viewing habits of about 5000 households, and the data collected today will be used to determine the proportions of households tune to a particular educational program.
Cross-sectional Study In a cross-sectional study data are observed, measured, and collected at one point in time.
Ordinal level of measurement
Data are at the ordinal level of measurement if they can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless. Example: Consumer reports magazine rates products as, "Best Buy", "Recommended", and "Not Recommended". This is an ordinal level of measurement because you can organize these ratings as most and least recommended.
Determine whether the given value is a statistic or a parameter. A homeowner measured the voltage supplied to his home on 48 random days, and the average (mean) value is 129.5 volts.
The given value is a statistic for the year because the data collected represent a sample.
Stratified Sampling
we subdivide the population into at least two different subgroups so that subjects within the same subgroup (or strata) share the same characteristics (such as gender or age bracket). Then we draw a sample from each subgroup (or stratum).
Discrete Data
Results when the number of possible values is either a finite number or a "countable" number. (That is, the number of possible values is 0, 1, or 2, and so on) Example: In a classroom of 30 children ages range from 9-11. What is the discrete data in this situation? When there is a specific, whole number of something that is discrete. Therefore the amount of children (30) is the discrete data.
State whether the data described below are discrete or continuous, and explain why. The number of languages spoken by different people.
The data is discrete because the data can only take on specific values.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Blood lead levels of low, medium, and high.
The ordinal level of measurement is most appropriate because the data can be ordered, but differences (obtained by subtraction) cannot be found or are meaningless.
State whether the data described below are discrete or continuous, and explain why. The percentage of houses that have a colonial style in different cities.
The data are continuous because the data can only take on specific values.
Determine whether the data described below are discrete or continuous, and explain why. The heights of different refrigerators offered by manufacturers.
The data are continuous because the data can take on any value in an interval.
Fill in the blank. _______ is used when subjects are assigned to different groups through a process of random selection.
Randomization is used when subjects are assigned to different groups through a process of random selection. The logic behind randomization is to use chance as a way to create two groups that are similar. Although it might seem that we should not leave anything to chance in experiments, randomization has been found to be an extremely effective method for assigning subjects to groups.
Identify the type of sampling used: random, systematic, convenience, stratified, or cluster. To estimate the percentage of defects in a recent manufacturing batch, a quality contol manager ate general selects every 17th soup that comes off the assembly line starting with the third until she obtains a sample of 60 soup cans.
Systematic A systematic sample is obtained by selecting every kth individual from the population (the first individual selected is a random number from 1 to k).
State whether the data described below are discrete or continuous, and explain why. The numbers of children in families A.) The data are discrete because the data can take on any value in an interval. B.) The data are discrete because the data can only take on specific values C.) The data are continuous because the data can take on any value in an interval. D.) The data are continuous because the data can only take on specific values.
The data are discrete because the data can only take on specific values
Determine whether the data described below are qualitative or quantitative and explain why. The styles of shoes of clients entering a certain store (sneaker, boot, sandal, etc.) A.) The data are qualitative because they don't measure or count anything. B.) The data are quantitative because they consist of counts or measures. C.) The data are qualitative because they consist of counts or measures. D.) The data are quantitative because they don't measure or count anything.
The data are qualitative because they don't measure or count anything.
Determine whether the study is an experiment or an observational study, and then identify a major problem with the study. A study involved 22,071 male physicians. Based on random selections, 11,037 of them were treated with aspirin and the other 11,034 were given placebos. The study was stopped early because it became clear that aspirin reduced the risk of myocardial infarctions by a substantial amount. What is the major problem with this study?
This is an EXPERIMENT because the researchers APPLY A TREATMENT TO the individuals. The major problem with this study is that the results only apply to male physicians.
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A man is selected by a marketing company to participate in a paid focus group. The company says that the man was selected because he was randomly chosen from all adults.
In a random sample, each individual member in the population has an equal chance of being selected.
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. In a poll conducted by a certain research center, 1208 adults were called after their telephone numbers were randomly generated by a computer, and 89% were able to correctly identify the president.
In a random sample, each individual member in the population has an equal chance of being selected.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below. Body temperatures in degrees Fahrenheit.
Interval level of measurement is most appropriate because the data can be ordered, difference can be found and are meaningful, and there is NO natural starting zero point.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Favorite Colors
Nominal
Identify the level of measurement of the data, and explain what is wrong with the given calculation. In a set of data, course grades are represented as 10 for 'A', 20 for 'b', and 30 for 'c'. The average (mean) of the 711 course grades is 25.4 What is wrong with the given calculation?
The data are at the ____ level of measurement. Ordinal Such data should not be used for calculations such as an average.
Determine whether the given description corresponds to an observational study or an experiment. In a study of 411 women with a particular disease, the subjects were monitored with an EEG while sleeping.
The given description corresponds to an observational study.
Determine whether the given value is a statistic or a parameter. A sample of professors is selected and it is found that 55 % own a vehicle. A.) Statistic because the value is a numerical measurement describing a characteristic of a population. B.) Parameter because the value is a numerical measurement describing a characteristic of a sample. C.) Statistic because the value is a numerical measurement describing a characteristic of a sample. D.) Parameter because the value is a numerical measurement describing a characteristic of a population.
Statistic because the value is a numerical measurement describing a characteristic of a sample.
Identify which of these types of sampling is used: random, systematic, convenience, stratified, or cluster. To determine her heart rate, Denise divides up her day into three parts: morning, afternoon, and evening. She then measures her heart rate at 2 randomly selected times during each part of the day.
Stratified A stratified sample is obtained by subdividing the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics, and then a sample is drawn from each subgroup (or stratum).
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. A researcher selects every 661th social security number and surveys the corresponding person.
Systematic Sampling A systematic sample is obtained by selecting every kth individual from the population (the first individual selected is a random number from 1 to k).
Identify the type of sampling used: random, systematic, convenience, stratified, or cluster. A man experienced a tax audit. The tax department claimed that the man was audited because every 1000th person on the taxpayer list was audited.
Systematic Sampling A systematic sample is obtained by selecting every kth individual from the population (the first individual selected is a random number from 1 to k).
Identify the type of sampling used: random, systematic, convenience, stratified, or cluster. A market researcher has partitioned all residents of a certain region into categories of unemployed, employed full time, and employed part-time. She is surveying 40 people from each category.
Stratified A stratified sample is obtained by subdividing the population into at least two different subgroups (or strata) so that subjects within the same subgroup share the same characteristics, and then a sample is drawn from each subgroup (or stratum).
Which sampling method subdivides the population into categories sharing similar characteristics and then selects a sample from each subdivision?
Stratified sampling method involves subdividing the population and uses a sample of members from each subdivision of the population.
Identify which of these designs is most appropriate for the given experiment: completely randomized design, randomized block design, or matched pairs design. A drug is designed to treat insomnia. In a clinical trial of the drug, amounts of sleep each night are measured before and after subjects have been treated with the drug.
Matched pairs design compares two treatment groups (such as treatment and placebo) by using subjects matched in pairs that are somehow related or have similar characteristics.
Determine whether the study is an experiment or an observational study, and then identify a major problem with the study. A medical researcher tested for a difference in systolic blood pressure levels between male and female students who are 12 years of age. She randomly selected four males and four females for her study.
This is an OBSERVATIONAL STUDY because researchers DO NOT ATTEMPT TO MODIFY the individuals. The major problem with this study is: The sample is too small
Parameter
A numerical measurement describing some characteristic of a population
Simple random Sample
A sample of size 'n' selected from the population in such a way that each possible sample of size 'n' has an equal chance of being selected. You're selecting a group of elements such that each group of that size has an equal chance of being selected.
Which of the following corresponds to the case when every sample of size n has the same chance of being chosen?
A simple random sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen.
Which of the following is associated with a parameter? A.) A numerical measurement describing some characteristic of a sample. B.) Data that were obtained from an entire population. C.) Data that were obtained from a voluntary poll at the end of a service call. D.) Data that were obtained from a sample.
Data that were obtained from an entire population. A parameter is a numerical measurement describing some characteristic of a population. So, a parameter is associated with data that were obtained from an entire population.
Sampling Error
The difference between a sample result and the true result if the entire population had been interviewed. Such an error results from chance sample fluctuations.
Determine whether the study is an experiment or an observational study, and then identify a major problem with the study. A sociologist has created a brief survey to be given to 2000 adults randomly selected from the U.S. population. Here are her first two questions: (1) Have you ever been the victim of a felony crime? (2) Have you ever been convicted of a felony?
This is an OBSERVATIONAL STUDY because the researcher DOES NOT ATTEMPT TO MODIFY the individuals. The major problem with this study is: Individuals convicted of a felony are more likely to not answer the second question honestly.
Cluster Sampling
We first divide the population area into sections (or clusters), then randomly select some of those clusters, and then choose ALL the members from those selected clusters.
Systematic Sampling
select some starting point and then select every 'kth' element (such as every 50th) in the population
Explain the difference between a single-blind and a double-blind experiment.
In a single-blind experiment, the subject does not know which treatment is received. This helps ensure that individuals do not adjust their behavior because of the treatment they are receiving. In a double-blind experiment, neither the subject nor the researcher in contact with the subject knows which treatment is received. When the researcher also does not know which treatment is received, it helps ensure that they observe each group the same way.
Identify the level of measurement of the data, and explain what is wrong with the given calculation. In a survey, the responses of respondents are identified as 1 for a "yes", 2 for a "no", 3 for a "maybe" and 4 for anything else. The average (mean) is calculated for 704 respondents and the result is 1.5 What is wrong with the given calculation?
The data are at the _______ Level of measurement. Nominal Such data are not counts or measures of anything, so it makes no sense to compute their average (mean).
Ratio Level of Measurement
The ratio level of measurement is like the interval level with the additional property that there is also a natural zero starting point (where zero indicates that none of the quantity is present). For values at this level, differences and ratios are both meaningful.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate for the data below. Acres of land
The ratio level of measurement is most appropriate because the data can be ordered, difference can be found and are meaningful, and there is a natural starting zero point.
Determine which of the four levels of measurement (nominal, ordinal, interval, ratio) is most appropriate. Number of houses people own.
The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful and there is a natural starting zero point.
Determine whether the given value is a statistic or a parameter. A sample of 1507 construction workers showed that 6% voted in the past election.
The value is a statistic because it is a numerical measurement describing some characteristic of a sample.
Determine whether the study is an experiment or an observational study, and then identify a major problem with the study. In a survey, 1465 Internet users chose to respond to this question posted on a newspaper's electronic edition: "Is news online as satisfying as print and TV news?" 52% of the respondents said "yes." What is the Major Problem with this study?
This is an OBSERVATIONAL STUDY because researchers DO NOT ATTEMPT TO MODIFY the individuals. This is a convenience sample with voluntary response, which has a high chance of leading to bias.
In a study designed to test the effectiveness of a medication as a treatment for lower back pain, 1643 patients were randomly assigned to one of three groups: (1) the 547 subjects in the placebo group were given pills containing no medication; (2) 550 subjects were in a group given pills with the medication taken at regular intervals; (3) 546 subjects were in a group given pills with the medication to be taken when needed for pain relief. In what specific way was replication applied in the study?
The group sample sizes are all large so the researchers could see the effects of the treatment.
Replication and Sample Size
Use a sample size that is large enough so that we can see the true nature of any effects and obtain the sample using an appropriate method, such as one based on randomness
State whether the data described below are discrete or continuous, and explain why. The volume of cola in a can is 11.1 oz.
A continuous data set because there are infinitely many possible values and those values cannot be counted.
Determine whether the value given below is from a discrete or continuous data set. In a test of a method of gender selection, 674 couples used the XSORT method and 540 of them had baby girls.
A discrete data set because there are a finite number of possible values.
Determine whether each of the following is a simple random sample and a random sample. a. A statistics class with 36 students is arranged so that there are 6 rows with 6 students in each row, and the rows are numbered from 1 through 6. A die is rolled and a sample consists of all students in the row corresponding to the outcome of the die. b. For the same class described in part (a), the 36 student names are written on 36 individual index cards. The cards are shuffled and six names are drawn from the top. c. For the same class described in part (a), the six youngest students are selected.
A simple random sample of n subjects is selected in such a way that every possible sample of the same size n has the same chance of being chosen. (A simple random sample is often called a random sample, but strictly speaking, a random sample has the weaker requirement that all members of the population have the same chance of being selected.) A. This sample IS NOT a simple random sample, it IS a random sample. B. This sample IS a simple random sample, it IS a random sample. C. This sample IS NOT a simple random sample, it IS NOT a random sample.
Identify the type of sampling used (random, systematic, convenience, stratified, or cluster sampling) in the situation described below. To determine customer opinion of their musical variety, Sony randomly selects 140 concerts during a certain week and surveys all concert goers. Which type of sampling is used?
Cluster A cluster sample is obtained by selecting individuals within a randomly selected group of individuals.
Would this give us a Random Sample, Simple Random Sample, or a Probability Sample? Why? Picture a classroom with 60 students arranged in 6 rows of 10 students each. Assume that the professor selects a sample of 10 students by rolling a die and selecting the row corresponding to the outcome.
This IS a Random Sample because every single person has an equal chance of being selected. This would NOT be a Simple Random Sample because using the die to select a row makes it impossible to select ten students who are in different rows so we cannot have different sets of rows. This IS a Probability Sample because each student knows they have the same chance of being selected
Continuous (numerical) data
Result from infinitely many possible values that correspond to some continuous scale that covers a range of values without gaps, interruptions, or jumps. Example: In a classroom of 30 children ages range from 9-11. What is the Continuous data in this situation? When there is a range of information it is Continuous. Therefore the age range: 9 to 11 is the continuous data.