Biostats

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Procedure for constructing a frequency distribution

1. Select the number of classes, usually between 5 and 20. 2. Calculate the class width. 3. Choose the value for the first lower class limit by using either the minimum value or a convenient value below the minimum. 4. Using the first lower class limit and class width, list the other lower class limits. 5. List the lower class limits in a vertical column and then determine and enter the upper class limits.6. Take each individual data value and put a tally mark in the appropriate class. Add the tally marks to get the frequency.

Parameter

A numerical measurement describing some characteristic of a population

Statistic

A numerical measurement describing some characteristic of a sample

Standard Deviation

A set of sample values, denoted by s Is a measure of how much data values deviate away from the mean. The standard deviation is a measure of how much data values deviate away from the mean. The value of the standard deviation s is never negative. It is zero only when all of the data values are exactly the same. Larger values of s indicate greater amounts of variation.

Which of the following is not included in the five-point summary? A) Mode B) Median C) 1st Quartile D) 3rd Quartile

A) Mode

Determine which of the four levels of measurement is most appropriate. Ornithologists classify hummingbirds in the United States using 17 different species: Allen, Anna, Berylline, Black-chinned, Blue-throated, Broad-billed, Broad- tailed, Buff-bellied, Calliope, Costa, Lucifer, Magnificent, Ruby-throated, Rufous, Violet-crowned, White-eared, and Xantus. A) Nominal B) Ordinal C) Interval D) Ratio

A) Nominal

Determine the "best" type of sampling (systematic, convenience, stratified, or cluster) based on the description. A county water quality control officer obtains a list of all residential addresses in the county and constructs a sample of homes to monitor by selecting every 200th home on the list. A) Systematic B) Convenience C) Stratified D) Cluster

A) Systematic

For the given description of​ data, determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate. A research project on the effectiveness of appendectomies begins with a compilation of the hospitals that provide appendectomies. A. The nominal level of measurement is most appropriate because the data cannot be ordered. B. The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural zero starting point. C. The ordinal level of measurement is most appropriate because the data can be ordered, but differences (obtained by subtraction) cannot be found or are meaningless. D. The interval level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, but there is no natural zero starting point. (taken from MLM)

A. The nominal level of measurement is most appropriate because the data cannot be ordered.

Simple event

An outcome or an event that cannot be further broken down into simpler components.

If the standard deviation for a set of data is 0, which of the following must be true? A) All the data values equal 0. B) All of the data values are identical. C) All of the data values are negative. D) None of the above must be true since standard deviation cannot be equal to 0.

B) All of the data values are identical.

Determine the "best" type of sampling (systematic, convenience, stratified, or cluster) based on the description. To obtain a sample of pregnant women, a researcher contacts her son's preschool teacher for a list of names. A) Systematic B) Convenience C) Stratified D) Cluster

B) Convenience

Which of the following statistics is most resistant to the inclusion of outliers in a dataset? A) Mean B) Median C) Variance D) Standard deviation

B) Median

Identify the type of observational study: cross-sectional, retrospective, or prospective. Scientists studying the migration habits of Aleutian geese collect data from reports measuring the number of geese during "fly-offs" over the past 15 years. A) Cross-sectional B) Retrospective C) Prospective

B) Retrospective

For the given description of​ data, determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate. Blood lead level (μg/dL) A. The ordinal level of measurement is most appropriate because the data can be ordered, but differences (obtained by subtraction) cannot be found or are meaningless. B. The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural zero starting point. C. The interval level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, but there is no natural zero starting point. D. The nominal level of measurement is most appropriate because the data cannot be ordered. (taken from MLM)

B. The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural zero starting point.

Identify the type of observational study: cross-sectional, retrospective, or prospective. A researcher from the department of defense is studying the psychology of trauma. He plans to follow the children of service members who died in Afghanistan for the next 20 years. A) Cross-sectional B) Retrospective C) Prospective

C) Prospective

Several studies showed that after regular exercise on a treadmill​, subjects had lowered blood pressure. High blood pressure has been associated with increased risk of heart disease and stroke. A fitness equipment company financed this research. Identify what is wrong. A. The data used in the studies is not reliable because it was not measured by the administrator. B. Since the research is composed of voluntary response​ samples, there may be key data points missing. C. It is questionable that the sponsor is a fitness equipment company because this sponsor can be greatly affected by the conclusion. D. It is not possible to take accurate measurements. (taken from MLM)

C. It is questionable that the sponsor is a fitness equipment company because this sponsor can be greatly affected by the conclusion.

Nominal level of measurement

Characterized by data that consist of names, labels, or categories only, and the data cannot be arranged in some order (such as low to high). Example: Survey responses of yes, no, and undecided

Sample space

Consists of all possible simple events. That is, the sample space consists of all outcomes that cannot be broken down any further.

Categorical (or qualitative or attribute) data

Consists of names or labels (not numbers that represent counts or measurements). Example: ~The gender (male/female) of professional athletes ~Shirt numbers on professional athletes uniforms (substitutes for names)

Range Rule of Thumb

Crude but simple tool for understanding and interpreting standard deviation. The vast majority (such as 95%) of sample values lie within 2 standard deviations of the mean.

f your score on your next statistics test is converted to a z score, which of these z scores would you prefer? A) -2.00 B) -1.00 C) 0 D) 2.00

D) 2.00

How will a high outlier in a data set affect the mean and median? A) Distribution skewed to the left, mean pulled to the left, median not effected B) Distribution skewed to the right, mean not effected, median not effected C) Distribution skewed to the right, mean not effected, median skewed to the left D) Distribution skewed to the right, mean pulled to the right, median not effected

D) Distribution skewed to the right, mean pulled to the right, median not effected (mean will be pulled to the right, due to a higher outlier)

When the mean and median are equal, what can be said about the shape of the distribution? A) More information should be presented if assumptions pertaining to the shape of the distribution are to be made. B) When the mean is equal to the median, the shape of the distribution tends to be slightly skewed right. C) When the mean is equal to the median, the shape of the distribution tends to be slightly skewed left. D) When the mean is equal to the median, the shape of the distribution can be assumed relatively symmetrical.

D) When the mean is equal to the median, the shape of the distribution can be assumed relatively symmetrical.

Ratio level of measurement

Data can be arranged in order, differences can be found and are meaningful, and there is a natural zero starting point (where zero indicates that none of the quantity is present). Differences and ratios are both meaningful. Example: Class times of 50 minutes and 100 minutes

Researchers are interested in effectiveness of a new drug meant to treat HIV. Supposed a new study is conducted to compare the new study drug to the current best standard of care of care/highly active antiretroviral therapy (HAART). The study is mandated to be double blind. Describe the meaning of double blind as it relates to this particular study.

Double blind-neither the participants nor the physicians involved are aware of which treatment the participants will recieve.

A new anti-hypertensive drug is designed to reduce the blood pressure in a highly at risk population of adult aged 55 to 70. Suppose researchers are interested in conducting an assessment of the drug's ability to reduce blood pressure compared to a control group. Researchers decide to randomly select a group of 250 participants from their eligible population, and randomly assign 125 to receive either the new study drug, or saline solution intravenously. The results would be compared and neither the patient, study doctor, nor the monitoring board would be aware of the random assignment. What kind of study is this? What level of blinding is present?

Experimental Triple

Boxplot (or box-and-whisker diagram)

Graph of a data set that consists of a line extending from the minimum value to the maximum value, and a box with lines drawn at the first quartile Q1, the median, and the third quartile Q3

Standard Deviation of a Population

Instead of dividing by n − 1 for a sample, we divide by the population size N, not n-1.

Interval level of measurement

Involves data that can be arranged in order, and the differences between data values can be found and are meaningful. However, there is no natural zero starting point at which none of the quantity is present. Example: Years 1000, 2000, 1776, and 1492

Ordinal level of measurement

Involves data that can be arranged in some order, but differences (obtained by subtraction) between data values either cannot be determined or are meaningless. Example: Course grades A, B, C, D, or F

Odds ratio

Measure of risk found by evaluating the ratio of the odds in favor of the treatment group to the odds in favor of the control group

Calculating the mean from a frequency distribution

Multiply the class midpoint by the frequency. Do this for all frequencies and add the sum. Divide the sum by the sum of the original frequencies.

CAT scans are becoming incredibly common in children and adolescents. Suppose a research team is interested in the effects of early and repeated CAT scanning on blood and bone cancers in Sweden. The research team decided to collect a list of all children aged 5-15 with blood or bone cancers and retrospectively look at the number of CAT scans they have received over the course of their lives. What type of study is this?

Observational Retrospective

Notation for Probabilities

P denotes a probability. A, B, and C denote specific events. P(A) denotes the "probability of event A occurring."

Relative risk

Probability of disease given exposure divided by the probability of disease given non-exposure

Significant Z scores

Significant values are those with z scores ≤ −2.00 or ≥ 2.00.

Determine whether the data are from a discrete or continuous data set. A sample of married couples is randomly selected and the difference in heights for each couple is recorded. Since the differences in heights are ___ ​countable, the data are from a _______ data set. (taken from MLM)

Since the differences in heights are not ​countable, the data are from a continuous data set.

Determine whether the data are from a discrete or continuous data set. In a study of weight gains by college students in their freshman​ year, researchers record the numbers of pizzas ordered by randomly selected students Since the numbers of pizzas ordered ___ ​countable, the data are from a __________ data set. (taken from MLM)

Since the numbers of pizzas ordered are ​countable, the data are from a discrete data set.

Class width

The difference between two consecutive lower class limits in a frequency distribution

Empirical Rule for Data with a Bell-Shaped Distribution

The empirical rule states that for data sets having a distribution that is approximately bell-shaped, the following properties apply. • About 68% of all values fall within 1 standard deviation of the mean. • About 95% of all values fall within 2 standard deviations of the mean. • About 99.7% of all values fall within 3 standard deviations of the mean.

Complementary Events

The events of one outcome happening and that outcome not happening

Identify whether the given value is a statistic or a parameter. A research team has received a collection of brains from those deceased in a natural disaster which had been donated for research. The research team found that the average​ (mean) for the volume of the brains they had received was 1152.8 cm3. They came to this conclusion after measuring the brains they had received. The given value is a statistic because the numerical measurement describes a characteristic of a sample. (taken from MLM)

The given value is a statistic because the numerical measurement describes a characteristic of a sample.

Identify whether the given value is a statistic or a parameter. In a study of 500 of the babies born at five different hospitals in Kentucky​, it was found that 48% of the babies were boys. (taken from MLM)

The given value is a statistic because the numerical measurement describes a characteristic of a sample.

For the given description of​ data, determine which of the four levels of measurement​ (nominal, ordinal,​ interval, ratio) is most appropriate. Year of birth of subject A. The nominal level of measurement is most appropriate because the data cannot be ordered. B. The ratio level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, and there is a natural zero starting point. C. The interval level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, but there is no natural zero starting point. Your answer is correct. D. The ordinal level of measurement is most appropriate because the data can be ordered, but differences (obtained by subtraction) cannot be found or are meaningless. (taken from MLM)

The interval level of measurement is most appropriate because the data can be ordered, differences (obtained by subtraction) can be found and are meaningful, but there is no natural zero starting point.

Midrange

The midrange of a data set is the measure of center that is the value midway between the maximum and minimum values in the original data set. It is found by adding the maximum data value to the minimum data value and then dividing the sum by 2 *Because the midrange uses only the maximum and minimum values, it is very sensitive to those extremes so the midrange is not resistant.

z Score

The number of standard deviations that a given value x is above or below the mean. Round z scores to two decimal places (such as 2.31). A data value is significantly low if its z score is less than or equal to −2 or the value is significantly high if its z score is greater than or equal to +2. If an individual data value is less than the mean, its corresponding z score is a negative number.

Class boundaries

The numbers used to separate the classes, but without the gaps created by class limits

Range

The range of a set of data values is the difference between the maximum data value and the minimum data value. Range = (maximum data value) − (minimum data value) Important note: the range uses only the maximum and the minimum data values, so it is very sensitive to extreme values. The range is not resistant.

Important Properties of Variance

The units of the variance are the squares of the units of the original data values. The value of the variance can increase dramatically with the inclusion of outliers. (The variance is not resistant.) The value of the variance is never negative. It is zero only when all of the data values are the same number. The sample variance s² is an unbiased estimator of the population variance σ².

Class midpoints

The values in the middle of the classes Each class midpoint can be found by adding the lower class limit to the upper class limit and dividing the sum by 2.

Range Rule of Thumb for Estimating a Value of the Standard Deviation s

To roughly estimate the standard deviation from a collection of known sample data, use image provided (prof said he "personally has not seen this used very often)

Specificity

True false test

Sensitivity

True positive test

Variance of a Sample and a Population

Variance- The variance of a set of values is a measure of variation equal to the square of the standard deviation. Sample variance: s² = square of the standard deviation s. Population variance: σ² = square of the population standard deviation σ.

Rounding Probabilities

When expressing the value of a probability, either give the exact fraction or decimal or round off final decimal results to three significant digits.

Round-off Rule for Measures of Variation

When rounding the value of a measure of variation, carry one more decimal place than is present in the original set of data.

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. A man is selected by a marketing company to participate in a paid focus group. The company says that the man was selected because his name is among the first 200 in the phone number listings. a) Convenience sampling b) Clusters sampling c) Systematic sampling d) Stratified sampling e) Random sampling (taken from MLM)

a) Convenience sampling

For the following​ scenario, identify which of these types of sampling is​ used: random,​ systematic, convenience,​ stratified, or cluster. A researcher collects sample data by dividing a floorplan of the hospital into sections and choosing all employees in three of them. Choose the correct answer below. a) Convenience sampling b) Clusters sampling c) Systematic sampling d) Stratified sampling e) Random sampling (taken from MLM)

b) Clusters sampling

Which of the following describes an experimental, placebo controlled study? a) a non-experimental clinical trial is organized to test the efficacy of a new drug intended to treat high cholesterol against the leading drug on the market b) a clinical trial is organized to test a new chemotherapy on its ability to increase progression free survival (PGA) and is randomly offered to half of the participants, while the other half receive best standard care c) Epidemiologists are interested in the association between mold in homes and wheezing in young adults so they conduct an investigation looking at the exposure and outcome relationship d.) Researchers are interested in a causal relationship between periodontal disease and hypertension so they observe the results from 10 different studies looking at the relationship and compile the results into one report in a meta analysis

b) a clinical trial is organized to test a new chemotherapy on its ability to increase progression free survival (PGA) and is randomly offered to half of the participants, while the other half receive best standard care

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. A researcher selects every 278th social security number and surveys the corresponding person. Which type of sampling did the researcher ​use? a) Convenience sampling b) Clusters sampling c) Systematic sampling d) Stratified sampling e) Random sampling (taken from MLM)

c) Systematic sampling

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. A researcher selects every 810th social security number and surveys the corresponding person. Which type of sampling did the researcher ​use? a) Convenience sampling b) Clusters sampling c) Systematic sampling d) Stratified sampling e) Random sampling (taken from MLM)

c) Systematic sampling

Identify the type of sampling used​ (random, systematic,​ convenience, stratified, or cluster​ sampling) in the situation described below. A woman is selected by a marketing company to participate in a paid focus group. The company says that the woman was selected because every 1000th person in the phone number listings was being selected. Which type of sampling did the researcher ​use? a) Convenience sampling b) Clusters sampling c) Systematic sampling d) Stratified sampling e) Random sampling (taken from MLM)

c) Systematic sampling

Determine which of the four levels of measurement is most appropriate. Biologist measure the water temperature of the Merrimack river in New Hampshire. a.) nominal b) ordinal c) interval d) ratio

c) interval

Which of the following could most be considered an ordinal or variable? a). height in feet b.) BMI c). Ranks of Colleges in U.S. News and World Report d) Annual Income

c). Ranks of Colleges in U.S. News and World Report

Which of the following is not a categorical or nominal variable? a) Gender b) HIV Status (+/-) c) Diabetes Type II Status (Y/N) d) Height in cm

d) Height in cm (this would be a ratio)

Determine which of the four levels of measurement is most appropriate. Doctors measure the weights (in pounds) of preterm babies. a.) nominal b) ordinal c) interval d) ratio

d) ratio

∑ denotes the sum of a set of data values

denotes the sum of a set of data values

The two types of quantitative data:

discrete and continuous

Relative frequency for a class

frequency for a class divided by sum of all frequencies

Percentage for a class (frequency)

frequency for a class/sum of all frequencies x 100%

x bar

mean of a sample

Percentile notations

n= total # of values in a data set k= percentile being used L= locater gives you position of value Pk= kth percentile

Determine whether the given value is a statistic or a parameter. In studying the loggerhead turtle on Ana Maria Island Florida, scientists observe the average (mean) number of hatchlings in all 253 nests.

parameter

mu

population mean

N represents the number of data values in a...

population.

5 most basic properties (rules) of Probability are:

refer to image

x is the variable usually used to...

represent the individual data values.

Notation Summarys

s = sample standard deviation s² = sample variance σ = population standard deviation σ² = population variance Square the standard deviation to get the variance

n represents the number of data values in a...

sample.

Determine whether the given value is a statistic or a parameter. A sample of Medtronic's rechargeable implanted spine stimulator batteries last in an average (mean) of nine years.

statistic


Set pelajaran terkait

PVD/PAD- Ch 30 - Brunner & Suddarths

View Set

RN Neurocognitive Disorders Assessment

View Set

PED Final Exam Course point ?'s Unit 1

View Set

Career Choices: Chapter 3 Vocabulary Words

View Set

DP-100 Data Science Questions Topic 4

View Set

Chapter 04 Quiz: Business Ethics and Social Responsibility: Doing Well by Doing Good

View Set

Human Anatomy: Chapter 5 - The Integumentary System

View Set

Basic Economic Concepts Test Review

View Set