MATH 1530 - Statistics; Chapter 1 & 2
"Statistic" instead of "____________ _______________" "Parameter" instead of "______________ ______________"
"Sample Statistic"; "Population Parameter"
Selection Bias
( or a selection effect ) occurs whenever researchers select their sample in a biased way.
Qualitative Data
(or categorical) data that consist of values that can be placed into nonnumerical categories.
Self-Selected Survey
(or voluntary response survey) is one in which people decide for this themselves whether to participate.
Examples of Qualitative Date
1. Brands names of running shoes in a consumer survey; Brand names are categories and therefore represent qualitative data. 2. Letters grades on an essay assignment; Letters grades on an essay assignment are qualitative because they represent different categories of performance (failing through excellent). 3. Numbers on uniforms that identity players on a basketball team; The players' uniform numbers are qualitative because they do not represent a count or measurement; they are used solely as substitutes for names. You can tell that these numbers are qualitative rather than quantitative because they don't measure or count anything; so, it. would make no sense to add or subtract the uniform numbers of different players.
Eight Guidelines for Critically Evaluating a Statistical Study
1. Get a Big Picture of the Study; You should understand the goal of the study, the population that was understudy, and whether the study was observational or an experiment. 2. Consider the Source; Look for potential sources of bias on the part of the researchers. 3. Look for Bias in the Sample; Decide whether the sampling method was likely to produce a representative sample. 4. Look for Problems in Defining or Measuring the Variables of Interest; Ambiguity in the variables can make it difficult to interpret reported results. 5. Beware of Confounding Variables; If the study neglected potential confounding variable's, its results may not be valid. 6. Consider the Setting and Wording in Surveys; Look for anything that might tend to produce in accurate or dishonest responses. 7. Check That Results Are Represented Fairly; Check whether the study really supports the conclusions that are presented in the media. 8. Stand Back and Consider the Conclusions; Evaluate whether the study achieve its goals. If so, do the conclusions make sense and have practical significance?
Examples of Continuous Date
1. Measurements of the time it takes to walk a mile; Time can take on any value, so measurements of time are continuous. 2. The amounts of milk produced by dairy cows on a farm; The amounts of milk that a cow produces can take on any value in some range, so milk production data are continuous.
Data Type: Quantitative
1. Nominal 2. Ordinal
Examples of Quantitative Date
1. Scores on a multiple-choice exam; Scores on a multiple-choice exam are quantitative because they present different categories of performance (falling through excellent)
Basic Steps in a Statistical Study
1. State the goal of your study precisely 2. Choose a representative sample from the population 3. Collect raw data from the sample and summarize these data by finding sample statistics of interest 4. Use the sample statistics to infer the population parameters 5. Draw conclusions
Example of Discrete Date
1. The numbers of calendar years (such as 2017, 2018, 2019); The. numbers of calendar years are discrete because they cannot have fractional values. For example, at midnight on New Year's Eve of 2020, the year will change from 2020 to 2021; we'll never say the year is 2020 1/2. 2. The numbers of dairy cows on different farms; Each farm has a whole number of cows that we can count, so these data are discrete. 3. What kind of coffee students at UofM likes.
Census
A collection of data from EVERY member of the population.
What does a placebo ensure?
A placebo ensures that any effect arising from the psychological factor of undergoing a treatment process affects the treatment and control groups equally.
Peer Review
A process in which several experts in a field evaluates a research report before the report is published.
Simple Random Sample
A sample that is chosen randomly. Random samples are used to avoid bias and other unwanted effects. EVERYONE HAS A CHANCE.
Identify the level of measurement (nominal, ordinal, interval, or ratio) Student rankings of cafeteria food as excellent, good, far, or poor.
A set of rankings represents data at the ordinal level of measurement because the categories (excellent, good, far, or poor) have a definite order.
Bais
A statistical study suffers from BIAS if its designed or conduct tends to favor certain results. A sample is biased if the members of the sample differ in some specific way from the members of the general population.
Confounding
A study suffers from confounding if the effects of different variables are mixed so a researcher cannot determine the specific effects of the variables of interest. The variables that lead to the confusion are called confounding variables.
Calibration Error
A systematic error in which the scale's measurements differ consistently from the true value.
Identify the level of measurement (nominal, ordinal, interval, or ratio) Calendar years of historic events, such as 1776, 1945, and 2001.
An interval of one calendar year always has the same meaning. But ratios of calendar years do not make sense because the choice of the year 0 is arbitrary and does not mean "the beginning of time." Calendar years are therefore at the interval level of measurement.
Why is blinding important in an experiment that is testing the effectiveness of a drug?
Blinding is the practice whereby participants and/or experimenters do not know who belongs to the treatment group and who belongs to the control group. It is important to use blinding for participants so that they are not affected by the knowledge that they are receiving the real treatment, and it is important to use it for experimenters so that they can evaluate results objectively instead of being influenced by knowledge about who is getting the real treatment. Blinding the experimenter is necessary so that the experimenter does not influence the subjects.
Population Parameter
Characteristics of a population.
Continuous Data
Data that can take on any value in a given interval. ex: someone's weight, time, someone's actual foot lengths, temperature (it's constantly changing)???
Discrete Data
Data that can take on only particular, distinct values and not other values in between. ex: shoe sizes (7 1/2, how many kids in a room in a classroom (5, 7), number of pages in a book, number of people in a race
Quantitative Data
Data that consist of values representing counts or measurements.
Data Type: Qualitative
Discrete 1. Interval - things that can go below 0 ( like the temperature ) 2. Ratio - ex: how many students in our math class; there is a specific number that can be counted Continuous 1. Interval 2. Ratio
Helpful "Representative Sample" Note
If sample doesn't represent the population, the statistics doesn't mean much.
Examples of Values of Interest
Imagine trying to conduct a study of how exercise affects resting heart rates. The variables of interest would be "amount of exercise" and "resting heart rate."
Response Variable
In cases where cause and effect may be involved, the variables of interest may be subdivided into two categories. A response variable is a variable that responds to changes in the explanatory variable.
Explanatory Variable
In cases where cause and effect may be involved, the variables of interest may be subdivided into two categories. An explanatory variable is a variable that may explain or cause the effect.
What is blinding?
In statistical terminology, the practice of keeping people in the dark about who is in the treatment group and who is in the control group is called blinding. A single-blind experiment is one in which the participants don't know which group they belong to, but the experimenters do know. If neither the participants nor the experimenters know who belongs to each group, the study is said to be double-blind. Of course, someone has to keep track of the two groups in order to evaluate the results at the end. In a double-blind experiment, the researchers conducting the study typically hire experimenters to make any necessary contact with the participants.
In an Observational Study there was no attempt to...
Influence the results.
If intervals are meaningful but ratios or not, it is a...
Interval level of measurement (there is no ))
Define Representative Sample
Is a sample in which the relevant characteristics of the sample members are generally the same as the characteristics of the population. This type of sample is important because only a representative sample can be used to make trustworthy inferences about the population is correct, because in order to study a population with a sample, the sample must be representative of the population in order to give trustworthy, and therefore useful, results.
Statistics
Is the act of collecting information, organizing it into usable form, and understanding what the data means. Statistics are also the raw data that is collected. This raw data can be in the form of numbers or other pieces of information that describe or summarize something.
Population
Is the complete set of people or things being studied, a population parameter is a number that describes a characteristic of the population, a sample is the set of people or things from which the data are obtained, a sample statistic is a number that describes a characteristic of a sample, and raw data are the individual measurements collected.
The Second Step
Is to choose a representative sample from the population. In this case, we are choosing to survey 50,000 households.
Margin of Error
Is used to describe the range of values likely to contain a population parameter and is added to and subtracted from a sample statistic to establish a confidence interval. The confidence interval is used to estimate a population parameter.
Identify the level of measurement (nominal, ordinal, interval, or ratio) Temperatures on the Celsius scale.
Like Fahrenheit temperatures, Celsius temperatures are at the interval level of measurement. An interval of 1 degree C always has the same meaning, but the zero point (0 degree C = freezing point of water) is arbitrary and does not mean "no heat".
Identify the level of measurement (nominal, ordinal, interval, or ratio) Runners' times in the Boston Marathon.
Marathon times are the ratio level of measurement because they have meaningful ratios -- for example, a time of 6 hours really is twice as long as a time of 3 hours-- because they have a true zero point at a time of 0 hours.
Identify the level of measurement (nominal, ordinal, interval, or ratio) Numbers on uniforms that identify players on a soccer team.
Numbers on uniforms don't count or measure anything. They are at the nominal level of measurement because they are labels and do not imply and kind of ordering.
Is this an example of systematic or random error? An FDA agent inspects shipment weights of cereal boxes to identify the following: (1) incorrectly recorded weights of shipments and(2) incorrect entries that were intentionally made to increase the shipment weights. Discuss whether each problem involves random or systematic errors.
Random, because they represent an unpredictable event In the measurement process.
If intervals and ratios are meaningful, it is a...
Ratio of level measurement (there is a 0)
Accuracy is usually defined by
Relative error rather than rather than absolute error.
If a sample is small, it cant be a...
Representative Sample
Sample
Simple random sampling: Choose a sample of items in such a way that every sample of the same size has an equal chance of being selected. Systematic sampling: Result of using a simple system to choose the sample, such as selecting every 10th member of the population. Convenience sampling: A sample that happens to be convenient to select. Cluster sampling: Divide the population into groups, or clusters, and select some of these clusters at random. The sample contains all members of the selected clusters. Stratified sampling: Identify the subgroups, or strata, and then draw a random sample within each stratum. The total sample consists of all the samples from the individual strata.
Variable of Interest
Statistical studies, whether observations or experiments, generally are attempts to measure variables of interest. The term variable refers to an item or quantity that can vary or take on different values, and variables of interest are those that the study seeks to learn about.
Stratified Random Sampling
Stratified random sampling is a sampling method that involves taking samples of a population subdivided into smaller homogenous groups called strata. SEPARATING THE POPULATION INTO TWO AT LEAST TWO STRATAS AND DRAW A SAMPLE FROM EACH GROUP.
Differentiations Between Stratified Sample and Cluster Sample
Stratified random sampling is a sampling method that involves taking samples of a population subdivided into smaller homogenous groups called strata. SEPARATING THE POPULATION INTO TWO AT LEAST TWO STARTAS AND DRAW A SAMPLE FROM EACH GROUP Cluster Sample involves the selction of ALL members in randomly selected groups, or clusters. DIVIDE POPULATION RANDOMLY IN CLUSTERS AND CHOOSE ALL THE MEMBERS OF THE SELECTED CLUSTER.
Do incorrect entries that were intentionally made to increase the shipment weights involve random or systematic errors?
Systematic, because the cause of the error affects all measurements in the same way.
How are the Explanatory and Response Variable Related
The explanatory variable may cause a change and the response variable may respond to a change. Each explanatory variable may affect none, some, or all of the response variables; so each response variable may respond to none, some, or all of the explanatory variables.
The Fifth Step of Basic Steps in a Statistical Study
The fifth step is to draw conclusions to determine what you learned and whether you achieved your goal. In this case, we see that we have an average for time spent streaming internet video and conclude that we achieved our goal.
The First Step
The first step is to state the goal of your study.
Reference Value
The number that we are using as the basis for a comparison.
Compared Value
The other number, which we compare to the reference value.
The Third Step of Basic Steps in a Statistical Study
The third step is to collect raw data from the sample and summarize these data by finding sample statistics of interest.
Imagine a study that seeks to determine whether radon gas causes lung cancer by comparing the lung cancer right in Colorado, where radon is fairly common, with the lung cancer rate in Hong Kong, where radon gas is less common. Suppose the study finds that long cancer rates are nearly the same. Would it be reasonable to conclude the radon is not a significant cause of lung cancer? Is there a confounding variable if so what is it, is there a explanatory variable and response variable if so identify them?
The variables of interest are "amount of radon" and "lung cancer rate". "Amount of radon" is in explanatory variable and "lug cancer" is a response variable. Smoking can also cause lung cancer, so "smoking rate" may be a confounding variable in the study -- especially since smoking is more common in Hong Kong.
Define Statistics
There are two different definitions. One singular; The science of conducting studies to collect, organize, summarize, analyze, and draw conclusions from data. Ex: Sports-number of hits a baseball player gets in a season. One plural; are the data ( numbers or others pieces of information ) that describes or summarize something.
In an Experiment Study there is
There is a treatment and control group.
The Fourth Step of Basic Steps in a Statistical Study
Use the sample statistics to infer the population (parameters).
Representative Sample
a subset of a population that seeks to accurately reflect the characteristics of the larger group. For example, a classroom of 30 students with 15 males and 15 females could generate a representative sample that might include six students: three males and three females.
Variable
any item or quantity that can vary or take on different values. Statistical studies, whether observations or experiments, generally are attempts to measure variables of interest.
Interval Level of Measurement (quantitative)
applied to quantitative data for which intervals are meaningful, but ratios are not. Data at this level have an arbitrary/no zero point. ex: temperature
Ordinal Level of Measurement (qualitative)
applies to qualitative data that can be arranged in some order (such as low to high). It generally does not make sense to do computations with data at the ordinal level of measurement.
Ratio Level of Measurement (quantitative)
applies to quantitative data for which both intervals and ratios are meaningful. Data at this level have a true zero point. ex: distance, weight, incomes, and speed
SAMPLE STATISTICS
are numbers describing characteristics of the sample found by consolidating or summarizing the raw data.
We can minimize the effects of random errors by __________________ ________________ and we can account for the effect of a systematic error by _______________ _________________
averaging multiple measurements; adjusting the affected measurements The unpredictable nature of random errors makes it impossible to correct for them. However, one can minimize the effects of random errors by making many measurements and averaging them.
Nominal Level of Measurement (qualitative)
characterized by data that consist of names, labels, or categories only. The data are qualitative and cannot be arranged in a ranked or ordered way (such as low to high)
Relative Error
compares the size of the absolute error to the true value. Often expressed as a percentage relative error = (absolute error/ true value) x 100
Accuracy
describes how close/approximate a measurement is to the true value.
Absolute Error
describes how far a claimed or measured value lies from the true value absolute error = claimed or measured value - True Value
Descriptive Statistics -
describes raw data in the form of graphics and sample statistics. Descriptive statistics uses the data to provide descriptions of the population, either through numerical calculations or graphs or tables.
Precision
describes the amount of detail in a measurement.
Relative Difference (of = multiple)
describes the size of the absolute difference in comparison to the reference value and can be expressed as a percentage: relative difference = (compared value - reference value)/ reference value x 100%
Placebo
does not have the active ingredients of a treatment being tested in a study, but is identical in appearance to the treatment.
Systematic Errors
errors when there is a problem in the measurement system that affects all measurements in the same way.
Qualitative data consist of values that can be placed into nonnumerical categories, such as ...
gender, color, or rating
Quantitative data consist of values representing counts or measurements, such as ...
incomes, temperatures, or heights
Inferential Statistics -
infers or estimates population parameter from sample data. Inferential statistics makes inferences and predictions about a population based on a sample of data taken from the population in question.
Cluster Sample
involves the selection of ALL members in randomly selected groups, or clusters. DIVIDE POPULATION RANDOMLY IN CLUSTERS AND CHOOSE ALL THE MEMBERS OF THE SELECTED CLUSTER.
A SAMPLE
is a subset of the population from which data are actually obtained.
Convenience Sample
is a type of non-probability sampling method where the sample is taken from a group of people easy to contact or to reach. Prone to be the most biased sample. For example, standing at a mall or a grocery store and asking people to answer questions would be an example of a convenience sample. USES RESULTS THAT ARE READILY AVALIABLE.
Systematic Sampling
is a type of probability sampling method in which sample members from a larger population are selected according to a random starting point but with a fixed, periodic interval. SELECTING EVERY (OTHER) __?__ MEMBER/NUMBER.
Absolute Change
is the actual increase or decrease from a reference value to a new value: absolute change = new value - reference value
RAW DATA
is the actual measurements of observations collected from the sample constitute.
Absolute Difference (subtract)
is the difference the compared value and the reference value: absolute difference = compared value - reference value
Relative Change
is the size of the absolute change in comparison to the reference value and can be expressed as a percentage. relative change = (new value - reference value)/ reference value x 100%
Random Errors
occur because of random and inherently unpredictable events in the measurement process.
Participation Bias
occurs any time participation in a study is voluntary. ex: mailed surveys
Random Errors occur because
of random and inherently unpredictable events in the measurement process.
Two Types of Measurement Error
systematic and Random
Purpose of Statistics
to help us make good decisions about issues that involve uncertainty
A random error occurs because of _______________ ______________ and a systematic error occurs because of _______________________ ___________________.
unpredictable events; some problem in the measurement system. A systematic error affects all measurements in the same way, such as making them all too high or all too low. If a systematic error is discovered, one can go back and adjust the affected measurements.
Systematic Errors Occur
when there is a problem in the measurement system that affects all measurements in the same way.