Statistics Math 146

Ace your homework & exams now with Quizwiz!

Process of Statistics

1. Identify the research objective. A researcher must determine the question(s) he or she wants to be answered. The question(s) must clearly identify the population that is to be studied. 2. Collect the data needed to answer the question(s) posed in (1). Conducting research on an entire population is often difficult and expensive, so we typically look at a sample. This step is vital to the statistical process, because if the data are not collected correctly, the conclusions drawn are meaningless. Do not overlook the importance of appropriate data collection. We discuss this step in detail in Sections 1.2 through 1.6. 3. Describe the data. Descriptive statistics allow the researcher to obtain an overview of the data and can help determine the type of statistical methods the researcher should use. We discuss this step in detail in Chapters 2 through 4. 4. Perform inference. Apply the appropriate techniques to extend the results obtained from the sample to the population and report a level of reliability of the results. We discuss techniques for measuring reliability in Chapters 5 through 8 and inferential techniques in Chapters 9 through 15

example 2

19) A farmer wishes to test the effects of a new fertilizer on her tomato yield. She has four equal-sized plots of land-- one with sandy soil, one with rocky soil, one with clay-rich soil, and one with average soil. She divides each of the four plots into three equal-sized portions and randomly labels them A, B, and C. The four A portions of land are treated with her old fertilizer. The four B portions are treated with the new fertilizer, and the four C's are treated with no fertilizer. At harvest time, the tomato yield is recorded for each section of land. a) Identify the experimental units. Dirt b) What is the treatment in this experiment? Fertilizer c) What is the response variable in this experiment? tomato yield d) How many levels does the treatment in this experiment have? 3 old fertilizer, new fertilizer and no fertilzer e) What type of experimental design is this? (random, block, matched-pairs, or single-blind) block: four different soil types

find percentage

21) Given the following table, where people were asked which of 3 paintings they liked the best, create a relative frequency distribution for men and for women: (round to hundredths place) Men Women Painting A 38 15 Painting B 25 31 Painting C 10 12 Total Relative Frequency for Men Relative Frequency for Women Painting A Painting B Painting C Total for Men: 73. Then find percent for each painting. A = 38/73=0.52 B=25/73=0.34 C=10/73=0.14 Total for Women: 58 A=15/58=0.26 B=31/58=0.53 C=12/58=0.21

2

23) Find the class width for the frequency table below. Class Frequency 35-36 3 37-38 1 39-40 3 41-42 6 43-44 2 Take two successive beginnings of the class and subtract: 37 - 35 = 2

Pareto Chart

A Pareto chart is a bar graph whose bars are drawn in decreasing order of frequency or relative frequency

census

A census is a list of all individuals in a population along with certain characteristics of each individual.

completely randomized design

A completely randomized design is one in which each experimental unit is randomly assigned to a treatment.

confounding variable

A confounding variable is an explanatory variable that was considered in a study whose effect cannot be distinguished from a second explanatory variable in the study.

control group

A control group serves as a baseline treatment that can be used to compare it to other treatments. For example, a researcher in education might want to determine if students who do their homework using an online homework system do better on an exam than those who do their homework from the text. The students doing the text homework might serve as the control group (since this is the currently accepted practice). The factor is the type of homework. There are two treatments: online homework and text homework.

convenience sample

A convenience sample is a sample in which the individuals are easily obtained and not based on randomness

example

A drug company wanted to test a new depression medication. The researchers found 200 adults aged 25-35 and randomly assigned them to two groups. The first group received the new drug, while the second received a placebo. After one month of treatment, the percentage of each group whose depression symptoms decreased was recorded and compared. a) Identify the experimental units. the 200 adults aged 25-35 b) What is the treatment in this experiment? the new drug c) What is the response variable in this experiment? depression symptoms are measured d) How many levels does the treatment in this experiment have? 2 drugs and placebo e) What type of experimental design is this? (random, block, matched-pairs, or single-blind) random, single-blind

frequency distribution

A frequency distribution lists each category of data and the number of occurrences for each category of data. Note

Histogram

A histogram is constructed by drawing rectangles for each class of data. The height of each rectangle is the frequency or relative frequency of the class. The width of each rectangle is the same and the rectangles touch each other

lurking variable

A lurking variable is an explanatory variable that was not considered in a study, but that affects the value of the response variable in the study. In addition, lurking variables are typically related to explanatory variables considered in the study

matched pairs design

A matched-pairs design is an experimental design in which the experimental units are paired up. The pairs are selected so that they are related in some way (that is, the same person before and after a treatment, twins, husband and wife, same geographical location, and so on). There are only two levels of treatment in a matched-pairs design.

pie chart

A pie chart is a circle divided into sectors. Each sector represents a category of data. The area of each sector is proportional to the frequency of the category

randomized block design

A randomized block design is used when the experimental units are divided into homogeneous groups called blocks. Within each block, the experimental units are randomly assigned to treatments

Sample

A relatively small proportion of people who are chosen in a survey so as to be representative of the whole.

simple random sampling

A sample of size n from a population of size N is obtained through simple random sampling if every possible sample of size n has an equally likely chance of occurring. The sample is then called a simple random sample

stratified sample

A stratified sample is obtained by separating the population into nonoverlapping groups called strata and then obtaining a simple random sample from each stratum. The individuals within each stratum should be homogeneous (or similar) in some way

ratio measurement

A value of zero does not mean the absence of the quantity. Arithmetic operations such as addition and subtraction can be performed on values of the variable. A variable is at the ratio level of measurement if it has the properties of the interval level of measurement and the ratios of the values of the variable have meaning. A value of zero means the absence of the quantity. Arithmetic operations such as multiplication and division can be performed on the values of the variable Note

interval level of measurement

A variable is at the interval level of measurement if it has the properties of the ordinal level of measurement and the differences in the values of the variable have meaning.

response variable

A variable that measures an outcome of a study.

experiment,factors and treatment

An experiment is a controlled study conducted to determine the effect varying one or more explanatory variables or factors has on a response variable. Any combination of the values of the factors is called a treatment

observational study

An observational study measures the value of the response variable without attempting to influence the value of either the response or explanatory variables. That is, in an observational study, the researcher observes the behavior of the individuals without trying to influence the outcome of the study. Observational studies do not allow a researcher to claim causation, only association.

Open questions

An open question allows the respondent to choose his or her response: A closed question requires the respondent to choose from a list of predetermined responses: What is the most important problem facing America's youth today? What is the most important problem facing America's youth today? (a) Drugs (b) Violence (c) Single-parent homes (d) Promiscuity (e) Peer pressure In closed questions, the possible responses should be rearranged because respondents are likely to choose early choices in a list rather than later choices. An open question should be phrased so that the responses are similar. (You don't want a wide variety of responses.) This allows for easy analysis of the responses.

case-control study

Case-control Studies These studies are retrospective, meaning that they require individuals to look back in time or require the researcher to look at existing records. In case-control studies, individuals who have a certain characteristic may be matched with those who do not. For example, we might match individuals who smoke with those who do not. When we say "match" individuals, we mean that we would like the individuals in the study to be as similar (homogeneous) as possible in terms of demographics and other variables that may affect the response variable. Once homogeneous groups are established, we would ask the individuals in each group how much they smoked over the past 25 years. The rate of lung cancer between the two groups would then be compared. A disadvantage to this type of study is that it requires individuals to recall information from the past. It also requires the individuals to be truthful in their responses. An advantage of case-control studies is that they can be done relatively quickly and inexpensively.

Closed questions

Closed questions limit the number of respondent choices and, therefore, the results are much easier to analyze. The limited choices, however, do not always include a respondent's desired choice. In that case, the respondent will have to choose a secondary answer or skip the question. Survey designers recommend conducting pretest surveys with open questions and then using the most popular answers as the choices on closed-question surveys. Another issue to consider in the closed-question design is the number of possible responses. The option "no opinion" should be omitted, because this option does not allow for meaningful analysis. The goal is to limit the number of choices in a closed question without forcing respondents to choose an option they do not prefer, which would make the survey have response bia

cohort

Cohort Studies A cohort study first identifies a group of individuals to participate in the study (the cohort). The cohort is then observed over a long period of time. During this period, characteristics about the individuals are recorded and some individuals will be exposed to certain factors (not intentionally) and others will not. At the end of the study the value of the response variable is recorded for the individual Typically, cohort studies require many individuals to participate over long periods of time. Because the data are collected over time, cohort studies are prospective. Another problem with cohort studies is that individuals tend to drop out due to the long time frame. This could lead to misleading results. That said, cohort studies are the most powerful of the observational studies. One of the largest cohort studies is the Framingham Heart Study. In this study, more than 10,000 individuals have been monitored since 1948. The study continues to this day, with the grandchildren of the original participants taking part in the study. This cohort study is responsible for many of the breakthroughs in understanding heart disease. Its cost is in excess of $10 million.

Statistics

Collection of methods for planning experiments, obtaining data, organizing, summarizing, presenting, analyzing, interpreting, and drawing conclusions based on data.

Confounding

Confounding in a study occurs when the effects of two or more explanatory variables are not separated. Therefore, any relation that may exist between an explanatory variable and the response variable may be due to some other variable or variables not accounted for in the study

cross-sectional study

Cross-sectional Studies These observational studies collect information about individuals at a specific point in time or over a very short period of time. For example, a researcher might want to assess the risk associated with smoking by looking at a group of people, determining how many are smokers, and comparing the rate of lung cancer of the smokers to the nonsmokers. An advantage of cross-sectional studies is that they are cheap and quick to do. However, they have limitations. For our lung cancer study, individuals might develop cancer after the data are collected, so our study will not give the full picture. Note

Interval

Determine the level of measurement of the variable choose nominal, ordinal, interval or ratio The day of the month (the 0th day does not mean the absence of a day, and the 9th day is not twice as much as the 4th day)

Ratio

Determine the level of measurement of the variable choose nominal, ordinal interval or ratio weight of rice bought by a customer. (measurement)

Interval

Determine the level of measurement of the variable choose nominal,ordinal, interval or ratio The year of manufacture of a car (there is no meaning in doubling the year of the manufacture, if this had been the age of the car in years, then doubling does have meaning and it would be a ratio measure.)

Nominal

Determine the level of measurement of the variable. choose nominal, ordinal, interval or ratio The musical instrument played by a music student. (instruments are named not numerical)

cross-sectional

Determine what type of observational study is described. Choose retrospective, cross-sectional or cohort. Can money buy happiness? A researcher wanted to determine whether there was any association between economic status and happiness. She selected a sample of 1000 adults and interviewed them. Each person was asked about their financial situation and their level of happiness was evaluated. The researcher analyzed the results to determine whether there was an association between economic status and happiness. (happened at a specific point in time.)

cohort study

Determine what type of observational study is described. Choose retrospective, cross-sectional or cohort. Researchers wanted to determine whether there was an association between city driving and stomach ulcers. They selected a sample of 900 young adults and followed them for a twenty-year period. At the start of the study none of the participants was suffering from a stomach ulcer. Each person kept track of the number of hours per week they spent driving in city traffic. At the end of the study each participant underwent tests to determine whether they were suffering from a stomach ulcer. The researchers analyzed the results to determine whether there was an association between city driving and stomach ulcers. (happened over a 20 year period.)

cross-sectional study

Determine what type of observational study is described. Choose retrospective, cross-sectional or cohort. Researchers wanted to determine whether there was an association between high blood pressure and the suppression of emotions. The researchers looked at 1800 adults enrolled in a Health Initiative Observational Study. Each person was interviewed and asked about their response to emotions. In particular they were asked whether their tendency was to express or to hold in anger and other emotions. The degree of suppression of emotions was rated on a scale of 1 to 10. Each person's blood pressure was also measured. The researchers analyzed the results to determine whether there was an association between high blood pressure and the suppression of emotions. (happened at a specific point in time.)

Retrospective? case-control study

Determine what type of observational study is described. Choose retrospective, cross-sectional or cohort. Vitamin D is important for the metabolism of calcium and exposure to sunshine is an important source of vitamin D. A researcher wanted to determine whether osteoporosis was associated with a lack of exposure to sunshine. He selected a sample of 250 women with osteoporosis and an equal number of women without osteoporosis. The two groups were matched - in other words they were similar in terms of age, diet, occupation, and exercise levels. Histories on exposure to sunshine over the previous twenty years were obtained for all women. The total number of hours that each woman had been exposed to sunshine in the previous twenty years was estimated. The amount of exposure to sunshine was compared for the two groups. (subjects were asked to look back over the last 20 years and estimate.)

Data

Facts and statistics collected together for reference or analysis

random sampling example

For the results of a survey to be reliable, the characteristics of the individuals in the sample must be representative of the characteristics of the individuals in the population. The key to obtaining a sample representative of a population is to let chance or randomness play a role in dictating which individuals are in the sample, rather than convenience. If convenience is used to obtain a sample, the results of the survey are meaningless

how to make a histogram in statcrunch

How to make a histogram in StatCrunch: Open a new spreadsheet Enter data in a single column Graph - histogram. Select Column (var 1), enter start value and width, title on x-axis, compute. Save using the right click on the graph.

simple random

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) A lobbyist for the oil industry assigns a number to each senator and then uses a computer to randomly generate ten numbers. The lobbyist contacts the senators corresponding to these numbers. (used random number technique)

stratified

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) A market researcher randomly selects 200 homeowners under 65 years of age and 200 homeowners over 65 years of age. (used some from each strata (age group)

systematic

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) A sample consists of every 30th worker from a group of 1000 workers. (every 30th)

convenience

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) A statistics student interviews everyone in his apartment building to determine who owns a cell phone.

cluster

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) At a local technical school, five auto repair classes are randomly selected and all of the students from each class are interviewed. (all from each cluster (class) are interviewed)

systematic

Identify the type of sampling used. (cluster, stratified, simple random, convenience, systematic) Every fifth adult entering an airport is checked for extra security screening. (every 5th)

designed experiment

If a researcher assigns the individuals in a study to a certain group, intentionally changes the value of an explanatory variable, and then records the value of the response variable for each group, the study is a designed experiment.

bias

If the results of the sample are not representative of the population, then the sample has bias

Blinding

In an experiment, it is important that each group be treated the same way. It is also important that individuals do not adjust their behavior because of the treatment they are receiving. For this reason, many experiments use a technique called blinding. Blinding refers to nondisclosure of the treatment an experimental unit is receiving. There are two types of blinding: single blinding and double blinding

experimental group and subject

In an experiment, the experimental unit is a person, object, or some other well-defined item upon which a treatment is applied. We often refer to the experimental unit as a subject when he or she is a person. The subject is analogous to the individual in a survey. The goal in an experiment is to determine the effect various treatments have on the response variable. For example, we might want to determine whether a new treatment is superior to an existing treatment (or no treatment at all). To make this determination, experiments require a control group.

single-blind experiment & double-blind experiment

In single-blind experiments, the experimental unit (or subject) does not know which treatment he or she is receiving. In double-blind experiments, neither the experimental unit nor the researcher in contact with the experimental unit knows which treatment the experimental unit is receiving Note

Classes

In summarizing quantitative data, we first determine whether the data are discrete or continuous. If the data are discrete with relatively few different values of the variable, then the categories of data (called classes) will be the observations (as in qualitative data). If the data are discrete, but with many different values of the variable or if the data are continuous, then the categories of data (the classes) must be created using intervals of numbers. We will first present the techniques for organizing discrete quantitative data when there are relatively few different values and then proceed to organizing continuous quantitative data. Note

Explanatory variable

In the study in Example 2, the researchers obtained 480 rats and divided the rats into three groups. Each group was intentionally exposed to various levels of radiation. The researchers then compared the number of rats that had brain tumors. Clearly, there was an attempt to influence the individuals in this study because the value of the explanatory variable (exposure to radio frequency) was influenced. Because the researchers controlled the value of the explanatory variable, we call the study Note

nonresponse bias

Nonresponse bias exists when individuals selected to be in the sample who do not respond to the survey have different opinions from those who do. Nonresponse can occur because individuals selected for the sample do not wish to respond or the interviewer was unable to contact them.

nonsampling error

Nonsampling errors result from undercoverage, nonresponse bias, response bias, or data-entry error. Such errors could also be present in a complete census of the population.

obtaining a cluster sample

Obtaining a Cluster Sample Problem A sociologist wants to gather data regarding household income within the city of Boston. Obtain a sample using cluster sampling. Approach The city of Boston can be set up so that each city block is a cluster. Once the city blocks have been identified, obtain a simple random sample of the city blocks and survey all households on the blocks selected. Solution Suppose there are 10,493 city blocks in Boston. First, the sociologist must number the blocks from 1 to 10,493. Suppose the sociologist has enough time and money to survey 20 clusters (city blocks). The sociologist should obtain a simple random sample of 20 numbers between 1 and 10,493 and survey all households from the clusters selected. Cluster sampling is a good choice in this example because it reduces the travel time to households that is likely to occur with both simple random sampling and stratified sampling. In addition, there is no need to obtain a frame of all the households with cluster sampling. The only frame needed is one that provides information regarding city blocks. •

Dot Plot

One more graph! We draw a dot plot by placing each observation horizontally in increasing order and placing a dot above the observation each time it is observed.

uniform distribution

One way that a variable is described is through the shape of its distribution. Distribution shapes are typically classified as symmetric, skewed left, or skewed right. Figure 15 on the following page displays various histograms and the shape of the distribution. Figures 15(a) and (b) show symmetric distributions. They are symmetric because, if we split the histogram down the middle, the right and left sides are mirror images. Figure 15(a) is a uniform distribution because the frequency of each value of the variable is evenly spread out across the values of the variable.

Lower and upper class limits and class width

Organize Continuous Data in tables Classes are categories into which data are grouped. When a data set consists of a large number of different discrete data values or when a data set consists of continuous data, we create classes by using intervals of numbers. Table 10 is a typical frequency distribution created from continuous data. The data represent the number of U.S. residents, ages 25-74, who had earned a bachelor's degree or higher in 2013. Notice that the data are categorized, or grouped, by intervals of numbers. Each interval represents a class. For example, the first class is 25- to 34-year-old U.S. residents who had a bachelor's degree or higher. We read this interval as follows: "The number of U.S. residents, ages 25-34, with a bachelor's degree or higher was 14,481,000 in 2013." There are five classes in the table, each with a lower class limit (the smallest value within the class) and an upper class limit (the largest value within the class). The lower class limit for the first class in Table 10 is 25; the upper class limit is 34. The class width is the difference between consecutive lower class limits. In Table 10 the class width is 35-25=10. The data in Table 10 are continuous. So the class 25-34 actually represents 25-34.999 . . . , or 25 up to every value less than 35. Notice that the classes in Table 10 do not overlap. This is necessary to avoid confusion as to which class a data value belongs. Notice also that the class widths are equal for all classes. One exception to the requirement of equal class widths occurs in open-ended tables. A table is open ended if the first class has no lower class limit or the last class has no upper class limit. The data in Table 11 represent the number of births to unmarried mothers in 2012 in the United States. The last class in the table, "40 and over," is open-ende Note

difference between quantitative and qualitative variables

Problem Determine whether the following variables are qualitative or quantitative. (a) Gender (b) Temperature (c) Number of days during the past week that a college student studied (d) Zip code Approach Quantitative variables are numerical measures such that meaningful arithmetic operations can be performed on the values of the variable. Qualitative variables describe an attribute or characteristic of the individual that allows researchers to categorize the individual. Solution (a) Gender is a qualitative variable because it allows a researcher to categorize the individual as male or female. Notice that arithmetic operations cannot be performed on these attributes. (b) Temperature is a quantitative variable because it is numeric, and operations such as addition and subtraction provide meaningful results. For example, 70°F is 10°F warmer than 60°F. (c) Number of days during the past week that a college student studied is a quantitative variable because it is numeric, and operations such as addition and subtraction provide meaningful results. (d) Zip code is a qualitative variable because it categorizes a location. Notice that, even though zip codes are numeric, adding or subtracting zip codes does not provide meaningful results

Difference between discrete and continuous variables

Problem Determine whether the quantitative variables are discrete or continuous. (a) The number of heads obtained after flipping a coin five times. (b) The number of cars that arrive at a McDonald's drive-thru between 12:00 p.m. and 1:00 p.m. (c) The distance a 2014 Toyota Prius can travel in city driving conditions with a full tank of gas. Approach A variable is discrete if its value results from counting. A variable is continuous if its value is measured. Solution (a) The number of heads obtained by flipping a coin five times is a discrete variable because we can count the number of heads obtained. The possible values of this discrete variable are 0, 1, 2, 3, 4, 5. (b) The number of cars that arrive at a McDonald's drive-thru between 12:00 p.m. and 1:00 p.m. is a discrete variable because we find its value by counting the cars. The possible values of this discrete variable are 0, 1, 2, 3, 4, and so on. Notice that this number has no upper limit. (c) The distance traveled is a continuous variable because we measure the distance (miles, feet, inches, and so on). •

seed

Problem Find a simple random sample of five clients for the problem presented in Example 2. Approach The approach is similar to that given in Example 2. Step 1 Obtain the frame and assign the clients numbers from 01 to 30. Step 2 Randomly select five numbers using a random number generator. To do this, we must first set the seed. The seed is an initial point for the generator to start creating random numbers—like selecting the initial point in the table of random numbers. The seed can be any nonzero number. Statistical software such as StatCrunch, Minitab, or Excel can be used to generate random numbers, but we will use a TI-84 Plus C graphing Note

response bias

Response bias exists when the answers on a survey do not reflect the true feelings of the respondent. Response bias can occur in a number of ways.

Undercoverage

Sampling bias also results due to undercoverage, which occurs when the proportion of one segment of the population is lower in a sample than it is in the population. Undercoverage can result if the frame used to obtain the sample is incomplete or not representative of the population. Some frames, such as the list of all registered voters,

sampling bias

Sampling bias means that the technique used to obtain the sample's individuals tends to favor one part of the population over another. Any convenience sample has sampling bias because the individuals are not chosen through a random sample.

sampling error

Sampling error results from using a sample to estimate information about a population. This type of error occurs because a sample gives incomplete information about a population.

List some ways that a graph can be Misleading:

Statistics: The only science that enables different experts using the same figures to draw different conclusions.—Evan Esar Statistics often gets a bad rap for having the ability to manipulate data to support any position. One method of distorting the truth is through graphics. We mentioned in Section 2.1 how visual displays send more powerful messages than raw data or even tables of data. Since graphics are so powerful, care must be taken in constructing graphics and in interpreting their messages. Graphics may mislead or deceive. We will call graphs misleading if they unintentionally create an incorrect impression. We consider graphs deceptive if they purposely create an incorrect impression. In either case, a reader's incorrect impression can have serious consequences. Therefore, it is important to be able to recognize misleading and deceptive graphs. The most common graphical misrepresentations of data involve the scale of the graph, an inconsistent scale, or a misplaced origin. Increments between tick marks should be constant, and scales for comparative graphs should be the same. Also, because readers usually assume that the baseline, or zero point, is at the bottom of the graph, a graph that begins at a higher or lower value can be misleading.

Steps in Systematic Sampling

Steps in Systematic Sampling 1. If possible, approximate the population size, N. 2. Determine the sample size desired, n. 3. Compute N n and round down to the nearest integer. This value is k. 4. Randomly select a number between 1 and k. Call this number p. 5. The sample will consist of the following individuals: p, p+k, p+2k,c, p+(n-1)k

side-by-side bar graph

Suppose we want to know whether more people are finishing college today than in 1990. We could draw a side-by-side bar graph to compare the data for the two different years. Data sets should be compared by using relative frequencies, because different sample or population sizes make comparisons using frequencies difficult or misleading.

characteristics of an experiment

The Characteristics of an Experiment Problem Lipitor is a cholesterol-lowering drug made by Pfizer. In the Collaborative Atorvastatin Diabetes Study (CARDS), the effect of Lipitor on cardiovascular disease was assessed in 2838 subjects, ages 40 to 75, with type 2 diabetes, without prior history of cardiovascular disease. In this placebo-controlled, double-blind experiment, subjects were randomly allocated to either Lipitor 10 mg daily (1428) or placebo (1410) and were followed for 4 years. The response variable was the occurrence of any major cardiovascular event. Lipitor significantly reduced the rate of major cardiovascular events (83 events in the Lipitor group versus 127 events in the placebo group). There were 61 deaths in the Lipitor group versus 82 deaths in the placebo group. (a) What does it mean for the experiment to be placebo-controlled? (b) What does it mean for the experiment to be double-blind? (c) What is the population for which this study applies? What is the sample? (d) What are the treatments? (e) What is the response variable? Is it qualitative or quantitative? Approach Apply the definitions just presented. Solution (a) The placebo is a medication that looks, smells, and tastes like Lipitor. The placebo control group serves as a baseline against which to compare the results from the group receiving Lipitor. The placebo is also used because people tend to behave differently when they are in a study. By having a placebo control group, the effect of this is neutralized. (b) Since the experiment is double-blind, the subjects, as well as the individual monitoring the subjects, do not know whether the subjects are receiving Lipitor ExamPlE 1 r the placebo. The experiment is double-blind so that the subjects receiving the medication do not behave differently from those receiving the placebo and so the individual monitoring the subjects does not treat those in the Lipitor group differently from those in the placebo group. (c) The population is individuals from 40 to 75 years of age with type 2 diabetes without a prior history of cardiovascular disease. The sample is the 2838 subjects in the study. (d) The treatments are 10 mg of Lipitor or a placebo daily. (e) The response variable is whether the subject had any major cardiovascular event, such as a stroke, or not. It is a qualitative variable.

Two successive lower class limits

The class width is the difference between A) The upper class limit and the lower class limit of a class B) The largest frequency and the smallest frequency C) Two successive lower class limits D) The high and the low data values

sampling with replacement and without replacement

The clients must be listed (the frame) and numbered from 01 to 30. Step 2 Five unique numbers will be randomly selected. The clients corresponding to the numbers are sent a survey. This process is called sampling without replacement. In a sample without replacement, an individual who is selected is removed from the population and cannot be chosen again. In a sample with replacement, a selected individual is placed back into the population and could be chosen a second time. We use sampling without replacement so that we don't select the same client twice.

Skewed left/skewed right

The distribution in Figure 15(c) is skewed right. Notice that the tail to the right of the peak is longer than the tail to the left of the peak. Finally, Figure 15(d) illustrates a distribution that is skewed left, because the tail to the left of the peak is longer than the tail to the right of the peak

voluntary response

The most popular of the many types of convenience samples are those in which the individuals in the sample are self-selected (the individuals themselves decide to participate in a survey). These are also called voluntary response samples. One example of self-selected sampling is phone-in polling; a radio personality will ask his or her listeners to phone the station to submit their opinions. Another example is the use of the Internet to conduct surveys. For example, a television news show will present a story regarding a certain topic and ask its viewers to "tell us what you think" by completing a questionnaire online or phoning in an opinion. Both of these samples are poor designs because the individuals who decide to be in the sample generally have strong opinions about the topic. A more typical individual in the population will not bother phoning or logging on to a computer to complete a survey. Any inference made regarding the population from this type of sample should be made with extreme caution. Convenience samples yield unreliable results because the individuals participating in the survey are not chosen using random sampling. Instead, the interviewer or

relative frequency

The relative frequency is the proportion (or percent) of observations within a category and is found using the formula Relative frequency= frequency/ sum of all frequencies

frame

The results of Example 1 leave one question unanswered: How do we select the individuals in a simple random sample? We could write the names of the individuals in the population on different sheets of paper and then select names from a hat. Often, however, the size of the population is so large that performing simple random sampling in this fashion is not practical. Instead, each individual in the population is assigned a unique number between 1 and N, where N is the size of the population. Then n distinct random numbers from this list are selected, where n represents the size of the sample. To number the individuals in the population, we need a frame—a list of all the individuals within the population. Note

1.1

The weights (in pounds) of babies born at St Mary's hospital last month are summarized in the table. Find the class width. Class Frequency 5.0-6 7 6.1-7.1 11 7.2-8.2 20 8.3-9.3 10 9.4-10.4 3 6.1 - 5.0 = 1.1

design

To design an experiment means to describe the overall plan in conducting the experiment. Conducting an experiment requires a series of step

Variable

Variables are the characteristics of the individuals within the population. For example, recently, my son and I planted a tomato plant in our backyard. We collected information about the tomatoes harvested from the plant. The individuals we studied were the tomatoes. The variable that interested us was the weight of a tomato. My son noted that the tomatoes had different weights even though they came from the same plant. He discovered that variables such as weight may vary. If variables did not vary, they would be constants, and statistical inference would not be necessary. Think about it this way: If each tomato had the same weight, then knowing the weight of one tomato would allow us to determine the weights of all tomatoes. However, the weights of the tomatoes vary. One goal of research is to learn the causes of the variability.

D

What is the difference between a bar chart and a histogram? A) The bars in a bar chart are all the same width while the bars of a histogram may be of various widths. B) There is no difference between these two graphical displays. C) The bars in a bar chart may be of various widths while the bars of a histogram are all the same width. D) The bars on a bar chart do not touch while the bars of a histogram do touch.

Completely Randomized Design

a Completely Randomized Design Problem A farmer wishes to determine the optimal level of a new fertilizer on his soybean crop. Design an experiment that will assist him. Approach Follow the steps for designing an experiment. Solution Step 1 The farmer wants to identify the optimal level of fertilizer for growing soybeans. We define optimal as the level that maximizes yield. So the response variable will be crop yield. Step 2 Some factors that affect crop yield are fertilizer, precipitation, sunlight, method of tilling the soil, type of soil, plant, and temperature. Step 3 In this experiment, we will plant 60 soybean plants (experimental units). Step 4 List the factors and their levels. • Fertilizer. This factor will be controlled and set at three levels. We wish to measure the effect of varying the level of this variable on the response variable, yield. We will set the treatments (level of fertilizer) as follows: Treatment A: 20 soybean plants receive no fertilizer. Treatment B: 20 soybean plants receive 2 teaspoons of fertilizer per gallon of water every 2 weeks. Treatment C: 20 soybean plants receive 4 teaspoons of fertilizer per gallon of water every 2 weeks. See Figure 6. • Precipitation. The amount of rainfall cannot be controlled, but the amount of watering done can be controlled. Each plant will receive the same amount of precipitation. • Sunlight. This uncontrollable factor will be roughly the same for each plant. • Method of tilling. Control this factor by using round-up ready method of tilling for each plant. • Type of soil. Certain aspects of the soil, such as level of acidity, can be controlled. In addition, each plant will be planted within a 1-acre area, so it is reasonable to assume that the soil conditions for each plant are equivalent. • Plant. There may be variation from plant to plant. To account for this, randomly assign the plants to a treatment. • Temperature. This factor is uncontrollable, but will be the same for each plant. Step 5 (a) Randomly assign each plant to a treatment group. First, number the plants from 1 to 60 and randomly generate 20 numbers. The plants corresponding to these numbers get treatment A. Next number the remaining plants 1 to 40 and randomly generate 20 numbers. The plants corresponding to these numbers get treatment B. The remaining plants get treatment C. Now till the soil, plant the soybean plants, and fertilize according to the schedule prescribed. (b) At the end of the growing season, determine the crop yield for each plant. Step 6 Determine any differences in yield among the three treatment groups. Figure 7 on the following page illustrates the experimental design

matched-pairs design example

a matched-Pairs Design Problem An educational psychologist wants to determine whether listening to music has an effect on a student's ability to learn. Design an experiment to help the psychologist answer the question. Approach We will use a matched-pairs design by matching students according to IQ and gender (just in case gender plays a role in learning with music). Solution Match students according to IQ and gender. For example, match two females with IQs in the 110 to 115 range. For each pair of students, flip a coin to determine which student is assigned the treatment of a quiet room or a room with music playing in the background. Each student will be given a statistics textbook and asked to study Section 1.1. After 2 hours, the students will enter a testing center and take a short quiz on material in the section. Compute the difference in the scores of each matched pair. Any differences in scores will be attributed to the treatment. Figure 8 illustrates the design.

Individual

a person or object that is a member of the population being studied

continuous variable

a quantitative variable that has an infinite number of possible values that are not countable

systematic sample

a sample drawn by selecting individuals systematically from a sampling frame

qualitative or categorical variables

allow for classification of individuals based on some attribute or characteristic

bar graph

bar graph is constructed by labeling each category of data on either the horizontal or vertical axis and the frequency or relative frequency of the category on the other axis. Rectangles of equal width are drawn for each category. The height of each rectangle represents the category's frequency or relative frequency.

cluster sample

cluster sample is obtained by selecting all individuals within a randomly selected collection or group of individuals

descriptive statistics

consists of methods for organizing, displaying, and describing data by using tables, graphs, and summary measures

Ordinal

determine the level of measurement of the variable. choose nominal,ordinal, interval or ratio The medal received (gold,silver,bronze) by an Olympic gymnast. (medals are named but an order is implied)

Ratio

determine the level of measurement of the variable. choose nominal,ordinal,interval or ratio height of a tree (can have zero height, and an 8 ft tree is twice as tass as a 4ft tree)

Random sampling

evaluated. The researcher analyzed the results to determine whether there was an association between economic status and happiness.

bell-shaped distribution

figure 15(b) displays a bell-shaped distribution because the highest frequency occurs in the middle and frequencies tail off to the left and right of the middle. That is, the graph looks like the profile of all bell. Note

finding percentage

finding what percentage 20) Suppose the payroll amounts for 26 major-league baseball teams are given, and 10 of those are in the $20 - $30 million range. Calculate approximately what percentage of the payrolls were in the $20-$30 million range. Round to the nearest whole percent. 10/26=0.3846 or about 38%

Blocking

gouping together similar (homogeneous) experimental units and then randomly assigning the experimental units within each group to a treatment is called blocking. Each group of homogeneous individuals is called a blocking

Statistic

is a numerical measurement describing some characteristic of a sample

Discreate Variables

is a quantitative variable that either has a finte number of possible values or a countable number of possible values.

Placebo

method for defining the control group is through the use of a placebo. A placebo is an innocuous medication, such as a sugar tablet, that looks, tastes, and smells like the experimental medication.

Parameter

numerical summary of a population

quantitative variable

provide numerical measures of individuals. Arithmetic operations such as addition and subtraction can be performed on the values and provide meaningful results.

Ordinal

ranking( first place,second place, etc) of contestants in a singing competition. (like the medals)

steps in designing an experiment

step 1 Identify the Problem to Be Solved. The statement of the problem should be as explicit as possible and should provide the experimenter with direction. The statement must also identify the response variable and the population to be studied. Often, the statement is referred to as the claim. Step 2 Determine the Factors That Affect the Response Variable. The factors are usually identified by an expert in the field of study. In identifying the factors, ask, "What things affect the value of the response variable?" After the factors are identified, determine which factors to fix at some predetermined level, which to manipulate, and which to leave uncontrolled. Step 3 Determine the Number of Experimental Units. As a general rule, choose as many experimental units as time and money allow. Techniques (such as those in Sections 9.1 and 9.2) exist for determining sample size, provided certain information is available. Step 4 Determine the Level of Each Factor. There are two ways to deal with the factors: control or randomize. 1. Control: There are two ways to control the factors. (a) Set the level of a factor at one value throughout the experiment (if you are not interested in its effect on the response variable). (b) Set the level of a factor at various levels (if you are interested in its effect on the response variable). The combinations of the levels of all varied factors constitute the treatments in the experiment. 2. Randomize: Randomly assign the experimental units to treatment groups. Because it is difficult, if not impossible, to identify all factors in an experiment, randomly assigning experimental units to treatment groups mutes the effect of variation attributable to factors (explanatory variables) not controlled. Step 5 Conduct the Experiment. (a) Replication occurs when each treatment is applied to more than one experimental unit. Using more than one experimental unit for each treatment ensures the effect of a treatment is not due to some characteristic of a single experimental unit. It is a good idea to assign an equal number of experimental units to each treatment. (b) Collect and process the data. Measure the value of the response variable for each replication. Then organize the results. The idea is that the value of the response variable for each treatment group is the same before the experiment because of randomization. Then any difference in the value of the response variable among the different treatment groups is a result of differences in the level of the treatment. Step 6 Test the Claim. This is the subject of inferential statistics. Inferential statistics is a process in which generalizations about a population are made on the basis of results obtained from a sample. Provide a statement regarding the level of confidence in the generalization. Methods of inferential statistics are presented in Chapters 9 through 1

Population

the entire group of individuals about which we want information

inferential statistics

uses methods that take a result from a sample, extend it to the population, and measure the reliability of the result

nominal level of measurement

variable is at the nominal level of measurement if the values of the variable name, label, or categorize. In addition, the naming scheme does not allow for the values of the variable to be arranged in a ranked or specific order.

Ordinal of measurement

variable is at the ordinal level of measurement if it has the properties of the nominal level of measurement, however the naming scheme allows for the values of the variable to be arranged in a ranked or specific order.


Related study sets

Chapter 8 History of R & R - What's That Sound?

View Set

Management Chapter 14 "Teams and Teamwork"

View Set

A&P 233 Kidneys and electrolytes

View Set

Chapter 4. DNA, Chromosomes, and Genomes

View Set

Floriculture State Test Study Guide

View Set