Statistics Final Exam

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

A graph of the information in a frequency distribution for a numerical data set. A rectangle is drawn above each possible value (discrete data) or class interval.

Histogram

4. Data summarization and preliminary analysis.

After the data are collected, the next step is usually a preliminary analysis that includes summarizing the data graphically and numerically. This initial analysis provides insight into important characteristics of the data and can provide guidance in selecting appropriate methods for further analysis.

extreme and mild outlier

An outlier is an extreme outlier if it is more than 3(iqr) from the nearest quartile (it is mild otherwise).

States that when n is sufficiently large, the x distribution will be approximately normal.

Central Limit Theorem

Involves dividing the population of interest into nonoverlapping subgroups

Cluster sampling

Cluster Sampling

Cluster sampling involves dividing the population of interest into nonoverlapping subgroups, called clusters. Clusters are then selected at random, and then all individuals in the selected clusters are included in the sample.

Selection bias

(sometimes also called undercoverage) is introduced when the way the sample is selected systematically excludes some part of the population of interest.

Dot Plot

A dot plot is a graph of numerical data in which each observation is represented by a dot on or above a horizontal measurement scale.

Bar Chart

A graph of a frequency distribution for a categorical data set. Each category is represented by a bar, and the area of the bar is proportional to the corresponding frequency or relative frequency.

hypothesis

A hypothesis is a claim about the value of a population characteristic.

Experiment

A study in which the investigator observes how a response variable behaves when one or more explanatory variables, also called factors, are manipulated. The usual goal of an experiment is to determine the effect of the manipulated explanatory variables (factors) on the response variable. In a well-designed experiment, the composition of the groups that will be exposed to different experimental conditions is determined by random assignment.

population characteristic

A population characteristic is estimated by three different statistics of sampling distributions.

statistic

A quantity computed from values in a sample is called statistic.

statement

A statement of the problem consists of describing the population characteristic about which hypotheses are to be tested, stating the null hypothesis, stating the alternative hypothesis, and selecting the significance level for the test.

Observational study

A study in which the investigator observes characteristics of a sample selected from one or more existing populations. The goal of an observational study is usually to draw conclusions about the corresponding population or about differences between two or more populations. In a welldesigned observational study, the sample is selected in a way that is designed to produce a sample that is respresentative of the population.

A table that displays frequencies, and sometimes relative frequencies, for each of the possible values of a categorical variable.

Frequency distribution for categorical data

Numerical, graphical, and tabular methods for organizing and summarizing data.

Categorical data

A graph of a frequency distribution for a categorical data set. Each category is represented by a bar, and the area of the bar is proportional to the corresponding frequency or relative frequency.

Bar chart

A histogram that has a double peak.

Bimodal

The effects of some extraneous variables can be filtered out by a process known as

Blocking

If the sampling distribution of a statistic is (at least approximately) normal.

Bound on error

A picture that conveys information about the most important features of a numerical data set: center, spread, extent of skewness, and presence of outliers.

Boxplot

Individual observations are categorical responses (nonnumerical).

Categorical data

Two or more bar charts that use the same set of horizontal and vertical axes.

Comparative bar chart

An interval computed from sample data that provides a range of plausible values for a population characteristic.

Confidence interval

A number that provides information on how much "confidence" we can have in the method used to construct a confidence interval estimate.

Confidence level

Possible values form an entire interval along then umber line.

Continuous numerical data

A graph of a cumulative relative frequency distribution.

Cumulative relative frequency plot

A summary of a data set that includes the minimum, lower quartile, median, upper quartile, and maximum.

Five-number summary

A table that displays frequencies, and sometimes relative and cumulative relative frequencies, for categories (categorical data), possible values (discrete numerical data), or class intervals

Frequency distribution

A table that displays frequencies, and sometimes relative frequencies, for each of the possible values of a categorical variable.

Frequency distribution

Possible values are isolated points along the number line.

Discrete numerical data

A graph of numerical data in which each observation is represented by a dot on or above a horizontal measurement scale.

Dotplot

1. Understanding the nature of the problem.

Effective data analysis requires an understanding of the research problem. We must know the goal of the research and what questions we hope to answer. It is important to have a clear direction before gathering data to ensure that we will be able to answer the questions of interest using the data collected.

The independent samples t test should be used in which of the following:

Evaluating differences in perceived social support between males and females.

A study in which the investigator observes how a response variable behaves when one or more explanatory variables, also called factors, are manipulated.

Experiment

Dotplots are often used to compare groups.

False

A study in which the investigator observes characteristics of a sample selected from one or more existing populations.

Observational Study

Stratified Sampling

In stratified random sampling, separate simple random samples are independently selected from each subgroup.

Methods for generalizing from a sample to a population.

Inferential statistics

A skewed histogram in which the lower tail of the histogram stretches out much farther than the upper tail

Negatively skewed

Occurs when responses are not obtained from all individuals selected for inclusion in the sample.

Nonresponse bias

Individual observations are numerical (quantitative) in nature.

Numerical data

A graph of a frequency distribution for a categorical data set. Each category is represented by a slice of the pie, and the area of the slice is proportional to the corresponding frequency or relative frequency.

Pie chart

Something that is identical (in appearance, taste, feel, etc.) to the treatment received by the treatment group, except that it contains no active ingredients.

Placebo

A single number, based on sample data, that represents a plausible value of a population characteristic.

Point estimate

The entire collection of individuals or measurements about which information is desired.

Population

A department store reports that 84% of all customers who use the store's credit plan pay their bills on time.

Population characteristic

The Department of Motor Vehicles reports that 22% of all vehicles registered in a particular state are imports.

Population characteristic

A skewed histogram in which the upper tail of the histogram stretches out much farther than the lower tail

Positively skewed

Is the design strategy of making multiple observations for each experimental condition.

Replication

A part of the population selected for study.

Sample

The middle value in the ordered list of sample observations. (For n even, the median is the average of the two middle values.) It is very insensitive to outliers.

Sample median

Describes the long-run behavior of the statistic.

Sampling distribution

A quantity computed from values in a sample.

Sampling variability

The observed value of a statistic depends on the particular sample selected from the population and it will vary from sample to sample.

Sampling variability

A graph of bivariate numerical data in which each observation (x, y) is represented as a point with respect to a horizontal x-axis and a vertical y-axis.

Scatterplot

A graph of a frequency distribution for a categorical data set. Each category is represented by a segment of the bar, and the area of the segment is proportional to the corresponding frequency or relative frequency.

Segmented bar graph

A unimodal histogram that is not symmetric

Skewed

The estimated standard deviation of a statistic.

Standard Error

A consumer group, after testing 100 batteries of a certain brand, reported an average life of 63 hours of use.

Statistic

A hospital reports that based on the 10 most recent cases, the mean length of stay for surgical patients is 6.4 days.

Statistic

A sample of 100 students at a large university had a mean age of 24.1 years.

Statistic

A method of organizing numerical data in which the stem values (leading digit(s) of the observations) are listed in a column, and the leaf (trailing digit(s)) for each observation is then listed beside the corresponding stem.

Stem-and-leaf display

5. Formal data analysis

The data analysis step requires the researcher to select and apply statistical methods. Much of this textbook is devoted to methods that can be used to carry out this step.

2. Deciding what to measure and how to measure it.

The next step in the process is deciding what information is needed to answer the questions of interest.

Reliability of a study

uses a set of standardized medical charts by a sample of data collectors. Reviewing the range of answers and checking on consistency of those answers

3. Data collection

The researcher must first decide whether an existing data source is adequate or whether new data must be collected. If a decision is made to use existing data, it is important to understand how the data were collected and for what purpose, so that any resulting limitations are also fully understood. If new data are to be collected, a careful plan must be developed, because the type of analysis that is appropriate and the subsequent conclusions that can be drawn depend on how the data are collected.

A graphical display of numerical data collected over time.

Time series plot

A modified dotplots represents mild outliers by solid circles and extreme outliers by open circles, and the whiskers extend on each end to the most extreme observations that are not outliers.

True

A primary goal of statistical studies is to collect data that can then be used to make informed decisions.

True

All probability distributions can be classified as discrete probability distributions or as continuous probability distributions, depending on whether they define probabilities associated with discrete variables or continuous variables.

True

Both the type of analysis that is appropriate and the conclusions that can be drawn depend on how the data are collected.

True

Generally speaking (all other things being held constant, such as the sample error, alpha, etc.), the larger the t value, the more likely the test will be statistically significant.

True

One potential drawback to the mean as a measure of center for a data set is that its value can be greatly affected by the presence of even a single outlier (an unusually large or small observation) in the data set.

True

The probability of an event refers to the likelihood that the event will occur.

True

There is no way to tell just by looking at a sample whether it is representative of the population from which it was drawn. Our only assurance comes from the method used to select the sample.

True

Two events are mutually exclusive or disjoint if they cannot occur at the same time.

True

A statistic that has a sampling distribution with a mean equal to the value of the population characteristic to be estimated.

Unbiased statistic

A histogram that has a single peak.

Unimodal

Each observation consists of one, two, or two or more responses or values.

Univariate, bivariate, and multivariate data

external validity

extent to which we can generalize findings to real-world settings

Nonresponse bias

occurs when responses are not obtained from all individuals selected for inclusion in the sample. As with selection bias, nonresponse bias can distort results if those who respond differ in important ways from those who do not respond. Although some level of nonresponse is unavoidable in most surveys, the biasing effect on the resulting sample is lowest when the response rate is high.

Measurement or response bias

occurs when the method of observation tends to produce values that systematically differ from the true value in some way. This might happen if an improperly calibrated scale is used to weigh items or if questions on a survey are worded in a way that tends to in fluence the response.

Each person in a random sample of 20 students at a particular university was asked whether he or she is registered to vote. The responses (R 5 registered, N 5 not registered) are given here: R R N R N N R R R N R R R R R N R R R N Use these data to estimate p, the proportion of all students at the university who are registered to vote.

p = 14/20, 7/10

Consumption of fast food is a topic of interest to researchers in the field of nutrition. The article "Effects of Fast-Food Consumption on Energy Intake and Diet Quality Among Children" (Pediatrics [2004]: 112-118) reported that 1720 of those in a random sample of 6212 U.S. children indicated that on a typical day, they ate fast food. Estimate p, the proportion of children in the United States who eat fast food on a typical day.

p = 1720/6212, 430/1553

The value such that r % of the observations in the data set fall at or below that value.

r th percentile

internal validity

the degree to which changes in the dependent variable are due to the manipulation of the independent variable

Validity of a study

the degree to which the inference drawn from a study, is warranted when account is taken of the study methods, the representativeness of the study sample, and the nature of the population from which it is drawn

6. Interpretation of results

to carry out this step. 6. Interpretation of results. Several questions should be addressed in this final step. Some examples are: What can we learn from the data? What conclusions can be drawn from the analysis? How can our results guide future research? The interpretation step often leads to the formulation of new research questions.


Kaugnay na mga set ng pag-aaral

Which antibiotics to use (for medical students)

View Set

Chapter 8 The Play Years: Biosocial Development

View Set