Chapter 1-5 Test (Stat)

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Variable

A variable holds information about the same characteristic for many cases.

Quantitative variable

A variable in which the numbers act as numerical values is called quantitive. Quantitive variables always have units

Categorical variable

A variable that names categories (whether with words or numerals) is called categorical

Population

All the cases we wish we knew about

Data table

An arrangement of data in which each row represents a case and each column represents a variable

Experimental unit

An individual in a study for whom or for which data values are recorded. Human experimental units are usually called subjects or participants.

re-express (transform)

Applying a simple function (such as a logarithm or square root) to the data can make a skewed distribution more symmetric or equalize spread across groups.

Bar chart

Bar charts show a bar whose area represents the count of observations for each category of a categorical variable

Unimodal/Bimodal

Having one mode. This is a useful term for describing the shape of a histogram when it's generally mound-shaped. Distributions with two modes are called bimodal. Those with more than two are multimodal.

Units

A quantity or amount adopted as a standard of measurement

Simulation

A random re-enactment of data collection under one or more assumptions. If real data look very different from simulated data, then the assumptions are called into question

Relative frequency table

A relative frequency table lists the categories of a categorical variable and gives the fraction or percent of observations of each categories

Context

The context ideally tells Who was measured, What was measured, How the data were collected, Where the data were collected, and Why and Why the study was performed.

Distribution

The distribution of a variable gives - the possible values of the variable and - the relative frequency of each value The distribution of a quantitative variable slices up all the possible values of the variable into equal-width bins and gives the number of values falling into each bin.

Conditional distribution

The distribution of a variable restricting the Who to consider only a smaller group of individuals

Quartile

The lower quartile (Q1) is the value with a quarter of the data below it. The upper quartile (Q3) has three quarters of the data below it. The median and quartiles divide data into four parts with equal numbers of data values.

Categorical Data Collection

The methods in this chapter are appropriate for displaying and describing categorical data. Be careful not to use them with quantitative data

Shape

To describe the shape of a distribution, look for. . . - single vs. multiple modes - symmetry vs. skewness - outliers and gaps

Histogram (relative frequency histogram)

Uses adjacent bars to show the distribution of a quantitative variable. Each bar represents the frequency (or relative frequency) of values falling in each bin.

Independence

Variables are said to be independent if the conditional distribution of one variable is the same for each category of the other. if variables are not independent, we say there is an association.

Simpson's paradox

When averages are taken across different groups, they can appear to contradict the overall averages

Timeplot

displays data that change over time to show long-term patterns and trends

Record

Information about an individual in a database

Outliers

extreme values that don't appear to belong with the rest of the data. they may be unusual values that deserve further investigation, or they may just be mistakes.

5 Number Summary

min, Q1, median, Q3, max

Parameter

numerically valued attribute of a model

standard deviation

the square root of the variance

Variance

the sum of squared deviations from the mean, divided by the count minus one

Interquartile Range (IQR)

the difference between the first and third quartiles

Range

the difference between the highest and lowest scores in a data set

Subject

A human experimental unit. Also called a participant.

Segmented Bar Chart

A bar chart whose bars are stacked on top of one another in a vertical graph, or lined up side-by-side in a horizontal graph. A segmented bar chart usually shows relative frequencies so that the distribution of the categorical variable can be more easily compared between different groups

Case

A case is an individual about whom or which we have data

Identifier variable

A categorical variable that assigns a unique value for each case, used to name or identify it

Contingency Table

A contingency table displays counts and, sometimes, percentages or individuals falling into named categories on two or more variables. The table categorizes the individuals on all variables at once to reveal possible patterns in one variable that may be contingent on the category of the other

Nearly Normal Condition

A distribution is nearly Normal if it is unimodal and symmetric. We can check by looking at a histogram or a Normal probability plot.

Symmetric

A distribution is symmetric if the two halves on either side of the center look approximately like mirror images of each other.

Frequency table

A frequency table lists the categories of a categorical variable and gives the number of observations of each category

Marginal distribution

In a contingency table, the distribution of either variable alone is called the marginal distribution. The counts or percentages are the totals found in the margins (last row or column) of the table.

Area principle

In a statistical display, each data value should be represented by the same amount of area

Rescale

Multiplying each data value by a constant multiplies both the measures of position (mean, median, and quartiles) and the measures of spread (standard deviation and IQR) by that constant.

Pie chart

Pie charts show how a "whole" divides into categories by showing a wedge of a circle whose area corresponds to the proportion in each category.

Respondent

Someone who answers, or responds to, a survey

Data

Systematically recorded information, whether numbers or labels, together with its context

normal percentile

The Normal percentile corresponding to a z-score gives the percentage of values in a standard Normal distribution found at that z-score or below.

Sample

The cases we actually examine in seeking to understand the larger population

Center

The place in the distribution of a variable that you'd point to if you wanted to attempt the impossible by summarizing the entire distribution with a single number. Measures of center include the mean and median.

Tails

The tails of a distribution are the parts that typically trail off on either side. Distributions can be characterized as having long tails (if they straggle off for some distance) or short tails (if they don't)

Boxplot

a box-lot displays the 5 number summary as a central box, whiskers that extend to the non outlying data values and any other outliers shown

normal probability plot

a display to help assess whether a distribution of data is approximately normal; if it is nearly straight, the data satisfy the nearly normal condition

Skewed

a distribution is this if it's not symmetric and one tail stretches out farther than the other.

Uniform

a distribution that's roughly flat

Dot plot

a dot for each case against a single axis

Mode

a hump or local high point in the shape of the distribution of a variable. the apparent location of modes can change as the scale of a histogram is changed

Spread

a numerical summary of how tightly the values are clustered around the "center". measures of spread include the IQR and standard deviation

Gap

a region of the distribution where there are no values

Shifting

adding a constant to each data value adds the same constant to the mean, the median, and the quartiles, but does not change the standard deviation or IQR

Comparing Distributions

compare shape, center, spread

Mean

found by summing all the data values and dividing by the count

68-95-99.7 rule

in a normal model, about 68% of values fall within 1 standard deviation of the mean, about 95% fall within 2 standard deviations of the mean, and about 99.7% fall within 3 standard deviations of the mean

Stem and leaf display

shows quantitative data values in a way that sketches the distribution of the data

Z-score

tells how many standard deviations a value is from the mean and in which direction; have a mean of zero and a standard deviation of one

Quantitative Data Condition

the data are values of a quantitative variable whose units are known

Percentile

the ith percentile is the number that falls above i% of the data

Median

the middle value with half of the data above and half below it

Normal Model

useful family of models for unimodal, symmetric distributions

Statistic

value calculated from data to summarize aspects of the data

Standardized value

value found by subtracting the mean and dividing by the standard deviation

normality assumption

we must have a reason to believe a variable's distribution is normal before applying a normal model


Set pelajaran terkait

MedSurg II Exam 1 (39, 40, 41, 42, 43, 44)

View Set

ACC211 Chapter 4 learning assignment

View Set

Introduction to Computer Concepts Final CHAPTER 12 STUDY SET

View Set

Mid term Intro to ECON (ECO 140-231) Summer class 2021

View Set