Introduction to Statistics, Exam 1 Ch.1,2,3
Discrete Data
A finite number of values; can be counted (ie. # of kids, # of classes taken) used for single value grouping
Histogram
A graph representing the distribution of a quantitative variable. (displays cross-sectional data).
Parameter
A numerical measurement describing some characteristic of a population (Greek Letters - = mean (average))
Statistic
A numerical measurement describing some characteristic of a sample (Roman Letters - X= mean (average))
Categorical variable
A variable that falls into a particular category, (signs of categorical variables appear in question form).
Splitting stems
A way to double the number of stems when all the leaves would otherwise fall on just a few stems.
AKA Qualitative Data
AKA Categorical Data
Stem
An observation that consists of all but the final (rightmost) digit.
Design Experiment
Applying some treatment and then observing its effects on the subject (experimental units) take measurements
Variable(s)
Are any characteristics of an individual, (can take diff values for diff indivi). data is determined by the type of variable.
Normal distributions
Are bell shaped curves in which the mean and STD are good descriptions for symmetric distributions w/o outliers.
Boxplots
Based on the five-number summary. Used to compare several distributions. Box spans the quartiles and shows the variability of the central half of the distribution, median is marked within the box. Lines extend from the box to the extremes (lower and upper bound) showing the full variability of the data.
Qualitative Data
Can be separated into different categories that are distinguished by some numerical characteristic. used for pie and bar charts
Ratio (Quantitative)
Data that can be arranged and the differences are meaningful, plus there is a natural starting point of zero. (Always a number) (Differences are meaningful, starting at zero) (ie. Cost of a book, distance traveled, weight, age)
Interval (Quantitative)
Data that can be arranged in an order where the difference is meaningful. (Always a number) (Differences are meaningful, but starting point of zero) (ie. Body Temperature, years as in dates)
Ordinal (Qualitative)
Data that can be arranged in an order, but the difference is meaningless (Some Order) (ie. Grades - A, B, C, D, F: Drink sizes - Small, medium, large)
Nominal (Qualitative)
Data that consists of names, labels, or categories. Cannot be arranged in an order. (Category Only) (ie. colors - red, blue, green, yellow: Survey responses - yes, no, undecided)
Median
Describes the center of a distribution. (mid-point of the values).
Quartiles
Description of the means variability. (ex. tightly packed)
2 Types of Statistics
Descriptive & Inferential
Quantitative Data can be broken down into...
Discrete & Continuous
Stemplot
Display of distributions for small data sets that presents more detailed info than histograms. Consists of stems and leaves.
Stratified Sampling
Divide the population into at least 2 subgroups that share characteristics then draw a sample from each subgroup.
Cluster Sampling
Divide the population into sections (clusters), randomly select some clusters, choose all members from selected clusters.
Random Sample
Each individual has an equal chance at being selected (picture putting all the names in a hat and then picking a name)
Simple Random Sample
Every possible sample of the same size (n) has the same chance of being chosen (picture voting districts and choosing a district at random)
Skewed distributions
Have a tail that extends in either the left/right side of the bulk of data. The five-number summary is ideal for explaining the description of skewed distributions, but they do not always fully describe the shape of a parti. distribution.
Both start with P
How to remember Parameter is for Population
Both start with S
How to remember Statistic is for a Sample
Inferential
Involves collecting, organizing, summarizing, and presenting data with graphs, charts, and tables. A conclusion is made.
Descriptive
Involves collecting, organizing, summarizing, and presenting data with graphs, charts, and tables. Used in the tables are averages, measures in variation, and percentages. No conclusions are ever made.
Outlier
Is an indivi value that falls outside the overall pattern.
Symmetric distribution
Left and right side of the histogram are aprox mirror images of each other.
Trend
Long-term upward or downward movement over time.
Mu (μ)
Mean of a density curve (population mean) which is the true mean of all the individuals out there even outside your sample data. Also the balance pt of the curve
Continuous Data
Measured; Containing no gaps, interruptions, or jumps (ie. Temperature, weight, time, distance)
Variance 's2' and standard deviation 's'
Measures variability about the mean as center.
4 Categories of Data
Nominal Ordinal Interval Ratio
Sampling Bias
Non-Response Voluntary Response Convenience Sample
Quantitative Data
Numbers, representing counts or measures
Quantitative variable
Numerical values for which arithmetic operations such as adding and averaging makes sense, (recorded with a unit of measurement).
Observational Study
Observing and measuring specific characteristics WITHOUT attempting to modify the subject being studied. Reveal association
First quartile, Q1
One-fourth of the observations fall below it, and 3/4 above it.
Symbols Representing Measurements
Parameter & Statistic
Five-number summary
Provides a quick overall description of a distribution by looking at the median, the quartiles, and the smallest and largest indivi observations.
Types of Data
Quantitative & Qualitative
Bar graphs
Represent each category as a bar.
Sigma (σ)
STD of a density curve
Systematic Sampling
Selecting every nth element
Pie charts
Show the distribution of a categorical variable as a "pie" whose slices are sized by the counts or percents for the categories.
Mean
Sometimes denoted as x bar, describes the center of a distribution. (Arithmetic avg of the observations).
Variability
Sometimes referred to as 'spread', how spread out is the data.
Sampling Strategies (Not Random)
Systematic Convenience Stratified Cluster
Distribution
Tells us what values a variable takes and how often it takes these values.
Leaf
The final digit of an observation (right-most).
Third quartile, q3
Three-fourths of the observations fall below it, and 1/4 above it.
Convenience Sampling
Using results that are easy to get
Skewed to the left distribution
When the left side of the histogram extends much farther out than the right side.
Skewed to the right distribution
When the right side of the histogram extends much farther out than the left side.
Z - score
Z = (x - μ) / σ says how many STD x lies from the distribution mean (standardized).
Probability sample
random device is used like tossing a coin or referring to a table of random numbers which is used to decide which is used to decide which members of the population will be in the population instead of leaving decisions to humans
Representative sample
sample that reflects as closely as possible the relevant characteristics of the population under consideration.