Econ 201
Center
A Representative or average value that indicates where the middle of the data set is located
Blocking
A block is a group of subjects that are similar in some way that is expected to affect the response to the treatments. -In a block design the random assignment or units o treatments is carried out separately.
Variation
A measure of the amount that the values vary among themselves.
Data
Are collections of observations, such as measurements, genders, or sure respondents.
Which of the following is not a measure of center
C. Census
Nominal
Categories Only Ex: names, labels -Yes/No Undecided -Political Party -Social Security Number -Jersey Numbers on Football Team (Substitute for names)
Ordinal
Categories with SOME ORDER Examples: -Course Grades:Can be arranged but we can't determine the difference. -Ranks:Us College Reports college ranks. 1 and 2 determine ordering
Multistage Sampling
Collect data by using some combination of the basic sampling method
Dotplot
Consists of a graph in which each data value is plotted as a point (or dot) along a horizontal scale of values. Dots representing equal values are stacked
Qualitative (categorical) data (Quality)
Consists of names or labels that are not numbers representing counts or measurements Ex: Political parties: Democrat/Republican -Numbers on Jerseys representing names
Quantitative Data(Quantity)
Consists of numbers representing counts of measurements *Important to use appropriate units of measurement - Ages(in years)of survey respondents, dollars, hours, feet, meters
A __________ is considered ________ when it lied far from the mean.
Data Value, Unusual
Ratio Level of Measurement:
Differences and a natural starting point Ex: car lengths, class times
Cluster Sampling
Divide the population into sections (or clusters). Then randomly select some of those clusters. Now choose all members from selected clusters.
Relative Frequency Histogram
Has the same shape and horizontal scale as a histogram, but vertical scale uses relative frequencies
Multiple bar graphs
Has two or more set of bars and is used to compare two or more data sets.
Which of the following is always true ?
In a symmetric and bell-shaped distribution, the mean, median and mode, are the same. (A distribution of data is symmetric, if the left half of its histogram is roughly a mirror image of its right half. In this case the mean, median, mode are the same.
Continuous (Numerical Data)
Infinitely many possible quantitative values where the collection of values is not countable.
Pie Chart
Is a graph that depicts categorical data as slices of a circle, in which the size of each slice is proportional to the frequency count for the category.
Statistic
Is a measurement describing some characteristic of a sample
Scatterplot (Scatter diagram)
Is a plot of (x,y) quantitative data with a horizontal x-axis a vertical y-axis . -the horizontal axis is used for the first (x) variable, and the vertical axis is used for the second
Which of the following is NOT a value in the 5 number summary
Mean
Standard Deviation
Measure of bow much data values deviate away from the mean. -Usually positive
The measure of center that is the value that occurs with the greatest frequency is the __________
Mode
In modified box plots, a data value is a(n) __________ if it is above Q3 +(1.5)(1QR) or below Q1-(1.5)(1QR)
Outliers
O-Outliers
Outliers are any unusual parts of the data set which do not fit the pattern of the data set. Outliers can be found by visually looking at the data or graph, but when using this method to describe the data you would have to say, "the outliers appear to be...". Outliers can also be found arithmetically. First Quartile - 1.5 (IQR) - an outlier would include anything less than this value ~AND~ Third Quartile + 1.5 (IQR) - an outlier would include anything greater than this value
Data skewed to the right
Positively skates have a longer right tail
Which measure of variation is very sensitive to extreme values
Range --> extreme values will affect the value of the range
Systematic Sampling
Select some starting point then select every Kth element in the population
SOCS
Shape, Outliers, Center, Spread
Stratified Sampling
Subdivide the population into at least two different subgroups that share the same characteristics, then draw a sample from each subgroup(or stratum )
C-Center
The center of a data set is described by either the mean or the median of the set of values. Unless the data is symmetric, the median should be used to describe the center rather than the mean, because the median is a resistant measure while the mean is not.
Which of the following is NOT a characteristic of the mean?
The mean is called the average by statsticians
Distribution
The nature or shape
S-Shape
The shape that the graph takes (which includes a bar graph, histogram, stem-and-leaf plot, dot plot, or box plot) Possible answers would include: 1. Symmetric (only if the graph is exactly symmetric) or Roughly Symmetric - this would be describing a graph where the left and right sides are mirror images 2. Skewed Right - this is when the graph has a long tail on the right side of the data set 3. Skewed Left - this is when the graph has a long tail on the left side of the data set
S-Spread
The spread of a data set is used to describe the range of values that are represented by the data. One way to find spread would be to find the range of the data, subtracting the smallest point of data by the largest point of data. Another way to find the spread of a data set would be to find the interquartile range (IQR), which unlike range is resistant, because it ignores the top 25% and lowest 25% of the data. IQR = (Third Quartile) - (First Quartile)
Empirical Rule with data bell-shaped distribution
This rule states that for data sets having a distribution that is approximately bell-shaped , the following properties apply. -About 68% of all values fall within 1 standard deviation of the mean -About 95% of all the values fall within 2 standard deviation of the mean -About 99.7 of all the values fall within 3 standard deviations of the mean.
Convenience Sampling
Use results that are very easy to get
Boxplot
Useful for revealing the center spread distribution outliers
Bar graph
Uses bars of equal width to show frequencies of categories of categorial or qualitative data .
The Square of the standard deviation is called the ______
Variance
Experiment
We apply some treatment and then proceed to observe its effects on the subject. -subjects in experiments are always called experimental units.
Observational Study
We observe , measure specific characteristics , but we don't attempt to modify the subjects being studied.
Which of the following is NOT a property of the standard deviation?
When comparing variation in samples with very different means, it is good practice to compare the two standard deviations.
Discrete Data
When data values are quantitative and number of values is finite or "countable"
When data is converted to a standardized scale representing the number of standardized scale representing the number of standard deviations. The data value lies fro the mean, we call the new value a ________
Z score
Whenever the data value is less than the mean, the corresponding _____________
Z-score is negative
Pareto Chart
a bar graph for categorical data , with the added stipulation that the bars are arranges descending order according to frequencies.
Interval
differences and a natural starting point -zero has a value -data can be arranged in order Ex: temperature
Data skewed to the right
have a longer tight tail than the left.
Standard box-and- whisker plot
includes ALL data points, including what are called outliers. Outliers are points that are far left or far right of the data set and may detract from a representative picture of the data.
Time - series graph
is a graph of time-series data, which are quantitative data that have been collected at different points in time, such as monthly or yearly
Parameter
is a numerical measurement describing characteristics of a population
Voluntary Response Sample (Self Selected Sample)
is one which the respondents themselves decide whether it should be included. -Internet Polls, mail in polls, telephone calls
Data skewed to the left
negatively skewed have a longer left tail
Z score
represents a standardized value and is the number of standard deviations that are given x-value is above or below the mean.
Stemplot (stem- and-leaf plot)
represents quantitative data by separating each value into two parts:the stem (such as the leftmost digit) and the leaf (such as he rightmost digit) -Advantage: of the stem plot is that we can see the distribution of data while keeping the original data values.
A data value is considered ____________ unusual if its z-score is less than -2 or greater than 2
unusual
Modified Box Plot and whisker plot
will not plot outliers as part of the box-and-whisker. The outliers are plotted as individual points beyond the whisker in an attempt to give a more accurate picture of the dispersion of the data.