Chapter 1-4 Test
steps to frequency table
2. determine the class interval or width. Formula is i ->(H-L)/k where i is the class interval, H is the highest observed value, L is the lowest observed value and k is the number of classes.
Steps to frequency table cont.
3. set the individual class limits
steps to frequency table cont.
4. tally the vehicle profits into the classes.
steps to frequency table cont
5. count the number of items in each class
Relative class frequencies
Class frequencies can be converted to relative class frequencies to show the fraction of the total number of observations in each class. Captures relationship between a class total and the total number of observations.
Steps to the frequency table
Step 1. decide on the number of classes. A useful recipe to determine the number of classes (k) is the "2 to the k rule"
class midpoint
a point that divides a class into two equal parts. This is the average of the upper and lower class limits.
Advantage of the frequency polygon
allows us to compare directly two or more frequency distributions.
Displays of frequency distributions
common graphics are: 1. Histograms 2.Frequency polygons 3. Cumulative frequency distributions
Advantages of the histogram
depicts each class as a rectangle, with the height of the rectangular bar representing the number in each class.
Frequency polygon
shows the shape of a distribution; line segments connecting the class midpoints of the class frequencies.
test
test
test 2
test 2
class interval
the class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class
Weighted Mean
the mean obtained by assigning each observation a weight that reflects its importance
class frequency
the number of observations in each class
Relative Frequency Distribution
to convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of observations.
Population Mean Formula
μ = ∑X / N
Frequency distribution
A grouping of quantitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class.
Sample
A portion, or part, of the population of interest.
Contingency Table
A table that is used to classify observations according to 2 identifiable characteristics. A Contingency Table is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables: 1- students at a university are classified by gender and class (freshman, junior and senior) 2- a product is classified as acceptable or unacceptable and by the shift (day, after-noon, or night) on which it is manufactured Look at Example 2.6
Type of Scatter Diagram - type of correlation
According to the type of correlation, scatter diagrams can be divided into following categories: •Scatter Diagram with No Correlation •Scatter Diagram with Moderate Correlation •Scatter Diagram with Strong Correlation
What are 3 examples of a Quantitative Variable?
Age, bank balance, and number of children in a family.
What are 2 examples of a Continuous variable?
Air pressure in a tire, and the weight of a shipment of tomatoes.
What is Skewness
An important characteristics of a distribution is the shape. There are 4 shapes: 1-Symmetric 2-Positively Skewed 3-Negatively Skewed 4-Bi modal Distribution
What is an Outlier in Box Plots
An outlier is a value that is inconsistent with the rest of the data. It is defined as a value that is 1- more than 1.5 times the IQR 2- smaller than Q1 OR 3- larger than Q3 An outlier is usually identified in a Box Plot by an asterisk (*)
Measures of Location
Averages; used to pinpoint the center of a distribution of data
Population Standard Deviation Formula
σ, the square root of the population variance
Population Variance Formula
σ^2 = ∑(X - μ)^2 / N σ ^2 is the population variance X is the value of a particular observation in the population μ is the arithmetic mean of the population N is the number of observations in the population
Limitations of the Scatter Diagram
- Scatter diagrams are unable to give you the exact extent of correlation. - Scatter diagram does not show you the quantitative measure of the relationship between the variable. It only shows the quantitative expression of the quantitative change. - This chart does not show you the relationship for more than two variables. - A scatter diagram requires that both variables be at least interval scale. So, when one or two variables are nominal or ordinal scale, the scatter diagram can not be applied. We resolve this limitation by using a Contingency Table.
Four Properties of the Mean
1. Every set of interval- or ratio-level data has a mean 2. All the values are included in computing the mean 3. The mean is unique 4. The sum of the deviations of each value from the mean is zero, expressed as ∑(X - X̄) = 0
What are the interval level properties?
1: Data classifications are ordered according to the amount of characteristics they possess. 2: Equal differences in the characteristics are represented by equal differences in the measurements.
What are ratio level properties?
1: Data classifications are ordered according to the amount of characteristics they possess. 2: Equal differences in the characteristics are represented by equal differences in the measurements. 3: The zero point is the absences of the characteristic and the ratio between two numbers is meaningful.
Skewness - Bi-modal Distribution
A Bi-modal Distribution will have 2 or more peaks. This is mostly the case when we have values from 2 or more populations.
Parameter
A characteristic of a population
Statistic
A characteristic of a sample
Skewness - Negatively Skewed
Also called skewed to the left, is a distribution of values where there is a single peak and the values extend much further to the left of the peak rather than to the right of the peak. In the distribution, the mean is smaller than the median.
Skewness - Positively Skewed
Also called skewed to the right, is a distribution of values where there is a single peak and the values extend much further to the right of the peak rather than to the left of the peak. In the distribution, the mean is larger than the median. Positively skewed are more common than negatively skewed distributions. Salaries for instance follow this pattern where salaries of top executives would be larger than salaries of the rest of the employees and the distribution of the salaries would exhibit positive Skewness to the left.
Scatter Diagram with Weakest (or no) Correlation
In this type of chart, you are not able to see any kind of relationship between the two variables. It might just be a series of points with no visible trend, or it might simply be a straight, flat row of points. In either case, the independent variable has no effect on the second variable (it is not dependent).
Interquartile Range or IQR (Box Plots)
Interquartile range (IQR), also called the midspread or middle 50% is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles IQR = Q3 - Q1
What does it mean when the median, percentile, decile or Quartile is in a decimal position?
Like the median, percentile, decile or Quartile doesnot need to be one of the actual values on the data set. Example, is the 80th percentile was calculated to be in position 13.12, we follow the following steps: 1. We locate the 13th positioned value in the ordered array of data 2. We move 0.12 of the distance between the 13 and 14 values 2. we determine the distance between the 14 and the 13 values (value 14 - value 13) 3- multiple the distance in step 2 with 0.12 4- add the outcome of step 3 to the first value (which is in this case value 13) 3. We report the outcome value of step 4 as the 80th percentile The 13.12 positioned percentile value does not necessarily have to be a value in the set of data The same steps are used to determine decimal positioned median, percentile, decile or Quartile.
What do Continuous variables result from?
Measuring .
Population Mean
Sum of all values in the population divided by the number of values in the population
Sample Mean
Sum of all values in the sample divided by the number of values in the sample
How do you infer something about a population?
Take a sample from the population.
What are 2 examples of nominal level data?
Taking a sample of candies and classifying them by color. And taking a sample of students at a football game and classifying them by gender.
Variance
The arithmetic mean of the squared deviations from the mean
Range
The difference between the largest and smallest values in a data set
Population
The entire set of individuals or objects of interest or the measurements obtained from all individuals or objects of interest.
What equation to use to measure Quartiles, Deciles and Percentiles ?
The formal Computation of Quartiles, Deciles and Percentiles: Lp = (n+1)(P/100) This formula is called the location formula because it allows us to locate the position of these measures among the observations in a data set 1- n is the number of observations 2- Lp: refers to the location of the desired Quartile, Decile or Percentile. So, if we are finding the 92th percentile, Lp would be L92 and the formula would read ---> L92 = (n+1)(92/100) The above formula can be used to calculate the median as well. Since the median is the 50th percentile, Lp would be L50 ---> we can use this fomula to determine the median for a set of observations Look at example 4.2
Inferential Statistics (Statistical inference)
The methods used to estimate a property of a population on the basis of a sample.
Value of the Scatter Diagram
The scatter diagram is used to find the correlation between two variables. This diagram helps us determine how closely the two variables are related. After determining the correlation between the variables, we can then predict the behavior of the dependent variable based on the measure of the independent variable. This chart is very useful when one variable is easy to measure and the other is not
Statistics
The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.
Standard Deviation
The square root of the variance
Mode
The value of the observation that appears most frequently
Other tools (other than the location formula) to determine Quartile
There are other ways other than the location formula to determine Quartile values and it is called the Excel Method. 1- Determining the Q1 : The Excel Method uses 0.25n+0.75 to locate Q1 2- Determining the Q3 : The Excel Method uses 0.75n+0.25 to locate Q3 Look at example 4.3
Dot Plots
When we organize data in a frequency distribution into classes, we lose the exact value of the observations. A dot plots groups the data as little as possible so we do not lose the identity of each individual organization. Dot plots are useful for smaller data sets, whereas histograms tend to be more useful for large data sets.
Difference between Frequency Distribution and the Stem and Leaf Display
Wit the Stem and Leaf Display, we can quickly observe that 94 people attended two performances and the number attending ranged from 93 to 97. A stem and leaf display is similar to a frequency distribution with more information, that is , the identity of the observations is preserved.
Mean of Grouped Data
X̄ = (∑fM)/n M is the midpoint of each class f is the frequency of each midpoint n is the total number of frequencies
Sample Mean Formula
X̄ = ∑X / n
What are 2 examples of ordinal level data?
1: A student's rating of a professor. 2: A states business climate.
What are the ordinal level properties?
1: Data is represented by an attribute(qualitative variable) 2: The data can only be ranked or ordered, because the ordinal level of measurement assigns relative values. Such as low, medium, and high
What are 2 examples of interval level data?
1: Dress sizes 2: The Fahrenheit temperature scale.
What are the nominal level properties?
1: The variable of interest is divided into categories or outcomes. 2: There is no natural order to the outcomes.
What are 2 examples of ratio level data?
1: Wages. (zero dollars=no money) 2: weight. (zero on a weight scale=absence of weight)
What is the Box Plots
A Box Plot is a graphical display, based on quartiles, that helps us picture a set of data. To construct a Box Plot, we need 5 statistics: 1- The Minimum Value 2- Q1 or the first Quartile 3- The Median 4- Q3 or the third Quartile 5- The Maximum Value Look at Graph 4.7 and Example 4.4
Continuous variable
Can assume any value within a specific range.(Can be decimal numbers)
Discrete variable
Can assume only certain values and there are "gaps" between the values.(Can't be decimal numbers)
What are the types of quantitative variables?
Continuous variables, and Discrete variables.
What do Discrete variables result from?
Counting
Measure of Position - Deciles
Deciles divide a set of observations into 10 equal parts. If your GPA was in the 8th decile, this means that 80% of the students had a GPA lower than yours while 20% had a GPA that is higher. Decile 5 would be the median: his is the middle value of the set of arranged data from minimum to maximum and it is the median - 50% of the observation are larger than Decile 5 and 50% of the observations are smaller than Decile 5. Look at Graph 4.3 and 4.4
What are the types of Statistics?
Descriptive statistics and Inferential Statistics.
The Empirical Rule
For a symmetrical distribution: - ~68% of values lie between ±1 standard deviation - ~95% of values lie between ±2 standard deviations - ~99.7% of values lie between ±3 standard deviations
What are 3 examples of a Qualitative Variable?
Gender, State of birth, and eye color.
Scatter Diagram with Weak Positive Correlation
Here as the value of x increases the value of y will also tend to increase, but the pattern will not closely resemble a straight line.
Scatter Diagram with Weak Negative Correlation
Here as the value of x increases the value of y will tend to decrease, but the pattern will not be as well defined.
Measure of Position - Percentiles
Percentiles divide a set of observations into 100 equal parts. If your GPA was in the 95th percentile, this means that 95% of the students had a GPA lower than yours while 5% had a GPA that is higher. The median is the 50th percentile of a set of data arranged from minimum to maximum. Percentile scores are usually used to report results on National Standardized tests such as SAT and GMAT. Look at Graph 4.5 and 4.6
Skewness - Symmetric
In a Symmetric distribution, the mean and the median are equal and the data values are evenly spread around. The shape of the distribution below the mean and median is a mirror image of distribution above the median and mean.
Measure of Position
In addition to the standard of deviation, there are other ways of describing the variation or spread in a set of data. This method determines the location of the values that divide a set of observations into equal parts. These measures are called 1- Quartiles 2- Deciles 3- Percentiles
Descriptive Statistics
Methods of organizing, summarizing, and presenting data in an informative way.
Advantage of the Stem-and-Leaf Display
One technique used to display quantitative data in a condensed for an provide more information about the frequency distribution is the Stem-And-Lead Display. The Stem-and-Leaf Display allows us not to lose the identity of each observation. It is a statistical technique to present a set of quantitative data. Each numerical value is divided into two parts: stem and leaf. The leading digit or digits becomes the stem and the trailing digit or digits is the leaf. The stems are located along the vertical axis and the leaf values are stacked against each other along the horizontal axis. The stem value is the leading digit or digits. The Leaf value is the trailing digit or digits. How to construct a Stem-and-Leaf Display? Look at Example 4.1
Pearson Coefficient of Skewness
Pearson coefficient of Skewness (SK) is based on arithmetic mean, mode, median and standard deviation. Usually SK ranges from -3 up to 3. A value near -3 such as -2.75 indicates a considerable negative skewness. A value near 3 such 2.36 indicates considerable positive skewness. Pearson's mode or first Skewness coefficient: Sk= (mean− mode) / standard deviation Pearson's median or second Skewness coefficient: Sk= 3(mean − median) / standard deviation - If Sk = 0, then the frequency distribution is normal and symmetrical - the median, mode and mean are equal. - If Sk > 0, then the frequency distribution is positively skewed - If Sk < 0, then the frequency distribution is negatively skewed Look example 4.5
Types of variables
Qualitative and Quantitative variables
Measure of Position - Quartiles
Quartiles divide a set of observations into 4 equal parts. 1- 1st Quartile is usually labeled Q1 : the value below which 25% of the observations occur 2- 3rd Quartile is usually labeled Q3 : the value below which 75% of the observations occur 3- 2nd Quartile is usually labeled Q2: this is the middle value of the set of arranged data from minimum to maximum and it is the median - 50% of the observation are larger than this Q2 and 50% of the observations are smaller than the Q2. Look at Graph 4.2
Median
The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest
What are 2 examples of a Discrete variable?
The number of rooms in a house, and the number of students in each section of a statistics class.
Type of Scatter Diagram
The scatter diagram can be categorized into several types based on 1- the type of correlation between the variables 2- the slope of trend
Equations to calculate Skewness
There are several formulas to calculate Skewness, the most familiar one is Pearson Coefficient of Skewness
Software Coefficient of Skewness
This is a Coefficient of Skewness that is calculated using software packages like Mini-tab and Excel. The formula these systems use is based on the cubed deviations from the mean. Look example 4.5
Scatter Diagram with Strong Correlation
This type of diagram is also known as "Scatter Diagram with High Degree of Correlation". In this diagram, data points are grouped very close to each other such that you can draw a line by following their pattern. In this case you will say that the variables are closely related to each other.
Scatter Diagram with Moderate Correlation
This type of diagram is also known as "Scatter Diagram with Low Degree of Correlation". Here, the data points are little closer together and you can feel that some kind of relation exists between these two variables.
Scatter Diagram with No Correlation
This type of diagram is also known as "Scatter Diagram with Zero Degree of Correlation". In this type of scatter diagram, data points are spread so randomly that you cannot draw any line through them.In this case you can say that there is no relation between these two variables.
Scatter Diagram with Strong Negative Correlation
This type of diagram is also known as Scatter Diagram with Negative Slant. In negative slant, the correlation will be negative, i.e. as the value of x increases, the value of y will decrease. The slope of a straight line drawn along the data points will go down.
Scatter Diagram with Strong Positive Correlation
This type of diagram is also known as Scatter Diagram with Positive Slant. In positive slant, the correlation will be positive, i.e. as the value of x increases, the value of y will also increase. You can say that the slope of straight line drawn along the data points will go up. The pattern will resemble the straight line.
How do we develop a Dot Plot?
To develop a Dot Plot, we display a dot plot for each observation along a horizontal line indicating the possible values of the data. If they are identical observations or the observations are too close to be shown individually, the dots are "piled" on top of each other. This allows us to see the shape of the distributions, the value where the data tends to cluster and the largest and smallest observations. Look at Graph 4.1
Relationship between two variables
Univariate Data: a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry Bivariate Data: a type of data on each of two variables, where each value of one of the variables is paired with a value of the other variable. The pairs of values of these two variables are often represented as individual points in a plane using a scatter plot or scatter diagram. This is done so that the relationship (if any) between the variables is easily seen. For example, bivariate data on a scatter plot could be used to study the relationship between stride length and length of legs
Measures of Dispersion
Variance in data
Type of Scatter Diagram - the slope of trend
We can also divide the scatter diagram according to the slope, or trend, of the data points: •Scatter Diagram with Strong Positive Correlation •Scatter Diagram with Weak Positive Correlation •Scatter Diagram with Strong Negative Correlation •Scatter Diagram with Weak Negative Correlation •Scatter Diagram with Weakest (or no) Correlation
Symmetric Distribution
When mode, median, and mean are equal, thus located centrally in a histogram
Qualitative Variable (attribute)
When the characteristics being studied is non-numeric.
Positively Skewed Distribution
When the mean is the largest value (mode < median < mean), thus the histogram is skewed to the right
Negatively Skewed Distribution
When the mean is the smallest value (mode > median > mean), thus the histogram is skewed to the left
Quantitative Variable
When the variable studied can be reported numerically.
Stem-and-Leaf Display
When we organize data in a frequency distribution into classes, we are not sure how the values within each class are distributed. Look at Table 4.1
Weighted Mean Formula
X̄_{w} = ∑ (w • x) / ∑w
Pie Chart
a chart that shows the proportion or percent that each class represents of the total number of frequencies.
Bar Chart
a graph that shows qualitative classes on the horizontal axis and the class frequencies are proportional to heights of the bars.
Scatter Plot or Scatter Diagram
a graphical technique that is used to show the relationship between 2 variables (dependent and independent variables) A scatter diagram requires that both variables be at least interval scale. Look at Graph 4.8
Frequency table
a grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class.
What are the 4 levels of measurement from least to greatest?
nominal,ordinal, interval, and ratio.
Standard Deviation of Grouped Data
s = √((∑f(M - X)^2)/(n-1))
Sample Standard Deviation
s, the square root of the sample variance
Sample Variance Formula
s^2 = (∑(X-X̄)^2)/(n-1)