Chapter 1-4 Test

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What is the Box Plots

A Box Plot is a graphical display, based on quartiles, that helps us picture a set of data. To construct a Box Plot, we need 5 statistics: 1- The Minimum Value 2- Q1 or the first Quartile 3- The Median 4- Q3 or the third Quartile 5- The Maximum Value Look at Graph 4.7 and Example 4.4

Limitations of the Scatter Diagram

- Scatter diagrams are unable to give you the exact extent of correlation. - Scatter diagram does not show you the quantitative measure of the relationship between the variable. It only shows the quantitative expression of the quantitative change. - This chart does not show you the relationship for more than two variables. - A scatter diagram requires that both variables be at least interval scale. So, when one or two variables are nominal or ordinal scale, the scatter diagram can not be applied. We resolve this limitation by using a Contingency Table.

Four Properties of the Mean

1. Every set of interval- or ratio-level data has a mean 2. All the values are included in computing the mean 3. The mean is unique 4. The sum of the deviations of each value from the mean is zero, expressed as ∑(X - X̄) = 0

What are 2 examples of ordinal level data?

1: A student's rating of a professor. 2: A states business climate.

What are the interval level properties?

1: Data classifications are ordered according to the amount of characteristics they possess. 2: Equal differences in the characteristics are represented by equal differences in the measurements.

What are ratio level properties?

1: Data classifications are ordered according to the amount of characteristics they possess. 2: Equal differences in the characteristics are represented by equal differences in the measurements. 3: The zero point is the absences of the characteristic and the ratio between two numbers is meaningful.

What are the ordinal level properties?

1: Data is represented by an attribute(qualitative variable) 2: The data can only be ranked or ordered, because the ordinal level of measurement assigns relative values. Such as low, medium, and high

What are 2 examples of interval level data?

1: Dress sizes 2: The Fahrenheit temperature scale.

What are the nominal level properties?

1: The variable of interest is divided into categories or outcomes. 2: There is no natural order to the outcomes.

What are 2 examples of ratio level data?

1: Wages. (zero dollars=no money) 2: weight. (zero on a weight scale=absence of weight)

steps to frequency table

2. determine the class interval or width. Formula is i ->(H-L)/k where i is the class interval, H is the highest observed value, L is the lowest observed value and k is the number of classes.

Steps to frequency table cont.

3. set the individual class limits

steps to frequency table cont.

4. tally the vehicle profits into the classes.

steps to frequency table cont

5. count the number of items in each class

Skewness - Bi-modal Distribution

A Bi-modal Distribution will have 2 or more peaks. This is mostly the case when we have values from 2 or more populations.

Parameter

A characteristic of a population

Statistic

A characteristic of a sample

Frequency distribution

A grouping of quantitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class.

Sample

A portion, or part, of the population of interest.

Contingency Table

A table that is used to classify observations according to 2 identifiable characteristics. A Contingency Table is a type of table in a matrix format that displays the (multivariate) frequency distribution of the variables: 1- students at a university are classified by gender and class (freshman, junior and senior) 2- a product is classified as acceptable or unacceptable and by the shift (day, after-noon, or night) on which it is manufactured Look at Example 2.6

Type of Scatter Diagram - type of correlation

According to the type of correlation, scatter diagrams can be divided into following categories: •Scatter Diagram with No Correlation •Scatter Diagram with Moderate Correlation •Scatter Diagram with Strong Correlation

What are 3 examples of a Quantitative Variable?

Age, bank balance, and number of children in a family.

What are 2 examples of a Continuous variable?

Air pressure in a tire, and the weight of a shipment of tomatoes.

Skewness - Negatively Skewed

Also called skewed to the left, is a distribution of values where there is a single peak and the values extend much further to the left of the peak rather than to the right of the peak. In the distribution, the mean is smaller than the median.

Skewness - Positively Skewed

Also called skewed to the right, is a distribution of values where there is a single peak and the values extend much further to the right of the peak rather than to the left of the peak. In the distribution, the mean is larger than the median. Positively skewed are more common than negatively skewed distributions. Salaries for instance follow this pattern where salaries of top executives would be larger than salaries of the rest of the employees and the distribution of the salaries would exhibit positive Skewness to the left.

What is Skewness

An important characteristics of a distribution is the shape. There are 4 shapes: 1-Symmetric 2-Positively Skewed 3-Negatively Skewed 4-Bi modal Distribution

What is an Outlier in Box Plots

An outlier is a value that is inconsistent with the rest of the data. It is defined as a value that is 1- more than 1.5 times the IQR 2- smaller than Q1 OR 3- larger than Q3 An outlier is usually identified in a Box Plot by an asterisk (*)

Measures of Location

Averages; used to pinpoint the center of a distribution of data

Continuous variable

Can assume any value within a specific range.(Can be decimal numbers)

Discrete variable

Can assume only certain values and there are "gaps" between the values.(Can't be decimal numbers)

Relative class frequencies

Class frequencies can be converted to relative class frequencies to show the fraction of the total number of observations in each class. Captures relationship between a class total and the total number of observations.

What are the types of quantitative variables?

Continuous variables, and Discrete variables.

What do Discrete variables result from?

Counting

Measure of Position - Deciles

Deciles divide a set of observations into 10 equal parts. If your GPA was in the 8th decile, this means that 80% of the students had a GPA lower than yours while 20% had a GPA that is higher. Decile 5 would be the median: his is the middle value of the set of arranged data from minimum to maximum and it is the median - 50% of the observation are larger than Decile 5 and 50% of the observations are smaller than Decile 5. Look at Graph 4.3 and 4.4

What are the types of Statistics?

Descriptive statistics and Inferential Statistics.

The Empirical Rule

For a symmetrical distribution: - ~68% of values lie between ±1 standard deviation - ~95% of values lie between ±2 standard deviations - ~99.7% of values lie between ±3 standard deviations

What are 3 examples of a Qualitative Variable?

Gender, State of birth, and eye color.

Scatter Diagram with Weak Positive Correlation

Here as the value of x increases the value of y will also tend to increase, but the pattern will not closely resemble a straight line.

Scatter Diagram with Weak Negative Correlation

Here as the value of x increases the value of y will tend to decrease, but the pattern will not be as well defined.

Skewness - Symmetric

In a Symmetric distribution, the mean and the median are equal and the data values are evenly spread around. The shape of the distribution below the mean and median is a mirror image of distribution above the median and mean.

Measure of Position

In addition to the standard of deviation, there are other ways of describing the variation or spread in a set of data. This method determines the location of the values that divide a set of observations into equal parts. These measures are called 1- Quartiles 2- Deciles 3- Percentiles

Scatter Diagram with Weakest (or no) Correlation

In this type of chart, you are not able to see any kind of relationship between the two variables. It might just be a series of points with no visible trend, or it might simply be a straight, flat row of points. In either case, the independent variable has no effect on the second variable (it is not dependent).

Interquartile Range or IQR (Box Plots)

Interquartile range (IQR), also called the midspread or middle 50% is a measure of statistical dispersion, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles IQR = Q3 - Q1

What does it mean when the median, percentile, decile or Quartile is in a decimal position?

Like the median, percentile, decile or Quartile doesnot need to be one of the actual values on the data set. Example, is the 80th percentile was calculated to be in position 13.12, we follow the following steps: 1. We locate the 13th positioned value in the ordered array of data 2. We move 0.12 of the distance between the 13 and 14 values 2. we determine the distance between the 14 and the 13 values (value 14 - value 13) 3- multiple the distance in step 2 with 0.12 4- add the outcome of step 3 to the first value (which is in this case value 13) 3. We report the outcome value of step 4 as the 80th percentile The 13.12 positioned percentile value does not necessarily have to be a value in the set of data The same steps are used to determine decimal positioned median, percentile, decile or Quartile.

What do Continuous variables result from?

Measuring .

Descriptive Statistics

Methods of organizing, summarizing, and presenting data in an informative way.

Advantage of the Stem-and-Leaf Display

One technique used to display quantitative data in a condensed for an provide more information about the frequency distribution is the Stem-And-Lead Display. The Stem-and-Leaf Display allows us not to lose the identity of each observation. It is a statistical technique to present a set of quantitative data. Each numerical value is divided into two parts: stem and leaf. The leading digit or digits becomes the stem and the trailing digit or digits is the leaf. The stems are located along the vertical axis and the leaf values are stacked against each other along the horizontal axis. The stem value is the leading digit or digits. The Leaf value is the trailing digit or digits. How to construct a Stem-and-Leaf Display? Look at Example 4.1

Pearson Coefficient of Skewness

Pearson coefficient of Skewness (SK) is based on arithmetic mean, mode, median and standard deviation. Usually SK ranges from -3 up to 3. A value near -3 such as -2.75 indicates a considerable negative skewness. A value near 3 such 2.36 indicates considerable positive skewness. Pearson's mode or first Skewness coefficient: Sk= (mean− mode) / standard deviation Pearson's median or second Skewness coefficient: Sk= 3(mean − median) / standard deviation - If Sk = 0, then the frequency distribution is normal and symmetrical - the median, mode and mean are equal. - If Sk > 0, then the frequency distribution is positively skewed - If Sk < 0, then the frequency distribution is negatively skewed Look example 4.5

Measure of Position - Percentiles

Percentiles divide a set of observations into 100 equal parts. If your GPA was in the 95th percentile, this means that 95% of the students had a GPA lower than yours while 5% had a GPA that is higher. The median is the 50th percentile of a set of data arranged from minimum to maximum. Percentile scores are usually used to report results on National Standardized tests such as SAT and GMAT. Look at Graph 4.5 and 4.6

Types of variables

Qualitative and Quantitative variables

Measure of Position - Quartiles

Quartiles divide a set of observations into 4 equal parts. 1- 1st Quartile is usually labeled Q1 : the value below which 25% of the observations occur 2- 3rd Quartile is usually labeled Q3 : the value below which 75% of the observations occur 3- 2nd Quartile is usually labeled Q2: this is the middle value of the set of arranged data from minimum to maximum and it is the median - 50% of the observation are larger than this Q2 and 50% of the observations are smaller than the Q2. Look at Graph 4.2

Steps to the frequency table

Step 1. decide on the number of classes. A useful recipe to determine the number of classes (k) is the "2 to the k rule"

Population Mean

Sum of all values in the population divided by the number of values in the population

Sample Mean

Sum of all values in the sample divided by the number of values in the sample

How do you infer something about a population?

Take a sample from the population.

What are 2 examples of nominal level data?

Taking a sample of candies and classifying them by color. And taking a sample of students at a football game and classifying them by gender.

Variance

The arithmetic mean of the squared deviations from the mean

Range

The difference between the largest and smallest values in a data set

Population

The entire set of individuals or objects of interest or the measurements obtained from all individuals or objects of interest.

What equation to use to measure Quartiles, Deciles and Percentiles ?

The formal Computation of Quartiles, Deciles and Percentiles: Lp = (n+1)(P/100) This formula is called the location formula because it allows us to locate the position of these measures among the observations in a data set 1- n is the number of observations 2- Lp: refers to the location of the desired Quartile, Decile or Percentile. So, if we are finding the 92th percentile, Lp would be L92 and the formula would read ---> L92 = (n+1)(92/100) The above formula can be used to calculate the median as well. Since the median is the 50th percentile, Lp would be L50 ---> we can use this fomula to determine the median for a set of observations Look at example 4.2

Inferential Statistics (Statistical inference)

The methods used to estimate a property of a population on the basis of a sample.

Median

The midpoint of the values after they have been ordered from the smallest to the largest, or the largest to the smallest

What are 2 examples of a Discrete variable?

The number of rooms in a house, and the number of students in each section of a statistics class.

Type of Scatter Diagram

The scatter diagram can be categorized into several types based on 1- the type of correlation between the variables 2- the slope of trend

Value of the Scatter Diagram

The scatter diagram is used to find the correlation between two variables. This diagram helps us determine how closely the two variables are related. After determining the correlation between the variables, we can then predict the behavior of the dependent variable based on the measure of the independent variable. This chart is very useful when one variable is easy to measure and the other is not

Statistics

The science of collecting, organizing, presenting, analyzing, and interpreting data to assist in making more effective decisions.

Standard Deviation

The square root of the variance

Mode

The value of the observation that appears most frequently

Other tools (other than the location formula) to determine Quartile

There are other ways other than the location formula to determine Quartile values and it is called the Excel Method. 1- Determining the Q1 : The Excel Method uses 0.25n+0.75 to locate Q1 2- Determining the Q3 : The Excel Method uses 0.75n+0.25 to locate Q3 Look at example 4.3

Equations to calculate Skewness

There are several formulas to calculate Skewness, the most familiar one is Pearson Coefficient of Skewness

Software Coefficient of Skewness

This is a Coefficient of Skewness that is calculated using software packages like Mini-tab and Excel. The formula these systems use is based on the cubed deviations from the mean. Look example 4.5

Scatter Diagram with Strong Correlation

This type of diagram is also known as "Scatter Diagram with High Degree of Correlation". In this diagram, data points are grouped very close to each other such that you can draw a line by following their pattern. In this case you will say that the variables are closely related to each other.

Scatter Diagram with Moderate Correlation

This type of diagram is also known as "Scatter Diagram with Low Degree of Correlation". Here, the data points are little closer together and you can feel that some kind of relation exists between these two variables.

Scatter Diagram with No Correlation

This type of diagram is also known as "Scatter Diagram with Zero Degree of Correlation". In this type of scatter diagram, data points are spread so randomly that you cannot draw any line through them.In this case you can say that there is no relation between these two variables.

Scatter Diagram with Strong Negative Correlation

This type of diagram is also known as Scatter Diagram with Negative Slant. In negative slant, the correlation will be negative, i.e. as the value of x increases, the value of y will decrease. The slope of a straight line drawn along the data points will go down.

Scatter Diagram with Strong Positive Correlation

This type of diagram is also known as Scatter Diagram with Positive Slant. In positive slant, the correlation will be positive, i.e. as the value of x increases, the value of y will also increase. You can say that the slope of straight line drawn along the data points will go up. The pattern will resemble the straight line.

How do we develop a Dot Plot?

To develop a Dot Plot, we display a dot plot for each observation along a horizontal line indicating the possible values of the data. If they are identical observations or the observations are too close to be shown individually, the dots are "piled" on top of each other. This allows us to see the shape of the distributions, the value where the data tends to cluster and the largest and smallest observations. Look at Graph 4.1

Relationship between two variables

Univariate Data: a type of data which consists of observations on only a single characteristic or attribute. A simple example of univariate data would be the salaries of workers in industry Bivariate Data: a type of data on each of two variables, where each value of one of the variables is paired with a value of the other variable. The pairs of values of these two variables are often represented as individual points in a plane using a scatter plot or scatter diagram. This is done so that the relationship (if any) between the variables is easily seen. For example, bivariate data on a scatter plot could be used to study the relationship between stride length and length of legs

Measures of Dispersion

Variance in data

Type of Scatter Diagram - the slope of trend

We can also divide the scatter diagram according to the slope, or trend, of the data points: •Scatter Diagram with Strong Positive Correlation •Scatter Diagram with Weak Positive Correlation •Scatter Diagram with Strong Negative Correlation •Scatter Diagram with Weak Negative Correlation •Scatter Diagram with Weakest (or no) Correlation

Symmetric Distribution

When mode, median, and mean are equal, thus located centrally in a histogram

Qualitative Variable (attribute)

When the characteristics being studied is non-numeric.

Positively Skewed Distribution

When the mean is the largest value (mode < median < mean), thus the histogram is skewed to the right

Negatively Skewed Distribution

When the mean is the smallest value (mode > median > mean), thus the histogram is skewed to the left

Quantitative Variable

When the variable studied can be reported numerically.

Stem-and-Leaf Display

When we organize data in a frequency distribution into classes, we are not sure how the values within each class are distributed. Look at Table 4.1

Dot Plots

When we organize data in a frequency distribution into classes, we lose the exact value of the observations. A dot plots groups the data as little as possible so we do not lose the identity of each individual organization. Dot plots are useful for smaller data sets, whereas histograms tend to be more useful for large data sets.

Difference between Frequency Distribution and the Stem and Leaf Display

Wit the Stem and Leaf Display, we can quickly observe that 94 people attended two performances and the number attending ranged from 93 to 97. A stem and leaf display is similar to a frequency distribution with more information, that is , the identity of the observations is preserved.

Mean of Grouped Data

X̄ = (∑fM)/n M is the midpoint of each class f is the frequency of each midpoint n is the total number of frequencies

Sample Mean Formula

X̄ = ∑X / n

Weighted Mean Formula

X̄_{w} = ∑ (w • x) / ∑w

Pie Chart

a chart that shows the proportion or percent that each class represents of the total number of frequencies.

Bar Chart

a graph that shows qualitative classes on the horizontal axis and the class frequencies are proportional to heights of the bars.

Scatter Plot or Scatter Diagram

a graphical technique that is used to show the relationship between 2 variables (dependent and independent variables) A scatter diagram requires that both variables be at least interval scale. Look at Graph 4.8

Frequency table

a grouping of qualitative data into mutually exclusive and collectively exhaustive classes showing the number of observations in each class.

class midpoint

a point that divides a class into two equal parts. This is the average of the upper and lower class limits.

Advantage of the frequency polygon

allows us to compare directly two or more frequency distributions.

Displays of frequency distributions

common graphics are: 1. Histograms 2.Frequency polygons 3. Cumulative frequency distributions

Advantages of the histogram

depicts each class as a rectangle, with the height of the rectangular bar representing the number in each class.

What are the 4 levels of measurement from least to greatest?

nominal,ordinal, interval, and ratio.

Standard Deviation of Grouped Data

s = √((∑f(M - X)^2)/(n-1))

Sample Standard Deviation

s, the square root of the sample variance

Sample Variance Formula

s^2 = (∑(X-X̄)^2)/(n-1)

Frequency polygon

shows the shape of a distribution; line segments connecting the class midpoints of the class frequencies.

test

test

test 2

test 2

class interval

the class interval is obtained by subtracting the lower limit of a class from the lower limit of the next class

Weighted Mean

the mean obtained by assigning each observation a weight that reflects its importance

class frequency

the number of observations in each class

Relative Frequency Distribution

to convert a frequency distribution to a relative frequency distribution, each of the class frequencies is divided by the total number of observations.

Population Mean Formula

μ = ∑X / N

Population Standard Deviation Formula

σ, the square root of the population variance

Population Variance Formula

σ^2 = ∑(X - μ)^2 / N σ ^2 is the population variance X is the value of a particular observation in the population μ is the arithmetic mean of the population N is the number of observations in the population


Set pelajaran terkait

HG&D chapter 3 likely test questions

View Set

Business 101 Test 2 Ch. 8,11,12,13

View Set

Four Essential Features of A State

View Set

Managing and behavior organization

View Set

Questions and Notes - Chapter 20

View Set