Stats, Chapter 2

Pataasin ang iyong marka sa homework at exams ngayon gamit ang Quizwiz!

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. The second quartile is the median of an ordered data set.

TRUE

Deviation

The difference between the data entry, x, and the mean of the data set. deviation = data entry - mean of data set ex: (slide 9)

Weighted Mean

The mean of a data set whose entries have varying weights --> used so not affected by outliers ** Formula on slide 26 of 2.3

Use the​ box-and-whisker plot to identify the​ five-number summary. (IS SCREENSHOTTED AND NAMED QUESTION 6 OF 2.5)

min= 11 Q1= 13 Q2 = 15 Q3 = 17 max= 20

The mean value of land and buildings per acre from a sample of farms is ​$1400​, with a standard deviation of ​$200. The data set has a​ bell-shaped distribution. Assume the number of farms in the sample is 74.

--> so 1 sample deviation so 68% of those 74 farms would fit there --> so 74 x 0.68 = 50 so 50 farms!

The mean value of land and buildings per acre from a sample of farms is ​$1300​, with a standard deviation of ​$100. The data set has a​ bell-shaped distribution. Using the empirical​ rule, determine which of the following​ farms, whose land and building values per acre are​ given, are unusual​ (more than two standard deviations from the​ mean). Are any of the data values very unusual​ (more than three standard deviations from the​ mean)?

--> so check and see if any of the numbers (not listed on this) are farther than standard deviation x2 away from mean --> if so then unusual in this example it was 985 and 1529

Graphs of Frequency Distributions .

1 Frequency Histogram SEE ABOVE 2 Frequency Polygon A line graph that emphasizes the continuous change in frequencies. SLIDE 26 3 Relative Frequency Histogram Has the same shape and the same horizontal scale as the corresponding frequency histogram. The vertical scale measures the relative frequencies, not frequencies. 4 Cumulative Frequency Graph or Ogive A line graph that displays the cumulative frequency of each class at its upper class boundary. The upper boundaries are marked on the horizontal axis. The cumulative frequencies are marked on the vertical axis. ( SEE NEXT SLIDE TO SEE HOW TO MAKE)

Graphing QUANTITATIVE Data Sets

1 Stem-and-leaf plot Stem-and-leaf plot Each number is separated into a stem and a leaf. Similar to a histogram. Still contains original data values. Data: 21, 25, 25, 26, 27, 28, 30, 36, 36, 45 so like if 78 79 72 would do 7 | 2, 8, 9 2 Dot plot Each data entry is plotted, using a point, above a horizontal axis

The Shape of Distributions

1 Symmetric Distribution A vertical line can be drawn through the middle of a graph of the distribution and the resulting halves are approximately mirror images. 2 Uniform Distribution (rectangular) All entries or classes in the distribution have equal or approximately equal frequencies. Symmetric. 3 Skewed Left Distribution (negatively skewed) The "tail" of the graph elongates more to the left. The mean is to the left of the median. 4 Skewed Right Distribution (positively skewed) The "tail" of the graph elongates more to the right. The mean is to the right of the median.

Heights of men on a baseball team have a​ bell-shaped distribution with a mean of 183 cm183 cm and a standard deviation of 7 cm7 cm. Using the empirical​ rule, what is the approximate percentage of the men between the following​ values? a. 162cm and 204 cm b. 176 cm and 190cm

1. 99.7% 2. 68% BC EMPIRCAL RULE --> for bell-shaped stuff: About 68% of the data lie within one standard deviation of the mean. (so if standard deviation is +5 it would just be +- 5) About 95% of the data lie within two standard deviations of the mean. (would be +-10) About 99.7% of the data lie within three standard deviations of the mean.

Population Variance

1. find average of set: 41.5 2. find deviation of each entry (take entry - 41.5) 3. square each deviation 4. add to get the sum of the squares 5. Divide that by N (number of entries) to get the population variance to get POPULATION VARIANCE OR Divide by n-1 to get the SAMPLE VARIANCE 6. find the square root to get the population standard deviation

Comparing the Mean, Median, and Mode

All three measures describe a typical entry of a data set. Advantage of using the mean: The mean is a reliable measure because it takes into account every entry of a data set. Disadvantage of using the mean: Greatly affected by outliers (a data entry that is far removed from the other entries in the data set).

What is an advantage of using the range as a measure of​ variation? What is a​ disadvantage?

It is easy to compute. Using only two entries from the data set does not always reflect an accurate measure of variation of the data.

Range

The difference between the maximum and minimum data entries in the set. The data must be quantitative. Range = (Max. data entry) - (Min. data entry)

A​ student's score on an actuarial exam is in the 78th percentile. What can you conclude about the​ student's exam​ score?

The student scored higher than​ 78% of the students who took the actuarial exam. Since the student is in the 78th​ percentile, that means that​ 78% of the students who took the actuarial exam fall below that score.​ Therefore, the student scored higher than​ 78% of the total number of students.

Why is the standard deviation used more frequently than the​ variance?

The units of variance are squared. Its units are meaningless.

Mean of Grouped Data

Mean of a Frequency Distribution all data entires added up MULTIPLIED BY midpoint and frequencies ALL DIVIDED BY population size

Interquartile Range

The difference between the third and first quartiles. (3Q-1Q)

Percentiles and Other Fractiles

Quartiles --> divides data into 4 equal parts Deciles ---> 10 Percentiles --> 100

finding the mean of a Frequency Distribution in words/ in symbols (equations)

SAVED ON LAPTOP OTHERISE LOOK AT 2.3 SLIDE 28

Both data sets have a mean of 185. One has a standard deviation of​ 16, and the other has a standard deviation of 24.

(a) has a standard deviation of 24 and​ (b) has a standard deviation of​ 16, because the data in​ (a) have more variability. --> Table A was scattered --> Table B was like a pyramid growth (bell-shaped distrib) --> Question 4 from 2.4 --> table screenshot named question 4 from 2.4

Constructing an Ogive (1 of 2)

1 Construct a frequency distribution that includes cumulative frequencies as one of the columns. 2 Specify the horizontal and vertical scales. The horizontal scale consists of the upper class boundaries. The vertical scale measures cumulative frequencies. 3 Plot points that represent the upper class boundaries and their corresponding cumulative frequencies. 4 Connect the points in order from left to right. 5 The graph should start at the lower boundary of the first class (cumulative frequency is zero) and should end at the upper boundary of the last class (cumulative frequency is equal to the sample size). **** the graph ends at the upper boundary of the last​ class, where cumulative frequency is equal to the sample size.****** --> easy way to find sample size

Constructing a Frequency Distribution

1 Decide on the number of classes. Usually between 5 and 20; otherwise, it may be difficult to detect any patterns. 2 Find the class width. Determine the range of the data. **Divide the range by the number of classes. ** Round up to the next convenient number. 3 Find the class limits. You can use the minimum data entry as the lower limit of the first class. Find the remaining lower limits (add the class width to the lower limit of the preceding class). Find the upper limit of the first class. Remember that classes cannot overlap. Find the remaining upper class limits. 4 Make a tally mark for each data entry in the row of the appropriate class. 5 Count the tally marks to find the total frequency f for each class.

Drawing a Box-and-Whisker Plot

1 Find the five-number summary of the data set. 2 Construct a horizontal scale that spans the range of the data. 3 Plot the five numbers above the horizontal scale. 4 Draw a box above the horizontal scale from Q1 to Q3 and draw a vertical line in the box at Q2. 5 Draw whiskers from the box to the minimum and maximum entries. slide 11 of 2.5

Graphing QUALITATIVE Data Sets

1 Pie Chart A circle is divided into sectors that represent categories. The area of each sector is proportional to the frequency of each category. ** to find angle it should be shown at, multiply 306 degrees by the category's relative frequency For example, the central angle for cars is 360(0.24) ≈ 86º 2 Pareto Chart A vertical bar graph in which the height of each bar represents frequency or relative frequency. The bars are positioned in order of decreasing height, with the tallest bar positioned at the left. **ORDERS TALLEST BAR TO LOWERST

Graphing Paired Data Sets

1 SCATTER PLOT Graph using a SCATTER PLOT The ordered pairs are graphed as points in a coordinate plane. Used to show the relationship between two quantitative variables. 2 Time Series Data set is composed of quantitative entries taken at regular intervals over a period of time. e.g., The amount of precipitation measured each day for one month. Use a time series chart to graph. **just graph normal chart (slide 25 on 2.2)

Example: Constructing a Frequency Distribution The following sample data set lists the prices (in dollars) of 30 portable global positioning system (GPS) navigators. Construct a frequency distribution that has seven classes. SLIDE 10 IN PWRPT 2.1

1. Number of classes (given) 2. find class width (next quilt problem) 3. take minimum value and use it as the lower limit and then add the class width to it to get to the next class 4. then tally how often you actually had stuff in the appropriate classes 5. count the tallies to find frequency of each class

Frequency Distribution

A table that shows classes or intervals of data with a count of the number of entries in each class. The frequency, f, of a class is the number of data entries in the class BENEFITS : makes patterns more evident By graphing a frequency​ distribution, it becomes easier to see where the observations are​ concentrated, making patterns easier to determine.

Measures of Central Tendency

A value that represents a typical, or central, entry of a data set. Most common measures of central tendency: Mean (average) Median Mode

A certain brand of automobile tire has a mean life span of 39,000 miles and a standard deviation of 2,200 miles.​ (Assume the life spans of the tires have a​ bell-shaped distribution.) ​(a) The life spans of three randomly selected tires are 33,000 ​miles, 38,000 ​miles, and 31,000 miles. Find the​ z-score that corresponds to each life span. ​(b) The life spans of three randomly selected tires are 34 comma 60034,600 ​miles, 43 comma 40043,400 ​miles, and 39 comma 00039,000 miles. Using the empirical​ rule, find the percentile that corresponds to each life span.

A. 33,000 z score -2.73 38,000 --> -0.45 31000 --> -3.64 B. if z score is 2 then empircle rules states 95% distribution --> so 34,600 is the (50 - 95/2) = 2.5% bc z score of -2 43,400 --> 97.5% (z score of 2 so it was like 50+95/2) 39,000 --> 50% (bc its the mean so its always half) ***if Z= 0 then 50% ** if Z = +2 then 97.5% ***if Z = -2 then 2.5%

Class Boundaries

Class boundaries The numbers that separate classes without forming gaps between them. The distance from the upper limit of the first class to the lower limit of the second class is 115 − 114 = 1. Half this distance is 0.5. EX So if class is 59-114$$$, then Class Boundaries is 58.5-114.5

Coefficient of Variation

Describes the standard deviation of a data set as a percent of the mean. just take the standard deviation divided by the average weight x 100% and boom there you go

Determining the Cumulative Frequency DEIFNITOIN Cumulative frequency of a class The sum of the frequency for that class and all previous classes.

EX if class 1 had frequency of 5, then cumulative is 5 THEN for class 2 if frequency was 8, then cumulative is 5 + 8 = 13

Box-and-Whisker Plot

Exploratory data analysis tool. Highlights important features of a data set. Requires (five-number summary): 1 Minimum entry 2 First quartile Q1 3 Median Q2 4 Third quartile Q3 5 Maximum entry **Next flashcard is a HOW TO

Quartiles

FRACTILES are numbers that partition (divide) an ordered data set into equal parts. QUARTILES approximately divide an ordered data set into four equal parts. First quartile, Q1: About one quarter of the data fall on or below Q1. Second quartile, Q2: About one half of the data fall on or below Q2 (MEDIAN). Third quartile, Q3: About three quarters of the data fall on or below Q3. **Slide 6 of 2.5 has example HOW TO FIND THEM --> 2nd Q is the median --> 1st and 3rd Q is the median of the the remaining sides WHAT IT MEANS --> 1/4 have of the sample has 1st Q or less --> 1/2 have 2nd Q or less --> 3/4 have 3rd Q or less

Interpreting Standard Deviation: Empirical Rule (68 - 95 - 99.7 Rule) EMPIRCAL RULE

For data with a (symmetric) bell-shaped distribution, the standard deviation has the following characteristics: About 68% of the data lie within one standard deviation of the mean. About 95% of the data lie within two standard deviations of the mean. About 99.7% of the data lie within three standard deviations of the mean. PIC ON SLIDE 27 OF 2.4

Graphs of Frequency Distributions (1 of 4)

Frequency Histogram DATA VALUES (X) vs. FREQUENCY (Y) A bar graph that represents the frequency distribution. The horizontal scale is quantitative and measures the data values. The vertical scale measures the frequencies of the classes. Consecutive bars must touch.

Chebychev's Theorem

IDEK (slide 30)

IMPORTANT QUESTIONS After constructing a relative frequency distribution summarizing IQ scores of college​ students, what should be the sum of the relative​ frequencies?

If percentages are​ used, the sum should be​ 100%. If proportions are​ used, the sum should be 1.

Use the​ box-and-whisker plot to determine if the shape of the distribution represented is​ symmetric, skewed​ left, skewed​ right, or none of these.

If the median is near the center of the box and each horizontal line is of approximately equal​ length, the distribution is roughly symmetric. If the median is to the left of the center of the box and the right line is substantially longer than the left​ line, the distribution is skewed right. If the median is to the right of the center of the box and the left line is substantially longer than the right​ line, the distribution is skewed left. If none of the previous conditions​ exists, the distribution does not have a defined name. EXAMPLE SCREENSHOT NAMED QUESTION 9 OF 2.5

Determining the Midpoint of a Class

Lower class limit + Upper Class Limit DIVIDED BY 2 LOOK AT SLIDE 14 ON PWRPT 2.1 BC CONFUSING MORE HTAN YOU THINK *** An upper limit is the greatest number that can belong to the class. The upper limit of the first class is one less than the lower limit of the second class. Find the upper limit of the first class. Remembering that classes cannot​ overlap, find the remaining upper class limits.

Measure of Central Tendency: Mean

Mean (average) The sum of all the data entries divided by the number of entries. 1 Population Mean all of the data entries in the data set added up DIVIDED BY population size 2 Sample Mean all the data entries added up DIVIDED BY sample size **** The mean is the measure of central tendency most likely to be affected by an extreme value​ (outlier) because the outlier will affect the sum of the data values.

Measure of Central Tendency: Median

Median The value that lies in the middle of the data when the data set is ordered. Measures the center of an ordered data set by dividing it into two equal parts. If the data set has an odd number of entries: median is the middle data entry. even number of entries: median is the mean of the two middle data entries. ** The medianmedian is the best measure WHEN the data are skewed.

Measure of Central Tendency: Mode

Mode The data entry that occurs with the greatest frequency. If no entry is repeated the data set has no mode. If two entries occur with the same greatest frequency, each entry is a mode (bimodal). **The mode is the best measure WHEN the data are at the nominal level of measurement.

​(a) Find the​ five-number summary, and​ (b) draw a​ box-and-whisker plot of the data. 44 88 88 66 22 99 88 77 99 66 99 55 11 66 22 99 88 77 77 99

ON FIRST PAGE OF NOTEBOOK 1 2 2 4 5 6 6 6 7 7 7 8 8 8 8 9 9 9 9 9 (20 numbers) Min = 1 Max = 9 Q1 = 5.5 Q2 = 7 Q3 = 8.5

The Standard Score (z score)

Standard Score (z-score) Represents the number of standard deviations a given value x falls from the mean µ. z = (value-mean)/ stand. dev SLIDE 16 of 2.5 example: In 2009, Heath Ledger won the Oscar for Best Supporting Actor at age 29 for his role in the movie The Dark Knight. Penelope Cruz won the Oscar for Best Supporting Actress at age 34 for her role in Vicky Cristina Barcelona. The mean age of all Best Supporting Actor winners is 49.5, with a standard deviation of 13.8. The mean age of all Best Supporting Actress winners is 39.9, with a standard deviation of 14.0. Find the z-score that corresponds to the ages of Ledger and Cruz. Then compare your results. --> Ledger --> his age - mean age // stan 29-49.5/13.8 = 1.49 ****IF THE Z SCORE FALLS BETWEEN -2 OR 2 IT IS CONSIDERED UNUSUAL*****

Interpreting Standard Deviation

Standard deviation is a measure of the typical amount an entry deviates from the mean. The more the entries are spread out, the greater the standard deviation.

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. It is impossible to have a​ z-score of 0.

The statement is false. A​ z-score of 0 is a standardized value that is equal to the mean.

Determine whether the statement is true or false. If it is​ false, rewrite it as a true statement. The 50th percentile is equivalent to Upper Q 1

The statement is false. The 50th percentile is equivalent to Upper Q 2

The goals scored per game by a soccer team represent the first quartile for all teams in a league. What can you conclude about the​ team's goals scored per​ game?

The team scored fewer goals per game than​ 75% of the teams in the league. About one quarter of the data will fall below the first​ quartile, and about three quarters will fall above the first quartile. ​ Therefore, the team scored fewer goals per game than​ 75% of the teams in the league.

Standard Deviation for Grouped Data

When a frequency distribution has classes, estimate the sample mean and standard deviation by using the midpoint of each class. 1. Find frequency distribution 2. find the average frequency distribution 3. Determine the sum of squares 4. find the sample standard deviation

Given a data​ set, how do you know whether to calculate sigmaσ or​ s?

When given a data​ set, one would have to determine if it represented the population or if it was a sample taken from the population. If the data are a​ population, then sigmaσ is calculated. If the data are a​ sample, then s is calculated. --> POPULATION = SIGMA --> SAMPLE = S

How to find class width

max-min DIVIDED BY #of classes --- AND THEN ROUND UP---- In a frequency​ distribution, the class width is the distance between the lower or upper limits of consecutive classes.

Determining the Relative Frequency DEFINITION: Relative Frequency of a class Portion or percentage of the data that falls in a particular class.

relative frequency = class frequency DIVIDED BY sample size ex: if class frequency is 5 and you took 30 samples from original problem --> 5 DIVIDED BY 30


Kaugnay na mga set ng pag-aaral

Prep U MS3 Ch. 12 Oncologic Management

View Set

Fluid and Electrolytes and Pain Questions thePoint

View Set

GCSE: Answer Smash Vocab: Animals

View Set

P36 Earth and Physical Science Teas 5

View Set