MATH-11-SP18: Module 1 Notes

Ace your homework & exams now with Quizwiz!

Ordinal

Data that is measured using an ordinal scale is similar to nominal scale data but there is a big difference. The ordinal scale data can be ordered.

Interval

Data that is measured using the interval scale is similar to ordinal level data because it has a definite ordering but there is a difference between data. The differences between interval scale data can be measured though the data does not have a starting point.

Ratio

Data that is measured using the ratio scale takes care of the ratio problem and gives you the most information. Ratio scale data is like interval scale data, but it has a 0 point and ratios can be calculated.

5 number summary

For a set of data, the minimum, first quartile, median, third quartile, and maximum. A boxplot is a visual display of the five-number summary. *MIn, Q, Median, Q3, Max

Population

all individuals, objects, or measurements whose properties are being studied *The complete collection of elements to be studied

Non-sampling errors

an issue that affects the reliability of sampling data other than natural variation; it includes a variety of human errors including poor study design, biased sampling methods, inaccurate information provided by study participants, data entry errors, and poor analysis. *Dont want - mistakes in data collection (faulty scale, survey, incorrect answers)

Line graph

graph that is useful for specific data values is a line graph. The frequency points are connected using line segments.

Sampling bias

not all members of the population are equally likely to be selected

Sampling errors

the natural variation that results from selecting a sample to represent a larger population; this variation decreases as the sample size increases, so selecting larger samples reduces sampling error. *We expect statistics (sample) to differ from parameters (population) - that difference is called sampling error.

Proportion

the number of successes divided by the total number in the sample

Frequency

the number of times a value of the data occurs

Quartiles

the numbers that separate the data into quarters; quartiles may or may not be part of the data. The second quartile is the median of the data.

Relative Frequency

the ratio of the number of times a value of the data occurs in the set of all outcomes to the number of all outcomes to the total number of outcomes

Standard error

the standard deviation of the distribution of the sample means.

Mode

the value that appears most frequently in a set of data *the value that occurs most ofter *can be used with categorical (qualitative) data

Skewed

used to describe data that is not symmetrical; when the right side of a graph looks "chopped off" compared the left side, we say it is "skewed to the left." When the left side of the graph looks "chopped off" compared to the right side, we say the data is "skewed to the right." Alternatively: when the lower values of the data are more spread out, we say the data are skewed to the left. When the greater values are more spread out, the data are skewed to the right. *Skewed - scored are piled up on one side & spread out on the other. *Skewed positive (right) - tail is on the right *Skewed negative (left) - tail is on the left

Numerical (or quantitative) variables

variables that take on values that are indicated by numbers

Categorical variables

variables that take on values that are names or labels

Outlier

An outlier is an observation of data that does not fit the rest of the data. It is sometimes called an extreme value. When you graph an outlier, it will appear not to fit the pattern of the graph. Some outliers are due to mistakes (for example, writing down 50 instead of 500) while others may indicate that something unusual is happening. an observation that does not fit the rest of the data

Continuous

Data is continuous if it is the result of measuring (such as distance traveled or weight of luggage) *Infinite - not countable (measure foot (cm))

Discrete

Data is discrete if it is the result of counting (such as the number of students of a given ethnic group in a class or the number of books on a shelf). *Finite - countable (shoe size)

Nominal

Data that is measured using a nominal scale is qualitative(categorical). Categories, colors, names, labels and favorite foods along with yes or no responses are examples of nominal level data. Nominal scale data are not ordered.

Research questions

*50% of marriages end in divorce *Cats are 33% more likely to get cancer in a smoking household *75% of people who shout out statistics are wrong *1 in 3 children are over weight *Men are more likely to be struck by lightning * Attending pre-school increases chance of college graduation *Self-driving v. people (bullies)

Random sample

*Any individual is as likely as any other individual (to be selected)

Factors which influence sample size

*Population size (N) *Resources *The amount of error tolerated *The amount of variation in the population

Skewed left

*Skewed negative (left) - tail is on the left

Skewed right

*Skewed positive (right) - tail is on the right

Shapes of frequency distributions (histograms)

*Unimodal - freq. distribution where one value occurs more often. *Bimodal - two value with approx. equal larger freq. *Multimodal - 2 or more values with high freq. *Uniform - all values have approx. the same freq. *Symmetrical - equal on both sides *Skewed - scored are piled up on one side & spread out on the other. *Skewed positive (right) - tail is on the right *Skewed negative (left) - tail is on the left

Measures of Center

*population mean(parameter) *sample mean(statistic)

Rules for a Frequency Distribution Table for a Quantitative Variable

1. Use the following rule of thumb for determining the number of class intervals: (however do not use more than 10 class intervals!) 2. Size of each interval is approximately: max-min/# of classes, but should be a simple integer. 3. Bottom of the interval should be multiples of the interval width. 4. All intervals should be the same width

Range

=Max - Min

Histogram

A histogram is a graphic version of a frequency distribution. The graph consists of bars of equal width drawn adjacent to each other. The horizontal scale represents classes of quantitative data values and the vertical scale represents frequencies. The heights of the bars correspond to frequency values. Histograms are typically used for large, continuous, quantitative data sets.

Parameter

A parameter is a numerical characteristic of the whole population that can be estimated by a statistic. *A number that describes a population *Get a parameter by taking a census

Bar graph

Bar graphs consist of bars that are separated from each other. The bars can be rectangles or they can be rectangular boxes (used in three-dimensional plots), and they can be vertical or horizontal.

Inferential statistics

Formal methods for drawing conclusions from "good" data. The formal methods are called inferential statistics. Statistical inference uses probability to determine how confident we can be that our conclusions are correct. *Techniques which allow us to study samples & make generalizations about the populations from which they came

IQR

Interquartile Range or IQR, is the range of the middle 50 percent of the data values; the IQR is found by subtracting the first quartile from the third quartile. *IQR = Q3 - Q1

Normal curve

Relative frequency histograms that are symmetric and bell-shaped are said to have the shape of a normal curve

Descriptive statistics

Organizing and summarizing data is called descriptive statistics. Two ways to summarize data are by graphing and by using numbers (for example, finding an average). *Statistical procedures used to summarize, organize & simplify data.

Qualitative data

Qualitative data are the result of categorizing or describing attributes of a population. Qualitative data are also often called categorical data. Hair color, blood type, ethnic group, the car a person drives, and the street a person lives on are examples of qualitative data. *Nominal-eye color/major/name/category that cannot be ordered *Ordinal (rank-order)- movie ratings/military rank/categories that can be ordered

Quantitative data

Quantitative data are always numbers. Quantitative data are the result of counting or measuring attributes of a population. Amount of money, pulse rate, weight, number of people living in your town, and number of students who take statistics are examples of quantitative data. *Interval-year/temp/time of day/can be compared by looking at differences (no inherent zero) *Ratio-weight/age/# of children/counts or measurements where 2 numbers can be compared using ratios

Symmetric distribution

Symmetrical distribution is a situation in which the values of variables occur at regular frequencies, and the mean, median and mode occur at the same point. *Symmetrical - equal on both sides

The law of large numbers

The law of large numbers is a principle of probability according to which the frequencies of events with the same likelihood of occurrence even out, given enough trials or instances. As the number of experiments increases, the actual ratio of outcomes will converge on the theoretical, or expected, ratio of outcomes.

Median

The median is a number that measures the "center" of the data. You can think of the median as the "middle value," but it does not actually have to be one of the observed values. It is a number that separates ordered data into halves. Half the values are the same number or smaller than the median, and half the values are the same number or larger. *the value that lies in the middle of data arranged in order.

Sample variance

The sample variance is equal to the sum of the squares of the deviations divided by the difference of the sample size and one.

Population variance

The symbol σ2 represents the population variance; the population standard deviation σ is the square root of the population variance.

Variability in samples

The term "sampling variability" refers to the fact that the statistical information from a sample (called a statistic) will vary as the random sampling is repeated.

Cumulative Relative Frequency

The term applies to an ordered set of observations from smallest to largest. The cumulative relative frequency is the sum of the relative frequencies for all values that are less than or equal to the given value. The Cumulative Relative Frequency is defined by the frequency at or below that value divided by the sample size. First add up the number of students who are taking 2 or fewer courses then divide by the sample size 50.

Levels of Measurement

The way a set of data is measured is called its level of measurement. Correct statistical procedures depend on a researcher being familiar with levels of measurement. Not every statistical operation can be used with every set of data. Data can be classified into four levels of measurement. They are (from lowest to highest level): Nominal scale level Ordinal scale level Interval scale level Ratio scale level *Categorical (Qualitative) -nominal -ordinal (rank-order) *Quantitative -interval -ratio

Mean

The words "mean" and "average" are often used interchangeably. The substitution of one word for the other is common practice. The technical term is "arithmetic mean," and "average" is technically a center location. However, in practice among non-statisticians, "average" is commonly accepted for "arithmetic mean."

Raw Scores & Z-Scores

The z-score represents the number of standard deviations away from the mean to the value (x).

Variables

a characteristic of interest for each person or object in a population *Variable - characteristics or condition that changes or has different values for individuals *Values - examples numerical (16-100) or categorical (child, adult,senior) *Individual score - 39 or adult

Relative frequency table

a data representation in which grouped data is displayed along with the corresponding frequencies

Box plots

a graph that gives a quick picture of the middle 50% of the data

Cluster sample

a method for selecting a random sample and dividing the population into groups (clusters); use simple random sampling to select a set of clusters. Every individual in the chosen clusters is included in the sample. * Divide into sectors & randomly choose some sectors <---All members (classes)

Stratified sample

a method for selecting a random sample used to ensure that subgroups of the population are represented adequately; divide the population into groups (strata). Use simple random sampling to identify a proportionate number of individuals from each stratum. *Ethnicities/Majors/ect. (groups - have quotas to have proportionate representation)

Systematic sample

a method for selecting a random sample; list the members of the population. Use simple random sampling to select a starting point in the population. Let k = (number of individuals in the population)/(number of individuals needed in the sample). Choose every kth individual in the list starting with the one that was randomly selected. If necessary, return to the beginning of the population list to complete your sample. *Every 10th person (every 10th person is selected)

Convenience sample

a nonrandom method of selecting a sample; this method selects individuals that are easily accessible and may result in biased data. *Easiest (front row)

Percentiles

a number that divides ordered data into hundredths; percentiles may or may not be part of the data. The median of the data is the second quartile and the 50th percentile. The first and third quartiles are the 25th and the 75th percentiles, respectively.

Population standard deviation

a number that is equal to the square root of the variance and measures how far data values are from their mean; notation: s for sample standard deviation and σ for population standard deviation.

Sample standard deviation

a number that is equal to the square root of the variance and measures how far data values are from their mean; notation: s for sample standard deviation and σ for population standard deviation.

Statistic

a numerical characteristic of the sample; a statistic estimates the corresponding population parameter. *A number that describes a sample.

Data

a set of observations (a set of possible outcomes); most data can be put into two groups: qualitative (an attribute whose value is indicated by a label) or quantitative (an attribute whose value is indicated by a number). Quantitative data can be separated into two subgroups: discrete and continuous. Data is discrete if it is the result of counting (such as the number of students of a given ethnic group in a class or the number of books on a shelf). Data is continuous if it is the result of measuring (such as distance traveled or weight of luggage)

Simple random sample

a straightforward method for selecting a random sample; give each member of the population a number. Use a random number generator to select a set of labels. These randomly selected labels identify the members of your sample. *Every 'N' individuals as likely as any other 'N' individulas

Sample

a subset of the population studied *A piece of the population (subcollection)

Researcher Elisabeth Kvaavik and others studied factors that affect the eating habits of adults in their mid-thirties. Classify each of the following variables considered in the study as qualitative or quantitative.

a. Nationality QUALITATIVE b. Number of children QUANTITATIVE c. Household income in the previous year QUANTITATIVE d. Level of education QUALITATIVE e. Daily intake of whole grains (measured in grams per day) QUANTITATIVE

Researcher Elisabeth Kvaavik and others studied factors that affect the eating habits of adults in their mid-thirties. Classify each of the following quantitative variables considered in the study as discrete or continuous.

a. Number of children DISCRETE b. Weight (in theory) CONTINUOUS c. Daily intake of whole grains (measured in grams per day) Scale - CONTINUOUS Facts - DISCRETE

Determine whether the variables are qualitative (at the Nominal or Ordinal level) or quantitative (discrete or continuous).

a. Number of snack and soft drink vending machines in the school QUANT - discrete b. Whether or not the school has a closed campus policy during lunch QUAL - Nominal c. Class rank (Freshman, Sophomore, Junior, Senior) QVAL . Ordinal d. Distance to the closest elementary school QUANT - Continuous e. Number of days per week a student eats school lunch QUANT - discrete f. Nationality of a student QUAL - nominal g. Time in line to buy groceries QUANT - continuous h. A student's place on the waiting list(first,second,...) QUAL - Ordinal


Related study sets

Chapter 8 - Discrete Random Variables

View Set

Exam 4 Genetics and Evolution #1

View Set

Organic Chemistry - Chapter 22 (CHEM 2)

View Set

World Regional Studies Semester 2 Final

View Set

Key Terms Ch. 7:5 Muscular System

View Set

Chapter 9: Grounds Upon Which a Contract May Be Set Aside: Mistake and Misrepresentation

View Set