Agresti Statistics 1a

¡Supera tus tareas y exámenes ahora con Quizwiz!

Shapes of Distributions

1. symmetric 2. assymetric/ non-symmetric: => skewed right (positive skew) => skewed left (negative skew)

data

- Observations gathered

Parameters

- Parameters are characteristics of populations. = numerical summary of the population =>They are not known (in fact, they are often what you want to know) =>Example includes population mean, population variance, the population median

Variance

- the square of the standard deviation - Question: How far, on average, are observations from the mean?

standard normal distribution

Properties: - mean: 0 - variance: 1 - standard deviation: 1 - median: 0 - mode: 0

Population

- larger set of data from which the sample is drwan =>Actual population: inferences apply to this population =>Conceptual population: generalization (hypothetical)

descriptive statistics

- no generalization beyond the data at hand

How do you represent the different scales of measurement?

- nominal & ordinal scales (!are qualitative/ categroical!) => Plot; e.g. Bar graph, pie chart - interval & ratio (!are quantitative!) => Plot; e.g. histogram, stemplot

What to do about outliers?

- remove them - examine whether the outliers signal a problem with your sampling - obtain more data maybe they are not really outliers)

Sample

- small subset of a larger set of data (=population) - one score in this subset is called a "sample point" - sampling has to be random

range

- the difference between the highest and lowest scores in a distribution - Range = max - min

Lower Quartile (Q1)

- the median of the lower half of the data

Upper Quartile (Q3)

- the median of the upper half of the data

statisfied (geschichtet) sampling

- the random sampling from (each) subgroups in a population => used if the population has distinct number of "strata" or groups - sizes of the subgroups in the sample = proportional to their sizes in the population - subgroups are often randomly divided into treatment group and control group (e.g. taking a test without sleeping (condition) vs. with sleep (control)??)

How to identify outliers?

- use histograms and boxplots - the 1,5 x IQR rule => an observation that falls more than 1,5 x IQR below 1st quartile or above 3rd quartile is a suspected outlier

sampling bias

- when the sample over-represent one kind of group at the expense of others

Match the question with the suitable summary measure. 1. Where is the "center" of the data? 2. Where does the data tend to cluster? 3. How spread out is the data? How different are observations from one another? 4. Is the distribution symmetric?

1. central tendency: median, mean 2. mode 3. spread: range, variance, standart deviation, interquartile range 4. shape: skewness, outliers

Sampling distribution: 3 very important facts in statistics!

1. on average, the sample mean x̅ will not be too far from the population mean μ 2. statement: 95% of the time, the sample mean is within X units of the population mean => how small X depends on sample size n; larger n = smaller X 3. We can use this knowledge to create "confidence intervals" to learn about population parameters

variables

= properties or characteristics that can vary in value among subjects in a sample or population

Histogram

=>A graph of vertical bars representing the frequency distribution of a set of data. - Each bar is a 'bin'. - There is a bin for every range of numbers in the data - All bins have the same width - The height of a bin is the number of observations in the bin - There are no gabs between bins (Note: Difference with bar graph!)

qualitative (categorical) variable

A variable that cannot assume a numerical value but can be classified into two or more nonnumeric categories

Which four scales of measurement are there? Are they qualitative/ categorical OR quantitative?

Nominal => are qualitative/ categorical (=no scores) Interval =>qualitative/ categorical Ordinal => are quantitative (=uses scores/ numerical) Ratio => are quantitative

Normal distribution

Properties: - mean: μ - variance: σ² - standard deviation: σ - median: μ - mode: μ => If X is a value that comes from a Normal distribution, we say: X ∼ N (μ,σ)

What are the two kinds of distribution?

diescrete and continuous distribution

Data

evidence; information gathered from observations

Descrete data

- "between" numbers are meaningless -Example: How many siblings do you have? (2 and 3 are possible answers, but 2,5 not!)

Continious data

- "between" numbers have meaning -Example: How tall are you? (all positive real numbers are meaningful answers)

What is an outlier?

- An outlier is a value that appears to be unusually large or small, given the rest of the data. - Look around the room. Would a 175cm tall person be unusual? A 100cm tall person?

treatment

- Condition in an experiment

Reliability

- Consistent in the sense that a subject will give same response when asked again

Continuous distribution (probability density!)

- Continous distribution functions are described by probability density

Validity

- Describing what is intended to measure and accurately reflecting the concept

diescrete distribution (probability!)

- Discrete distribution functions are described by probability

Leptokurtic distribution

- Distribution curve is very tall, thin and peaked. => More scores in its tail

Distributions

- Distributions = Models of populations => It is possible to observe single samples with absolute precision, but we can not observe populations in science => therefore creating models (=distributions)

independent vs. dependent variable

- Effects of independent variable on dependent variable are measured

Platykurtic distribution

- Flatter and more spread out than a normal curve. => (Memory: 'Plat' sounds like 'flat') => Fewer scores in its tails

Confidence Interval (CI)

A range of values, calculated from the sample observations, that is believed, with a particular probability, to contain the true value of a population parameter. A 95% confidence interval, for example, implies that were the estimation process repeated again and again, then 95% of the calculated intervals would be expected to contain the true parameter value. Note that the stated probability level refers to properties of the interval and not to the parameter itself which is not considered a random variable.

Ordinal scale

- assigns observations to ordered categories - categorical: e.g.: How good are you in sports? Choose from very poor, satisfactory, very good; Service quality,

Nominal scale

- assigns observations to unordered categories - identity / labels; e.g., gender, martial status, car owned

Interval scale

- assigns scores on a scale with equal intervals - e.g. thermometer; Δ400C-500C = Δ200C-300C - However, one can't say 100C is twice as hot as 50C => Which implies that 0C is not the absolute minimum -e.g. standardized exam score

Ratio scale

- assigns scores on a scale with equal intevals and a true zero point. - e.g. weight, height, age, weekly food spending

What are the 3 types of distribution?

- bimodal distribution - leptokurtic distribution - platykurtic distribution

Database

- existing archived collection of data

Inferential Statistics

- generalizating data to other set of cases - based on idea that sampling is random

standard deviation

the square root of the variance

mean of a sample

x̅ (x bar)

population mean

μ

population variance

σ²

Computing upper & lower quartile - Example:

- Formula: R= P/100 x (N+1) => R=Rank, P = desired percentile, N = number of all numbers 1. Order the given sample points from the sample by size, compute the formula. 2. Define IR as the integer (ganze Zahl) portion of R (the number to the left of the decimal point of the formula's result), e.g. R=2.25 => IR = 2 3. Define FR as the fractional portion of R (the number to the right of the decimal point) => FR = 0.25. 4. Find the scores with Rank IR and with Rank IR + 1 (take a look at your ordered sample points) => e.g. score with Rank 2 and the score with Rank 3 (the scores at the 2rd or 3rd position in the ordering) 5. Interpolate by multiplying the difference between the scores by FR and add the result to the lower score => (0.25)x(7 - 5) + 5 = 5.5. => Therefore, the 25th percentile (=lower quartile) is 5.5.

population

- Is defined with respect to the psychological question - can be abstract - are usually large - Computing: settle for a sample and take inferences to the population =>populations are not random! (Populations stay the same over samples)

Example: population vs. sample

- Males in Germany average 182cm in height. => statement about the population - The males in this class average 180cm. => statement about the sample

Interquartile Range (IQR)

- Q3-Q1, the middle 50% of the data -Finding quartiles is similar to finding the median. =>First quartile (Q1): observation such that 25% ob observations are less than or equal to it. =>Second quartile (Q2): observations such that 50% of observations are less than or equal to it. (median!) =>Third quartile (Q3): observation such that 75% of observations are less than or equal to it. -IQR is the range of the middle 50% of the data 1. find Q1 and Q3 2. subtract Q1 from Q3 to get the interquartile range

Mean

- Question: What is the "average" number? - Computing: Take all the values together and divide them by n (n = total number of observations in the sample) - The sample mean is the most important measrue of central tendency - It is the point where the SUM of all deviations from it are 0. It can also be thought of as the "balancing point" of the sample. - The mean of observations is often abbrviated x^- (pronounced "x bar") =>an observation is often displayed as x_j

Median

- Question: What is the "middle" number? - Computing: Odd number of values: median = middle value; even number of values: take the two "middle" values and divide them by 2 - median = (also) the middle quartile (Q2) or 50th percentile

Mode

- Question: What is the most common number? Where do the data tend to cluster? - Computing: Looking at the sample. The number that is sampled most of the times.

simple random sampling

- Sampling is random => equal change to be selected for every member of population; independent selection of members

Skewness

- Skew describes the symmetry of a sample => If a histogram is symmetric, it has no skew =>If it is not symmetric, it can have left skew (negative skewness) or right skew (positive skewness).

Statistics

- Statistics are computed from a sample => They are known with absolute precision. => Example include x^- (mean of sample), s^2 (variance of sample), the median

Leaf display

- The angle to a stem - The leaf is next to the stem

box plot

- The graphical representation of the five-number-summary - Functions: 1. They give a quick representation of the important properties of a sample. 2. You can fit many of them in a small area (histograms are not good for this!)

dependent variable

- The outcome factor; the variable that may change in response to manipulations of the independent variable.

What is measurement?

- The process of applying numbers to objects according to a set of rules

back-to-back stemplot

- Used to compare two sets of data. - The leaves for one set of data are on one side of the stem, and the leaves for the other set of data are on the other side. - Good for displaying distributions

Five-number-summary

- Using the range and the quartiles, we can describe any distribution in five numbers. => i.e.: minimum, Q1, median, Q3, maximum

independent variable

- Variable manipulated by experimenter - Number of levels of an independent variable = the number of experimental conditions

What is a density curve?

- With a density curve, probabilities correspond to the AREA under the curve => total area under the curve is 1

quantitative variable

- a characteristic that can be measured numerically => e.g. numerical values on a measurement scale - divided into: 1. discrete variables: => variables with finite number of possible values => e.g. number of children 2. continuous variables: => variables with infinite number of possible values =>e.g. stars in universe

bimodal distribution

- a distribution with two modes => two distinct peaks

Sampling

- are observed, are the object of analysis - are drwan from a particular population to answer a psychological question - provide data so that we can learn about the population, without examining the whole population => Samples vary and are random

Why are outliers important? Give 3 reasons.

1. An outlier may represent a sample from outside the population of interest 2. Outliers have a large effect on many statistics (like mean, standard deviation) 3. Outliers may represent errors in the data (Example: a height of 100cm)

What is the right measurement scale? 1. How many siblings do you have? 2. How many cigarettes do you smoke per day? 3. How many degrees celsius is it in the rrom? 4. What color do you prefer? 5. On a scale from extremely dissatisfied (0) to extremely satisfied (9), how satisfied are you with your life?

1. Ratio scale (true zero point/ equal intervals) 2. Ratio scale (true zero point/ equal intervals) 3. Interval (no true zero point) 4. Nominal scale (no order) 5. Ordinal (ordered/ no equal intervals)

What are the 3 types of distribution?

1. Standard normal distribution 2. Normal distribution 3. Binominal distribution

What are the two (complementary) approaches to data analysis?

1. Summarize data graphically 2. Summarize characteristics of data with numbers (numerically/ summary measures => i.e. center, spread, shape)


Conjuntos de estudio relacionados

Chapter 3: Life Insurance Policies - Quiz

View Set

C13 L3: What makes up our solar system?

View Set

Fetal Alcohol Spectrum Disorders

View Set

Medical-Surgical: Musculoskeletal

View Set

Chemistry Exams - For Final Exam

View Set

PREP U CH 55 Caring for Clients with Disorders of the Male Reproductive System

View Set

Adam Smith- Excerpt from Wealth of Nations

View Set