Chapter 11: Analyzing and Interpreting Data: Descriptive Analysis

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

What is linear correlation used for? What is the degree of the relationship expressed by?

-the method used most frequently to describe the relationship between two or more variables or between two or more sets of data. -The degree of relationship is expressed by the *coefficient of correlation (r).*

What can SD be used to define?

SD can be used to define extreme values and to characterize a distribution

Bi-modal distribution

ex: Height distribution of Men & Women

Positive or right skew

ex: income distribution in US -Non-symmetric around the mean -Disproportionate number of observations below the mean -Mean, median and mode are unequal Mode < Median < Mean -Few outliers at the very "high" end of the distribution (long "right" tail)

What is equal in normal distribution

mean = median = mode

Nominal measurement

variables are simply placed into different categories. (ex: race/ethnicity)

When is the median usually reserved for?

when a quick measure of central tendency is needed or when distributions are markedly skewed

How is standard deviation computed?

*The standard deviation is computed by obtaining the square root of the variance.* This takes away the squaring of deviations, thereby making the standard deviation the same unit of measurement as the mean.

What does a correlation coefficient of zero indicate?

*no relationship* between the variables in question.

Like the Pearson's r, the Spearman rank rs ranges in value from ______ to ______

-1 to 1

Define population

-Can be defined as the set of elements we are planning to study. -In the health sciences it usually refers to *a group of people.* -Can be something other than a group of people, such as all daily maximum temperatures or all automobiles produced in a given time frame.

Nonparametrical data

-Data that are either counted (nominal scale) or ranked (ordinal scale) -Nonparametrical statistical tests, often referred to as distribution-free tests, *do not require the more restrictive assumption of a normally distributed population*. -Generally, have wider applications and are less difficult to compute.

Parametric data

-Either interval or ratio data -Parametrical statistical tests assume that the data are *normally or near normally distributed*. -Are frequently considered the more powerful of the two.

What is inferential statistical analysis? Can generalizations be made? What is it used for?

-Involves observation of a sample taken from a given population. -Conclusions about the population are inferred from the information obtained about the sample. -Unlike descriptive data analysis, *generalizations can be made* from the sample to the respective population. -Can be used for *estimation and prediction.*

What are two unique characteristics of the correlation coefficient?

-It is a pure number. -It may take on values between -1.00 and +1.00

Negative or left skew

-Non-symmetric around the mean -Disproportionate number of observations are above the mean -Mean, median and mode are unequal Mean < Median < Mode -Few outliers at the very "low" end of the distribution (long "left" tail)

Parameter vs. Statistic

-Parameter: a characteristic of a population. -Statistic: a characteristic of a sample.

Examples of computer programs that can be employed by the health science researcher include:

-Statistical Analysis System (SAS) -Statistical Package for the Social Sciences (SPSS for Windows) -Stata

Pearson Product-Moment Correlation

-The most often used and most precise coefficient of correlation -Used with parametric data -The raw score equation that is convenient for both calculator and computer use

Spearman Rank Order Correlation

-Used to determine the relationship between *two ranked variables* (rather than interval or ratio variables) -Designed for nonparametric data -May be employed to compare judgments by a group of judges on two objects or the scores of a group of subjects on two measures. -A less frequent but valuable use is to compare judgments by two judges on a group of objects or items.

Why is the mean one of the most useful statistical measures? (3 reasons)

1) Provides much information. 2) Is affected by all the scores. 3) Serves as a basis for the computation of other important measures, such as variability.

What are two measures of relationship?

1) Spearman rank order correlation 2) Pearson product-moment correlation

What are the 3 measures of central tendency?

1) mean 2) median 3) mode

What are the four levels of measurement?

1) nominal 2) ordinal 3) interval 4) ratio

What are the two types of data that are recognized in the application of statistics?

1) parametric data 2) nonparametric data

What are 3 measures of spread or variation?

1) range 2) standard deviation (sq rt of variance) 3) variance

What is the 68-95-99.7 rule?

68% of the area under the normal curve lies in the region µ ± 1σ 95% of the area under the normal curve lies in the region µ ± 2σ 99.7% of the area under the normal curve lies in the region µ ± 3σ AKA the "empirical rule"

What does a correlation of -1.00 mean?

A perfect negative correlation of -1.00 means that for every unit increase in one variable, there is a unit decrease in the other variable.

What does a correlation of +1.00 mean?

A perfect positive correlation of +1.00 specifies that for every unit increase in one variable, there is a proportional unit increase in the other variable.

Define sample

A subset of the population

What does variability allow us to do?

Allow us to make generalizations from a sample to a larger population

What is kurtosis?

Amount of peakedness in a distribution

What is descriptive statistical analysis?

Can function to describe data: to explain *how the data look*, what the *center point* of the data is, how *spread out* the data may be, and how one aspect of the data may be *related to* one or more other aspect.

Slides 35-37

Example for calculating variance and SD

Definition of standard deviation

Horizontal distance between the mean and the point of inflection or change on the curve in a normal distribution

What is extrapolation a component of? descriptive or inferential statistics?

Inferential statistics

What does a lack of variability limit?

Lack of variability limits our ability to make meaningful statistical associations

Why is range not generally employed?

May be used justifiably as a hasty measure of variability; but, since it takes into account only the extremes and not the bulk of observations, it is not generally employed.

What does variability measure?

Measures the "spread" (width) of a distribution

______ is the quickest estimate of central value and shows the most typical case

Mode

When doing descriptive statistical analysis, can conclusions extend beyond the immediate group?

NO, conclusions can be extended beyond this immediate group, and any similarity to those outside the group *cannot be assumed*

Leptokurtic

Narrow, tall peak

Is the median influenced by extreme scores?

No, but the mean is In some instances, it may be a more realistic measure of central tendency than the mean.

What is Pearson's r very sensitive to? How is this problem fixed?

Pearson's r is very sensitive to outlying values. One way of alleviating this problem is to "rank" the sets of outcomes x and y separately and calculate a coefficient of rank correlation.

What is the simplest measure of variation?

Range

Platokurtic

Short, wide peak

Why do you square the values when calculating variance?

Since the the sum of the values above the mean (positive values) will, by definition, equal the sum of the values below the mean (negative values), you must square the difference between each value and its mean, sum them and then average them to get the variance

What is are most useful measures of variation?

Standard deviation and variance

What is a regression line?

The line drawn through or near the coordinate points is referred to as the line of best fit or regression line

What is a scatterplot?

The scatterplot is a means of displaying the relationship between variables and is developed by graphically plotting each pair of variables that correspond to the x and y axis.

How is variance obtained?

Variance is obtained from squared deviations from the mean, thereby making the variance a different unit of measurement than the mean.

Ratio measurements

When interval data have a *true zero* point. (Ex: height, weight, blood pressure)

When is more info needed than central tendency?

When you are comparing two groups whose means are identical -In such situations, it is important to know whether the scores or observations for each group tend to be quite similar (homogeneous) or spread apart (heterogeneous).

Interval measurements

categorizes, orders and provides a meaningful measure of the differences in ordering. Variables can be separated by how much they differ from one another. (ex: Celsius temperatures)

Ordinal measurement

measures both groups and ranks the data through ordering of categories. (ex: dosage levels, degree of education, severity of illness, and social class) Rankings may be related rather than absolute.

Which is more robust: Spearman rank rs (the s is little) or Pearon's r? Why?

rs is a more robust measure of correlation. It can be used when one or both variables are ordinal.

Standard deviation determines how ______ the data are about the mean

scattered

Define deviation

the distance of the measurements away from the mean.


Set pelajaran terkait

Health Assessment Ch 2 Interviewing Patients to Obtain a Health Assessment

View Set

Introduction to Science and the Scientific Method

View Set

Chapter 3 - Trials and Resolving Disputes

View Set

The Scarlet Letter study guide questions - McHale

View Set

3 - Cell Biology (ER, Golgi, and Vesicles)

View Set

Infection Control Chapter 28 Questions

View Set