CHATER 9 Descriptive Statistics, Significance Levels, and Hypothesis Testing
mode
score that appears the most often in a dataset 2. bimodal, multimodal which means that more than one score has the largest frequency of occurrence. 3. most are bimodal or multimodal - impossible for researcher to represent average in later calcs
probability level
significance level - established for each statistical test prior to computing the statistical test -the level of error the researcher is willing to accept. -symbolized in written research reports as the letter p or referred to as the alpha level
Two techniques hyp testing relies on
significance testing 2. sampling
range
simplest measure of dispersion 1. calculate by subtracting the lowest score from the highest score. 2. used to report the high and l ow scores on questionnaires - crude measure of dispersion because changing any values between the higheset and lowest scores will have no effect on it
medium or large datasets
spreadsheet program or a statistics program
Frequencies
the number of times a particular value of a variable occurs. - often used to report on the occurrence of comm events - data is at nominal level, because the researcher is making a decision for each occurrence- did this comm phenom occur or did it not? - ex: presidential elections
operationalized
the researcher must specify exactly what data were collected and how
If probability level of the statistical test is acceptable, or within the traditions of the discipline
then the finding are believed to be real, not random - inference can be presumed to be valid
descriptive statistics definition
those numbers that supply information about the sample or those that supply information about the variables 2. simply describe what is found
population inference
accepting the conclusions derived from the sample and assuming that those conclusions are also applicable to the population
If sig level computed for the stat test is .05 or less
alternative hypothesis is accepted
descriptive statistics
another set of numbers computed from the dataset 2. convey essential basic information about each variable and the dataset as a whole 3. mean, standard deviation, range, and number of cases are commonly used to provide a summary interpretation of each variable. 4. function: descriptive(summarizing), provide information about the relationships between or among variables, and help researchers draw conclusions about a population by examining the data of the sample. This use of numbers is known as inferential statistics.
Significance levels
- a criterion for accepting or rejecting hypotheses and is based on probability.
social significance
- achieving statistical significance does not guarantee social significance of the result 1. using a very large sample can create statistically significant differences that have little relevance in application 2. stat significance must always be interpreted with respect to the social and practical significance- or how the results might actually be applied or used in everyday life.
alternative hypothesis
- an assertion that states how researcher believes the variables are related or are different
skewed distributions
1. when a distribution of scores is not normal 2. one side is not a mirror image of the other 3. asymmetrical
positively skewed curve
1.represents a distribution in which there are very few scores on the right side of the distribution. 2. very few very high scores 3. most of the scores are lumped together on the left side of the curve, below the mean.
small dataset
calculator with sq root function
process inference
claim that the theory would likely work in similar situations -are the data consistent with the predictions the researcher drew from the theory? - inference based on the probability level computer for each statistical test
dataset
collection of the raw data
how to know which one is appropriate to use
compute the variability in the scores
skewness
degree to which the distribution of data is bunched to one side or the other 2. direct reflection of the variability, or dispersion, of the scores
Description of data
describe data for each quantitative variable in three ways: 1. the number of cases or data points 2. central tendency 3. dispersion or variability each of these descriptions provides information about the frequency of scores
first step whenever collecting data
develop a frequency distribution for each variable in the dataset for which you have collected quantitative data
positively Skewed distribution mean
mean will always be the largest value of the three measures of central tendency in a positively skewed distribution
negatively skewed distribution mean
mean will always be the smallest value of the three measures of central tendency
if probability level is unacceptable
no conclusion can be drawn
raw data
numerical data collected from each participant compiled into a dataset for the same variables for a sample of participants
probability
scientific term to identify how much error the researcher finds acceptable in a particular statistical test 1. in scientific research, probability is an estimate of "what you think would happen if the study were actually repeated many times, telling the researcher how wrong the results can be" 2. a calculation about the validity of the results. 3. provides an estimate of the degree to which data from a sample would reflect data from the population the sample was drawn from
how normal curve is used
scientists look for the normality of their data and the degree to which the distribution of their data deviates from the normal curve
Measures of Dispersion
to fully describe a distribution of data, a measure, or variability, is also needed. 2. two distributions can have the same mean but different spreads of scores when a measure of central tendency is used, a measure of dispersion should also be reported
disadvantages of using a spreadsheet or statistical fostware 2
- can create a false sense of securit -5 issues to consider if you use these programs for data entry and statistical comparison 1. computers can fail, programs can stall- never trust all your data to one file on one storage device 2. results can only be as good as the data entered 3. researchers tend to limit their thinking to statistical procedures they know they can do on the computer 4. the power of computing makes it possible to create an abundance of analyses 5. as the researcher you are the person responsible for the results and their interpretations
data
- information about communication phenomena -capture quality, intensity, value, or degree of the variables used in quantitative communication studies
deciding on null hypothesis
-belief in null hypothesis continues until there is sufficient evidence to make the assertion of the null hypothesis unreasonable. - decision is based on a comparison between the significance level established by the researcher prior to conducting the study and the significance level produced by the calculation of the statistical test.
Type I error
-occurs when the null hypothesis is rejected even when it is true. - error is set or controlled by the researcher when he or she chooses the significance level for the statistical test - thus if set at .05, there is a 5% chance the null will be rejected even though it is true
Type II error
-when you reject the alternative hypothesis even when it is true
Normal curve
1. "bell curve" 2. a theoretical distribution of scores or other numerical values. 3. majority of scores are distributed around the peak in the middle, with progressively fewer cases as one moves away from the middle of the distribution. 4. more responses are average or near-average than extremely high or extremely low.
Caution with using a software program
1. although programs compute the statistics tests, the program relies on you to request the appropriate test and to make the appropriate interpretations of the outcome. 2. relies on you to indicate which data should be included in the test- if u specify wrong test or indicate the wrong data to be used for the test, program will provide result but it will be wrong or noninterpretable
negatively skewed curve
1. distribution in which there are very few scores on the left side of the distribution 2. very few very low scores 3. most of scores are lumped together on the left side of the right side of the curve.
Causes of high probability level (greater than .05)
1. items on a survey or questionnaire intended to measure a construct may be so poorly written that participants respond inconsistently to them. 2. researcher can also create bias that generates unacceptable levels of probability 3. theory that was foundation of the study is not accurate orr the theory was not adequately or appropriately tested
Median
1. middle of all the scores on one variable 2. compute: arrange data in order from smallest to largest. 3. may or may not be the same as the mean for a set of scores 4. scores in the dataset can change without the median being effected 5. better to use if scores skewed better reflects middle of distribution
Mean
1. most common measure of central tendency 2. "average" computed by adding up all the scores on one variable and then dividing by the number of cases for that variable 3. most sensitive to extremely high or extremely low values of a distribution 4. most commonly reported measure of central tendency.
Application of descriptive statistic
1. reported in the method section of a written research report 2. reader can assess the normalcy of the data 3. help reader interpret the conclusions drawn from data 4. frequencies and percentages also commonly used to provide summaries of nominal data
Number of cases
1. simply indicates the number of sources from which data were collected 2. data points 3. the more data points (number of cases) the more reliable the data 4. found in the methods or results section of the written research report 5. represented by n or N 6. N= total number in a sample, n= subsample, a group of cases drawn from the sample. 7. may not always be the number of people, rather, number of speaking turns, arguments, conflict episodes, commercials, etc.
standard deviation
1. tells how close or far apart the scores are from one another 2. standard calculation and representation of the variability of a dataset 3. both mean and sd reported because mean alone is not interpretable 4. if sd is small, scores were very similar or close to one another 5. the larger the sd, the greater the degree the scores differ from the mean
another use of range
1. to describe demographic characteristics of the research participants 2. simply report the highest and lowest values 3. Ex: range 18 to 67
ways to control Type I and Type II errors
Type I control: set significance level that is appropriate Type II control: increase the sample size
Percentages
a comparison between the base, which can be any number, and a second number that is compared to the base. - frequently used to describe attributes of participants or characteristics of their communication behavior. - make it easy for the reader to get an idea of the degree to which the sample is relevant and appropriate for the study.
if probability level of statistical test is greater than .05
ex: p=.15 - finding is labeled nonsignificant - means that difference could easily be caused by chance or random error
when the probability level of a statistical test is .05 or less,
finding is real - labeled as statistically significant
Hypothesis Testing
hypotheses state the expected relationship or difference between two or more variables. - it is the null hypothesis that is statistically tested
creating a frequency distribution
list the scores in order, from highest to lowest, and then identify the number of times each score occurs. 2. create a polygon to get a good sense of the normality of the distribution. 3. range of possible scores for the variable is displayed on the horizontal axis. 4. frequencies with which those scores occur are listed on the vertical axis. 5. plot each data point according to its frequency of occurrence.
Measures of Central Tendency
primary summary form for data- -research reports do not report the data collected for every case on every variable, but report summary statistics for each variable. 1. mean 2. median 3. mode
two most common measures of dispersion
range standard deviation both provide information about the variability of the dataset
horizontal axis
represents all possible values of a variable
vertical axis
represents the relative frequency with which those values occur
reason to make significance level more rigorous (.01, .001)
results of the study have direct implications for people whos health is at risk - have to create greater certainty about the results achieved.