Chapter 5 - Basic Stats Concepts, and Descriptive Stats

¡Supera tus tareas y exámenes ahora con Quizwiz!

Factors that influence reliability

- Administration errors - Test length - the more Q's the better - Item homogeneity: similar the questions are better - Item difficulty - Interval between tests - error effects - Objectivity: responses that are subjective & involve observations add more room for error - Testing environment/student factors: fatigue/illness, temperature of environment, etc.

Classical test theory subsumed 3 primary types of evidence that support the validity of a measure

1. Construct 2. Content 3. Criterion

To compute variance use the following steps

1. Find the mean 2. Find the difference between each observation and the mean 3. Square each difference score 4. Sum the squared differences 5. Because data is a sample, divide the sum of the squared differences (step4) by number of observations minus one (n-1)

Stanley Smith Stevens, Scales of measurement

1. Nominal 2. Ordinal 3. Interval 4. Ratio

Histogram

A bar graph depicting a frequency distribution. Bars are contiguous and represent increasing magnitude of the variable. Can also be used to show percentage and/or frequency

Symmetrical Distribution

A distribution in which the pattern of frequencies on the left and right side are mirror images of each other One peak (mode). AKA normal curve/bell curve. Curve is constant and always bell shaped.

Positively Skewed distribution

A distribution where the scores pile up on the left side and the line tapers off to the right. The tail is being pulled to the right of the x-axis The mean will be higher than the median in this distribution.

Kuder-Richardson Formula 20

A measure of homogeneity for dichotomous responses (yes/no, correct/incorrect), which functions under the assumption that all items on a test measure the same thing or are the same difficulty level a type of split-half that is a formula to compute an average between all the possible splits to yield an overall reliability estimate

Standard Deviation

A measure of variability that describes an average distance of every individual score from the group mean. We interpret this as indicating how far something is from the mean and how many points in our sample fall within that distance

Standard normal distribution

A normal distribution with a mean of 0 and a standard deviation of 1.

T-Scores

A test score that is converted to a normal distribution that has a mean of 50 and a standard deviation of 10. Always whole numbers and always positive.

Criterion Validity

A test that predicts an outcome based on info' from other measures

Negatively Skewed Distribution

Asymmetric distribution in which the majority of the data is concentrated to the right of the mean The tail is being pulled to the left of the x-axis The mean will be lower than the median in a negatively skewed distribution.

Nominal Scale of measurement

Categorical or grouping data. Assigning observations into various independent categories and then counting the frequency of occurrence within each of the categories. Can use Chi-square tests can be used with this type of data

Ways to calculate inter-rater reliability

Cohen's Kappa and variations of kappa

Evidence of Criterion Validity

Concurrent & Predictive evidence

Standard Scores in a normal distribution

Converting sample/raw scores to a standard metric helps to compare across a sample and have a standard scale of reference. Allows us to determine probability of a particular score occurring within our sample.

Mean

the arithmetic average of a distribution, obtained by adding the scores and then dividing by the number of scores

Median

the exact middle score in a distribution; half the scores are above it and half are below it. List all scores in numerical order and locate the center score. With odd number of values can compute by: Md= N+1/2

Validity

the extent to which a test measures or predicts what it is supposed to The applicability, meaningfulness, and usefulness of the specific inferences made from scores

Reliability Coefficient

the measure of the degree of reliability of an assessment ranges from 0.00 - 1.00, higher numbers means greater reliability .90 or higher is preferred

Standard Deviation formula

the square root of the variance; converts the variance back into same units as raw data Subtracting the SD from mean gives you the lower limit, Adding SD from mean gives you upper limit

Z-Score formula

z=(x-mean)/standard deviation z = (x - μ)/σ

Internal Consistency

Correlating the individual items of a test to each other High reliability means that the test has homogenous items, measures a single construct, and correlates highly

68-95-97.7 Rule

Designates the actual percentages of the data for any normal curve that can fall between 1,2,3 standard deviations from the mean. -68% of values are within 1 standard deviation of the mean. -95% are within 2 standard deviations. -99.7% are within 3 standard deviations.

Range of variability

Difference between highest and lowest scores Does not include all observations/data points.

Split-half reliability

Divide a test into halves (odd vs. even q's) and correlate scores on each half and then correcting for length The longer the more reliable

Construct Validity; Discriminant Evidence

Extent to which scores on a test do not correlate with or negatively correlate with scores on another test that was intended to measure a different construct Ex. If this validity is high, scores on a test designed to assess X should NOT be highly correlated with scores from tests designed to assess Y Correlations between theoretically dissimilar tests should be "low"

Variability of Distribution

Extent to which the scores in a distribution differ from one another. Three measures are: Range Variance Standard Deviation

Outliers

Extreme values which are on the tails of the distribution and create the skew

When is reliability generally more important than validity?

For the purposes of estimating consistency and stability of the scale

When is validity generally more important than reliability?

For the purposes of predicting future behavior, scores, success, and performance, and diagnostic purposes

Frequency polygon

Graph of a frequency distribution that shows the number of instances of obtained scores, usually with the data points connect by straight lines. Appropriate for Quantitative data, especially for continuous data (height or age).

If a distribution has variability it is called

Hetergenous

Ratio Scale of measurement

Highest form of measurement and meets all of the rules of other forms of measurement Has a true Zero point Can use all statistical procedures in data analysis

Most common methods of graphing a distribution?

Histogram (bar graph) and Frequency polygon (percentage)

If a distribution is lacking in variability it is called

Homogenous

Kurtosis

How "flat" or "peaked" a normal distribution is; indicates how much variability there is in the distribution of scores Indicates likelihood of extreme outcomes.

Variance of variability

How close the scores in the distribution are to the mean. This is the average of the squared deviations from the mean

OS = TS + error

In classical test theory, any Observed Score (OS) consists of the True Score (TS) plus some amount of error.

Systematic errors

Increase or decrease a response by a predictable amount of each measurement

Chi-Square tests

Nonparametric tests used to determine frequency data from two samples or between observed and expected frequencies.

Interval Scale of measurement

Ranking items, but with equivalent and meaningful distances between scale points. Does NOT have a true Zero point. This measurement can help to find: - Mean and Standard Deviation - Correlation and regression - Analysis of variance (ANOVA) - Factor Analysis

Construct Validity; Convergent Evidence

Refers to the degree to which scores on a test correlate highly and positively with scores on other tests that are designed to assess the same construct or trait Correlations between theoretically similar tests should be "high" - when they assess for the same construct/trait

Stanine

Standard nine scale. A method of scaling test scores on a nine-point standard scale with a mean of five (5) and a standard deviation of two (2). These are useful in comparing scores across different content areas

Types of distributions of visual data (graphs)

Symmetrical Skewed Multimodal

T-Score Formula

T= 50 + 10z Have to have the z-score to find the t-score

Criterion Validity; Concurrent Evidence

Test scores and criterion measures are obtained at the same time. Ex. IQ test scores compared to student's most recent school grades would be assessing the concurrent evidence of the IQ scores

Reliability

The extent to which a test yields consistent results. Free from error and provides consistent results. Focuses only on degree of nonsystematic or random error in assessment.

Content Validity

The extent to which the measurement accurately samples the content domain. Extent to which a test-taker's responses to a given test reflect that test-taker's knowledge of the content area

Construct Validity

The extent to which the test is an accurate measure of a particular construct or variable. Is the test measuring what we think it should be measuring?

Mode

The most frequently occurring score(s) in a distribution. In a frequency polygon, this score is the peak of the distribution. Useful to find the most common category when studying nominal variables

Measures of central tendency

These are intended to describe the most average or "typical" score in the distribution. Mean, Median, and Mode are most common.

Descriptive Statistics

These are used to explain the basic characteristics of study data, including describing the numbers and what they show. They help us to simplify large amounts of data by providing a summary that may enable comparisons across people or other units. A bridge between measurement and understanding.

Skewed Distribution

When a variable does not fall within a normal distribution. An asymmetrical but generally bell-shaped distribution; its mode, or most frequent response, lies off to one side

Can you have reliability without validity?

Yes, but you CANNOT have validity without reliability

Criterion Validity; Predictive Evidence

Yields scores at a later time from administration of test. Test scores are kept on record and compared with a criterion measure obtained sometime in the future. Ex. High school math test yields predictive evidence of validity if it can predict some aspect of college performance

Ordinal Scale of measurement

divide observations into categories and provide measurement by order and rank. Does NOT take intervals into account; a higher number is a higher value, but intervals between numbers are not necessarily equal. Allow to calculate mean and median but not mean. Ex: Likert Scale

Random errors

error that results from chance alone and influences measures arbitrarily.

Evidence of internal structure

evidence that the items on the measure relate to one another (one factor or multiple components of construct) in a way that reflects the theoretical basis of the construct Encapsulates how well the structure and relationship of the variables in the assessment correspond with the theoretical understanding of the construct

Inter-rater reliability

for tests with subjective responses: this assesses the consistency or agreement between two or more scorers when making independent ratings of a particular construct

Multimodal Distribution

frequency distribution with two or more high frequencies separated by a lower frequency; a bimodal distribution is the special case of two high frequencies

Z-Scores

indicates by how many standard deviations a score is above or below the mean. 0= mean 1= 1 standard deviation above the mean -1= 1 standard deviation below the mean

Level of measurement of a variable

is a classification that describes the nature of information contained within the numbers that are assigned to that particular variable.

Cronbach's alpha

An indicator of internal consistency reliability assessed by examining the average correlation of each item (question) in a measure with every other question.

Evidence of Response processes

Analysis of responses to individual test items, rationale of responses, performance strategies, even possibly eye movement and response times

Stanine Averages

Above average = 9,8,7 Average = 6,5,4 Below Average = 3,2,1

Parallel Form Reliability

Administer similar (Form A and Form B), but not identical tests and correlate scores Most expensive and demanding test for reliability

test-retest reliability

Administer the same test twice and correlate the scores The closer the scores, the greater the reliability

Frequency distribution

An arrangement of data into a table that indicates how often a particular score or observation occurs. The most common way to describe a single variable and display the chaos of numbers in an organized manner.

Evidence of consequences of testing

An examination of the possible consequence of using a particular assessment. Ensures that no harm comes from using the assessment


Conjuntos de estudio relacionados

VARCAROLIS Chapter 28: Child, Older Adult, and Intimate Partner Violence

View Set

Wink wink get to know you questions

View Set

ITN100 Exam 1 Chapter 3 End of Chapter Questions

View Set

Chapter 15: Air Pollution and Stratospheric Ozone Depletion (APES)

View Set

Chapter 3 EEO & a Safe Workplace

View Set