chapter 3 standardized assessment
reliability subtypes -spilt-half reliability:
it is a measure of internal consistency. here the examiner correlates the scores form one-half of the test with those from the other half of the test. both halves of the test should be equivalent in content and difficulty level.
Standardized and norm-referenced tests -the normal bell-shaped curve has
majority of scores in the middle (median or mean score) with a rapid decrease in the number of people as scores move away from middle in either direction
standardized and norm-referenced tests -percentiles represent
percentage of individuals in the standardization sample scoring at or below a given raw score. The 50th percentile represents the mean or median score. People who score above the 50th percentile are "above average" and those who score below the 50th percentile are considered "below average". Percentiles are based on the number of individuals who score below or above a particular raw score.
standardized and norm-referenced tests -most standardized test scores are statistically transformed into other kinds of scores:
percentiles, standard scores (Z) and stanines
Standardized and norm-referenced tests -the term norm refers to
performance measure of a normative group on a tested skill. A normative group typically represents a sample of same aged individuals and possibly the same sex as the client
statistical measures -validity:
refers to accuracy of measurements. can be classified in different subtypes
standardized and norm-referenced tests -raw score
refers to the initial score based on correct responses to test items. Typically, raw scores are not compared with each other
validity subtypes -construct validity:
related to the degree that a test measures a predetermined theoretical construct. it is expressed in correlations coefficients- positive or negative in direction
validity subtypes -content validity:
related to the fact that the test includes items that are relevant to assess the purported skill. a test should include items that help sample the purported skill, facilitate in comprehensive assessment of the skill being measured, and fit in with the content.
Standardized and norm-referenced tests -Normal distribution of a norm-referenced test is
represented as a bell-shaped curve
Questionnaires and developmental inventories -in addition to standardized tests,
SLPs include questionnaires and developmental inventories to gather information about the children's speech and language skills.
validity subtypes -criterion validity:
degree to which a particular test measure is correlated with another meaningful variable. this further includes how a next test correlates with an established test measuring the same skills (concurrent validity) and how a test predicts future performance on related task (predictive validity).
background -SLPs use standardized tests to
diagnose communication disorders and to determine children's eligibility for intervention services
test construction -statistical analysis:
different statistical analyses are completed to define the test scores
basal and ceiling levels -basal score:
entry level in a test
background -Standardized testing of human participants stems from
"mental tests"
standardized and norm-referenced test -z-score=
(individual score-mean)/SD
standardized and norm-referenced tests -in a normal bell-shaped curve
-68.26% of all scores fall within the 1 S.D of the mean (34.13% above and 34.13% below the mean) -95.44% of all scores fall within 2 S.D. of the mean (47.5% above and below the mean) -99.72% of all scores fall within the 3 S.D of the mean (49.85% above and below the mean)
limitations of standardized tests
-highly structured and standardized tests often do not allow a sampling of natural social interactions -provide limited opportunities for the child to initiate interaction -many tests artificially isolate individual aspects of communication -often include verbal language skills and overlook preverbal or nonverbal communication -allow for little individual variation for communication in children -sometimes norms may not accurately reflect the general population -sometimes all the relevant behaviors are not adequately sampled -sometimes results from standardized tests may not be helpful in translating specific treatment goals -stimulus items may be sometimes discriminatory or inappropriate for children of diverse socioeconomic or cultural backgrounds -may not be appropriate or ethnocultural minority groups
Advantages of parent-based questionnaires:
-provides unique perspectives of parents about how their child communicates on a daily basis -offers information about changes in child's skill over an extended time period -give insights about parents actions and reactions to any limitations or deficiencies -gather information about young children's communication skills in a relatively inexpensive manner -provide information in addition to the standardized assessment which may be used for assessment and treatment planning -provide a valid and efficient means of assessing a child's naturalistic communication skills -offer a means of assessing the reliability of information obtained during clinical assessment
strengths of standardized tests
-relatively inexpensive -little to no systemic training required before use -part of policy requirements that define need for intervention services -convenient and easy to administer -variety of speech and language skills can be assessed in relatively shorter durations -results across various test administrations and examiners can be compared -can be effective in assessing cultural minorities
standardized and norm-referenced tests -information provided by norm-referenced tests:
-the sample size should not be less than 100 as smaller samples provide extremely limited generalizability -greater variety for ethnocultural and socioeconomic levels of individuals provide higher applicability of the test -more varied geographical distribution of the samples lead to wider applicability of the test -the range of education, IQ, medical status, and occupation of sampling individuals should be mentioned in the test manual. -the test manual should include descriptive statistics (means and standard deviations for all groups on whom the test items were administered). Sometimes the test manual can also include additional statistical information of the raw scores.
standardized and norm-referenced test -T-scores are based on
a normal probability distribution. here the mean is 50 and SD is 10.
standardized and norm-referenced tests -The mean represents
a statistical average of individual performance and may or may not exactly correspond to the mean. It may deviate or be different from mean
test construction -test purpose:
a test must be designed with a clear statement of purpose
background -at the beginning of 20th century, psychological tests included tests to measure
aptitude, personality, and education performance
standardized and norm-referenced tests -percents of raw scores is calculated
based on number of correct and incorrect responses out of the total number of test items.
standardized and Norm-referenced tests -process of standardization includes
careful selection of test items, administration of items to a representative sample, statistical analysis of results, establishment of age-based norms, development of instructions, and response scoring procedures
Standardized and Norm-referenced tests -Norm-referenced tests help in
comparing a client's performance with other individuals of a normative group
statistical measures -reliability:
consistency across measurements. can be classified into different subtypes
reliability subtypes -intraobserver reliability (AKA test-retest reliability):
consistency in test scores of an individual when the same examiner readministers the test or repeats a naturalistic observation. It is one of the easiest ways to establish reliability of standardized tests. same clinician gives the test two different times.
reliability subtypes -alternate or parallel form reliability:
consistency in test scores when two forms of the same test are administered to the same person or to a group of participants.
reliability subtypes -interobserver reliability (AKA inter judge reliability):
consistency of test scores when recorded by two or more examiners administrating the same test to the same individual
test construction -test administration and scoring procedures:
for effective test administration and scoring, written directions must outline exactly what the examiner does and says to give test instructions and administer the test
Standardized and Norm-referenced tests -A standardized assessment should be
self-sufficient in terms of stimulus materials and recording forms
basal and ceiling levels -in some of the longer tests
sometimes the clinician starts from an arbitrary level judged appropriate for the participant instead of at the very beginning of the test
Standardized and Norm-referenced tests -all norm-referenced tests are
standardized but majority of standardized tests are norm-referenced tests
basal and ceiling levels -ceiling score:
terminating score in a test
test construction -normative sample:
test developers typically select a much smaller normative sample that is expected to represent the population
basal and ceiling levels -if the judged entry proves wrong,
the clinician may move down to find the basal level. once a basal level is established, all test items prior to it are considered correct and the testing continues forward.
standardized and norm-referenced tests -standard scores represent
the degree to which a participant's score deviates from the mean. Two commonly types of standard scores are z-score and T-score
standardized and norm-referenced tests -Standard deviation (S.D.) refers to
the extent by which the individual's score deviates from the mean
basal and ceiling levels -the ceiling refers to
the highest number or level of test items at which the test was stopped as the remaining higher level items are all considered failed
test construction -stimulus items:
the test items should be assessed for their relevance, validity, and reliability
basal and ceiling levels -raw scores are calculated
with basal and ceiling scores in mind