Chapter 1
A good assessment is
1. A good assessment is thorough. It should incorporate as much relevant information as possible so that an accurate diagnosis and appropriate recommendations can be made. 2. A good assessment uses a variety of assessment modalities. It should include a combination of interview and case history information, formal and informal testing, and client observations. 3. A good assessment is valid. It should truly evaluate the intended skills. 4. A good assessment is reliable. It should accurately reflect the client's communicative abilities and disabilities. Repeated evaluations of the same client should yield similar findings, provided there has been no change in the client's status. 5. A good assessment is tailored to the individual client. Assessment materials that are appropriate for the client's age, gender, skill levels, and ethnocultural background should be used.
Seven steps to completing an assessment
1. Obtain historical information about the client, the client's family or caregivers, developmental and medical history, and the nature of the disorder. 2. Interview the client, the client's family or caregivers, or both. 3. Evaluate the structural and functional integrity of the orofacial mechanism. 4. Evaluate the client's functional abilities in the areas of articulation, language, fluency, voice, resonance, and/or cognition. In the case of a dysphagia assessment, assess the client's chewing and swallowing abilities. 5. Screen the client's hearing or obtain evaluative information about hearing abilities. 6. Analyze assessment information to determine diagnosis or conclusions, prognosis, and recommendations. 7. Share clinical findings through formal written records (usually a report) and a meeting with the client or caregiver and other professionals (such as a physician or members of an interdisciplinary team).
To calculate chronological age:
1. Record the test administration date as year, month, day. 2. Record the client's birth date as year, month, day. 3. Subtract the birth date from the test date. If necessary, borrow 12 months from the year column and add to the month column, reducing the year by one, and/or borrow 30 or 31 days (based on number of days in month borrowed from) from the months column and add to the days column, reducing the month by one
The Empirical Rule for a normal curve states that:
68% of all outcomes will fall within one standard deviation of the mean (34% on each side). 95% of all outcomes will fall within two standard deviations of the mean (47.5% on each side). 99.7% of all outcomes will fall within three standard deviations of the mean (49.85% on each side).
Disadvantages of norm referenced tests
Norm-referenced tests do not allow for individualization. Tests are generally static; they tell what a person knows, not how a person learns. The testing situation may be unnatural and not representative of real life. The approach evaluates isolated skills without considering other contributing factors. Norm-referenced tests must be administered exactly as instructed for the results to be considered valid and reliable. Test materials may not be appropriate for certain clients, such as culturally and linguistically diverse clients.
norm referenced test
Norm-referenced tests help answer the question, "How does my client compare to the average?" It is the responsibility of test developers to determine normative standards that will identify averages for a given test. Test developers accomplish this by administering the test to a representative sample group. The results of this sample are analyzed to establish the normal distribution. This normal distribution then provides a range of scores by which others are judged when they take the same test.
concurrent criterion validity
One type of criterion validity, means the test compares to an established standard. For example, the Stanford-Binet Intelligence Scale is widely accepted as a valid assessment of intelligence. Newer intelligence tests are compared to the Stanford-Binet, which serves as the criterion.
predictive criterion validity
One type of criterion validity, means the test predicts performance, which is the criterion, in another situation or in the future. There is a relationship between the behaviors the test measures and the criterion behavior or skill. An example is a college entrance exam, such as the Graduate Record Examination (GRE), which is expected to predict future academic performance
ASHA Code of Ethics
Speech-language pathologists have an obligation to provide services with professional integrity, achieve the highest possible level of clinical competence, and serve the needs of the public. Clinicians need to be aware of biases and prejudices that may be personally held or prevalent in society. Such biases and prejudices should not affect the client-clinician relationship or the assessment process. All clients should be treated with the utmost respect. It is the clinician's responsibility to determine whether a communicative disorder exists and, if so, recommend a treatment plan that is in the best interests of the client. Negative feelings or attitudes should never affect clinical impressions or decisions. Principles of professional ethics and conduct are outlined in the American Speech-Language Hearing Association (ASHA) Code of Ethics.
authentic assessment strategies
Systematic observations Real-life simulations Language sampling Structured symbolic play Short-answer and extended-answer responses Self-monitoring and self-assessment Use of anecdotal notes and checklists Videotaping Audiotaping Involvement of caregivers and other professionals
Administering and Scoring Tests.
Test users should administer and score tests correctly and fairly. 1. Follow established procedures for administering tests in a standardized manner. 2. Provide and document appropriate procedures for test takers with disabilities who need special accommodations or those with diverse linguistic backgrounds. Some accommodations may be required by law or regulation. 3. Provide test takers with an opportunity to become familiar with test question formats and any materials or equipment that may be used during testing. 4. Protect the security of test materials, including respecting copyrights and eliminating opportunities for test takers to obtain scores by fraudulent means. 5. If test scoring is the responsibility of the test user, provide adequate training to scorers and ensure and monitor the accuracy of the scoring process. 6. Correct errors that affect the interpretation of the scores and communicate the corrected results promptly. 7. Develop and implement procedures for ensuring the confidentiality of scores
Informing Test Takers.
Test users should inform test takers about the nature of the test, test taker rights and responsibilities, the appropriate use of scores, and procedures for resolving challenges to scores. 1. Inform test takers in advance of the test administration about the coverage of the test, the types of question formats, the directions, and appropriate test-taking strategies. Make such information available to all test takers. 2. When a test is optional, provide test takers or their parents/guardians with information to help them judge whether a test should be taken—including indications of any consequences that may result from not taking the test (e.g., not being eligible to compete for a particular scholarship)—and whether there is an available alternative to the test. 3. Provide test takers or their parents/guardians with information about rights test takers may have to obtain copies of tests and completed answer sheets, to retake tests, to have tests rescored, or to have scores declared invalid. 4. Provide test takers or their parents/guardians with information about responsibilities test takers have, such as being aware of the intended purpose and uses of the test, performing at capacity, following directions, and not disclosing test items or interfering with other test takers. 5. Inform test takers or their parents/guardians how long scores will be kept on file and indicate to whom, under what circumstances, and in what manner test scores and related information will or will not be released. Protect test scores from unauthorized release and access. 6. Describe procedures for investigating and resolving circumstances that might result in canceling or withholding scores, such as failure to adhere to specified testing procedures. 7. Describe procedures that test takers, parents/guardians, and other interested parties may use to obtain more information about the test, register complaints, and have problems resolved.
Reporting and Interpreting Test Results.
Test users should report and interpret test results accurately and clearly. 1. Interpret the meaning of the test results, taking into account the nature of the content, norms or comparison groups, other technical evidence, and benefits and limitations of test results. 2. Interpret test results from modified test or test administration procedures in view of the impact those modifications may have had on test results. 3. Avoid using tests for purposes other than those recommended by the test developer unless there is evidence to support the intended use or interpretation. 4. Review the procedures for setting performance standards or passing scores. Avoid using stigmatizing labels. 5. Avoid using a single test score as the sole determinant of decisions about test takers. Interpret test scores in conjunction with other information about individuals. 6. State the intended interpretation and use of test results for groups of test takers. Avoid grouping test results for purposes not specifically recommended by the test developer unless evidence is obtained to support the intended use. Report procedures that were followed in determining who were and who were not included in the groups being compared, and describe factors that might influence the interpretation of results. 7. Communicate test results in a timely fashion and in a manner that is understood by the test taker. 8. Develop and implement procedures for monitoring test use, including consistency with the intended purposes of the test.
Selecting Appropriate Tests.
Test users should select tests that meet the intended purpose and that are appropriate for the intended test takers. 1. Define the purpose for testing, the content and skills to be tested, and the intended test takers. Select and use the most appropriate test based on a thorough review of available information. 2. Review and select tests based on the appropriateness of test content, skills tested, and content coverage for the intended purpose of testing. 3. Review materials provided by test developers and select tests for which clear, accurate, and complete information is provided. 4. Select tests through a process that includes persons with appropriate knowledge, skills, and training. 5. Evaluate evidence of the technical quality of the test provided by the test developer and any independent reviewers. 6. Evaluate representative samples of test questions or practice tests, directions, answer sheets, manuals, and score reports before selecting a test. 7. Evaluate procedures and materials used by test developers, as well as the resulting test, to ensure that potentially offensive content or language is avoided. 8. Select tests with appropriately modified forms or administration procedures for test takers with disabilities who need special accommodations. 9. Evaluate the available evidence on the performance of test takers of diverse subgroups. Determine, to the extent feasible, which performance differences may have been caused by factors unrelated to the skills being assessed
Advantages of Authentic Assessment
The approach is natural and most like the real world. Clients participate in self-evaluation and self-monitoring. The approach allows for individualization. This is particularly beneficial with culturally diverse clients or special needs clients. The approach offers flexibility.
disadvantages of authentic assessment
The approach may lack objectivity. Procedures are not usually standardized; thus, reliability and validity are less assured. Implementation requires a high level of clinical experience and skill. The approach is not efficient, requiring a lot of planning time. Authentic assessment may be impractical in some situations. Insurance companies and school districts prefer known test entities for third-party payment and qualification for services.
Each test's manual should include information about:
The purpose(s) of the test The age range for which the test is designed and standardized Test construction and development Administration and scoring procedures The normative sample group and statistical information derived from it Test reliability Test validity
Disadvantages of Criterion Referenced Tests
The testing situation may be unnatural and not representative of real life. The approach evaluates isolated skills without considering other contributing factors. Standardized criterion-referenced tests do not allow for individualization. Standardized criterion-referenced tests must be administered exactly as instructed for the results to be considered valid and reliable
Advantages of norm-referenced tests
The tests are objective. The skills of an individual can be compared to those of a large group of similar individuals. Test administration is usually efficient. Many norm-referenced tests are widely recognized, allowing for a common ground of discussion when other professionals are involved with the same client. Clinicians are not required to have a high level of clinical experience and skill to administer and score tests (administration and interpretation guidelines are clearly specified in the accompanying manual). Insurance companies and school districts prefer known test entities for third-party payment and qualification for services
Advantages of Criterion Referenced Tests
The tests are usually objective. Test administration is usually efficient. Many criterion-referenced tests are widely recognized, allowing for a common ground of discussion when other professionals are involved with the same client. Insurance companies and school districts prefer known test entities for third-party payment and for qualification for services. With nonstandardized criterion-referenced tests, there is some opportunity for individualization.
Dynamic assessment (DA)
a form of authentic assessment. The purpose of dynamic assessment is to evaluate a client's learning potential based on his or her ability to modify responses after the clinician provides teaching or other assistance. It is an especially appropriate strategy when assessing clients with cognitive communication disorders or those from culturally and linguistically diverse backgrounds (enabling clinicians to distinguish between a language disorder and language difference).
Standardized tests
also called formal tests, are those that provide standard procedures for the administration and scoring of the test. Standardization is accomplished so that test-giver bias and other extraneous influences do not affect the client's performance and so that results from different people are comparable. There are many commercially available speech and language assessment tools that are standardized. Most of the standardized tests clinicians use are norm-referenced, but the terms standardized and norm-referenced are not synonymous. Any type of test can be standardized as long as uniform test administration and scoring are used
Alternate form reliability
also called parallel form reliability, refers to a test's correlation coefficient with a similar test. It is determined by administering a test (Test A) to a group of people and then administering a parallel form of the test (Test B) to the same group of people. The two sets of test results are compared to determine the test's alternate form reliability
Authentic assessment
also known as alternative assessment or nontraditional assessment. Like criterion-referenced assessment, authentic assessment identifies what a client can and cannot do. The differentiating aspect of authentic assessment is its emphasis on contextualized testing. The test environment is more realistic and natural. For example, when assessing a client with a social language disorder, it may not be meaningful to use contrived test materials administered in a clinic. It may be more valid and useful to observe the client in real-life situations, such as interacting with peers at school or talking with family members at home. Another feature of authentic assessment is that it is ongoing. The authentic assessment approach evaluates the client's performance during diagnostic and treatment phases. Assessment information is maintained in a client portfolio, which offers a broad portrait of the client's skills across time and in different settings. When appropriate, the client actively participates in reviewing the portfolio and adding new materials. This provides an opportunity for the client to practice self-monitoring and self-evaluation. Artifacts of the client's performance on standardized tests, nonstandardized tests, and treatment tasks are items that are included in the client's portfolio. Using an authentic assessment approach requires more clinical skill, experience, and creativity than formal assessment does because skills are assessed qualitatively. Testing environments are manipulated to the point of eliciting desired behavior, yet not so much that the authentic aspect of the client's responses is negated.
adjusted age
also referred to as corrected age. Adjusted age considers the gestational development that was missed due to premature delivery. For example, a normal 10-month-old baby born 8 weeks premature would be more similar, developmentally, to a normal 8-month-old. This is important when considering milestones that have or have not been achieved and when applying standardized norms. Adjusted age is determined by using the child's due date, rather than actual birth date, when calculating chronological age. Adjusted age becomes less relevant as a child grows and is generally not a consideration for children over age 3.
The scaled score
also reflects performance compared to the normative sample. However, scaled scores do not necessarily follow a normal distribution, meaning 50% of the people in the sample group do not necessarily fall above or below the average. Scaled scores allow the tester to compare the abilities of the test taker to the appropriate normative sample (as defined by the test designer in terms of age, gender, ethnicity, etc.).
Regardless of the approach used,
always use the most recent edition of a published test. This is required by insurance and law agencies, and it is a best practice to not use outdated or obsolete materials.
Modifications
are changes to the test's standardized administration protocol. For example, a test giver might reword or simplify instructions, allow extra time on timed tests, repeat prompts, offer verbal or visual cues, skip test items, allow the test taker to explain or correct responses, and so forth. Any such instance of altering the standardized manner of administration invalidates the norm-referenced scores. Findings may still have value from a diagnostic point of view, but the test can no longer be considered a standardized administration. If variance from the standardized procedure is required, always include that information in the written report.
Accommodations
are minor adjustments to a testing situation that do not compromise a test's standardized procedure. For example, large-print versions of visual stimuli may be used, or an aid may assist with recorded responses. As long as the content is not altered and administration procedures are consistent with the manual's instructions, the findings are still considered valid and norm-referenced scores can still be applied.
The Code of Fair Testing Practices in Education
developed by the Joint Committee on Testing Practices, which is sponsored by professional associations such as the American Psychological Association, the National Council on Measurement in Education, the American Association for Counseling and Development, and the American Speech-Language-Hearing Association. The guidelines in the code were developed primarily for use with commercially available and standardized tests, although many of the principles also apply to informal testing situations. There are two parts to the code: One is for test developers and publishers, and the other is for those who administer and use the tests.
Criterion referenced test
do not attempt to compare an individual's performance to anyone else's (as opposed to norm-referenced tests); rather, they identify what a client can and cannot do compared to a predefined criterion. These tests help answer the question, "How does my client's performance compare to an expected level of performance?" Criterion-referenced tests assume that there is a level of performance that must be met for a behavior to be acceptable. Any performance below that level is considered deviant. For example, when evaluating an aphasic client, it is not helpful to compare the client's speech and language skills to a normative group. It is much more meaningful to compare the client's abilities to a clinical expectation—in this example, intelligible and functional speech and language. Criterion-referenced tests are used most often when assessing clients for neurogenic disorders, fluency disorders, and voice disorders. They may also be used for evaluating some aspects of articulation or language. Criterion-referenced tests may or may not be standardized
The Health Insurance Portability and Accountability Act (HIPAA)
federal law designed to improve the health care system by: Allowing consumers to continue and transfer health insurance coverage after a job change or job loss Reducing health care fraud Mandating industry-wide standards for electronic transmission of health care information and billing Protecting the privacy and confidentiality of health information
The stanine (standard nine)
is an additional method of ranking an individual's test performance. A stanine is a score based on a 9-unit scale, where a score of 5 describes average performance. Each stanine unit (except for 1 and 9) is equally distributed across the curve. Most people (54%) score stanines of 4, 5, or 6; few people (8%) score a stanine of 1 or 9.
The z-score
is another expression of test-taker performance compared to the normative sample. The z-score tells how many standard deviations the raw score is from the mean. The z-score is useful because it shows where an individual score lies along the continuum of the bell-shaped curve and thus tells how different the test taker's score is from the average.
Chronological age
is the exact age of a person in years, months, and days. It is important for analyzing findings from standardized tests, as it allows the clinician to convert raw data into meaningful scores.
The raw score
is the initial score obtained based on the number of correct or incorrect responses. Some tests award more than one point for a correct response. Incorrect calculation of raw score will skew all findings and make test results inaccurate. Raw scores are not meaningful until converted to other scores or ratings
Assessment
is the process of collecting valid and reliable information, and then integrating and interpreting it to make a judgment or a decision about something. The outcome of an assessment is usually a diagnosis, which is the clinical decision regarding the presence or absence of a disorder and, often, the assignment of a diagnostic label. Speech-language pathologists use assessment information to make professional diagnoses and conclusions, identify the need for referral to other professionals, identify the need for treatment, determine the focus of treatment, determine the frequency and length of treatment, and make decisions about the structure of treatment (e.g., individual versus group sessions, treatment with or without caregiver involvement). Ultimately, clinical decisions are based on information derived from an assessment process.
Criterion validity
means a test is related to an external criterion in a predictive or congruent way. There are two types: Concurrent criterion validity Predictive criterion validity
Face validity
means that a test looks like it assesses the skill it claims to assess. A layperson can make this judgment. Face validity alone is not a valuable measure of validity because it is based merely on appearance, not on content or outcomes
Construct validity
means that a test measures the theoretical construct it claims to measure. A construct is an explanation of a behavior or attribute based on empirical observation and knowledge. For example, the construct that children's language skills improve with age is based on language development studies. Therefore, a valid test of childhood language skills will demonstrate this known language-growth construct and show improved language skills when administered to normally developing children of progressively increasing ages.
Content validity
means that a test's contents are representative of the content domain of the skill being assessed. For example, an articulation test with good content validity will elicit all phonemes in their range of contexts, thereby assessing the spectrum of articulation. Content validity is related to face validity. Content validity, however, judges the actual content of the test (rather than superficial appearance) and is judged by individuals with expert knowledge.
Psychometrics refers to the
measurement of human traits, abilities, and certain processes. Refers to validity and reliability
Interrater reliability
one type of rater reliability, established if results are consistent when more than one person rates the test.
Intrarater reliability
one type of rater reliability, established if results are consistent when the same person rates the test on more than one occasion.
Split-half reliability
refers to a test's internal consistency. Scores from one-half of the test correlate with results from the other half of the test. The halves must be comparable in style and scope, and all items should assess the same skill. This is often achieved by dividing the test into odd-numbered questions and even-numbered questions.
Test-retest reliability
refers to a test's stability over time. It is determined by administering the same test multiple times to the same group and then comparing the scores. If the scores from the different administrations are the same or very similar, the test is considered stable and reliable.
Rater reliability
refers to the level of agreement among individuals rating a test. It is determined by administering a single test and audio- or videotaping it so it can be scored multiple times. There are two types of rater reliability: Intrarater reliability Interrater reliability
Basals and ceilings
refers to the starting point for test administration and scoring. Ceiling refers to the ending point. Basals and ceilings allow the tester to hone in on only the most relevant testing material. It would not be worthwhile or efficient, for example, to spend time assessing prespeech babbling skills in a client who communicates in sentences, or vice versa. Review test manuals to determine basals and ceilings. Typically, a starting point is suggested according to a client's age. The basal is then established by eliciting a certain number of consecutively correct responses. If the basal cannot be established from the recommended starting point, administer items that occur before the suggested starting point until the predetermined number of consecutively correct responses is elicited. For example, if a test's basal is three consecutively correct responses, and the recommended starting point is test Item 20, the tester will start test administration on Item 20. If, however, the client does not answer three consecutive questions correctly, the tester will work backward from test Item 20 until the basal is established (i.e., administer test Items 19, 18, 17, etc.). The test ceiling is also predetermined and stated in the test manual. A ceiling is typically determined by a requisite number of consecutively incorrect responses. It is imperative to review the manual before administering a test. Basals and ceilings vary with every test. Many tests do not have a basal or ceiling and are designed to be administered in their entirety. And in some cases, certain subsets of an individual test require a basal and ceiling, whereas other subtests of the same test do not.
The standard score
reflects performance compared to average and the normal distribution. Standard scores are used to determine whether a test taker's performance is average, above average, or below average. Test developers calculate the statistical average of the normative sample and assign it a value. The most common standard score average value is 100 (with a standard deviation of 15), although not every test assigns this value.
Age equivalence (or sometimes grade equivalence)
reflects the average raw score for a particular age (or grade). Be aware that these scores are the least useful and most misleading scores obtained from a standardized test. Although it seems logical that raw scores transfer easily to age equivalence, age-equivalent scores do not take into account the normal distribution of scores within a population. It would be incorrect to conclude that a 10-yearold child with an age-equivalent score of 8 years is performing below expectations based on age equivalence alone. It could very well be true that the 10-year-old's score is within the range of normal variation. Age-equivalent and grade-equivalent scores are not considered a reliable measure and should generally not be used.
The standard deviation
reflects the variation within the normal distribution. It determines what is considered average, above average, or below average. Performance −1.5 to −2 standard deviations below the mean is usually considered significantly below average and is clinically a cause for concern
The confidence interval
represents the degree of certainty on the part of the test developer that the statistical values obtained are true. Confidence intervals allow for natural human variability to be taken into consideration. Many test manuals provide statistical data for a confidence interval of 95% (some lower, but the higher the better when considering test reliability). This allows the clinician to obtain a range of possible scores in which the true value of the score exists 95% of the time. In other words, a 95% confidence interval provides a range of reliable scores, not just a single reliable score.
Test developers
responsible for clearly outlining the standardization and psychometric aspects of a test.
The percentile rank
tells the percentage of people scoring at or below a given score. For example, scoring in the 75th percentile indicates that the test taker scored higher than 75% of the people taking the same test. The 50th percentile is the median; 50% of the test takers obtained the median score. Clinically, there is usually cause for concern if a client performs near the bottom 7% of the normal distribution.
The dynamic assessment approach follows a
test-teach-retest method. 1. A test is administered without prompts or cues to determine current performance. 2. The clinician teaches strategies specific to the skills being evaluated, observing the client's response to instruction and adjusting teaching accordingly. This is referred to as a mediated learning experience, or MLE. 3. The test is readministered and results from the pre- and posttest are compared.
HIPAA is regulated by
the U.S. Department of Health and Human Services (DHHS).
The overall emphasis of each assessment differs depending on
the client, the type of disorder, the setting, the client's history, the involvement of the caregivers, and other factors. Some disorders have extensive histories; others do not. Clients have different primary communicative problems. Some exhibit problems of articulation, others of social language, others of fluency, and so forth. Some cases involve extensive interviewing; others do not. Some cases require detailed written reports, whereas others do not. Even though assessment emphases differ across clients, some consideration of each of the seven general areas listed above is necessary with most clients.
During the dynamic assessment,
the clinician pays particular attention to teaching strategies that were effective at improving the client's success. These may include use of cuing (e.g., verbal, visual, tactile, or auditory), graduated prompting, making environmental adjustments, conversational teaching (e.g., asking questions such as "Why did you . . . ?" and then instructing "Ah, I see. . . ."), or other strategies. The dynamic assessment allows the clinician, as part of the diagnostic process, to determine baseline ability and identify appropriate goals and strategies for intervention. If one of the clinician's purposes is to discern a language difference versus a language impairment, it is helpful to note that clients who do not demonstrate improvement following teaching likely have a language impairment, whereas clients who are able to make positive changes following brief teaching experiences are likely to have a language difference
