FINAL STUDY GUIDE!!!!!

¡Supera tus tareas y exámenes ahora con Quizwiz!

Avoid double-barreled questions

questions that are incorporating more than one topic

Avoid unbalanced response categories.

"0-1, 2-5, 6-12, 13-18"

Avoid overlapping response categories

"0-5, 5-10, 10-15"

Avoid asking respondents about their future intentions/behaviors.

"How often do you plan to exercise in the next 6 months?"

Avoid jargon and slang. *"made out" is a slang, do not confuse with confusing, vague, or ambiguous terminology, read the sentence carefully.If it says a slang then its a slang/jargon question.

"How often have you made out with your girlfriend in the last week?" • "Made out" may have different connotations to different people. A researcher may want to use a word with fewer meanings (e.g., kiss)

Avoid asking respondents about their future intentions/behaviors.

"How often will you praise your child in the next 3 months?"

Avoid double negatives.

"Infants usually do not misbehave for no reason."

Avoid confusing, vague, and/or ambiguous terminology.

"To help avoid and solve workplace conflicts, it is usually a good idea for each person to stay pretty firm in pursuing their goals." • "Pretty firm" may be interpreted very differently by different people, and may not be understood by those with limited English abilities or literacy.

Avoid double negatives. *do not confuse with dichotomous questions, which is based on response choices.

"True or False. It is not good not to exercise."

Avoid loaded questions

"What do you see as the disadvantages of eliminating welfare?"

Avoid loaded questions there is so much answers to it

"What do you think is wrong with spanking?"

Needs assessment

(1) West Palm Beach parenting needs assessment

Avoid leading questions

(i.e., questions which lead respondents to answer in a certain way).

Avoid emotional, biased, and/or stigmatizing language. Examples of stigmatizing language:

- "True or False. Social workers need training in working with special populations such as lesbians, convicted child abusers, and convicted rapists." This sentence implies that lesbians are in the same group as those who have been convicted for harming others.

Follow-up evaluation

- Occurs after a pre-set amount of time to determine the lasting effects of the program. • One way to do this is through a pretest, posttest, follow-up design.

Construct validity

- The most rigorous validity test you can put your attitude survey through. Do the scores your survey produce correlate with other related constructs in the anticipated manner? In other words, it is a measure of how well a test assesses some underlying construct.

Avoid emotional, biased, and/or stigmatizing language. Examples of stigmatizing language

-"Are their identifiable differences between gay men and normal men? Yes No" This sentence portrays gay men as not normal.

Filter / Contingency question

1. Have you had sex? If "yes" 1a. At what age did you first have sexual intercourse?____ 1b. How many sexual partners have you had?____ If "no" 1c. What is the #1 reason you have not had sex? 2. What do you think is the average age people first have sex?____

Observed score

= true score + error score (method error and trait error)

Content validity

A researcher measures depression (or anxiety) by using a scale that has one item for each of the domains of depression (or anxiety) as outlined in the DSM-R.

Reliability and validity

A test can be reliable, but not valid; but a test cannot be valid without being reliable. Reliability is a characteristic of the measurement itself, while validity describes the appropriateness of the use to which the measure is put.

Parallel or Alternative form reliability

A given function is tested more than once over time and it is important to have more than one version of a given instrument and to show that these different versions are reasonably equivalent.

Needs assessment (pre-implementation assessment)

A purposeful, planned, and systematic strategy for acquiring information to guide and justify program development. • Needs assessments are necessary to determine the type(s) of programs required within a specific geographic area. • Valid and credible needs assessments provide the foundation of successful programs.

Parallel or alternative form

Administer two different tests of the same construct to the same group of people so as to eliminate the practice effects on participants' scores. The degree to which two or more versions of the same test correlate with one another. -From online

Content validity

An indicator of how well the items represent the entire universe of items from which they are drawn. In other words, it assesses whether or not the instrument reflects the content the researcher is trying to measure. this scale is certainly measuring enough items that was tapping into the construct

Ask about one topic at a time

Another questionnaire design suggestions/issues

Cost benefit and effectiveness

Are programs delivered in a way that reduces costs to society? -Cost benefit analysis -Cost effectiveness analysis

Ratings

Asking respondents to assign a value to their preferences

Ranking

Asking respondents to assign an order to their preferences.

Construct validity cont.

Assess the underlying construct upon which the test is based and correlate these scores with the test scores. Concerned with the extent to which a particular measure relates to other measures consistent with theoretically derived hypotheses concerning the concepts that are being measured. •The theoretical relationship between the concepts must be specified. • The empirical relationship between the measures of the concepts must be examined. • The empirical evidence must be interpreted in terms of how it clarifies the construct validity of the measure.

Face validity definition

Assesses whether the measurement instrument appears relevant to the construct by an innocent bystander. It can be established by asking people if they think the survey could adequately and completely assess someone's attitude/belief. If they say 'yes', it is established.

Split-halves

Attempts to assess the reliability of a measure, administered only once, by focusing on the equivalency of two halves of a test. Each split will result in a slightly different reliability estimate.

Needs assessment

BookEnds

•Checked boxes are often coded as a '1' while unchecked boxes are coded "0". Things to consider when developing checklists • Are all of the alternatives covered? •Is the list too long / reasonable length? • Is the structure of the responses easy and uniform? • Is an "other" category needed?

Checklists continued

TYPE OF QUESTIONS BASED ON RESPONSE CHOICES

Dichotomous questions, ranking, ratings, likert response scales, semantic differential there are other ones that are not on the exam Open-ended questions Close-ended questions Simple fill-in-the-blank Categorical (nominal level) Visual Analogue Scale (VAS) scale Cumulative or Guttman scale

Program evaluations and source documents

Evaluating by examining written documentation on the program's effectiveness through conference or workshop report, internal report, published non-academic article or newsletter

Multiple site replication studies

Examining whether the program has been successfully replicated and evaluated in multiple settings, preferably across multiple target populations, using appropriate scientific methods. The results should be published in more than one scientific, peer reviewed, academic journal.

Expert review / Peer consensus

Experts/Peers in the field review and rate the programs for effectiveness.

Cost effectiveness analysis

Family preservation versus treatment example

Refers to two-part questions where the answer to the first part of the question determines which of two different questions a respondent next receives. • Example - "Are you married? If 'yes' answer 4a, if 'no' answer 4b." • Filter questions can get very complex. Sometimes, multiple filter questions are necessary in order to direct the respondents to the correct subsequent questions. • General suggestions - Avoid more than three levels (two jumps) for any question. Too many jumps will confuse the respondents and may discourage them from continuing with the survey. - If there are only two levels, use a graphic (e.g., arrow and box) to help direct the respondent to the correct subsequent question. -example on next slide • If you can't fit the response to a filter question on a single page, it's probably best to be able to say something like "If NO, please turn to page 3" rather than "If NO, please go to Question 29" because the respondent will generally have an easier time finding a page than a specific question.

Filter / Contingency question

Types of questions

Filter or Contingency questions, screening questions other ones that are not on the exam are threatening questions, knowledge questions,

Accountability

For example, in a parent education program, _______ evaluation would identify how parents tried to enroll, how many classes were taught, how many parents attended, how many parents dropped out, and descriptions of the parents (e.g., gender, age, education level, ethnicity).

Internal consistency reliability example

Hence, it answers, "How much do the items in the survey relate to each other?"

Discrminant validity

If I have two scales that tap into similar constructs (e.g., positive esteem and negative esteem), but they predict different outcomes, then we could say they have _____. For example, positive esteem is more highly correlated to happiness while negative esteem is more highly correlated to depression

Construct validity

If a researcher is measuring empathy as the outcome variable, can the operational definition of empathy in the study by generalized to the rest of the world's concept of empathy?

Convergent validity

If a researcher is trying to establish the validity of a new self-esteem scale, the researcher may correlate the new scale with a previously established self-esteem scale (e.g., Rosenberg Self-Esteem Scale).

Face validity

If a researcher tells people that s/he is measuring their attitudes about alcohol, but the survey asks them how much money they spend on alcohol, they may think the researcher has lied to them about the study. Or, if the survey only asks how they feel about negative things (e.g., if their car was stolen, if they were beat up), they may think that the research is going to find that these people all have negative attitudes, when that may not be true. Hence, it is important to establish ______ with the population of interest.

Construct validity Correlates with other things that are supposed to be correlated, establishing construct validity?

If an attitude survey has ________, lower attitude scores (indicating negative attitude) should correlate negatively with life satisfaction survey scores, and positively life stress scores.

Concurrent validity

If an eating disorders test is able to distinguish those who have an eating disorder from those who do not, then it has ______

Implementation evaluation

Implementation evaluation (also called "process evaluation") is an early check by the project personnel to assess whether the program and its components are operating according to plan. • Were appropriate recruitment strategies used? • Were appropriate participants selected? • Do the activities/services match the plan?

What reading level for surveys?

In general, questions/items should be written at an 8th grade, reading vocabulary. • Many word processors (e.g., Word) will give you an estimate of the reading level of the document, and many will identify words that are above certain reading levels.

If I have multiple raters coding qualitative data, then I could have the code part of the data, then compare their results. If they are coding it consistently, then we can say they have _____.

Inter-rater/Inter-observer reliability

Why Reliability?

It is important for researchers to establish that their instruments are reliable because without reliability the results using the instrument are not replicable.

Cont.

It is most researchers' belief that validity is more important than reliability because if an instrument does not accurately measure what it is supposed to, there is no reason to use it; even if it measures consistently (reliably).

If I have a scale that has dichotomous items (e.g., yes/no), then the ______ would be an appropriate internal consistency reliability

Kuder-Richardson reliability

If I have a scale that has dichotomous items (e.g., true/false), then the ______ would be an appropriate internal consistency reliability

Kuder-Richardson reliability theres two questions->dichotomous

Convergent validity

Means that scales are measuring the same underlying construct. The amount of agreement between these two scales tells us the extent to which they are measuring the same thing (i.e., the amount of shared or common variance).

Establishing program efficacy - from best to worst

Multiple site replication studies, Meta-analyses, Expert review / Peer consensus, Single trial effectiveness, Program evaluations and source documents, and Testimonials, newspaper reports, or non-refereed publications Program efficacy -> STEM PM

Realities of program evaluation Cont.

Myth = Program evaluation is a waste of time. Reality = Evaluation is necessary. • To continually improve the program to make it more successful, efficient, and cost-effective. • To show the value of the program. • To be accountable - What gets measured gets done. - If you don't measure results, you can't tell success from failure. - If you can't see success, you can't reward it. - If you can't reward success, you're probably rewarding failure. - If you can't see success, you can't learn from it. - If you can't recognize failure, you can't correct it. - If you can demonstrate results, you can win public support. • To clarify the purpose and desired outcomes of the program. • To produce data that can be used for public relations purposes. • To produce valid comparisons between programs. • To provide unanticipated insights or information (i.e., unanticipated consequences). • To produce data to enhance others' knowledge (e.g., practitioners, scholars, lay audiences). • To justify the funds expended - Great programs may not get funded unless there is good evaluation data. • To serve as a basis for informed decisions regarding future funding/sponsorship.

Realities of program evaluation Cont.

Myth = Program evaluation is too complex. Reality = Program evaluation is within the capabilities of the staff of most agencies/programs. • It is not necessary, for all programs to conduct intensive outcome evaluations using experimental designs. • Most program coordinators conduct ongoing evaluation (although not scientifically or systematic) to make the program better or meet participants' needs. These informal methods may not provide enough information to make informed decisions. Thus, all programs should collect information that is most relevant and useful to their needs, history, stage of development, resources, and intended audience for the evaluation results. • Many evaluators think they must use standardized methods and instruments (with established reliability and validity) to perform program evaluation. This is not always true, especially if someone is trying to make an established method/instrument work at the expense of the program or participants. • Researcher/Practitioner collaboration can work together to overcome the complexity of the evaluation.

Realities of program evaluation Cont.

Myth = Program evaluation must be done by outside experts. Reality = Outside experts are not always necessary. • Outside experts can legitimize the evaluation (and give the perception of less bias), yet resources are often not available to contract outside evaluators.

Summative (impact) evaluation

Occurs at the end of the program (or at the end of a part of the program) to assess how well the participants/program are meeting the program goals/objectives. •Outcome evaluation *sum -> such as the end of summary -> happens at the end of the program

Parallel or alternative form cont.

Often the Spearman-Brown split-half reliability coefficient is used to compare the two halves

Formative evaluation

Ongoing evaluation to determine how a program is functioning, how well the goals are being implemented, and whether the perceived needs of the participants are being met. • Formative evaluation starts during the project planning and continues throughout the project. • Good formative evaluation can lead to program improvement and hopefully greater impact if and when an effectiveness/impact evaluation is conducted.

If I have to versions of a scale (e.g., GRE, depression), then I could correlate the two versions to get an internal consistency reliability, this would be a ________

Parallel/Alternative Forms If its two versions then its parallel alternative form reliability

Likert response scale

Please indicate your level of agreement. I am a person who has made wise investments Strongly Disagree 1 Disagree 2 Neither Agree Nor Disagree 3 Agree 4 Strongly Agree 5

Likert response scale

Please indicate your level of agreement. I am a person who has made wise investments. Strongly Disagree 1 Disagree 2 Slightly Disagree 3 Neither Agree Nor Disagree 4 Slightly Agree 5 Agree 6 Strongly Agree 7

Likert response scale

Please indicate your level of satisfaction. The materials presented by the instructor. Very Unsatisfied 1 Unsatisfied 2 Neutral 3 Satisfied 4 Very Satisfied 5

Ranking

Please rank your preference for the following presidential candidates: 1st choice, 2nd choice, 3rd choice ____ Daffy Duck _____ Bugs Bunny _____ Foghorn Leghorn

Progress evaluation

Progress evaluation is used to assess progress in meeting the goals of the program. •Assesses the impact of the program on the participants throughout the intervention. •This information can help program personnel make adjustments to improve the program; as compared to waiting until the programs ends and then finding out it was not effective. -Are participants moving towards expected goals? - How do changes in project participants relate to components of the program? • One way to structure formative evaluation in a report is through documenting the barriers encountered and the modifications made to the program to address the barriers.

Outcome Evaluation

Questions to ask: •Is the program meeting the stated goals for change or impact? • Which components are the most effective? • Which components needed improvement? • Were the results worth the costs? • Is the program replicable?

Ratings

Rate attractiveness of the celebrities Very Unattractive Very Attractive (From 6) Jay Leno 1 2 3 4 5 6 7 8 9 10 Channing Tatum 1 2 3 4 5 6 7 8 9 10 Taylor Swift 1 2 3 4 5 6 7 8 9 10 Meg Griffin 1 2 3 4 5 6 7 8 9 10

semantic differential

Rate your spouse on the following items (circle the number) Ugly -2 -1 0 1 2 Pretty Mean -2 -1 0 1 2 Nice Rude -2 -1 0 1 2 Considerate Spendthrift -2 -1 0 1 2 Tightwad

semantic differential scale

Scales assess a respondent's perception of an item on a set of bipolar adjective pairs (usually with a 5-point rating scale).

Screening questions

Sometimes needed to determine whether the respondent is qualified to answer the question of interest.

If I have one version of a scale, then I could correlate the even and odd items to get an internal consistency reliability

Split halves

Avoid false premises.

Starting questions with a premise in which respondents may not agree.

Questionnaires/Survey development

Survey Designs E

Predictive validity

The GRE can predict future success in graduate programs, hence it has _____

Concurrent validity

The ability to distinguish between groups that should be theoretically distinguishable

Predictive validity

The ability to predict something you want to predict.

Construct validity

The degree to which inferences made from the study can be generalized to the broader concepts underlying study. In other words, it asks if there is a relationship between how the concepts were operationalized to the actual causal relationship in the study.

Accountability

The following information should be collected once the program begins: • Track program utilization rates, participation rates, and characteristics of the participants. • Also track how many people inquired about the service and how many dropped out.

Internal consistency vs. test/retest

The major difference is that test/retest reliability involves two administrations of the measurement instrument, whereas an internal consistency method involves only one administration of that instrument

Program evaluations and source documents

The program is evaluated by examining written documentation on the program's effectiveness. The program may be published in refereed or non-refereed publications.

Single trial effectiveness

The program is evaluated using appropriate scientific method in a single population or in only one setting. The results should be published in at least one scientific, peer reviewed, academic journal.

Testimonials, newspaper reports, or non-refereed publications

The program is evaluated using only anecdotal evidence

Descriptive study

The researcher describes the goals, objectives, start-up procedures, implementation processes, and anticipated outcomes of a program, presenting the details of each.

Checklists

The respondent is provided with a list from which to choose one or more responses. When a respondent can select more than one option, each option is viewed as a separate variable.

Avoid loaded questions.

They are often developed by people who have a bias in support of or against a particular view.

Instrumental validity

Types of validity: •Face validity •Content validity •Criterion validity -Concurrent validity -Predictive validity •Convergent validity •Discriminant validity •Construct validity

Avoid emotional, biased, and/or stigmatizing language Cont.

Use neutral language. •Examples - Use "an individual who has cerebral palsy" instead of "an individual who is suffering from or afflicted with cerebral palsy." -Do not label individuals with disabilities as "patients" or "invalids." - Avoid potentially offensive and emotionally-charged labels

Meta-analyses

Uses a quantitative method to summarize the results of all available high quality studies on a program. It is used to gain greater objectivity, generalizability, and precision.

QUESTIONNAIRE DESIGN SUGGESTIONS/ISSUES

W

Dichotomous questions

When a question has two possible responses.

Avoiding abbreviations continued:

When surveying a specialized population, then certain abbreviations may be used. - Example: • If a researcher surveys interior design faculty members, then it would probably be acceptable to put the abbreviation "FIDER" on a survey. • If a researcher were surveying psychology faculty members in the USA, then using "APA" would be acceptable

Checklist example

Which of the following methods do you use to study? Please check all that apply. ✅note cards ✅memorization ✅practice tests ✅note cards ✅audio messages ✅other______ • In the example above, checked boxes are often coded as a '1' while unchecked boxes are coded "0".

Cost effectiveness analysis

Which program has "more bang for the buck"?

Test/retest reliability definition

answers the questions "How stable is the test over time?" Administer the same test at two different points in time to the same group of people. The idea behind test/retest is that respondents should get the same score on the test at time 1 and time 2. you are testing reliability, we are looking at consistency, so obviously should get the same score.

Cost benefit analysis

compares estimated cost of implementing a program/policy to the benefits of not implementing the program/policy.

Cost effectiveness analysis

compares the cost efficiency of various competing programs or policies.

Testimonials, newspaper reports, or non-refereed publications

e.g., participant testimonials, quotes, or media coverage

Program efficacy

establishes whether programs can be designed to have the desired impact and result in participant changes in controlled conditions.

Internal consistency relaibility

estimates reliability by grouping items in a questionnaire that measure the same concept.

Likert response scales

have respondents rate (usually on scale of 1-4 to 1-7) their level of (1) agreement or disagreement with various declarative statements or (2) satisfaction/dissatisfaction with services.

Criterion validity

is a more rigorous test than face or content validity. Criterion validity means your attitude assessment can predict or agree with constructs external to attitude. A measure of how well a test estimates a criterion (concurrent) or predicts a criterion (predictive). Select a criterion and see how scores on the test correlate with scores on the criterion. Criterion-related validity lends itself to being used in a theoretical, empirically dominated manner. For most of the measures used in the social sciences there do not exist any relevant criterion variables. The more abstract the concept, the less likely one is to discover an appropriate criterion for assessing a measure of it. *Not on exam, but concurrent validity and predictive validity are parts of it.

Cronbach's alpha

is a unique estimate of the expected correlation of the actual test with a hypothetical alternative form of the same length. Cronbach's alpha splits all the questions on your instrument every possible way and computes correlation values for them all

Kuder-Richardson reliability

is designed for measures in which items are dichotomous (e.g., true/false, yes/no). However, Cronbach's alpha can also be used with dichotomous data.

Test/retest

is the most conservative method to estimate reliability. chronbachs alpha is a lesser conservative method of reliability

Summative evaluation

may result in the following: • Disseminate the intervention to other sites or agencies. • Continue and/or increase funding. • Continue on a probationary status. • Modify and try again. • Discontinue.

Discriminant validity

means that measures that should not be related are not. The researcher is looking for a lack of correlation between measures that are supposed to tap basically different constructs.

Inter-rater/Inter-observer reliability consistency-> its reliability, not validity, choose inter-rater/inter-observer reliability

measures level of agreement between two or more people rating the same object, phenomenon, behavior, or concept; a measure of the consistency from rater to rater.

Avoid double-barreled questions

• "Do you agree that people who are overweight should be encouraged to exercise and eat low carbohydrate diets?" - Individuals may agree with exercise, but not with low carbohydrate diets (or vice versa).

Avoid leading questions

• "Given that research shows a link between exercise and fat reduction, what do you think is the number one thing a person can do to lose fat?"

Avoid questions that ask for categorical data when continuous data are available.

• "How old are you? 21-30, 31-40, 41-50, >51". - In the example above, a 21-year-old and a 30-year-old are the same distance from 31. - Instead ask, "How many years old are you?____"

Avoid leading questions

• "Most doctors say that cigarette smoke increases risk of lung disease for those near a smoker. Do you agree? 1. Strongly disagree, 2. Disagree, 3. Agree, 4. Strongly agree"

Avoid leading questions

• "Now that you have learned the value of parental praise in the parenting class, how often will you praise your child?"

Avoid double-barreled questions

• "True or False. Parents should use lots of praise and avoid physical punishment when interacting with their children." - This statement actually covers two distinct parenting behaviors (i.e., praise and physical punishment)

Avoid questions that ask for categorical data when continuous data are available.

• "What is your weight? 50-99 lbs., 100-149 lbs., 150-200 lbs., ...". - Instead ask, "How many pounds do you weigh?____

Screening question

• A research study examining consumers' opinions about a particular computer program would want to screen participants to see if they have used the program. - Example: "Have you used Dreamweaver to develop a webpage?"

Descriptive study cont. Questions include:

• Are the goals/objectives clearly articulated? • Are the goals/objectives communicated throughout the organization? • Who is responsible for each goal/objective? • Is there a good plan for assessing progress on objectives? • What is the form of accountability (e.g., performance contracts)?

Question order or sequence cont.

• Ask more general questions before specific ones, especially in interviews. • Be careful of the "order effect" (i.e., the order of the questions influence the respondents' answers). - Previous questions can influence later ones in two ways: through their content and through the respondent's response. * Example - A survey that keeps asking about the benefits of praise could teach the respondent the importance of praise by the end of the survey, and hence bias later questions on praise. • Never start a mail survey with an open-ended question. - Beginning with an open-ended question will decrease the response rate. • For historical demographics, follow chronological order. • End a survey on a positive note. - Do not end with highly threatening questions. - Always end with a "thank you".

Avoid asking questions that are beyond respondents' capabilities.

• Asking adolescents to identify their parents' end-of-the-year taxable income.

Avoid asking questions that are beyond respondents' capabilities.

• Asking complex questions on a survey designed for individuals with severe developmental disabilities.

Avoid asking questions that are beyond respondents' capabilities.

• Asking parents to identify which of Baumrind's 'parenting styles' they identify with.

Biases with Likert scales

• Central tendency bias - Respondents avoid using extreme response categories. In other words, they usually respond 'agree' or 'disagree' instead of 'strongly agree' or 'strongly disagree'. • Extreme tendency bias - Respondents tend to answer in the extreme. In other words, they usually respond 'strongly agree' or 'strongly disagree' instead of 'agree' or 'disagree'. • Acquiescence bias - Respondents tend to agree with statements regardless of their opinion. • Social desirability bias- Respondents try to portray themselves or the organization more favorably. C SEA

Reliability

• Concerns the extent to which an experiment, test, or any measuring procedure yields consistent results on repeated trials. In other words, reliability is the consistency of the measurement, or the degree to which an instrument measures the same way each time it is used under the same condition with the same subjects. -Unreliable measures produce results that are meaningless. - Reliability is not measured, it is estimated.

Content validity Cont.

• Content validity is similar to face validity, but experts in the field are asked instead of target members of the population of interest. The theory behind content validity, as opposed to face validity, is that experts are aware of nuances in the construct that may be rare or elusive of which the layperson may not be aware. • Expert opinion is often used to establish content validity. In the social sciences, it is very difficult (if not impossible) to identify a generally accepted universe or domain of content for the concepts to be measured. Also, in the social sciences it is impossible to randomly sample content to create a measure. • A more objective indicator of validity could be determined by a subjective evaluation by a group of judges or experts who examine the measuring technique and decide whether it measures what its name suggests. One could calculate a validity figure by computing the amount of agreement among judges.

Dichotomous questions

• Examples: o Yes/No o True/False o Agree/Disagree

Face validity Cont.

• Generally, the researchers will want to take this at least one step further by asking individuals similar to the sample whether they think the scale measures what it says it does. The reason for asking people is that participants in a study can sometimes become resentful and uncooperative if they think they are being misrepresented to others, or worse, if they think you are misrepresenting yourself to them.

Issues that can impact test/retest reliability include:

• Length of time between tests • Remembering/learning from earlier test • Reactivity to items on test • Developmental changes between tests. larry ran rapidly down

Myths & Realities of program evaluation

• Myth = Program evaluation is an adversarial process focused on proving the success/failure of program. Reality = Good program evaluation is intrinsic to the betterment of the program and its participants. - In the past, program evaluation was often used to determine which programs to keep and which to cut. Hence, program personnel have often been considered it to be disruptive, unhelpful, and threatening. -Current views integrate program evaluation from the beginning to identify areas in need of improvement and to highlight strengths of the program through continuous feedback.

Lenght of Survey

• Phone interviews: Ten minutes is rarely a problem and can usually be extended to twenty minutes. • Mail questionnaires: A short (i.e., 1-4 pages) survey is usually appropriate for the general population. • Face-to-face interviews can last one hour. In special circumstances, they can last 3-5

Avoid asking respondents about their future intentions/behaviors.

• Responses are poor predictors of future behavior.

Screening question

• Sometimes screening questions may be used to eliminate those people who don't meet the criteria for participating. - Examples: "Are you in 9th-12th grade" or "Do you live in Los Angeles?"

Conducting a needs assessment

• Talk with current program participants and staff to see if there is a gap in one's own services. - Fathering program example - Grandparents raising grandchildren example • Read the local newspaper and/or listen to local news. • Review social indicators in the geographic region (i.e., relevant statistical facts from public reports, documents, and data archives). • Talk with community leaders and experts (e.g., political figures, religious leaders, activists, educators, public policy advocates). • Communicate with other service providers to identify what programs currently exist and any gaps. • Interview or survey community residents and/or potential program participants. • Hold a community forum.

Observed score cont. Sources of error in reliability

• Temporary individual factors - health, fatigue, motivation • Lasting characteristics of the individual - ability level related to the trait being assessed • General characteristics of the individual - ability level, test-taking skills, ability to understand instructions • Factors affecting test administration - conditions of testing, interaction between examiner and test taker, bias in grading • Other factors (e.g., luck) know all of them, might ask which ones are not considered one. •Timed GOLF

PROCEDURE SUGGESTIONS FOR INCREASING RESPONSE RATE ON SURVEYS

• Thank the respondent at the beginning for participating. • Keep the survey as short as possible; only include what is absolutely necessary. • Be sensitive to the needs of the respondent. • Be alert for any sign that the respondent is uncomfortable. • Thank the respondent at the end for participating. • Assure the respondents that they will receive a copy of the final results (if they desire). will not be on exam

Question order or sequence

• The first few questions can set the tone of the survey and help put the respondent at ease. - Begin with easy, nonthreatening, and/or interesting questions so the respondent will feel comfortable about the questionnaire. * Simple descriptive questions (e.g., gender) are easy and fast to answer • Put more difficult, threatening, and/or sensitive questions near the end. - Before asking such questions, try to develop trust or rapport with the respondent by asking easier, nonthreatening questions at the beginning. - Do not abruptly change to sensitive questions; use a transition statement * Example - "In this section, we would like to ask you about your sexual behaviors. Remember, your answers are anonymous, and you can skip any questions that make you uncomfortable." • If the most important questions are left until the end, the respondents may not give them appropriate attention. If they are introduced too early, the respondents may not be ready to answer if they are difficult questions.

Screening questions cont.

• The more complicated the screening, the less likely it is that the researcher can use a paper-and-pencil instrument without confusing the respondent.

Use transitions when switching topics

• The transition could be a statement, divider, or a page break. • Example - "The questions will ask about your spending habits

Cronbach's Alpha Cont- W

• The value of alpha depends on the number of items in the scale and the average inter-item correlation. In general, the larger the number of items, the larger the Cronbach's alpha. • It is the most popular measure of internal consistency. • The computer output generates one number for Cronbach's alpha. Alpha can range between 0 and 1. The closer it is to one, the higher the reliability estimate of your instrument. • In an exploratory study, if a scale has an alpha above .70, it is usually considered to be internally consistent. However, some use .60 as a cutoff in exploratory research. • As a general rule, the alpha should be .80 or above for widely used scales. o For important clinical and educational decisions (e.g., standardized tests scores, special education placement, college admission tests), a reliability of .90 or better is needed. • However, getting too high of a reliability (approaching 1.0) may be indicative that the scale is actually an index (e.g., life events index) or the items in the scale are redundant. • It is a conservative estimate; hence, the actual reliability is probably higher than the Cronbach's alpha. It is a less conservative estimate of reliability than test/retest. • It is often used for Likert-type scales.

Inter-rater/Inter-observer reliability Cont.

• To get consensus on categorical data, the number of agreements between the rates is divided by the total number of observations. • Types of analyses for continuous level data - Cohen's Kappa can be used when there are only two raters (Note: There is a multi-rater version of Cohen's Kappa, but not in SPSS). - Intra-class correlation can be used for two or more raters when the data are interval level.

Face validity cont.

• To have a valid measure of a social construct, one should never stop at achieving only face validity, as this is not sufficient. However, one should never skip establishing face validity, because if one does not have face validity, s/he cannot achieve the other components of validity.

Avoid abbreviations.

• Using "CA" on a survey for Latinos could mean California or Central America

Avoid false premises - Again, many respondents may not agree with the premise at the start of this question, and there is controversy over whether this is fact or not.

•"Since man-made global warming is causing a worldwide crisis, how much should sustainable design be encouraged?"

Avoid false premises

•"Since oil companies are gouging the American people, should Congress increase taxes on oil companies? 1. Yes 2. No" - The premise at the start of this statement is not necessarily a verifiable fact, and it may be inconsistent with the belief of the participants. - Alternative: "Congress should increase taxes on oil companies. 1. Strongly Disagree 2. Disagree 3.Agree 4. Strongly Agree"

Avoid loaded questions -The researcher might get a very different perspective of the respondents' view if the researcher asked about the disadvantages of tax cuts. An alternate question would be to ask, "What do you think of the current tax cut initiative?"

•"What do you see as the benefits of the current tax cut initiative?"

Three steps to establish test/retest reliability:

•Administer the measure at two separate times to each subject. • Compute the correlation between the measure at time 1 and time 2 (often the Spearman-Brown coefficient). • Check if there is a significant correlation between times 1 and 2.

Avoid confusing, vague, and/or ambiguous terminology.

•Asking a person's "sex" may be misinterpreted as referring to sexual activity. It is generally less confusing to ask, "are you female, male, intersex, or other?"

Issues among ranking

•Coding / data entry issues come up with ranking questions. - Respondents may give two of the same rankings (e.g., assign two 1st choices). - Respondents may not give someone a ranking.

Outcome evaluation

•Does the program have an impact on the participants? - Should only be conducted after the program has been operating successfully for a time period (i.e., long enough to eliminate implementation problems and shortcomings). - One way to do this is through a pretest and posttest design. part of summative outcome->impact

Types of reliability

•Internal consistency reliability -Cronbach's alpha -Omega coefficient -Kuder-Richardson reliability -Median inter-item correlation -Factor loadings -Parallel or alternative form -Split halves •Test/retest •Inter-rater observer reliability in case asks which ones are internal consistency reliability: Mockspf, the only ones that are not are Test/retest and Inter-rater observer reliability Types of reliability I IT

Issues and/or suggestions for program evaluation

•Lack of evaluation or appropriate evaluation for community programs - Most programs only report accountability data and sometimes satisfaction data, but there is a need for more impact studies (i.e., outcome evaluation). • Biased evaluations - Unfortunately, many researchers have measured the impact of a program on various outcomes, and then selectively reported the findings that demonstrated program effectiveness. - Researchers and program developers can selectively report only those findings that demonstrated program effectiveness. A subtler example is when they make decisions about analyzing data or coding variables so the results are more positive, believing that these results properly captured the "true" impact of the program. - Before the analyses, researchers should identify the primary hypotheses to be tested and the exact procedures to test the hypotheses, and then report those findings regardless of whether they are positive or negative. - Benefits * Less biased evaluation * Allows the program to examine potential problems in either the data collection or program components * Enhances the ability of future program developers to avoid the same mistakes/pitfalls of the original program. •Testing is often conducted by untrained people. -Example: program presenters (e.g., parent educator) with no on-site training in the evaluation procedures -Example: researchers with no training in working with the population (e.g., developmental disabilities) Lack of evaluation or appropriate evaluation for community programs, Biased evaluations, Testing is often conducted by untrained people BLT

Realities & Myths of program evaluation

•Myth = Program evaluation is an adversarial process focused on proving the success/failure of program Reality= Good program evaluation is intrinsic to the betterment of the program and its participants. •Myth=Program evaluation is a waste of time Reality=Evaluation is necessary •Myth = Program evaluation must be done by outside experts. Reality=Outside experts are not always necessary. •Myth = Program evaluation is too complex Reality=Program evaluation is within the capabilities of the staff of most agencies/programs.

Split-halves cont.

•Often based on correlating the odd-numbered items with even-numbered items. • Often the Spearman-Brown split-half reliability coefficient is used to compare the two halves. • An internally consistent measure should have a split half reliability of at least .75.

Issues and/or suggestions for program evaluation Cont.

•Population studies need to be diverse to increase the ability to generalize (diverse in ethnicity, SES, etc.) • Sample sizes need to be larger - should be prepared for program attrition. •Stronger research designs are needed. • New research/evaluation methods - given the limitations of participants from different groups, creative strategies need to be used to collect useful information. - Programs that target people from different cultural backgrounds may need to modify the program/evaluation - Programs and evaluation efforts need to be culturally sensitive. - Terminology/jargon on questionnaires may need to be modified. - Programs that target people of limited intelligence or literacy may need to make appropriate accommodations. * Response choices on surveys may need to be limited. * Pictures might be used to demonstrate response choices (e.g., :) for agree, :( for disagree) * Games might be used to get data for pretests/posttests (e.g., board games) • Future evaluation needs. - Need to standardize instruments so evaluation results can be compared more easily. - Need to document program characteristics of effective programs. - Need to document barriers to program effectiveness. • Need cost benefit and/or cost effectiveness studies - Need to document the cost of programs versus the cost of not providing these programs or in comparison to other programs.

Avoid emotional, biased, and/or stigmatizing language Cont.

•Put people first, not their disability -Example: Use "person with a developmental disability" instead of "a developmentally disabled person." -An exception would be when the disability is part of a group's identity (e.g., Deaf person)

Difference between reliability and validity

•The real difference between reliability and validity is mostly a matter of definition. - Reliability estimates the consistency of a measure, or more simply the degree to which an instrument measures the same way each time it is used under the same conditions with the same subjects. -Validity, on the other hand, involves the degree to which a researcher is measuring what s/he wants to, more simply, the accuracy of the measurement.

ASSUMPTIONS OF USING A SURVEY

•The variables being measured in the survey are clearly conceptualized and operationalized. •Questionnaires have no wording, question order, or related effects. • Respondents are motivated and willing to answer all the questions honestly. • Respondents can give complete information and recall events accurately. • Respondents understand each question exactly as the researcher intends it. • Respondents give more truthful answers if they do not know the hypotheses. • Respondents give more truthful answers if they receive no hints or suggestions. • The data collection type, process, and situation have no effects on the respondent's beliefs or answers. Respondents' behaviors match their verbal responses in an interview. will not be on exam

Likert response scale cont.

•There is disagreement as to whether a "neutral" or "undecided" response choice should be offered. • Arguments for midpoint responses - In the example above, some would argue it is better to use the following response choices without a "neutral" option (see below): Please indicate your level of agreement. I am a person who has made wise investments. Strongly Disagree 1 Disagree 2 Agree 3 Strongly Agree 4 -Some people may not have an opinion; hence a midpoint denies a respondent the neutral option and artificially creates opinions that correspond to the scale. •Arguments against midpoint responses - "Satisficing" / false negatives - The likelihood of midpoint responses increases when opinions are not firm. Studies show that many neutral respondents actually took a position when questioned further. - Can increase 'regression to the mean' which has a neutralizing effect due to the midpoint. - Neutral can mean a variety of things (e.g., neutral, don't know, no opinion, don't care); which may decrease reliability and validity of the measure.

Practitioners

•Time Orientation -Practitioners tend to deal with clients who have immediate needs and/or who desire quick results (e.g., court-ordered participants in a parenting program, clients in a treatment facility). Also, practitioners and program developers may feel the need for quick results to write quarterly progress reports, to justify future funding. •Ways of Knowing Definition of Success -Practitioners often rely on feelings, intuition, direct experiences and observations, clinical evidence, and testimonials. -Practitioners define success as positively impacting individuals/families. •Communication -Practitioners are concerned with relationship issues (e.g., relationships with clients, community, funders), and hence, use verbal and written communication to facilitate these relationships. •Program Implementation -Practitioners want to be flexible and responsive to the participants' needs, yet they are constrained by program monitoring.

Researchers

•Time Orientation -Researchers generally don't have the same time constraints, and they often view their research projects as part of a much larger body of work that may last years. •Ways of Knowing Definition of Success -Researchers tend to rely on logic, scientifically generated evidence, numbers, and published research articles. Researchers often define success as conducting "good" research, collecting large amounts of data, and finding statistically significant results. •Communication -Researchers are often interested more in information, facts, and figures, and hence, use communication to convey this information. •Program Implementation -Researchers desire control in the program so that data are collected in a specified and predictable method. Control increases the likelihood that changes in measured outcomes are due to real changes in subjects, not to changes in study methodology.

Reasons to conduct a needs assessment:

•To establish the most pressing need(s). • To provide a rationale for funding. • To evaluate existing services as well connections between various service providers. • To identify the strengths of individuals and communities which can be built upon by program development. • To guide the development of program goals. • To gain community support and attract partners/participants.


Conjuntos de estudio relacionados

Economics Macro Chapter 27; No terms; Multiple choice only; IRSC

View Set

Global Supply Chain Midterm Flash Cards

View Set