PLS 393 Exam 1

Lakukan tugas rumah & ujian kamu dengan baik sekarang menggunakan Quizwiz!

Clarify steps in survey design (matrix)

- evaluation questions - source/target - sampling - mode of survey - survey questions - pretesting - data quality assurance - analysis and communication of results

Recommendations from Langbein

- good start: purpose of survey - clarify confidentiality and anonymity - do not go over 20 minutes - use mainly close-ended (multiple choice, or check) questions - short, simple, clear, specific language - professional layout and presentation, easy to go through - look for past questions - start with easier, factual questions - have thematic sections - provide definitions - specific time frames (e.g., during a typical day last week, how many hours...) - make options mutually exclusive - avoid double-barreled questions and answers - if multiple reponses, think carefully of the design including - none of the above - think carefully about Don't knows and/or No opinion (use sparingly) - ask for rating of items rather than their ranking - avoid leading, cueing - avoid agree/disagree - use bipolar, balances scales, 1-3, 1-5, 1-7 - think about where to place "overall question -- often towards end - demographics at end - most sensitive demographics such as income towards the end - use bracket from income (maybe age) - open-ended or any further comments, at the end - proofread and pretest survey on small sample

Objectives

- identify 4 representative counties - collect and analyze data for an implementation evaluation of each county's processes - collect and analyze data for impact evaluation, using treatment and comparison designs - provide recommendations on sentencing procedures that could be applied across the state

The 5 types/domains of evaluations

- need for the program: nature, magnitude, and distribution of a problem - program theory or design: assessment of program theory and design, valid causal mechanism, feasible approach, logic model (if...then) inputs...activities...outputs...short term outcomes...medium term outcomes...long term outcomes - program process: assessment of processes (operations, services and delivery; fidelity and quality), most common form of evaluation, program monitoring, output monitoring - impact evaluation: effectiveness of the program - program efficiency: cost benefit and cost effectiveness analyses

Evaluating fidelity: how well does the implemented program match the program's theory?

Assess the degree of which the implemented program matches with the intended program's theory Check how the program relates to: - logic model - program design documents - program philosophy Check quantity and quality of delivered services and how they match to program's intentions

Report cards

Analyzed based on: - capacity - condition - funding - future need - operation and maintenance - public safety - resilence - innovation

Single interrupted time series

Can we just compare the average outcomes after to the average outcomes before? NO!! We could compare averages if we were sure there were no trends or intervening events or regression to the mean issues, common problems with time series designs - trends and intervening effects can be (hopefully) addressed - often these are aggregate studies (e.g., crime rates) but can also be used with individual data (e.g., education and growth models for students) - fundamental SITS model involves a regression to account for both immediate (level) and gradual (trend) effects Y = a + bX + cXT + dT + e Where X is an indicator (1,0) for the program and T is a counter for time. So, b estimates the immediate impact and c the gradual one PROBLEMS - history and multiple intervening treatments - regression to the mean concerns - external validity

Implementation studies

Determining both whether the program being evaluated was implemented with fidelity, and what core components of the program are most important for achieving intended outcomes

Relations

Earlier: needs and theory, theory and implementation Later: impact and implementation, impact and cost

Program theory

- A program's theory is the conception of what must be done to bring about the intended changes; it is the rationale behind a program, the foundation on which every program rests - The program's theory may not be completely articulated in program documents - The evaluator will need to express/describe the programs theory (as intended) - The evaluator will need to assess the quality of the program theory - An assessment of a program's theory could take place during an evaluability assessment: a check that indeed a full evaluation should proceed Why is it important to understand a program's theory? - BEFORE: to decide if sponsor - WHILE: to help improve - AFTER: to address accountability and effectiveness

Why do we do policy research?

- Accountability - Knowledge - To be influential - To empower

Different uses for process evaluations

- Alone: as for a new program or pilot or for an existing program that seeks to improve its performance - Complement: the process evaluation becomes an additional key element in order to understand quality and quantity of services delivered - Monitoring: process evaluations done in a systematic, periodic manner are referred to as process monitoring

What is program evaluation?

- Applying (social) research methods to systematically investigate the effectiveness of (social) intervention programs and inform action - Policies vs. programs: policies are usually higher level statements of intent - Evaluation skills...in your own life...to advance in your career...at your work...for society in general

Research type, design, and methods

- Casual type of research with descriptive requirements as well. Both formative and summative, though summative ultimate goal. - Design: quasi-experiment before and after with matched comparison group. Method: - QE design of individually matched DUI offenders with interlock with comparison DUI offenders without interlock. Match by 5 characteristics: gender, race, age, number of prior DUIs, BAC What else could they have used from DMV or probation court info? - T-tests and losgistic regression - Surveys of all judges, interview of some judges, select sample of probationers, interviews of manufacturers and others - Analyses of interlock usage logs from county (Santa Clara)

Communication and use

- Client concerned and discounting of results - Recommendations on mentors (increased training and follow-up) only partially used - Recommendation on working conditions for liaisons could not be followed - Action: the report remained internal with no dissemination for promotional use - "It may be that, as in many cases of policy research, the most direct effects on action will be in the altered awareness and expectations of decision makers in designing future programs

Good policy research will be...

- Credible - Unbiased - Meaningful - Responsible - Doable - Creative

Evaluation lessons learned

- Difficulties of communicating complex, ambiguous results - Anticipate and think through potential problems with sample sizes and variable implementation - The importance of capturing implementation and its issues (e.g., judicial reticence and installation circumvention) - Only 7 states with no interlock - Policy is now evolving at the national level

Why systematic evaluation?

- Difficulty in identifying program effects - We need to assess program effects relative to the outcomes expected without the program intervention - Confirmation bias: the tendency to see things in ways the favor preexisting beliefs (behavioral insights)

What is public policy?

- Dye: whatever government chooses to do or not to do - Wilson: the actions, objectives, and pronouncements of governments on particular matters, the steps they take (or fail to take) to implement them, and the explanations they give for what happens (or does not happen) - Examples: laws, regulations, programs, executive orders, rulings

Process evaluation through the lens of the...

- Evaluator: assessing service and delivery data can help evaluators explains the impact. If a program does not produce the intented results, we can tease out whether it was due to implementation or bad theory. - Sponsor: Sponsors need to understand the level of provision and impact of a program in order to ultimately decide on continuation. Information on implementation helps provide a full picture of a program. - Manager: Managers are concerned with uncovering the problems in the program and correcting them to improve performance.

Policy researchers can be found in

- Government: federal agencies - Consulting firms, private research organizations, or consultants - Think tanks: independent scholars who are thought leaders or advocates for particular issues - Institutions of higher education - Public policy stakeholder organizations

Does policy research improve public decisions?

- Help with improvement of programs - Clarify effect of policies - Clarify what are problems - Clarify what are options - Help with quality of public debate

Evaluations in terms of the evaluator-stakeholder relationship:

- INDEPENDENT evaluation - PARTICIPATORY or collaborative evaluation - EMPOWERMENT evaluation

Independent Redistricting

- IRCs are a voter-centric reform that ensures voters, not politicians, decide how electoral districts are drawn - States are responsible for drawing their own maps - This process of partisan gerrymadering undermines the democratic principles of the elctoral process and contributes to the rising partisan polarization of legislature as well as the U.S. Congress - An IRC is a body separate from the legislature (which may or may not include members who hold partisan public office) that is responsible for drawing the districts used in congressional and state legislative elections - Generally, a redistricting commission takes one of two forms: non-politician commission or political commission - Iowa has one of the nation's first redistricting commission processes and Iowa's sysyem is unique because legislative leadership and the Governer have authorized the Legislative Services Agency (LSA) to perform many of the essential components of the redistricting process

What does a well-done RFE accomplish?

- In theory, RFEs are likely to achieve internal validity (that is, provide unbiased estimates of program effects) Why? Because if the random assignment worked: "the units assigned to each treatment group are very likely (in the aggregate) to be comparable on all confounding variables (Z), whether they are measured or not"

What is applied policy research?

- Information gathering...processing and analyzing - In order to assist...decision-making

California Ignition Interlock program: policy cycle and stakeholders

- Initiation: CA passes 1986 Farr-Davis Safety Act authorized ignition interlock demonstration pilot (first state to do so). Very detailed mandate about the policy experiment. - Policy cycle: part of policy legitimation-evaluation on top of a pre-existing sentencing system - Stakeholders: many! - California Department of Alcohol and Drug (DADP) programs is nominal client but Office of Traffic Safety (OTS) directed evaluation - NHTSA (federal) provided funds and technical expertise - DMV provided records of DUI offenders - County court (sentencing) and probation offices (installation and proper use and annual check); - Interlock manufacturers (installation, maintenance, quarterly report) - Sentenced individuals (install device within 30 days, report annually)

Why policy research is here to stay...

- It is the law: legislative requirements - It is interesting - It can be influential - It improves public decisions: 1. by reducing the uncertainty in public decisions 2. it contributes to decision makers' understanding of policies or programs - It can bring new perspectives - Can improve the quality of public debate - Can fuel continuous improvement of existing policies

Logic models

- Logic models are organizers of everything The basic components: - Resources/inputs - Activities - Outputs: direct products of a program's activities - Outcomes: expected changes in the target/population's conditions as a result of the program. There can be short, medium, and long term outcomes

SHP Chapter 1

- Policy research can happen at any point in the policy cycle, which has 5 steps: 1. Agenda setting: the problem that is to be addressed by the policy must be defined and balanced against other competing priorities 2. Policy formation: before policies are passed, they must be formulated to address the problem in question 3. Decision making: the actual passage of a policy 4. Implementation: how well it is being put into action 5. Evaluation: determining whther policies or programs actually produce their intended effects, evaluation takes many forms and is often required when policies are put into effect

Main characteristics of policy research

- Problem driven: real world problems that need solutions - Actionable: information produced must have clear implications - Policy research can help throughout the policy cycle

How Central Ohio Got People to Eat Their Leftovers

- Problem: food rotting in landfills generates methane, a lost resource, which incurs collection/transportation/disposal costs, 1/3 food in US is unsold or uneaten, food waste in the US is responsible for twice the greenhouse emissions as commercial aviation, households account for 39% of food waste in US - Policy tools: regulation and mandates, public campaigns, and economic incentives - Central Ohio SWACO efforts: public awareness campaigns (tips, $1500 annual savings per household, composting bins) and school grants for compost and recycling sorting - Results: self-reported drop of 23% in food waste, food waste volume decline of 21%, one school district droppped trash pickups by 30%, recycling pickups by 50%, and saved $22,000

The purposes of evaluations

- Program improvement (formative): to help management improve a program - Accountability (summative): to provide input for decisions - Knowledge: to learn about the program's effects in order to inform more general discussions

Who does policy research?

- Public sector (federal, state, and local) - Private sector - Non-profit sector

Dropout Prevention Program research design

- Quasi-experimental design: before and after with a comparison group - Implementation and outcome evaluation, takes place in the second year - 6 participating schools who receive treatment and 6 school that are the comparison - Comparison schools (another set of 6) are chosen to be similar to treated schools - Whole classrooms in a school are chosen - 6 liaisons - 3 program specialists - 530 mentors trained and assigned, about 800 students - Attitudinal survey conducted in the Fall and Spring - Attitudes adress personal efficacy and school relation items. Measures are checked for reliability. - Questionnaires to students, mentors, and liaisons - N=518 completed surveys, 278 from treated and 240 from comparison students. There is attrition from those who do pre-survey - Students are checked for comparability in terms of gender, grade, race, and age

RFEs

- Random selection does not equal random assignment: random selection refers to random selection to assemble the sample--then you still need to do the assignment - Random selection helps with external validity - But RFEs do not need to have random selection (could be volunteers)

Why do we need evaluations?

- Social problems can be complec and programs may or may not help - The main tasks of program evaluation are: 1. assess the effectiveness of program or policy interventions 2. identify the factors that drive or undermine their effectiveness

History of RFEs

- Started in agricultural context, then education and health. Used since in the 1960s among varied policies: education, health, welfare, criminal justice - Earlier on larger scale social program experiments (income security, health insurance) - Smaller scaled studies 80s and 90s onwards - For some a concern: targets often poor people dependent on some social program (in the US or abroad). Few RFEs for regulation or management.

Project Green Light

- The first public-private-community partnership of its kind, blending a mix of real-time crime-fighting and community policing aimed at improving neighborhood safety, promoting the revitalization and growth of local businesses, and strengthening DPD's efforts to deter, identify, and solve crime - Eight gas stations that have installed real-time camera connections with police headquarters - Effective? Yes, incidents of violent crime have been reduced by 23% at all sites and 48% at the original 8 sites compared to

Lessons learned

- The importance of a balanced (implementation and outcome) design - Limitation of many policy studies that need to resort to proximal outcomes (e.g., attitudes) - The importance of capturing "dosage," preferably with objective measures - The importance and challenges from client expectations

Challenges

- The local court's autonomy and discretion (not random) at the time of sentencing - Potential variation in processes from county to county - Variation (by design) in county demographic and socioeconomic characteristics - Large number of agencies that needed to coordinate data and processes - Multiple stakeholders with varied political and economic interests

What is the role of policy research in influencing policy change?

- The role of policy research in affecting policy is frequently subtle; its impacts incremental - Policy research is but one ingredient in a complex decision-making stew

An evaluation entails...

- a DESCRIPTION of the performance of the entity being evaluated, using scientific research methods - an ASSESSMENT of the performance based on some standards or criteria for judging the performance

Measuring program outcomes

- a relevant outcome is chosen...but still a bit more complicated - many program outcomes are multidimensional: 1. a single outcome measure may not be sufficient to capture all the dimensions of changes in a condition (e.g., intensity, frequency, type) 2. multiple measures: help against the possibility that poorly performing measures will underrepresent outcomes

Outcomes

- an outcome is the stage of the target population or the conditions that a program is expected to have changed - outcomes are about conditions, not the program - outcomes are then different from outputs (which seek to capture services/activities delivered)

Implications

- bullying behavior that school employees find it difficult to clearly identify - state laws and policies differ significantly - an important area for further study is to improve knowledge of these trade-offs - a need for more options (range of consequences for bullying) - effective intervention strategies - the discretion and skill of individual school personnel is critical to implementations

Issues with RFEs

- correct randomization - problems with contamination - use of cluster randomization - cross-overs and non-compliance - blinded follow-ups - attrition - resentful demoralization - waiting lists - sample sizes: can estimate required sample sizes for an expected effecr size - analyses: in principle can be very simple (difference of means test), often still use regression - generalizability: depends on the RFE, the sample of study - ethical and political feasibility constraints

Response rates

- declining rates across modes: public fatigue - low response rates can imply concerns with bias - vary across modes - 70% and above can be considered a great response rate - use of weights (but need to understand well characteristics across the whole target population) - strategies to improve response rates may involve: follow-ups, incentives, keep short, convenient, proper designs, and request - understand response rate patterns of the time and the mode you are selecting. Langbein: response rate to follow-up often around half of the reponse rate of the initial effort (mail). Newcomer and Triplett: avergae number of call attempts to get interview is more than 5

Sample size

- descriptive, population studies seeking to estimate a key population parameter: large samples of N ~1000+. For example, you seek to understand the rate of unemployment in Michigan. - causal, relational studies seeking a program effect may have smaller sample sizes (presumably the effects are large). Sample can go down to around a hundred (according to Langbein), per group of study. Caveat to this: if you are part of a large summative evaluation porject of an existing but controversial policy/program you may require a large sample in order to get a precise estimate of the effect. - descriptive, organization studies: surveys of small samples/populations (say less than 60), often related to the study of an organization, are mainly descriptive (so still useful) but avoid unwarranted claims based on large sample statistical tests

Trends

- enormous growth in survey industry - technological advances - tiredness, reluctance from the public

The standards by which program performance may be judged in an evaluation include the following:

- needs or wants of target population - stated program goals and objectives - professional standards - customary practice: norms for other programs - legal requirements - ethic or moral values; social justice, equity - past performance; historical data - targets set by program managers - expert opinion - preintervention baseline levels for the target population - conditions expected in the absence of the program (counterfactual) - cost or relative cost - stakeholders: individuals, groups, or organizations with a significant interest in how well a program in working

Questions in process evaluations

- number of people/recipients receiving services - bias in coverage - amount and quality of service - awareness of program - staffinf numbers and competencies - organization of program - variation across sites - satisfaction of those involved with program - implementation fidelity - adequacy resources, facilities, funding - coordination with other agencies - compliance with standards

Outcomes

- outcome level: the status of an outcome at some point in time - outcome change: the difference between outcome levels at different points in time - program effect: the portion of an outcome change that can be attributed uniquely to a program

Fields of policy study

- policy process - policy analysis (this semester) - policy evaluation (this semester) - policy implementation - policy design, policy makers, and policy making

Comparison group designs

- post only: treated vs un-treated after intervention (naive design) - pre-post: treated vs un-treated before and after intervention (differences-in differences) - pre-post with covariates: treated vs un-treated before and after with extra controls (regression adjusted covariate design)...I would call this non-experimental - single interrupted time series: treated across time - interrupted time series with comparison: treated vs un-treated across time - regression discontinuity designs: treated vs un-treated around a cutoff

Basic Comparison Designs

- post-test only comparison - pre-post comparison - single interrupted time series - interrupted time series with comparison - regression discontinuity

What criteria do we use for process evaluations?

- program theory - administrative standards - other programs - professional judgment or common sense

State Bullying Laws: Research design

- purposive sampling design: 4 states from different geographical regions; 11 districts (2 middle schools from each); 22 middle schools - site visits by 2 person teams Feb-May 2012 - N=281 in person interviews with personnel from the schools (could include principals, counselors, teachers, security personnel, bus drivers, etc) - structure interviews: first a short fixed questionnaire to be filled out (frequency, type, and level of disruption), then structured interview - questions address: 1) definitions 2) awareness 3) procedures 4) communication 5) training 6) prevention 7) supports - immediately after visti, team generates a School Summary Protocol: 155 dichotomous items or ratings

Key components

- service utilization (target reach and recruitment) - service delivery (activities delivered and received) - program support (program needs to function well) - satisfaction (participants and program staff) - fidelity (degree of consistency with design/theory)

Which outcomes do we pick?

- stakeholder perspectives: can sometimes lack specificity and concreteness to identify specific outcome measures - program impact theory: proximal outcomes (direct and immediate), distal outcomes (politically and practically important) - prior research: examine outcomes used before - unintended outcomes: emerge through some process that is not part of the program's design or direct intent, or through the dynamics of the social conditions

Communication and Use

- study was downplayed by new leadership in ED Office concerned about qualitative design. It was not disseminated by ED. - study belatedly receives attention from the National Academy of Sciences which cites and refers to it in one of its reviews on the topic - impact: more incremental and indirect, through being included in the NAS review, through the discussions at the sites themselves, and with the advisory panel

Logic model uses

- to carry out an evaluability assessment or a full evaluation - describes how the program works so it helps if changes are needed - helps in collecting information about the program - clarifies the place of the program in an organization (with multiple goals) - helps in communication and usability of results

Lessons for policy research

- too many meanings

Measurement properties

- use procedures that are already established or develop the measures - measurement of the outcomes focuses on: 1. reliability: the extent to which the measure produces the same results when used repeatedly to measure the same thing 2. validity: measure of the extent to which it measures what it is intended to measure 3. sensitivity: the extent to which the values on the measure change or difference in the thing being measured

Program monitoring - regular performance tracking

- what occurs (or should occur), day in and day out, in programs in order to monitor and improve performance - encourage all programs to add all relevant additional information For example: - process and utilization info - client characteristics - outcomes by sub-groups - outcomes at intake - relevant context info

Rapid-cycle evaluation

A framework for conducting a rapid sequence of experimental or quasi-experimental research that tests whether operational changes improve results. They tend to focus on incremental changes in operations instead of measuring the effects of entire programs.

Analysis and Evaluation

A: what should we propose? E: did it work? how can we improve?

Things to consider in RFEs

Ethics: assign treatment to one group and deny to the other. Is this feasible? Who is the control? More often need to compare two treatments A and B (status quo) or dosage or now and later, through lottery

Grading scale

Exceptional: fit for the future Good: adequate for now Mediocre: requires attention Poor: at risk Failing/Critical: unfit for purpose

Methods used in policy research

Experiments, quasi-experiments, non-experimental population studies, survey research, qualitative research, meta-analysis, site visits, cost-benefit/cost effectiveness analyses, secondary data analysis, forecasting, systematic reviews, criteria-based policy analyses

State Bullying Laws: Objectives and challenges

First phase objective: document the variability of bullying state laws across the US Second phase objective: How are state laws translated into practice at the school level? Challenges: diversity of perspectives and high level of analytic complexity

Key takeaway

For policy research there are: many careers, many uses, many methods, and (hopefully/eventually) paths to meaningful influences

Bullying: findings

Frequency: 1/2 respondents reported bullying-like behavior on a weekly basis but ultimately few are formally confirmed or meet criteria. Perceived reason odten related to appearance, then dating conduct, rather than protected categories (i.e., race, religion, disability, etc) - few formally confirmed bullying cases (serious harm, threats) - general level of awareness about bullying but difficulties in clarifying when a behavior was bullying per the law - even incidents/behaviors that are not legally deemed bullying are detrimental to both target and bully - Implication: Understand bullying within a social contect and a socio-emotional process of maturation. Focus then on context, such as school climate and supportive policies.

Nonexperimental research

Gathering information on intended policy efforts before, during, and/or after an intervention from a single group of participants with no comparison group

Researcher must...

Have a solid ground in research methods, a creative mind, and the ability to communicate research results in compelling ways

When do you do an impact evaluation with an RFE?

IMPACT EVAL - pilots that serve for extensions - on-going programs in order to improve/better understand alternatives - IDEALLY: when implementation problems are ironed out RFE - important problem - ethical to do it - have the resources

Is there one QE design that is always better?

IT DEPENDS

Case #4: Universal Basic Income study in Finland

In the Finnish experiment, people on the basic income reported large and statistically significant improvements in key drivers of well-being The purpose of this experiment was to test the effects of a $500 per month guaranteed income for 2 years on health and financial outcomes. A mixed methods randomized controlled trial in Stockton, CA, USA enrolled 131 individuals to the treatment condition and 200 to control to receive a guaranteed income from February 2019 to January 2021. Quantitative data collection began 3 months prior to allocation at 6-month intervals concluding 6 months after withdrawal of the intervention. Qualitative data collection included 105 interviews across 3 stages. The primary outcomes were income volatility, physical and mental health, agency, and financial wellbeing. The treatment condition reported lower rates of income volatility than control, lower mental distress, better energy and physical functioning, greater agency to explore new opportunities related to employment and caregiving, and better ability to weather pandemic-related financial volatility. Thus, this study provides causal evidence of positive health and financial outcomes for recipients of guaranteed income. As income volatility is related to poor health outcomes, provision of a guaranteed income is a potentially powerful public health intervention. Volatility means rapid changes.

Difference between outputs and outcomes

OUTPUTS - direct products of a program's activities/services - often expressed numerically or quantified in some way - examples: #attending workshops OUTCOMES - changes resulting from a program's activities/services - quantify changes in knowledge, attitude, behavior, or condition - examples: knowledge healthy choices, adoption healthy practices

Systematic reviews

Policies are much more defensible if they are grounded in evidence-based research. Policy researchers may review evidence across multiple studies to identify whether compelling evidence exists for the adoption of a program, policy, or practice.

Survey research

Policy researchers may survey beneficiaries of a program to determine whether the program had the intended outcomes

Implementation of State Bullying Laws

Problem and context Problem: varied state level bullying laws and gaps in knowledge regarding their components and their implementation Contect: process evaluation sponsored by the Department of Education. Stakeholders: Department of Education (Office of Planning, Evaluation and Policy Department), advisory group, enlisted states (districts and schools)

California Ignition Interlock program

Problem: In the mid 1980s, California was experiencing a substantial problem with drinking drives and high levels of recidivism (DUI reconvictions) - 1982, nearly 1000 DUI arrests a day - 1982, 1 in 3 DUI offenders in CA had a prior DUI - 1986, more than 2500 alcohol-related traffic fatalities - NHTSA considers that the public, legal system, and technology are amenable to experimenting with ignition interlocks as one policy solution

Dropout Prevention Program

Problem: school dropout was a serious problem in the St. Louis Public School District in the early 1960s. At the time, some schools had dropout rates of 50% by the 12th grade. Research initiated by school official in the district's Office of Federal Program who has sought and obtained a federal demonstration grant for a three year pilot a mentoring program, based on a prior model. Context and Stakeholders: the main client is the school official in the Office of Federal Programs who got the US D. of Education grant. There are several district personnel involved with the evaluation or potentially interested in results. Immediate stakeholders of the study: mentors, liaisons, program official, teachers, students and their familes. Policy cycle: the program was funded as a demonstration, with the idea that if successful it could be a model to disseminate. Objectives: increase the # of students who stay in school, increase positive attitudes toward school, provide trained adult mentors for N=1230 participating students, stregthen home-school-community collaboration with liaisons, pilot and then disseminate the mentoring model Type of research: causal study with the goal to uncover the relation between mentoring activities and positive changes in attitudes and behavior Technical challenges: find adequate measures when outcome is distal and good performance measures not available (at the time), explain and properly interpret quasi-experimental design Implementation challenges: teacher and personnel reluctance. First year study had issues with limited administration. Client challenge: having a client predisposed towards one-sided results

Process evaluation

Process evaluation is a from of evaluation designed to describe how a program is operating and to assess how well it performs its intended functions. As with program theory: there is not one approach. - Implementation evaluation: newer programs and accountability purposes - Process evaluation: established programs and formative purposes

Describing program theory

Programs center on transactions between a program's operations and its targets. Program theory can be broken down into three components. - Impact theory (theory of change): is causal theory that explains the cause and effect sequence in which program activities result in beneficial outcomes. It is the essence of a program. - Process theory (theory of action): 1. service utilization plan: describes the services from the perspective of the targets, sequence of contracts, including steps when targets do not engage and end of services 2. organizational plan: functions, activities, personnel, and resources from the perspective of management

Quasi-experimental research

Quasi experiments overcome ethical, legal, and logistic barriers to randomizing people by forming a comparison group similar to the treatment group. These experiments are far more common in policy research than experimental methods because they allow for adaptation of experimental logic to the realities of conducting studies in the real world of policy making and implementation.

Regression Discontinuity

RD designs exploit the rules of assignment by examining outcomes around a designated cut point: treated on one side, control on the other Examples of programs that have a continuous eligibility index and a cutoff: - poverty score for enrollment in a government health insurance program - test scores for enrollment in a school program - farm size for enrollment in a fertilizer subsidies program

Experimental research

Random assignment of individuals or groups to either receive services or benefits enacted by a policy

Types of research designs in impact evaluation

Randomized Field Experiment (RFE) - Controlling for threats to internal validity: random assignment Quasi-experiment (CS/TS) - Controlling for threats to internal validity: comparison group Non-experiment (MOST COMMON DESIGN) - Controlling for threats to internal validity: statistical controls In QE designs the researcher selects the groups to be compared, one group has program and the other does not. In NEs the researcher selects all or nearly all units.

Things Governments Do

Regulation, service provision, information, bureaucratic reforms, taxes, political reforms, allocate budgets, private rights, contracting, framework economic activity, subsidies and grants

Impact evaluation

Stakeholders seek the knowledge gained from an impact evaluation

Random Field Experiments

Study population...random treatment and control groups...both with follow ups...then compare results RFEs are the "gold standard" of program evaluation Main characteristics: RFE involve the random assignments of units of analysis to treatment alternatives in the real world Different designs in terms of treatments and control; there can be A vs B (status quo), A, B, C and D, More vs less, Now vs later. It is rare to have no treatment. Role of placebo: to rule out "testing effects"

Surveys

Surveys are a tool the evaluator can use in all types of designs: RFEs, QEs, NEs Why or when do we use them? Many circumstances - often to assist with implementation and process evaluations; but they are also used for impact analyses - to supplement administrative data with further information from recipients and implementers - surveys don't have to be based just on national samples, they work as well for local programs and organizations Goals with a good survey design: high response rate, unbiased responses, and responses are informative and useful

Dropout prevention program evaluation components

Surveys to students, mentors and liaisons and reports on contact and different levels/types of contact

Threats to internal validity

The accuracy of the causal claim or the unbiasedness of your estimate of the program's effect TS: time series designs CS: cross sectional designs (designs with multiple units, which can include a comparison) - intervening events: TS - maturation/trends: TS - regression to the mean: TS - selection: CS - contamination: CS - obtrusive testing: TS and CS - attrition: TS and CS - instrumentation: TS and CS - multiple treatments: TS and CS

History

The evolution and evaluation to a degree parallels the evolution of governmental growth anf involvement in the quality of life of citizens. As more programs are created, and as more revenues are obtained from citizens, expectations grow towards improving results. Education and public health are areas with first precursors. Growth of government size and roles increase in 30s. This involves higher needs for information and professional civil service. World War II can be linked to evaluation efforts related to war efforts. Post-World War II boom: federal and foundation investment in evaluation in many areas. Scientific management from industry and defense influences human service agencies. Program evaluation is commonplace by the end of 1950s, grows in the 60s, and becomes a distinct field in the 70s. War on Poverty-Great Society programs in the mid and late 60s gave a critical push to the field of evaluation research, in part due to the scale of programs and to executive initiatives validating the evaluation efforts. The first federal program to require an associated evaluation was for a juvenile delinquency program enacted by Congress in 1962. In earlier years, evaluation more shaped by researcher interests. Now evaluation is shaped more by immediate program stakeholders and to consumers more generally. Evaluation research and activities can be influenced by the changing cultural and political climates. Following evidence-based practicies and outcome-based performance monitoring and evaluation have become common expectations across sectors.

Findings

The final report found that the interlock device produced positive results but that the data did not support definitive conclusions - 1/4 of those sentenced to the interlock never had it installed - The device could often be bypassed or circumvented - Judges: each judge set their own policy, there is no standard - The findings of this study support a positive but non-conclusive assessment of the potential of ignition interlock as a DUI countermeasure - No recommendation on sentencing - Sanctions recommendations for non-installation - More information on the program for judges - Immediate result: contestation and critiques from all-sides. DMV: what about those who did not have the device installed and were omitted? Then: CA legislature reauthorizes and expands program. Report widely circulates at the national level and impact future trajectgory of policy Thereafter: large expansion and adoption of ignition interlocks across states

Evaluating service delivery: what activities were folks involved in?

The focus here is on an accounting of the activities delivered and received. - think of service flow charts - think in terms of units, # of events, time, costs So, need to systematically define, trace, and account for program activities

Assessing program theory

The program theory should be sound, reasonable. In many instances this starts in an informal manner, with common sense judgements. - Assess in relation to social needs - Assess in relation to logic and plausibility - Assess by comparing with research and practice - Assess with preliminary observations Program theory may not have a sound basis

Extracting program theory

The program's theory may often have not been written out (implicit) in detail and requires the evaluator to synthesize (articulate) it Steps: - clarify boundaries of the program - obtain a description of program theory from documents, interviews, observations, literature - identify goals and expected results - identify each component of the program (functions and activities) - understand the linkages between activities and their logic - verify the theory being derived matches stakeholders' understanding of the program and reality

Comparison interrupted time series

To address general problems with single time series designs we want to: - include carefully selected comparison - use regression modeling to address trends, account for intervening events, and regression to the mean issues

Evaluating service utilization: did we reach folks?

Track how and to what extent services are deployed and utilized in relation to targeted recipients Critical for interventions that: - have voluntary participation - participants must learn new procedures, change habits, or take instruction Need to understand coverage and bias - Coverage: the degree of the target's participation in the program - Bias: the degree to which some sub-groups participate more than others To monitor use records, surveys, pre-existing studies or program experiences, or community surveys

Evaluating organizational support: can the program operate successfully?

Track how well all program support functions operate. Program Support Functions: - fund-raising - public relations - staff training - recruiting and retention of key personnel - relationships with affiliated programs

Evaluating program satisfaction: how satisfied are folks with the program?

Track the degree of satisfaction with a program How satisfied: - services - interactions with personnel - outcomes

Secondary data analyses

Using date already collected by programs or regularly collected for other purposes, policy researchers can assess changes in outcomes overtime

Site visits

When trying to understand the inner workings of a program or a policy, there is no substitute for seeing it in action

Cost benefit or cost effectiveness analyses

Whereas cost-benefit analysis focuses on the economic benefit of a program by quantifying the dollar value of all costs and benefits, cost effectiveness analysis comapred the costs of a program with the raw outcomes produced

Can we have a graph that captures overall the rationale and functioning of the program?

Yes, with LMs. Logic models can in a way aggregate all of these components but in a simplified manner, with less detail, by focusing on inputs, activities, outputs, and outcomes.

Post only comparison design (naive)

You compare the mean outcomes between the treated and the untreated groups Example: test scores of pre-K program participants and non-program participants Problems: - can we assume that the two groups are comparable? If no treatment offered, would the two groups have the same mean outcomes? - groups can differ in terms of skills, resources, motivation, etc.. These factors can affect both the chances of being in a program and the outcome. This introduces selection bias. - there also could have been other treatments affecting one of the groups

Pre-post comparison design (difference-in-differences)

You compare the mean outcomes between the treated and the untreated groups, before and after Can we do a QE design with two groups having different starting points? Yes, but careful, they need to be trending in a similar way PROBLEMS WITH THIS DESIGN - groups could still differ in unobserved ways that affect the outcomes. That is, we may still have selection bias Evaluators to a priori limit selection biases in pre-post designs can: a) comparable selections: choose units for the untreated group that can be arguably similar to the treated (e.g., Dropout prevention case) b) matching: choose units for the untreated group through matching by exact covariates or propensity scores (e.g., Ignition interlock case) c) add covariates in a regression model: add more factors to control for furthur potential differences (I will call this Non-experimental)


Set pelajaran terkait

HIST 2010 MindTap Chapter 15 Test: Crucible of Freedom: Civil War, 1861-1865

View Set

Health Assessment Chapter 2 Practice Questions

View Set