An impact assessment is used to assess whether a program was implemented as intended.

Many types of evaluation exist, consequently evaluation methods need to be customised according to what is being evaluated and the purpose of the evaluation.1,2 It is important to understand the different types of evaluation that can be conducted over a program’s life-cycle and when they should be used. The main types of evaluation are process, impact, outcome and summative evaluation.1

Inhaltsverzeichnis Show

Process evaluation
Impact evaluation
Outcome evaluation
Summative evaluation
1 What is impact evaluation?
2 Why do impact evaluation?
3 When to do impact evaluation?
4 Who to engage in the evaluation process?
5 How to plan and manage an impact evaluation?
6 What methods can be used to do impact evaluation?
7 How can the findings be reported and their use supported?

Before you are able to measure the effectiveness of your project, you need to determine if the project is being run as intended and if it is reaching the intended audience.3 It is futile to try and determine how effective your program is if you are not certain of the objective, structure, programing and audience of the project. This is why process evaluation should be done prior to any other type of evaluation.3

Process evaluation

Process evaluation is used to “measure the activities of the program, program quality and who it is reaching”3 Process evaluation, as outlined by Hawe and colleagues3 will help answer questions about your program such as:

Has the project reached the target group?
Are all project activities reaching all parts of the target group?
Are participants and other key stakeholders satisfied with all aspects of the project?
Are all activities being implemented as intended? If not why?
What if any changes have been made to intended activities?
Are all materials, information and presentations suitable for the target audience?

Impact evaluation

Impact evaluation is used to measure the immediate effect of the program and is aligned with the programs objectives. Impact evaluation measures how well the programs objectives (and sub-objectives) have been achieved.1,3

Impact evaluation will help answer questions such as:

How well has the project achieved its objectives (and sub-objectives)?
How well have the desired short term changes been achieved?

For example, one of the objectives of the My-Peer project is to provide a safe space and learning environment for young people, without fear of judgment, misunderstanding, harassment or abuse. Impact evaluation will assess the attitudes of young people towards the learning environment and how they perceived it. It may also assess changes in participants’ self esteem, confidence and social connectedness.

Impact evaluation measures the program effectiveness immediate after the completion of the program and up to six months after the completion of the program.

Outcome evaluation

Outcome evaluation is concerned with the long term effects of the program and is generally used to measure the program goal. Consequently, outcome evaluation measures how well the program goal has been achieved.1,3

Outcome evaluation will help answer questions such as:

Has the overall program goal been achieved?
What, if any factors outside the program have contributed or hindered the desired change?
What, if any unintended change has occurred as a result of the program?

In peer-based youth programs outcome evaluation may measure changes to: mental and physical wellbeing, education and employment and help-seeking behaviours.

Outcome evaluation measures changes at least six months after the implementation of the program (longer term). Although outcome evaluation measures the main goal of the program, it can also be used to assess program objectives over time. It should be noted that it is not always possible or appropriate to conduct outcome evaluation in peer-based programs.

Summative evaluation

At the completion of the program it may also be valuable to conduct summative evaluation. This considers the entire program cycle and assists in decisions such as:

Do you continue the program?
If so, do you continue it in its entirety?
Is it possible to implement the program in other settings?
How sustainable is the program?
What elements could have helped or hindered the program?
What recommendations have evolved out of the program?3,4

References

Nutbeam & Bauman 2006
Trochim 2006
Hawe, P., Degeling, D., Hall, J. (1990) Evaluating Health Promotion: A Health Worker’s Guide, MacLennan & Petty, Sydney.
South Australian Community Health Research Unit n.d.c

If an impact evaluation fails to systematically undertake causal attribution, there is a greater risk that the evaluation will produce incorrect findings and lead to incorrect decisions. For example, deciding to scale up when the programme is actually ineffective or effective only in certain limited situations, or deciding to exit when a programme could be made to work if limiting factors were addressed.

1 What is impact evaluation?

An impact evaluation provides information about the impacts produced by an intervention.

The intervention might be a small project, a large programme, a collection of activities, or a policy.

Many development agencies use the definition of impacts provided by the Organisation for Economic Co-operation and Development – Development Assistance Committee:

“positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended.” (OECD-DAC 2010).

This definition implies that impact evaluation:

goes beyond describing or measuring impacts that have occurred to seeking to understand the role of the intervention in producing these (causal attribution);
can encompass a broad range of methods for causal attribution; and,
includes examining unintended impacts.

2 Why do impact evaluation?

An impact evaluation can be undertaken to improve or reorient an intervention (i.e., for formative purposes) or to inform decisions about whether to continue, discontinue, replicate or scale up an intervention (i.e., for summative purposes).

While many formative evaluations focus on processes, impact evaluations can also be used formatively if an intervention is ongoing. For example, the findings of an impact evaluation can be used to improve implementation of a programme for the next intake of participants by identifying critical elements to monitor and tightly manage.

Most often, impact evaluation is used for summative purposes. Ideally, a summative impact evaluation does not only produce findings about ‘what works’ but also provides information about what is needed to make the intervention work for different groups in different settings.

3 When to do impact evaluation?

An impact evaluation should only be undertaken when its intended use can be clearly identified and when it is likely to be able to produce useful findings, taking into account the availability of resources and the timing of decisions about the intervention under investigation. An evaluability assessment might need to be done first to assess these aspects.

Prioritizing interventions for impact evaluation should consider: the relevance of the evaluation to the organisational or development strategy; its potential usefulness; the commitment from senior managers or policy makers to using its findings; and/or its potential use for advocacy or accountability requirements.

It is also important to consider the timing of an impact evaluation. When conducted belatedly, the findings come too late to inform decisions. When done too early, it will provide an inaccurate picture of the impacts (i.e., impacts will be understated when they had insufficient time to develop or overstated when they decline over time).

Issue	Impact evaluation might be appropriate when...	Impact evaluation might NOT be appropriate when…
Intended uses and timing	There is scope to use the findings to inform decisions about future interventions.	There are no clear intended uses or intended users – for example, decisions have already been made on the basis of existing credible evidence, or need to be made before it will be possible to undertake a credible impact evaluation.
Focus	There is a need to understand the impacts that have been produced.	The priority at this stage is to understand and improve the quality of implementation.
Resources	There are adequate resources to undertake a sufficiently comprehensive and rigorous impact evaluation, including the availability of existing, good quality data and additional time and money to collect more.	Existing data are inadequate and there are insufficient resources to fill gaps.
Relevance	It is clearly linked to the strategies and priorities of an organisation, partnership and/or government.	It is peripheral to the strategies and priorities of an organisation, partnership and/or government.

4 Who to engage in the evaluation process?

Regardless of the type of evaluation, it is important to think through who should be involved, why and how in each step of the evaluation process to develop an appropriate and context-specific participatory approach. Participation can occur at any stage of the impact evaluation process: in deciding to do an evaluation, in its design, in data collection, in analysis, in reporting and, also, in managing it.

Being clear about the purpose of participatory approaches in an impact evaluation is an essential first step towards managing expectations and guiding implementation. Is the purpose to ensure that the voices of those whose lives should have been improved by the programme or policy are central to the findings? Is it to ensure a relevant evaluation focus? Is it to hear people’s own versions of change rather than obtain an external evaluator’s set of indicators? Is it to build ownership of a donor-funded programme? These, and other considerations, would lead to different forms of participation by different combinations of stakeholders in the impact evaluation.

The underlying rationale for choosing a participatory approach to impact evaluation can be either pragmatic or ethical, or a combination of the two. Pragmatic because better evaluations are achieved (i.e., better data, better understanding of the data, more appropriate recommendations, better uptake of findings); ethical because it is the right thing to do (i.e., people have a right to be involved in informing decisions that will directly or indirectly affect them, as stipulated by the UN human rights-based approach to programming).

Participatory approaches can be used in any impact evaluation design. In other words, they are not exclusive to specific evaluation methods or restricted to quantitative or qualitative data collection and analysis.

The starting point for any impact evaluation intending to use participatory approaches lies in clarifying what value this will add to the evaluation itself as well as to the people who would be closely involved (but also including potential risks of their participation). Three questions need to be answered in each situation:

(1) What purpose will stakeholder participation serve in this impact evaluation?;

(2) Whose participation matters, when and why?; and,

(3) When is participation feasible?

Only after addressing these, can the issue of how to make impact evaluation more participatory be addressed.

For more information, see:

5 How to plan and manage an impact evaluation?

Like any other evaluation, an impact evaluation should be planned formally and managed as a discrete project, with decision-making processes and management arrangements clearly described from the beginning of the process.

Planning and managing include:

Describing what needs to be evaluated and developing the evaluation brief
Identifying and mobilizing resources
Deciding who will conduct the evaluation and engaging the evaluator(s)
Deciding and managing the process for developing the evaluation methodology
Managing development of the evaluation work plan
Managing implementation of the work plan including development of reports
Disseminating the report(s) and supporting use

Determining causal attribution is a requirement for calling an evaluation an impact evaluation. The design options (whether experimental, quasi-experimental, or non-experimental) all need significant investment in preparation and early data collection, and cannot be done if an impact evaluation is limited to a short exercise conducted towards the end of intervention implementation. Hence, it is particularly important that impact evaluation is addressed as part of an integrated monitoring, evaluation and research plan and system that generates and makes available a range of evidence to inform decisions. This will also ensure that data from other M&E activities such as performance monitoring and process evaluation can be used, as needed.

For more information, see:

6 What methods can be used to do impact evaluation?

Framing the boundaries of the impact evaluation

The evaluation purpose refers to the rationale for conducting an impact evaluation. Evaluations that are being undertaken to support learning should be clear about who is intended to learn from it, how they will be engaged in the evaluation process to ensure it is seen as relevant and credible, and whether there are specific decision points around where this learning is expected to be applied. Evaluations that are being undertaken to support accountability should be clear about who is being held accountable, to whom and for what.

Evaluation relies on a combination of facts and values (i.e., principles, attributes or qualities held to be intrinsically good, desirable, important and of general worth such as ‘being fair to all’) to judge the merit of an intervention (Stufflebeam 2001). Evaluative criteria specify the values that will be used in an evaluation and, as such, help to set boundaries.

Many impact evaluations use the standard OECD-DAC criteria (OECD-DAC accessed 2015):

Relevance: The extent to which the objectives of an intervention are consistent with recipients’ requirements, country needs, global priorities and partners’ policies.
Effectiveness: The extent to which the intervention’s objectives were achieved, or are expected to be achieved, taking into account their relative importance.
Efficiency: A measure of how economically resources/inputs (funds, expertise, time, equipment, etc.) are converted into results.
Impact: Positive and negative primary and secondary long-term effects produced by the intervention, whether directly or indirectly, intended or unintended.
Sustainability: The continuation of benefits from the intervention after major development assistance has ceased. Interventions must be both environmentally and financially sustainable. Where the emphasis is not on external assistance, sustainability can be defined as the ability of key stakeholders to sustain intervention benefits – after the cessation of donor funding – with efforts that use locally available resources.

The OECD-DAC criteria reflect the core principles for evaluating development assistance (OECD-DAC 1991) and have been adopted by most development agencies as standards of good practice in evaluation. Other, commonly used evaluative criteria are about equity, gender equality, and human rights. And, some are used for particular types of development interventions such humanitarian assistance such as: coverage, coordination, protection, coherence. In other words, not all of these evaluative criteria are used in every evaluation, depending on the type of intervention and/or the type of evaluation (e.g., the criterion of impact is irrelevant to a process evaluation).

Evaluative criteria should be thought of as ‘concepts’ that must be addressed in the evaluation. They are insufficiently defined to be applied systematically and in a transparent manner to make evaluative judgements about the intervention. Under each of the ‘generic’ criteria, more specific criteria such as benchmarks and/or standards* – appropriate to the type and context of the intervention – should be defined and agreed with key stakeholders.

The evaluative criteria should be clearly reflected in the evaluation questions the evaluation is intended to address.

*A benchmark or index is a set of related indicators that provides for meaningful, accurate and systematic comparisons regarding performance; a standard or rubric is a set of related benchmarks/indices or indicators that provides socially meaningful information regarding performance.

Defining the key evaluation questions (KEQs) the impact evaluation should address

Impact evaluations should be focused around answering a small number of high-level key evaluation questions (KEQs) that will be answered through a combination of evidence. These questions should be clearly linked to the evaluative criteria. For example:

KEQ1.	What was the quality of the intervention design/content? [assessing relevance, equity, gender equality, human rights]
KEQ2.	How well was the intervention implemented and adapted as needed? [assessing effectiveness, efficiency]
KEQ3.	Did the intervention produce the intended results in the short, medium and long term? If so, for whom, to what extent and in what circumstances? [assessing effectiveness, impact, equity, gender equality]
KEQ4.	What unintended results – positive and negative – did the intervention produce? How did these occur? [assessing effectiveness, impact, equity, gender equality, human rights]
KEQ5.	What were the barriers and enablers that made the difference between successful and disappointing intervention implementation and results? [assessing relevance, equity, gender equality, human rights]
KEQ6.	How valuable were the results to service providers, clients, the community and/or organizations involved? [assessing relevance, equity, gender equality, human rights]
KEQ7.	To what extent did the intervention represent the best possible use of available resources to achieve results of the greatest possible value to participants and the community? [assessing efficiency]
KEQ8.	Are any positive results likely to be sustained? In what circumstances? [assessing sustainability, equity, gender equality, human rights]

A range of more detailed (mid-level and lower-level) evaluation questions should then be articulated to address each evaluative criterion in detail. All evaluation questions should be linked explicitly to the evaluative criteria to ensure that the criteria are covered in full.

The KEQs also need to reflect the intended uses of the impact evaluation. For example, if an evaluation is intended to inform the scaling up of a pilot programme, then it is not enough to ask ‘Did it work?’ or ‘What were the impacts?’. A good understanding is needed of how these impacts were achieved in terms of activities and supportive contextual factors to replicate the achievements of a successful pilot. Equity concerns require that impact evaluations go beyond simple average impact to identify for whom and in what ways the programmes have been successful.

Within the KEQs, it is also useful to identify the different types of questions involved – descriptive, causal and evaluative.

Descriptive questions ask about how things are and what has happened, including describing the initial situation and how it has changed, the activities of the intervention and other related programmes or policies, the context in terms of participant characteristics, and the implementation environment.
Causal questions ask whether or not, and to what extent, observed changes are due to the intervention being evaluated rather than to other factors, including other programmes and/or policies.
Evaluative questions ask about the overall conclusion as to whether a programme or policy can be considered a success, an improvement or the best option.

Impact evaluations must have credible answers to all of these questions.

For more information, see:

Defining impacts

Impacts are usually understood to occur later than, and as a result of, intermediate outcomes. For example, achieving the intermediate outcomes of improved access to land and increased levels of participation in community decision-making might occur before, and contribute to, the intended final impact of improved health and well-being for women. The distinction between outcomes and impacts can be relative, and depends on the stated objectives of an intervention. It should also be noted that some impacts may be emergent, and thus, cannot be predicted.

For more information, see:

Defining success to make evaluative judgements

Evaluation, by definition, answers evaluative questions, that is, questions about quality and value. This is what makes evaluation so much more useful and relevant than the mere measurement of indicators or summaries of observations and stories.

In any impact evaluation, it is important to define first what is meant by ‘success’ (quality, value). One way of doing so is to use a specific rubric that defines different levels of performance (or standards) for each evaluative criterion, deciding what evidence will be gathered and how it will be synthesized to reach defensible conclusions about the worth of the intervention.

At the very least, it should be clear what trade-offs would be appropriate in balancing multiple impacts or distributional effects. Since development interventions often have multiple impacts, which are distributed unevenly, this is an essential element of an impact evaluation. For example, should an economic development programme be considered a success if it produces increases in household income but also produces hazardous environment impacts? Should it be considered a success if the average household income increases but the income of the poorest households is reduced?

To answer evaluative questions, what is meant by ‘quality’ and ‘value’ must first be defined and then relevant evidence gathered. Quality refers to how good something is; value refers to how good it is in terms of the specific situation, in particular taking into account the resources used to produce it and the needs it was supposed to address. Evaluative reasoning is required to synthesize these elements to formulate defensible (i.e., well-reasoned and well evidenced) answers to the evaluative questions.

Evaluative reasoning is a requirement of all evaluations, irrespective of the methods or evaluation approach used.

An evaluation should have a limited set of high-level questions which are about performance overall. Each of these KEQs should be further unpacked by asking more detailed questions about performance on specific dimensions of merit and sometimes even lower-level questions. Evaluative reasoning is the process of synthesizing the answers to lower- and mid-level questions into defensible judgements that directly answer the high-level questions.

For more information, see:

Using a theory of change

Evaluations produce stronger and more useful findings if they not only investigate the links between activities and impacts but also investigate links along the causal chain between activities, outputs, intermediate outcomes and impacts. A ‘theory of change’ that explains how activities are understood to produce a series of results that contribute to achieving the ultimate intended impacts, is helpful in guiding causal attribution in an impact evaluation.

A theory of change should be used in some form in every impact evaluation. It can be used with any research design that aims to infer causality, it can use a range of qualitative and quantitative data, and provide support for triangulating the data arising from a mixed methods impact evaluation.

When planning an impact evaluation and developing the terms of reference, any existing theory of change for the programme or policy should be reviewed for appropriateness, comprehensiveness and accuracy, and revised as necessary. It should continue to be revised over the course of the evaluation should either the intervention itself or the understanding of how it works – or is intended to work – change.

Some interventions cannot be fully planned in advance, however – for example, programmes in settings where implementation has to respond to emerging barriers and opportunities such as to support the development of legislation in a volatile political environment. In such cases, different strategies will be needed to develop and use a theory of change for impact evaluation (Funnell and Rogers 2012). For some interventions, it may be possible to document the emerging theory of change as different strategies are trialled and adapted or replaced. In other cases, there may be a high-level theory of how change will come about (e.g., through the provision of incentives) and also an emerging theory about what has to be done in a particular setting to bring this about. Elsewhere, its fundamental basis may revolve around adaptive learning, in which case the theory of change should focus on articulating how the various actors gather and use information together to make ongoing improvements and adaptations.

A theory of change can support an impact evaluation in several ways. It can identify:

specific evaluation questions, especially in relation to those elements of the theory of change for which there is no substantive evidence yet
relevant variables that should be included in data collection
intermediate outcomes that can be used as markers of success in situations where the impacts of interest will not occur during the time frame of the evaluation
aspects of implementation that should be examined
potentially relevant contextual factors that should be addressed in data collection and in analysis, to look for patterns.

The evaluation may confirm the theory of change or it may suggest refinements based on the analysis of evidence. An impact evaluation can check for success along the causal chain and, if necessary, examine alternative causal paths. For example, failure to achieve intermediate results might indicate implementation failure; failure to achieve the final intended impacts might be due to theory failure rather than implementation failure. This has important implications for the recommendations that come out of an evaluation. In cases of implementation failure, it is reasonable to recommend actions to improve the quality of implementation; in cases of theory failure, it is necessary to rethink the whole strategy for achieving impacts.

For more information, see:

Deciding the evaluation methodology

The evaluation methodology sets out how the key evaluation questions (KEQs) will be answered. It specifies designs for causal attribution, including whether and how comparison groups will be constructed, and methods for data collection and analysis.

➟ Strategies and designs for determining causal attribution

Causal attribution is defined by OECD-DAC as:

“Ascription of a causal link between observed (or expected to be observed) changes and a specific intervention.” (OECD_DAC 2010)

This definition does not require that changes are produced solely or wholly by the programme or policy under investigation (UNEG 2013). In other words, it takes into consideration that other causes may also have been involved, for example, other programmes/policies in the area of interest or certain contextual factors (often referred to as ‘external factors’).

There are three broad strategies for causal attribution in impact evaluations:

estimating the counterfactual (i.e., what would have happened in the absence of the intervention, compared to the observed situation)
checking the consistency of evidence for the causal relationships made explicit in the theory of change
ruling out alternative explanations, through a logical, evidence-based process.

Using a combination of these strategies can usually help to increase the strength of the conclusions that are drawn.

There are three design options that address causal attribution:

Experimental designs – which construct a control group through random assignment.
Quasi-experimental designs – which construct a comparison group through matching, regression discontinuity, propensity scores or another means.
Non-experimental designs – which look systematically at whether the evidence is consistent with what would be expected if the intervention was producing the impacts, and also whether other factors could provide an alternative explanation.

Some individuals and organisations use a narrower definition of impact evaluation, and only include evaluations containing a counterfactual of some kind. These different definitions are important when deciding what methods or research designs will be considered credible by the intended user of the evaluation or by partners or funders.

For more information, see:

➟ Data collection, management and analysis approach

Well-chosen and well-implemented methods for data collection and analysis are essential for all types of evaluations. Impact evaluations need to go beyond assessing the size of the effects (i.e., the average impact) to identify for whom and in what ways a programme or policy has been successful. What constitutes ‘success’ and how the data will be analysed and synthesized to answer the specific key evaluation questions (KEQs) must be considered up front as data collection should be geared towards the mix of evidence needed to make appropriate judgements about the programme or policy. In other words, the analytical framework – the methodology for analysing the ‘meaning’ of the data by looking for patterns in a systematic and transparent manner – should be specified during the evaluation planning stage. The framework includes how data analysis will address assumptions made in the programme theory of change about how the programme was thought to produce the intended results. In a true mixed methods evaluation, this includes using appropriate numerical and textual analysis methods and triangulating multiple data sources and perspectives in order to maximize the credibility of the evaluation findings.

Start the data collection planning by reviewing to what extent existing data can be used. After reviewing currently available information, it is helpful to create an evaluation matrix (see below) showing which data collection and analysis methods will be used to answer each KEQ and then identify and prioritize data gaps that need to be addressed by collecting new data. This will help to confirm that the planned data collection (and collation of existing data) will cover all of the KEQs, determine if there is sufficient triangulation between different data sources and help with the design of data collection tools (such as questionnaires, interview questions, data extraction tools for document review and observation tools) to ensure that they gather the necessary information.

Evaluation matrix: Matching data collection to key evaluation questions

Examples of key evaluation questions (KEQs)	Programme participant survey	Key informant interviews	Project records	Observation of programme implementation
KEQ 1 What was the quality of implementation?		✔	✔	✔
KEQ 2 To what extent were the programme objectives met?	✔	✔	✔
KEQ 3 What other impacts did the programme have?	✔	✔
KEQ 4 How could the programme be improved?		✔		✔

There are many different methods for collecting data. Although many impact evaluations use a variety of methods, what distinguishes a ’mixed methods evaluation’ is the systematic integration of quantitative and qualitative methodologies and methods at all stages of an evaluation (Bamberger 2012). A key reason for mixing methods is that it helps to overcome the weaknesses inherent in each method when used alone. It also increases the credibility of evaluation findings when information from different data sources converges (i.e., they are consistent about the direction of the findings) and can deepen the understanding of the programme/policy, its effects and context (Bamberger 2012).

Good data management includes developing effective processes for: consistently collecting and recording data, storing data securely, cleaning data, transferring data (e.g., between different types of software used for analysis), effectively presenting data and making data accessible for verification and use by others.

The particular analytic framework and the choice of specific data analysis methods will depend on the purpose of the impact evaluation and the type of KEQs that are intrinsically linked to this.

For answering descriptive KEQs, a range of analysis options is available, which can largely be grouped into two key categories: options for quantitative data (numbers) and options for qualitative data (e.g., text).

For answering causal KEQs, there are essentially three broad approaches to causal attribution analysis: (1) counterfactual approaches; (2) consistency of evidence with causal relationship; and (3) ruling out alternatives (see above). Ideally, a combination of these approaches is used to establish causality.

For answering evaluative KEQs, specific evaluative rubrics linked to the evaluative criteria employed (such as the OECD-DAC criteria) should be applied in order to synthesize the evidence and make judgements about the worth of the intervention (see above).

For more information, see:

7 How can the findings be reported and their use supported?

The evaluation report should be structured in a manner that reflects the purpose and KEQs of the evaluation.

In the first instance, evidence to answer the detailed questions linked to the OECD-DAC criteria of relevance, effectiveness, efficiency, impact and sustainability, and considerations of equity, gender equality and human rights should be presented succinctly but with sufficient detail to substantiate the conclusions and recommendations.

The specific evaluative rubrics should be used to ‘interpret’ the evidence and determine which considerations are critically important or urgent. Evidence on multiple dimensions should subsequently be synthesized to generate answers to the high-level evaluative questions.

The structure of an evaluation report can do a great deal to encourage the succinct reporting of direct answers to evaluative questions, backed up by enough detail about the evaluative reasoning and methodology to allow the reader to follow the logic and clearly see the evidence base.

The following recommendations will help to set clear expectations for evaluation reports that are strong on evaluative reasoning:

The executive summary must contain direct and explicitly evaluative answers to the KEQs used to guide the whole evaluation.
Explicitly evaluative language must be used when presenting findings (rather than value-neutral language that merely describes findings). Examples should be provided.
Use of clear and simple data visualization to present easy-to-understand ‘snapshots’ of how the intervention has performed on the various dimensions of merit.
Structuring of the findings section using KEQs as subheadings (rather than types and sources of evidence, as is frequently done).
There must be clarity and transparency about the evaluative reasoning used, with the explanations clearly understandable to both non-evaluators and readers without deep content expertise in the subject matter. These explanations should be broad and brief in the main body of the report, with more detail available in annexes.
If evaluative rubrics are relatively small in size, these should be included in the main body of the report. If they are large, a brief summary of at least one or two should be included in the main body of the report, with all rubrics included in full in an annex.

For more information, see:

Resources

Overviews / introductions to impact evaluation

UNICEF Impact Evaluation Methodological Briefs and Videos:

Overview briefs (1,6,10) are available in English, French and Spanish and supported by whiteboard animation videos in three languages; Brief 7 (RCTs) also includes a video.

UNICEF-BetterEvaluation Impact Evaluation Webinar Series

Throughout 2015, BetterEvaluation partnered with the UNICEF Office of Research – Innocenti to develop eight impact evaluation webinars for UNICEF staff. The objective was to provide an interactive capacity-building experience, customized to focus on UNICEF’s work and the unique circumstances of conducting impact evaluations of programs and policies in international development. The webinars were based on the Impact Evaluation Series – a user-friendly package of 13 methodological briefs and four animated videos – and presented by the briefs' authors. Each page provides links not only to the eight webinars, but also to the practical questions and their answers which followed each webinar presentation.

Discussion papers

Guides

InterAction Impact Evaluation Guidance Notes and Webinar Series:

Methods Lab Publications

The Methods Lab was an action-learning collaboration between the Overseas Development Institute (ODI), BetterEvaluation (BE) and the Australian Department of Foreign Affairs and Trade (DFAT) conducted during 2012-2015. The Methods Lab sought to develop, test, and institutionalise flexible approaches to impact evaluations. It focused on interventions which are harder to evaluate because of their diversity and complexity or where traditional impact evaluation approaches may not be feasible or appropriate, with the broader aim of identifying lessons with wider application potential. See more information here. The Methods Lab produced several guidance documents including:

Realist impact evaluation: an introduction - This guide explains when a realist impact evaluation may be most appropriate or feasible for evaluating a particular programme or policy, and outlines how to design and conduct an impact evaluation based on a realist approach. Read more.

Addressing gender in impact evaluation - This paper is a resource for practitioners and evaluators who want to include a genuine focus on gender impact when commissioning or conducting evaluations. Read more.

Evaluability assessment for impact evaluation - This guide provides an overview of the utility of and specific guidance and a tool for implementing an evaluability assessment before an impact evaluation is undertaken. Read more.

The content for this page was compiled by: Greet Peersman

The content is based on ‘UNICEF Methodological Briefs for Impact Evaluation’, a collaborative project between the UNICEF Office of Research – Innocenti, BetterEvaluation, RMIT University and the International Initiative for Impact Evaluation (3ie).The briefs were written by (in alphabetical order): E. Jane Davidson, Thomas de Hoop, Delwyn Goodrick, Irene Guijt, Bronwen McDonald, Greet Peersman, Patricia Rogers, Shagun Sabarwal, Howard White.

References

Bamberger M (2012). Introduction to Mixed Methods in Impact Evaluation. Guidance Note No. 3. Washington DC: InterAction. See: https://www.interaction.org/blog/impact-evaluation-guidance-note-and-webinar-series/

Funnell S and Rogers P (2012). Purposeful Program Theory: Effective Use of Logic Models and Theories of Change. San Francisco: Jossey-Bass/Wiley.

OECD-DAC (1991). Principles for Evaluation of Development Assistance. Paris: Organisation for Economic Co-operation and Development – Development Assistance Committee (OECD-DAC). See: http://www.oecd.org/dac/evaluation/50584880.pdf

OEDC-DAC (2010). Glossary of Key Terms in Evaluation and Results Based Management. Paris: Organisation for Economic Co-operation and Development – Development Assistance Committee (OEDC-DAC). See: http://www.oecd.org/development/peer-reviews/2754804.pdf

OECD-DAC (accessed 2015). Evaluation of development programmes. DAC Criteria for Evaluating Development Assistance. Organisation for Economic Co-operation and Development – Development Assistance Committee (OECD-DAC). See: http://www.oecd.org/dac/evaluation/daccriteriaforevaluatingdevelopmentassistance.htm

Stufflebeam D (2001). Evaluation values and criteria checklist. Kalamazoo: Western Michigan University Checklist Project. See: https://www.dmeforpeace.org/resource/evaluation-values-and-criteria-checklist/

UNEG (2013). Impact Evaluation in UN Agency Evaluation Systems: Guidance on Selection, Planning and Management. Guidance Document. New York: United Nations Evaluation Group (UNEG) . See: http://www.uneval.org/papersandpubs/documentdetail.jsp?doc_id=1434

Page 2

Approaches (on this site) refer to an integrated package of options (methods or processes). For example, 'Randomized Controlled Trials' (RCTs) use a combination of the options random sampling, control group and standardised indicators and measures.

A strengths-based approach designed to support ongoing learning and adaptation by identifying and investigating outlier examples of good practice and ways of increasing their frequency.