Guidelines for Evaluating Programs

The following Indicators of Effectiveness are also included in the position statement as guidelines for evaluating programs for young children (adapted from the NAEYC and NAECS/SDE Position Statement, 2003):

Evaluation is used for continuous improvement. Programs undertake regular evaluation, including self-evaluation, to document the extent to which they are achieving desired results, with the goal of engaging in continuous improvement. Evaluations focus on processes and implementation as well as outcomes. Over time, evidence is gathered that program evaluations do influence specific improvements.
Goals become guides for evaluation. Evaluation designs and measures are guided by goals identified by the program, by families and other stakeholders, and by the developers of a program or curriculum, while also allowing the evaluation to reveal unintended consequences.
Comprehensive goals are used. The program goals used to guide the evaluation are comprehensive, including goals related to families, community, teachers, and other staff, as well as child-oriented goals that address a broad set of developmental and learning outcomes.
Evaluations use valid designs. Programs are evaluated using scientifically valid designs, guided by a “logic model” that describes ways in which the program sees its interventions having both medium- and long-term effects on children and, in some cases, families and communities.
Multiple sources of data are available. An effective evaluation system should include multiple measures, including information about staff qualifications, administrative practices, and classroom quality assessments; data on programs, child demographics, and implementation; and other information that provides a context for interpreting the results of child assessments.
Sampling is used when assessing individual children as part of large-scale program evaluation. When individually administered, norm-referenced tests of children’s progress are used as part of program evaluation and accountability, matrix sampling is used (that is, administered only to a systematic sample of children) so as to diminish the burden of testing on children and to reduce the likelihood that data will be inappropriately used to make judgments about individual children.
Safeguards are in place if standardized tests are used as part of evaluations. When individually administered, norm-referenced tests are used as part of program evaluation, they must be developmentally and culturally appropriate for the particular children in the program, conducted in the language with which children are most comfortable and with other accommodations as appropriate, valid in terms of the curriculum, and technically sound (including reliability and validity). Quality checks on data are conducted regularly, and the system includes multiple data sources collected over time.
Children’s gains over time are emphasized. When child assessments are used as part of program evaluation, the primary focus is on children’s gains or progress as documented in observations, samples of classroom work, and other assessments over the duration of the program. The focus is not just on children’s scores upon exit from the program.
Well-trained individuals conduct evaluations. Program evaluations, at whatever level or scope, are conducted by well-trained individuals who are able to evaluate programs in fair and unbiased ways. Self-assessment processes used as part of comprehensive program evaluation follow a valid model. Assessor training goes beyond single workshops and includes ongoing quality checks. Data are analyzed systematically and can be quantified or aggregated to provide evidence of the extent to which the program is meeting its goals.
Evaluation results are publicly shared. Families, policy makers, and other stakeholders have the right to know the results of program evaluations. Data from program monitoring and evaluation, aggregated appropriately and based on reliable measures, should be made available and accessible to the public.