Intro text + Embed Line

Overview:

How are you to assess an AI tool’s usefulness, ethical development, and most importantly, its impacts on users? In a white paper for the Data Foundation’s Center for Evidence Capacity, Senior Fellow Lauren Damme, Ph.D. shares the development of EvalAssist, an open-source model to support rigorous study design for early career evaluators. Her journey began with the aim of testing and releasing a simple GPT model, but led to the development of a new, human-centered framework for evaluating AI assistance tools. Read on if you want to understand the decisions and trade-offs one should consider when building upon frontier models.

About:

Data Foundation Senior Fellow Lauren Damme, Ph.D. led a project to develop a simple custom GPT to support early career evaluators in developing and designing program evaluations. The goal: to create an open-source generative artificial intelligence (GenAI) augmentation tool and thought partner to help ensure evaluation designs were rigorous and ethical, building upon best practices from the social sciences.

However, the nascent field of GenAI evaluation lacks established paradigms to uncover the human and social impacts of GenAI use. With support from over 45 highly-experienced evaluators representing 15 countries, the Data Foundation’s Data Coalition, and graduate students from George Washington University, the development of EvalAssist not only resulted in a support tool for evaluators, but also generated advancements in the way we approach assessments of GenAI deployments. Leaders and policymakers considering GenAI-driven deployments may be interested in the ethical issues and trade-offs of EvalAssist’s development, while evaluators and researchers may be particularly interested in the indicators and rubric developed to understand human and social impacts of tool use.

Key Resource:

Read more about the model development and evaluation rubric in our white paper, Integrating Social Science Practices into AI Evaluation.

Access EvalAssist:

Use EvalAssist to help you design program evaluations.

Evaluation Expert Advisory Group*:

● Juliette Berg

● Jacqueline Berman

● Mamoun Besaiso

● Hannah Betesh

● Kerry Bruce

● Amanda Cash

● Aubrey Comperatore

● Lauren Decker-Woodrow

● Danuta Dobosz

● Melanie Dubuis

● Kate Dunham

● Clement Dupont

● Meghan Ecker-Lyster

● Gizelle Gopez

● Gusimerjeet Gursimerjeet

● Sumera Idris

● Susan Jenkins

● Natalie Joseph

● Vicky Kaisidou

● Sharon Lacuyer

● Yuna Liang

● Kris Lugo-Graulich

● Baptiste Millet

● Claudia Mir Cervantes

● Nomsey Okosa

● Carlos Javier Rodriguez Cuellar

● Radha Roy

● Sutirtha Sahariah

● Brandie Sasser

● Deena Schwartz

● Aylin Talgar Pietz

● Ignacio Torrano

● Elias Walsh

● Jessica Weitzel

● Brean Witmer

*Additional members requested not to be publicly acknowledged.

George Washington University Student Testers:

● Sarah Cheney

● Brady Patton

● Azia Richardson-Williams

● Devyn Millensifer

● Rimsha Alam

● Kayla Gwaltney

● Olayemi Fadahunsi

● Kevin Michael Days

● Hannah Mayer

GenAI for Better Evaluation: EvalAssist

Intro text + Embed Line