AEA Conference 2024

AEA Conference 2024 TIG Sessions

TIG Annual Meeting

  • Day/Time: Tuesday October 8, 2024 2:00 PM 3:00 PM ET

Agenda

  • Sneak previews of TIG presentations
  • TIG Year in Review
  • Learn about volunteer opportunities

TIG Lunch Social

  • Day/Time: Thursday 1:35 PM 2:20 PM PT
  • Purpose: Networking and socializing with TIG and potential new members (Voodoo Donuts!)
  • Room: C120-122

TIG Fair & Reception

  • Day/Time: Thursday 1:35 PM 2:20 PM PT
  • Purpose: Meet TIG leadership, learn more about getting involved
  • Room: Exhibit Hall A1/B

AI-Enabled Evaluation Methods (PD Workshop)

  • Day/Time: Tuesday 9:00 AM – 4:00 PM PT
  • Presenters: Zach Tilton, Linda Raftree
  • Room: B115-116

Artificial intelligence has and will continue to augment the landscape of knowledge work—including program evaluation. This workshop equips participants with entry-level knowledge and practical skills to conduct evaluations in the age of AI and practice AI-enabled evaluation. As AI technology advances, evaluators must develop fundamental AI literacy and expand their evaluation toolbox to remain relevant and competitive in the evaluation marketplace. Further, AI has the potential to translate to efficiency and effectiveness gains in evaluation processes and products—if integrated responsibly and thoughtfully. This workshop will provide participants with basic premises and principles for a baseline level of AI-enable evaluation capacity. This workshop is for evaluation practitioners, managers, commissioners, and other MERL practitioners who have been or would like to integrate various AI tools, techniques, and tips into their evaluation practice.

Ghosts in the evaluation machine: ethics, data protection, meta-evaluation, and evaluation quality in the age of artificial intelligence

  • Day/Time: Wednesday 4:15 PM – 5:15 PM PT
  • Presenter(s): Alex Robinson, Michael Osei, Zach Tilton, Shaddrock Roberts, Michael A. Harnar (chair)
  • Room: E144

The integration of generative artificial intelligence (GenAI) technologies, exemplified by OpenAI's ChatGPT, into evaluation practice presents both groundbreaking opportunities and significant ethical challenges. This evolution in machine learning and AI is changing transdisciplinary evaluation, pushing practitioners, researchers, and organizations to reassess the frameworks guiding quality evaluations in the face of such disruptive technologies. The rapid advancement and application of these tools in Monitoring, Evaluation, Research, and Learning (MERL) Tech practices outpace existing guidelines on their responsible use, underscoring a critical need for updated meta-evaluative standards that address the ethical dimensions of AI-enabled evaluations. This session delves into the juxtaposition of AI safety and AI ethics research camps, emphasizing the latter's focus on the societal and ecological risks posed by current AI technologies, including their potential to perpetuate bias and inequality. It probes into how the data pools underpinning AI tools, reflecting a multitude of human voices and values, affect the integrity of evaluation processes and outcomes. By examining the influence of algorithmic bias and value representation on evaluative quality, this session aims to uncover the voices and values amplified or sidelined by AI in evaluation. Featuring insights from evaluators, data scientists, AI ethicists, and privacy professionals, this multi-paper session explores the theoretical, empirical, and practical aspects of GenAI-enabled evaluation practice. It strives to address pressing questions surrounding the quality of AI-enabled evaluations and offers practical recommendations for evaluators looking to harness technology for enhancing evaluation quality, all while maintaining a steadfast commitment to ethical principles and inclusivity in evaluation practice.

How engaging an AI “stakeholder at the table” can help expand and amplify collective knowledge

  • Day/Time: Wednesday 4:15 PM – 5:15 PM PT
  • Presenter(s): Jewlya Lynn
  • Room: Portland Ballroom 252

Ready or not, artificial intelligence (AI) is becoming part of the practice of evaluation and systems change. For many AI is being used primarily as an efficiency tool; yet it can be so much more. This session will explore how to engage AI as a thought partner at the table with other voices when evaluations are seeking to understand how, why and under what conditions change is happening in complex, dynamic systems. We will build on the participatory, inclusive approaches highlighted by the Causal Pathways Initiative and explore how AI is a tool for collectively strengthening contextual and causal knowledge. The demonstration will include examples from projects where ChatGPT and Petal AI was “at the table” with systems conveners, evaluators, and other stakeholders. Each case example focuses on advancing equity, ranging from addressing slavery and forced labor to advancing equitable and liberatory education. The presenter and audience will engage the AI platforms in a discussion in real time. Participants can join on their own laptops (and are encourged to have their ChatGPT and Petal free accounts set up in advance) or through the presenter's projected screen. Participants will leave the session having explored how different AI platforms respond in real time, learn about how to approach the prompts during the conversation, explore how to identify and manage bias, and ultimately consider how AI can be a stakeholder alongside other voices in their evaluations.

Localizing Large Language Models for Inclusive International Development (Presidential Session)

  • Day/Time: Thursday 10:15 AM – 11:15 AM PT
  • Presenters: Lindsey Moore
  • Room: Portland Ballroom 251

Our proposal introduces a transformative approach in the field of evaluation by leveraging a localized Large Language Model (LLM) to analyze over 400,000 USAID project evaluations. This method surpasses the capabilities of traditional LLMs, which are often built upon a predominantly Western understanding of development concepts and success metrics. By customizing the LLM to incorporate local perspectives and definitions of success, our model offers a nuanced analysis of the same dataset, revealing how local contexts can impact the recommendations made by LLMs drawing from the same dataset of evaluations.

The core innovation of our localized LLM is its refined ability to adjust model parameters to reflect local concepts and values, a critical advancement over traditional LLMs used in evaluation practices. These conventional models, while powerful, are not inherently designed to account for the vast diversity of cultural and contextual nuances that define success in various settings. Our approach, however, acknowledges and integrates these local perspectives directly into the evaluation analysis process. This adaptation allows the LLM to re-examine the extensive database of USAID evaluations through a lens that is more aligned with the local realities and expectations of the communities involved in or affected by development projects.

This advancement is crucial for the field of evaluation. It demonstrates that when LLMs are adapted to mirror local definitions of interventions and success, the interpretation of data can shift dramatically, offering insights that are more relevant and actionable for decision-makers. For instance, a women's empowerment intervention deemed successful using USAID's custom LLM was found to be more controversial when analyzed through the localized model. Such discrepancies highlight the importance of contextual sensitivity in the architecture of LLMS and the potential for thoughtful and ethical applications of AI to bridge this gap.

The value of this proposal to the evaluation community is twofold. Firstly, it showcases the potential of AI, specifically LLMs, to enrich the interpretation of evaluations at scale, providing a tool capable of handling complex datasets with a level of cultural and contextual understanding previously unattainable. Secondly, it challenges and expands the existing paradigms of evaluation theory and practice by illustrating the critical role of local perspectives in interpreting evaluation data. By adapting LLMs to reflect diverse local contexts, our proposal not only enhances the accuracy of evaluation findings but also ensures that these findings are meaningful and relevant to the communities they aim to serve.

Our work contributes to the field of evaluation by highlighting the necessity for and benefits of incorporating localized perspectives in the analysis of development interventions. This approach aligns with quality standards in evaluation theory and practice, emphasizing the importance of relevance, accuracy, and utility in evaluation outcomes. By presenting our findings, we aim to inspire further exploration into the use of localized LLMs in evaluation, advocating for a more inclusive, context-aware approach to assessing the impact of international development projects.

We have knowledge gaps on GenAI for Social and Behavior Change (SBC) programming: Let’s develop a research and evaluation agenda together! (Co-sponsored with RoE TIG)

  • Day/Time: Thursday 11:30 AM – 12:30 PM PT
  • Presenters: Linda Raftree, Nicola Harford, Anastasia Mirzoyants, Stephanie Coker
  • Room: E141-143

An emerging suite of Artificial Intelligence (AI) tools and approaches, including Generative AI (GenAI), Natural Language Processing (NLP), and Large Language Models (LLMs), has the potential to transform how organizations encourage social and behavior change (SBC) in critical areas such as sexual and reproductive health, vaccine uptake, and sharing of mis-and disinformation. In addition to changing how we communicate; Emerging AI is being used in efforts to improve monitoring and evaluation of SBC programming. For example, Gen AI chatbots can help with collecting data to monitor how well digital services are doing; NLP can be used to improve online audience segmentation and design and deliver messages for specific audiences; different kinds of AI can be integrated into qualitative data analysis to enable greater speed and efficiency and to augment the amount of data that can be coded, analyzed, and summarized to aid with evaluation. There is much to learn across the sector, however, regarding these novel approaches. We are working to develop a shared research agenda for the sector, including research on how evaluators can use these tools in their work; how emerging AI tools compare to one another and to traditional approaches; how AI tools can amplify and augment the work done by professional evaluators (while recognizing the value trained evaluators bring to the table); and what ethical, legal and quality control considerations need to be in place in order to confidently use these tools in ways that uphold human rights and other standards. Since 2021, iMedia and the MERL Tech Initiative have been convening and learning on the theme of evaluating digital SBC with Gates Foundation Partners and the wider MERL Tech community. Our goal is to bring diverse perspectives into the research agenda building process, including geographical, thematic, sectoral, age, and skill/experience level. So far we have facilitated sessions with the Natural Language Processing Community of Practice (NLP-CoP)’s SBC Working Group, the ICT4D Conference in Accra, the African Evaluation Association Conference in Kigali, the Global Digital Development Forum, the SBCC Summit in Morocco, the Global Digital Health Forum, and the European Evaluation Society Conference. At this session we will share what we have learned so far and involve participants in reviewing the emerging research agenda, identifying additional critical knowledge gaps; and discussing ways to collaboratively address them. Participants will leave the session with an updated understanding of the Emerging AI landscape for evaluation of SBCC and the critical questions that remain to be answered. Post-conference, we welcome participants to join the NLP-COP's SBC Working Group to continue working on these emerging issues, share case studies, and collaborate together to implement this co-created research agenda going forward. This session address aspects of Evaluation Foundations and Methodology, Program Development and Design, and International Evaluation, Diversity, and Specific Populations.

Harnessing Natural Language Processing to decode community perspective

  • Day/Time: Thursday 5:00 PM – 6:00 PM PT
  • Presenters: Nael Jean-Baptiste, Meghan Pollak
  • Room: Portland Ballroom 252

In the face of the significant challenges posed by conflict-affected areas, the Save the Children activity in northern Mali, named Albarka, is pioneering efforts to enhance food and nutrition security while increasing community resilience. Funded by USAID’s Bureau for Humanitarian Assistance, this five-year Resilience Food Security Activity (RFSA) aims to fortify local systems and encourage community involvement. At its heart, Albarka seeks to improve feeding practices among the most vulnerable groups, including infants, women, and adolescents, to soften the blow of food and nutrition security crises. Through the conduct of  focus group discussions, the program has sought community insights on promoting beneficial behaviors like dietary diversity and sanitation practices. However, the abundance of qualitative data collected presented a unique challenge: ensuring the integrity of data analysis amidst potential biases stemming from the analysts' subjective perspectives. In response, Albarka has innovatively combined traditional paper-based data collection methods with advanced Natural Language Processing (NLP) technology, a subset of artificial intelligence. This hybrid approach aims to refine the analysis process, reducing biases and enhancing the quality of information gleaned from the discussions. Leveraging these insights, the Albarka Nutrition Team has validated and rolled out a series of small doable actions (SDAs), crafting a detailed workplan. This plan is designed to guide the introduction of new social behavior change (SBC) initiatives through local structures, marking a significant stride towards sustainable community empowerment and resilience in northern Mali.

Harnessing Technology to Elevate Marginalized Voices: Training “Insta-Enumerators” in Inaccessible Environments (Ignite)

  • Day/Time: Thursday 5:35 PM – 5:40 PM PT
  • Presenters: Kate Hamilton, Alyssa Aclan
  • Room: E147-148

From 2020 to 2024, there has been an increase in internet shutdowns and simultaneously, a decline in online rights globally. In 2021, a record number of countries witnessed internet users being subjected to arrests and physical assaults due to their social media content. Even with this grim picture of where digital freedoms may be headed, social media platforms have emerged as powerful tools for activists and marginalized populations, enabling them to mobilize communities, document injustices, and disseminate crucial and unfiltered information to a broader audience. Despite this shrinking of access and voice, democracy and civil society strengthening programs continue to thrive despite the risks implementers face in their operating environment. Implementers have pivoted to meet the demands of their new environment, changing program delivery models, adapting educational materials, and targeting new and underserved populations. With this shift, evaluators seeking to understand the efficacy and outcomes of these programs must adapt and leverage creative methodologies that harness local expertise with introducing unnecessary risk.

While not new to evaluation, the use of social media as a data collection tool shows promise in environments with large, tech-savvy youth populations who live in restrictive environments. In particular, it can provide avenues for safer data collection in insecure environments and may provide a voice for marginalized communities. A growing body of research suggests that overt, government-backed media censorship and restrictions often backlash, leading to greater usership, as the affected population develops methods to circumvent the government-imposed controls. In this demonstration, our team will present an inclusive model for capacity building, information transfer, and data collection that leverages social media as the medium to access critical voices and data in restrictive environments while mitigating for unnecessary risk. Democracy programs are quick to identify changemakers within societies who can act as a force multiplier for the intended effects of the intervention. Our model leverages those same voices, training them in evaluation methodologies and data collection techniques so that they are not only able to create change in their communities, but they are able to dissect it, measure it, and distill best practices for forward momentum. New evaluators are trained through short, targeted capacity building modules on Instagram, and are then tasked with mini projects to practice data collection techniques such as launching a poll on Instagram or starting a conversation soliciting reactions via comments. Seasoned evaluation experts work hand-in-hand with the trainees on understanding and interpreting the reactions, comments, and providing insights on what the data could be saying. These “insta-enumerators” then work in concert with seasoned evaluation experts to craft meaningful measures, collect data, and share results. During the demonstration, the team will present the methodological underpinnings of the approach, provide practical steps for designing similar approaches using social media appropriate to the context, and share initial findings and lessons learned drawn from the employment of this model.

Unveiling Media Narratives: A Deep Dive into Content Analysis Leveraging Technology and Machine Learning

  • Day/Time: Friday 10:15 AM – 11:15 AM PT
  • Presenters: Seth Tucker, Charles Gasper
  • Room: D135-136

Media content analysis looks across different media sources (i.e. newspapers, magazines, journals) to identify what topics are being amplified and which voices are being promoted, which can play a crucial role in understanding how different voices and topics are, or are not, being represented and amplified in media narratives. These media narratives can in turn influence public opinion and discourse on diverse topics such as public health, politics, and education. This session will consist of a demonstration of a step-by-step guide process on how to perform a media content analysis, using two evaluation projects as examples. These projects include an evaluation of participants in a health leadership program and the extent to which they were featured in the media, how they were featured, and which topics were most frequently amplified; and an evaluation of journalists who participated in a fellowship program and what they wrote about, where the content was featured, and how that changed over time. This session will demonstrate the methods used in these analyses such as Latent Dirichlet Allocation (LDA) topic modeling, web scraping, sentiment analysis, and qualitative diagramming, and will also include an explanation of tools that can be used such as R and Nexis. After this session, participants will have a better understanding of how to utilize media content analysis in their projects, will understand the limitations and possibilities, and have new ideas on how to measure the extent to which different voices and topics are being represented and amplified in the media. No prior knowledge of media content analysis or any specific tools or methods will be necessary for attendees!

Amplifying New Perspectives: Bridging Innovation and Inclusivity In an AI-Enhanced Evaluation

  • Day/Time: Friday 1:00 PM – 2:00 PM PT
  • Presenter: Jennifer P. Villalobos, Zach Tilton, Tarek Azzam, Hanna Camp, Linda Raftree (discussant)
  • Room: Exhibit Hall A

During an era when Artificial Intelligence (AI) is reshaping professional landscapes, this session provides a timely discussion aimed at invigorating participants by integrating fresh perspectives and voices in a conversation about AI-influenced evaluation practices, particularly from emerging evaluators. Anchored in the theme "Amplifying and Empowering Voices in Evaluation," this interactive session seeks to contribute to an inclusive, adaptive future of evaluation practice in a digital era. Facilitators will guide discussions around leveraging AI to democratize evaluation, while also ensuring inclusivity and equity in AI-driven methodologies. Participants will explore how AI can amplify underrepresented voices and contribute to ethical, fair, and impactful evaluation practices. By examining AI’s role in enhancing accessibility to evaluation education, lowering barriers to comprehensive and impactful analysis and reporting, and addressing biases in data analysis, the session also will provide a platform for evaluators of all backgrounds to share innovative ideas and creative approaches. This gathering is not just a discussion but a call to action, encouraging participants to envision and articulate the future of evaluation in an AI-influenced world. It's an opportunity to connect, share experiences, and collaboratively forge pathways for a more inclusive and dynamic evaluation community.

Using AI to strengthen the evaluation of complex development programs

  • Day/Time: Friday 2:30 PM – 3:30 PM PT
  • Presenter: Peter York, Geetika Pandya, Michael Bamberger (chair)
  • Room: Portland Ballroom 253

The goal of this session is to demonstrate the value-added of AI tools and techniques in strengthening important areas of development evaluation. We have selected the evaluation of complex development programs for two reasons. First, while there is widespread recognition that many (if not most) development programs are “complex”, most current evaluation methodologies do not adequately address complexity. Second, one of the reasons that most evaluations have been slow to address complexity, is because many current evaluations often lack the tools and techniques for the collection and analysis of the many kinds of data required to model and evaluate complexity. AI, and its access to big data provides many of these tools – in increasingly accessible and affordable ways. Consequently, the evaluation of complexity provides a framework to demonstrate the areas of comparative advantage of AI and how they can be applied in practice. However, our message is that AI builds on, and complements current evaluation methodologies, but does not replace them. The panel will include two presentations and a combined discussant/presentation. The first presentation provides an introduction to complexity and presents a 5-step framework for the evaluation of complex development programs. This draws on the forthcoming publication Dealing with complexity in development evaluation (Bamberger and Zazueta). It also discusses some of the methodological challenges that have slowed the development and use of complexity-responsive evaluation methodologies and identifies some of the ways in which AI (and big data) can address these challenges. The second presentation will expand on how the comparative advantages of AI can be used to address the challenges facing the evaluation of complex programs. Two case studies will illustrate the use of precision causal modelling in the evaluation of complex development programs. The first case illustrates a bottom-up approach (Program to Aid Citizen Enterprise – PACE); while the second illustrates a top-down approach (Gemma Services Residential Behavioral Services for Children). Both cases emphasize the importance of a mixed-methods approach. The discussant will comment on the two presentations, and then describe a case from the ongoing work of the International Initiative for Impact Evaluation (3ie) that can complement the approaches discussed in the two presentations. The case will describe the use of machine learning to provide a broader framework for understanding the complex interactions (heterogeneity) among treatment effects, and will illustrate how the approach was applied to a previously completed randomized control trial (RCT) of a school-based gender attitude change program in Haryana, India.

Summarizing Evaluation Reports with ChatGPT: A New Tool in Program Analysis

  • Day/Time: Friday 3:45 PM – 4:45 PM PT
  • Presenters: Diego Benitez, Erkin Yalcin
  • Room: F151-152

The advent of AI and machine learning technologies offers unprecedented opportunities for analyzing and disseminating the results of program evaluations. This session, titled "Summarizing Evaluation Reports with ChatGPT: A New Tool in Program Analysis" aims to demonstrate and explore the integration of ChatGPT in summarizing and disseminating evaluation results by enhancing our ability to synthesize data, generate insights, and communicate findings more effectively. The session will include a live demonstration of ChatGPT 4.0 to explore evaluations located within the USAID Development Experience Clearinghouse, and generate one-pagers that summarize methodology, key findings, and recommendations, among others. The last half of this session will engage participants directly and explore the quantitative and qualitative analytical power of ChatGPT 4.0 through a live polling exercise. Audience members will be asked to respond to a series of questions. Data collected will be analyzed in real-time to explore how ChatGPT can performing analysis, create summaries, or answer specific questions based on audience provided content. This session will demonstrate how ChatGPT can facilitate real-time, interactive discussions on evaluation findings, allowing participants to explore data in a group setting, and stimulate a collective discussion around data analysis, ethical considerations, and the future of evaluations in the digital age.

Building Skills in AI Integration for Culturally Responsive and Equitable Evaluation: A Cross-Disciplinary Approach

  • Day/Time: Saturday 9:15 AM – 10:15 AM PT
  • Presenters: Ashley Love, Chunling Niu, Joan Labay- Marquez, Paula Caffer, Art Hernandez
  • Room: D135-136

This interactive session explores the integration of Artificial Intelligence (AI) into program evaluation as a means to promote culturally responsive and equitable evaluation (CREE) practices, aligning with AEA's Guiding Principles of respect for people, responsibilities for general and public welfare, and integrity/honesty. Drawing from multiple disciplines, including public health, education, organizational leadership, psychology, and legal, the session provides a cross-disciplinary perspective on leveraging AI to enhance evaluation methodologies. Through engaging discussions and case study analyses, participants will examine the potential of AI techniques such as Natural Language Processing (NLP) and predictive analytics in qualitative and quantitative evaluations. Specifically, the session will demonstrate how NLP can facilitate the analysis of complex qualitative data while predictive analytics can anticipate trends and inform data-driven decision-making, enabling more culturally sensitive program adjustments. The session will critically explore the ethical and legal frameworks necessary for employing AI in evaluation, addressing issues such as algorithmic bias, privacy concerns, transparency, and stakeholder engagement. Participants will engage in interactive exercises to develop strategies for mitigating risks, adhering to ethical standards, and ensuring compliance with relevant laws and regulations, thereby protecting and respecting all participants in the evaluation process. Through collaborative activities and knowledge-sharing, attendees will gain practical insights and strategies for incorporating AI into their evaluation practices, tailoring evaluation frameworks to reflect the cultural specificities of diverse communities. The session will foster an environment where participants can learn from each other's experiences across various disciplines, collectively contributing to the advancement of CREE principles. By the end of the session, attendees will have a comprehensive understanding of how AI can be thoughtfully and ethically integrated into program evaluation, enhancing their ability to promote equity, inclusivity, and respect across diverse evaluation contexts. The session contributes to AEA members' professional development and learning, equipping them with the knowledge and skills to harness technology in advancing CREE principles and addressing the unique needs of diverse communities.