Development of Evaluation Techniques for Multi-Agent Systems

About the Activity

There exists a broad class of complex, previously unsolved problems—spanning fields such as medicine, climate science, engineering, and the natural sciences—for which viable solutions are not yet available in the public domain. These problems are inherently non-trivial and often fall outside the knowledge distribution on which standard large language models (LLMs) are pretrained. As such, current LLMs are ill-suited to address these challenges directly, due to both their epistemic limitations and lack of domain-specific reasoning.

To address this, we propose evaluation of a customized multi-agent system capable of collaboratively reasoning about and attempting solutions to such novel problems. The primary objective is to explore whether a system of coordinated Artificial Intelligence (AI) agents—each with distinct roles, tools, and knowledge access—can engage in goal-directed problem-solving in uncharted domains. While we anticipate that such a system may generate interesting hypotheses or partial insights, it is also likely to produce hallucinations, factual inaccuracies, and logical errors, especially when operating outside familiar data distributions.

Goals of the Activity

The goal is to build transparent, auditable, and trusted multi-agent systems that are not only capable of tackling novel scientific and technical challenges but also produce verifiable, interpretable outputs. This project paves the way for robust AI architectures & software that integrate autonomous decision-making with rigorous evaluation and alignment mechanisms—essential for safe deployment in high-stakes, data-scarce environments.

Getting Involved

Who Should Get Involved

  • IT Companies
  • Technology Universities
  • Research Labs
  • Business Schools

How to Get Involved

To learn more about the program and how to join the Development of Evaluation Techniques for Multi-Agent Systems activity, please express your interest by completing the Development of Evaluation Techniques for Multi-Agent Systems interest form.

Contacts
Subscribe to our Newsletter

Sign up for our monthly newsletter to learn about new developments, including resources, insights and more.