Active PAR

P3419

Standard for Large Language Model Evaluation

This standard establishes a comprehensive set of criteria for the evaluation of Large Language Models (LLMs) and extends to multimodal models. Primarily, the standard provides a multi-dimensional evaluation framework for the users, teams, and enterprises who evaluate or use LLM technology or models. The standard describes a three-dimensional evaluation framework anchored in the principles of “versatility, intelligence, efficiency, and safety.” This framework is categorized under "capability-task-metrics" and encompasses evaluations at different stages, including foundation models, pre-training algorithms, and fine-tuning algorithms. The standard provides objective and subjective viewpoints and offers an extensive understanding of LLMs' multifaceted capabilities and potentialities. This comprehensive understanding greatly assists researchers and developers, as well as downstream users. Furthermore, the open-source evaluation platform operations embedded in the framework help stimulating innovation in model and algorithm research and drive forward their industrial applications.

Sponsor Committee: C/AISC - Artificial Intelligence Standards Committee
Status: Active PAR
PAR Approval: 2023-09-21

Working Group Details

Society: IEEE Computer Society
Learn More About IEEE Computer Society
Sponsor Committee: C/AISC - Artificial Intelligence Standards Committee
Working Group: AI-LME - AI Large Model Evaluation
Learn More About AI-LME - AI Large Model Evaluation
IEEE Program Manager: Christy Bahn
Contact Christy Bahn
Working Group Chair: Yonghua Lin

Other Activities From This Working Group

Current projects that have been authorized by the IEEE SA Standards Board to develop a standard.

No Active Projects

Standards approved by the IEEE SA Standards Board that are within the 10-year lifecycle.

No Active Standards

These standards have been replaced with a revised version of the standard, or by a compilation of the original active standard and all its existing amendments, corrigenda, and errata.

No Superseded Standards

These standards have been removed from active status through a ballot where the standard is made inactive as a consensus decision of a balloting group.

No Inactive-Withdrawn Standards

These standards are removed from active status through an administrative process for standards that have not undergone a revision process within 10 years.

No Inactive-Reserved Standards

P3419

Standard for Large Language Model Evaluation

Working Group Details

Other Activities From This Working Group

Subscribe to our Newsletter