A framework for standardizing evaluations of large foundation models, beyond single-score reporting and rankings.
APACHE-2.0 License