Rubrics let you define structured evaluation criteria for your agents — turning your domain knowledge into consistent, explainable scoring or validation systems.Instead of relying on a single metric, rubrics help you quantify or verify agent quality across dimensions you care about.
Score rubrics provide weighted numeric evaluations across several criteria. Each criterion has a defined scoring range (e.g., 1–10), with descriptions for what qualifies as high, medium, or low performance.
Each criterion has optional weights, which are used to compute a weighted average.
The final rubric score is a single float value.
Ideal for comparative analysis, model iteration, or benchmarking performance.
Rubric results are immutable once ran on a session; any changes made afterward won’t retroactively affect past evaluations.
You can use multiple rubrics per session to get both high-level and detailed insights.
Whether you need nuanced score-based comparisons or simple go/no-go quality checks, rubrics give you a structured, explainable framework to evaluate AI agents.