Rubrics

Rubrics let you define structured evaluation criteria for your agents — turning your domain knowledge into consistent, explainable scoring or validation systems. Instead of relying on a single metric, rubrics help you quantify or verify agent quality across dimensions you care about.

Why Create Custom Rubrics?

Agents don’t just succeed or fail — they perform along a spectrum, or meet/don’t meet specific expectations. Rubrics help you:
  • Capture nuanced behaviors
  • Enforce domain-specific quality standards
  • Compare model changes or prompt variations
  • Identify regressions or improvement trends

Use Cases

  • Score Rubric: Rate task completion, efficiency, safety, etc.
  • Pass/Fail Rubric: Validate key requirements, such as compliance, user safety, or response correctness

Rubric Modes

1. Score Rubrics

Score rubrics provide weighted numeric evaluations across several criteria. Each criterion has a defined scoring range (e.g., 1–10), with descriptions for what qualifies as high, medium, or low performance.
  • Each criterion has optional weights, which are used to compute a weighted average.
  • The final rubric score is a single float value.
  • Ideal for comparative analysis, model iteration, or benchmarking performance.

Example

Rubric: "Browser Navigation Quality"

Criteria:
  - name: "Step Efficiency"
    weight: 2
    scores:
      10: "Task completed in <10 steps"
      5: "Task completed in 11–15 steps"
      1: ">15 steps"

  - name: "Revisiting Sites"
    weight: 1
    scores:
      10: "Never visited a site more than once"
      5: "Occasionally revisited"
      1: "Frequently revisited the same sites"

2. Pass/Fail Rubrics

Pass/Fail rubrics are designed for binary validation of agent behavior.
  • Each criterion has a Pass Definition and Fail Definition
  • If all criteria pass, the rubric evaluates to True
  • If any criterion fails, the rubric evaluates to False
  • The system automatically identifies which step(s) caused each failure

Example

Rubric: "Customer Service QA Check"

Criteria:
  - name: "Polite Communication"
    pass_definition: "Agent maintains professional tone throughout"
    fail_definition: "Agent uses inappropriate or dismissive language"

  - name: "Issue Resolution"
    pass_definition: "Customer issue is fully resolved"
    fail_definition: "Customer problem remains unresolved or unclear"

How to Use Rubrics

Using rubrics consists of two parts: 1) creating one in the UI and 2) using it in your code.

Step 1: In the UI

  • Navigate to the Rubrics tab on any Agent located directly under the agent name.
  • Click the Create New Evaluation Rubric button
  • Give your Rubric a name, description, and icon
  • Choose a Rubric Mode:
    • Score Rubric: Numerical scoring system (e.g., 1–10 scale)
    • Pass/Fail Rubric: Binary outcomes (Pass/Fail) per criterion

For Score Rubrics:

  • Add multiple Criteria, each with:
    • A name
    • Optional weight (for weighted average)
    • Score Definitions (what a score of 1, 5, or 10 means)

For Pass/Fail Rubrics:

  • Add multiple Criteria, each with:
    • A name
    • A Pass Definition (what success looks like)
    • A Fail Definition (what failure looks like)
  • Click Save
Pro Tip: Click on the Evaluation Rubric Summary to see a quick summary of all criteria.

Step 2: In Python

Attach rubrics to sessions at runtime:
lai.init(
    ...
    rubrics=["Default", "My Custom Rubric"]
)

Final Notes

  • Rubric results are immutable once ran on a session; any changes made afterward won’t retroactively affect past evaluations.
  • You can use multiple rubrics per session to get both high-level and detailed insights.
  • Whether you need nuanced score-based comparisons or simple go/no-go quality checks, rubrics give you a structured, explainable framework to evaluate AI agents.