Sessions

A Session represents a single run of an AI agent as it attempts to complete a task. It captures the full trace of the agent’s behavior — from initial state to final outcome — and provides deep visibility into each decision, action, and outcome along the way.

What is a Session?

A Session is a top-level unit of observability. It begins when an agent starts a task and ends when the task is either completed, failed, or interrupted. Learn to create a session with our Quick Start guide or see detailed examples Within each Session UI in Lucidic or what we call workflow sandbox, you’ll find:
  • A timeline of Steps (agent states and transitions)
  • Fine-grained Events (e.g. LLM calls, API responses) inside each Step
  • A workflow graph visualizing the session’s trajectory
  • A GIF/video replay of the agent’s interaction (if available)
  • Evaluation scores, cost metrics, metadata, and more
Sessions are non-deterministic by nature. Re-running the same task may lead to different outcomes due to the agent’s autonomy and stochasticity.

Session Structure

Step and Event Hierarchy

A Session is composed of ordered Steps, each representing a unique state and action taken by the agent. Each Step contains multiple Events, representing granular atomic operations like LLM calls or API hits.
Session
└── Step
    ├── Event
    ├── Event
    └── ...
Steps and Events are documented here → and here → respectively.

Session Dashboard or Workflow Sandbox

Each Session in the workflow sandbox includes:
  • Overview metrics: duration, step count, cost, completion status
  • Visual replay: full session video or GIF (if captured)
  • Interactive workflow graph: clickable nodes for each step
  • Step-by-step explorer: deep dive into state/action/event logs
  • Evaluation scores: human or LLM-generated

Evaluation and Scoring

Sessions can be evaluated via multiple systems:

1. User-Provided Eval Score

Passed in at runtime using the Python SDK. Useful for customized or domain-specific scoring.
lai.end_session(
    is_successful=True,
    session_eval=4.5,  # Your custom score (numeric value, not string)
    session_eval_reason="Agent completed primary objectives but took longer than expected"
)

2. Rubric-Based Evaluation

Define structured, weighted rubrics with multiple criteria and score definitions.
criteria:
  - name: "Repeated Site Visits"
    weights: 0.4
    scores:
      10: "Never visited a site twice"
      1: "Frequently revisited pages"

  - name: "Average Step Time"
    weight: 0.6
    ...
These rubrics produce both per-criterion and composite scores.

3. Default Evaluation

Our built-in system outputs:
  • A score (0 or 1) based on task completion
  • A list of “wrong” or suboptimal steps (automatically identified)
  • Graph highlights of failure points

Debugging with Sessions

Sessions give you rich insight into how your agent operates and where it goes wrong:
  • Replay stuck behavior via video
  • Identify high-cost operations
  • Trace decision-making back to specific prompts or events
  • Group failing steps into unified nodes in the graph
  • Track revisited states and looping patterns

Metadata and Tagging

Each Session contains rich metadata including:
  • Task description
  • Agent identity and version
  • Cost
  • Timestamps
  • Link to parent Mass Simulation (if applicable)
  • Replay data and graph reference
Planned: semantic tagging, search, and session filtering based on LLM-labeled themes.