📦 Sessions

A Session represents a single run of an AI agent as it attempts to complete a task. It captures the full trace of the agent’s behavior — from initial state to final outcome — and provides deep visibility into each decision, action, and outcome along the way.

🧠 What is a Session?

A Session is a top-level unit of observability. It begins when an agent starts a task and ends when the task is either completed, failed, or interrupted. Learn to create a session here

Within each Session UI in Lucidic or what we call workflow sandbox, you’ll find:

A timeline of Steps (agent states and transitions)
Fine-grained Events (e.g. LLM calls, API responses) inside each Step
A workflow graph visualizing the session’s trajectory
A GIF/video replay of the agent’s interaction (if available)
Evaluation scores, cost metrics, metadata, and more

🧬 Session Structure

Step and Event Hierarchy

A Session is composed of ordered Steps, each representing a unique state and action taken by the agent. Each Step contains multiple Events, representing granular atomic operations like LLM calls or API hits.

Session
└── Step
    ├── Event
    ├── Event
    └── ...

📺 Session Dashboard or Workflow Sandbox

Each Session in the workflow sandbox includes:

Overview metrics: duration, step count, cost, completion status
Visual replay: full session video or GIF (if captured)
Interactive workflow graph: clickable nodes for each step
Step-by-step explorer: deep dive into state/action/event logs
Evaluation scores: human or LLM-generated

📊 Evaluation and Scoring

Sessions can be evaluated via multiple systems:

1. User-Provided Eval Score

Passed in at runtime using the Python SDK. Useful for customized or domain-specific scoring.

lai.end_session(
    is_successful=True,
    session_eval="4.5",  # Your custom score which is not a rubric
    session_eval_reason="Agent completed primary objectives but took longer than expected"
)

2. Rubric-Based Evaluation

Define structured, weighted rubrics with multiple criteria and score definitions.

criteria:
  - name: "Repeated Site Visits"
    weights: 0.4
    scores:
      10: "Never visited a site twice"
      1: "Frequently revisited pages"

  - name: "Average Step Time"
    weight: 0.6
    ...

These rubrics produce both per-criterion and composite scores.

3. Default Evaluation

Our built-in system outputs:

A score (0 or 1) based on task completion
A list of “wrong” or suboptimal steps (automatically identified)
Graph highlights of failure points

🕵️‍♂️ Debugging with Sessions

Sessions give you rich insight into how your agent operates and where it goes wrong:

Replay stuck behavior via video
Identify high-cost operations
Trace decision-making back to specific prompts or events
Group failing steps into unified nodes in the graph
Track revisited states and looping patterns

📁 Metadata and Tagging

Each Session contains rich metadata including:

Task description
Agent identity and version
Cost
Timestamps
Link to parent Mass Simulation (if applicable)
Replay data and graph reference

Planned: semantic tagging, search, and session filtering based on LLM-labeled themes.

Get Started

Features

Integrations

Core Concepts

Sessions

📦 Sessions

🧠 What is a Session?

🧬 Session Structure

Step and Event Hierarchy

📺 Session Dashboard or Workflow Sandbox

📊 Evaluation and Scoring

1. User-Provided Eval Score

2. Rubric-Based Evaluation

3. Default Evaluation

🕵️‍♂️ Debugging with Sessions

📁 Metadata and Tagging

Get Started

Features

Integrations

Core Concepts

​📦 Sessions

​🧠 What is a Session?

​🧬 Session Structure

​Step and Event Hierarchy

​📺 Session Dashboard or Workflow Sandbox

​📊 Evaluation and Scoring

​1. User-Provided Eval Score

​2. Rubric-Based Evaluation

​3. Default Evaluation

​🕵️‍♂️ Debugging with Sessions

​📁 Metadata and Tagging

📦 Sessions

🧠 What is a Session?

🧬 Session Structure

Step and Event Hierarchy

📺 Session Dashboard or Workflow Sandbox

📊 Evaluation and Scoring

1. User-Provided Eval Score

2. Rubric-Based Evaluation

3. Default Evaluation

🕵️‍♂️ Debugging with Sessions

📁 Metadata and Tagging