Sessions
Understand the lifecycle, structure, and analysis of a Session – a single run of your AI agent.
📦 Sessions
A Session represents a single run of an AI agent as it attempts to complete a task. It captures the full trace of the agent’s behavior — from initial state to final outcome — and provides deep visibility into each decision, action, and outcome along the way.
🧠 What is a Session?
A Session is a top-level unit of observability. It begins when an agent starts a task and ends when the task is either completed, failed, or interrupted. Learn to create a session here
Within each Session UI in Lucidic or what we call workflow sandbox, you’ll find:
- A timeline of Steps (agent states and transitions)
- Fine-grained Events (e.g. LLM calls, API responses) inside each Step
- A workflow graph visualizing the session’s trajectory
- A GIF/video replay of the agent’s interaction (if available)
- Evaluation scores, cost metrics, metadata, and more
🧬 Session Structure
Step and Event Hierarchy
A Session is composed of ordered Steps, each representing a unique state and action taken by the agent. Each Step contains multiple Events, representing granular atomic operations like LLM calls or API hits.
📺 Session Dashboard or Workflow Sandbox
Each Session in the workflow sandbox includes:
- Overview metrics: duration, step count, cost, completion status
- Visual replay: full session video or GIF (if captured)
- Interactive workflow graph: clickable nodes for each step
- Step-by-step explorer: deep dive into state/action/event logs
- Evaluation scores: human or LLM-generated
📊 Evaluation and Scoring
Sessions can be evaluated via multiple systems:
1. User-Provided Eval Score
Passed in at runtime using the Python SDK. Useful for customized or domain-specific scoring.
2. Rubric-Based Evaluation
Define structured, weighted rubrics with multiple criteria and score definitions.
These rubrics produce both per-criterion and composite scores.
3. Default Evaluation
Our built-in system outputs:
- A score (0 or 1) based on task completion
- A list of “wrong” or suboptimal steps (automatically identified)
- Graph highlights of failure points
🕵️♂️ Debugging with Sessions
Sessions give you rich insight into how your agent operates and where it goes wrong:
- Replay stuck behavior via video
- Identify high-cost operations
- Trace decision-making back to specific prompts or events
- Group failing steps into unified nodes in the graph
- Track revisited states and looping patterns
📁 Metadata and Tagging
Each Session contains rich metadata including:
- Task description
- Agent identity and version
- Cost
- Timestamps
- Link to parent Mass Simulation (if applicable)
- Replay data and graph reference
Planned: semantic tagging, search, and session filtering based on LLM-labeled themes.