Experiments
An Experiment is an organizational container that groups multiple related sessions together for bulk analysis and evaluation. Experiments enable you to identify patterns, track performance metrics, and detect failure modes across multiple runs of your AI agents.What is an Experiment?
An Experiment provides a framework for systematically analyzing agent behavior at scale. While a single session shows you how your agent performed once, an experiment reveals how it performs consistently across many executions. Think of experiments as a way to:- Group related test runs for comparative analysis
- Apply consistent evaluation across all sessions
- Identify failure patterns that only emerge at scale
- Track performance trends over time and configurations
Experiments are created and managed through the Lucidic Dashboard. Your SDKs can then add sessions to these experiments programmatically.
Key Concepts
Experiment Structure
Creating Experiments
Experiments are created exclusively through the Lucidic Dashboard:- Navigate to your agent’s Session History
- Select sessions you want to analyze together
- Click “Create Experiment”
- Configure name, description, tags, and rubrics
- Note the experiment ID for SDK integration
Adding Sessions
Once created, you can add sessions to experiments in two ways: Via Dashboard:- Select existing sessions
- Add to experiment through the UI
Core Features
1. Bulk Session Analysis
Experiments aggregate data across all included sessions:- Success rates - Overall and per-criteria pass rates
- Cost analysis - Average, total, and distribution metrics
- Duration statistics - Performance timing patterns
- Completion funnels - Where agents succeed or fail
2. Failure Pattern Detection
The platform automatically identifies and groups similar failures:- AI-powered clustering - Groups similar error patterns
- Named categories - Each group gets descriptive labels
- Affected session tracking - See which runs had each issue
- Root cause hints - Explanations of what went wrong
Failure analysis requires evaluation credits (1 credit per 10 sessions).
3. Evaluation Rubrics
Apply consistent evaluation criteria across all sessions:- Score rubrics - Weighted numerical evaluations (0-10)
- Pass/Fail rubrics - Binary success criteria
- Multi-criteria - Combine multiple evaluation dimensions
- Automatic application - Rubrics run on all experiment sessions
4. Comparative Analysis
Experiments excel at comparison scenarios:- A/B testing - Compare different prompts or configurations
- Regression detection - Ensure changes don’t degrade performance
- Version comparison - Track improvements across iterations
- Baseline establishment - Set performance benchmarks
Common Use Cases
A/B Testing
Test different configurations to find optimal settings:Regression Testing
Ensure new changes maintain performance:Load Testing
Analyze behavior under stress:Performance Benchmarking
Establish and track performance baselines:Best Practices
Experiment Design
- Clear scope - Group sessions with similar purposes
- Meaningful names - Use descriptive, searchable names
- Consistent tags - Develop a tagging taxonomy
- Appropriate size - 50-100 sessions for reliable patterns
Workflow Recommendations
- Create in dashboard first - Set up experiment before running code
- Note the ID - Copy experiment ID for SDK use
- Apply rubrics early - Configure evaluation criteria upfront
- Run analysis regularly - Analyze after significant sessions added
Organization Tips
- One purpose per experiment - Don’t mix different test types
- Version tracking - Include version info in names/tags
- Time boundaries - Consider daily/weekly experiment groups
- Documentation - Use descriptions to explain purpose
Limitations
- Dashboard creation only - Cannot create experiments via SDK
- One experiment per session - Sessions belong to single experiment
- Credit costs - Failure analysis consumes evaluation credits
- Processing time - Large experiments take time to analyze
Related Concepts
- Sessions - Individual agent execution runs
- Mass Simulations - Running agents at scale
- Evaluations & Rubrics - Scoring mechanisms
- Steps - Execution units within sessions