Working with Experiments

Overview

Experiments group related sessions for bulk analysis, pattern detection, and performance evaluation. While experiments are created and managed through the Lucidic Dashboard, the TypeScript SDK allows you to programmatically add sessions to existing experiments.

Experiments must be created in the dashboard first. The SDK cannot create new experiments, only add sessions to existing ones.

Prerequisites

Before adding sessions to an experiment:

Create an experiment in the dashboard
- Navigate to Session History
- Click “Create Experiment”
- Configure name, tags, and rubrics
Get the experiment ID
- Found in the experiment URL: https://dashboard.lucidic.ai/experiments/exp-123abc
- Copy this ID for use in your code

Basic Usage

Adding a Session to an Experiment

import * as lai from 'lucidicai';

// Experiment ID from the dashboard
const EXPERIMENT_ID = "exp-123abc-456def-789ghi";

// Initialize a session within the experiment
const sessionId = await lai.init({
  sessionName: "Checkout Flow Test",
  experimentId: EXPERIMENT_ID,  // Links session to experiment
  task: "Test the checkout process with items in cart"
});

// Run your agent workflow
await addItemsToCart(["item-1", "item-2"]);
await proceedToCheckout();
await completePayment();

// End the session with evaluation
await lai.endSession({
  isSuccessful: true,
  sessionEval: 9.0,
  sessionEvalReason: "Checkout completed, minor UI delay"
});

Key Points

Sessions can only belong to one experiment
The experimentId parameter is optional
Sessions without experimentId won’t appear in any experiment
Experiment assignment cannot be changed after session creation

Common Patterns

A/B Testing

Compare different configurations or prompts:

import * as lai from 'lucidicai';

// Create two experiments in the dashboard first
const VARIANT_A_EXPERIMENT = "exp-prompt-concise-abc123";
const VARIANT_B_EXPERIMENT = "exp-prompt-detailed-def456";

async function runABTest(testQuery: string) {
  // Test Variant A (Concise prompts)
  await lai.init({
    sessionName: `Query: ${testQuery.substring(0, 50)}`,
    experimentId: VARIANT_A_EXPERIMENT,
    tags: ["ab-test", "variant-a", "concise"]
  });
  
  const responseA = await chatbotWithConcisePrompt(testQuery);
  const scoreA = evaluateResponseQuality(responseA);
  
  await lai.endSession({
    isSuccessful: scoreA > 7,
    sessionEval: scoreA,
    sessionEvalReason: `Concise variant score: ${scoreA}`
  });
  
  // Test Variant B (Detailed prompts)
  await lai.init({
    sessionName: `Query: ${testQuery.substring(0, 50)}`,
    experimentId: VARIANT_B_EXPERIMENT,
    tags: ["ab-test", "variant-b", "detailed"]
  });
  
  const responseB = await chatbotWithDetailedPrompt(testQuery);
  const scoreB = evaluateResponseQuality(responseB);
  
  await lai.endSession({
    isSuccessful: scoreB > 7,
    sessionEval: scoreB,
    sessionEvalReason: `Detailed variant score: ${scoreB}`
  });
}

// Run tests
const testQueries = [
  "How do I reset my password?",
  "What's your refund policy?",
  "I need technical support",
  // ... more queries
];

for (const query of testQueries) {
  await runABTest(query);
}

Regression Testing

Ensure changes don’t degrade performance:

import * as lai from 'lucidicai';

// Experiments created in dashboard for each version
const BASELINE_EXPERIMENT = "exp-baseline-v1.0";
const CANDIDATE_EXPERIMENT = "exp-candidate-v1.1";

interface TestCase {
  name: string;
  category: string;
  input: any;
  expected: any;
}

class RegressionTester {
  constructor(
    private experimentId: string,
    private version: string
  ) {}
  
  async runTestSuite(testCases: TestCase[]) {
    const results = [];
    
    for (const test of testCases) {
      await lai.init({
        sessionName: `${test.name} - ${this.version}`,
        experimentId: this.experimentId,
        tags: ["regression", this.version, test.category]
      });
      
      try {
        const result = await this.executeTest(test);
        const success = result === test.expected;
        
        await lai.endSession({
          isSuccessful: success,
          sessionEval: success ? 10 : 0,
          sessionEvalReason: `Expected: ${test.expected}, Got: ${result}`
        });
        
        results.push({
          test: test.name,
          success,
          result
        });
        
      } catch (error) {
        await lai.endSession({
          isSuccessful: false,
          sessionEval: 0,
          sessionEvalReason: `Test failed with error: ${error}`
        });
        
        results.push({
          test: test.name,
          success: false,
          error: error.toString()
        });
      }
    }
    
    return results;
  }
  
  private async executeTest(test: TestCase) {
    // Your test implementation
    return await runAgentWithInput(test.input);
  }
}

// Run regression tests
const baselineTester = new RegressionTester(BASELINE_EXPERIMENT, "v1.0");
const candidateTester = new RegressionTester(CANDIDATE_EXPERIMENT, "v1.1");

const testCases = await loadRegressionTestSuite();

const baselineResults = await baselineTester.runTestSuite(testCases);
const candidateResults = await candidateTester.runTestSuite(testCases);

console.log("View comparison at: https://dashboard.lucidic.ai/experiments/compare");

Load Testing

Test performance under concurrent load:

import * as lai from 'lucidicai';

// Create load test experiment in dashboard
const LOAD_TEST_EXPERIMENT = "exp-load-test-1000-users";

interface Scenario {
  name: string;
  actions: Action[];
}

interface Action {
  type: string;
  params: any;
  delay?: number;
}

async function simulateUserSession(
  userId: number,
  scenario: Scenario
): Promise<UserResult> {
  const startTime = Date.now();
  
  // Initialize session for this user
  await lai.init({
    sessionName: `User ${userId} - ${scenario.name}`,
    experimentId: LOAD_TEST_EXPERIMENT,
    tags: ["load-test", `scenario:${scenario.name}`, `batch:${Math.floor(userId/100)}`]
  });
  
  try {
    // Simulate user actions
    const results = [];
    for (const action of scenario.actions) {
      const result = await performAction(action);
      results.push(result);
      
      if (action.delay) {
        await new Promise(resolve => setTimeout(resolve, action.delay));
      }
    }
    
    // Calculate metrics
    const duration = (Date.now() - startTime) / 1000;
    const success = results.every(r => r.success);
    
    await lai.endSession({
      isSuccessful: success,
      sessionEval: calculatePerformanceScore(duration, results),
      sessionEvalReason: `Duration: ${duration.toFixed(2)}s, Actions: ${results.length}`
    });
    
    return {
      userId,
      success,
      duration,
      results
    };
    
  } catch (error) {
    await lai.endSession({
      isSuccessful: false,
      sessionEval: 0,
      sessionEvalReason: `Error: ${error}`
    });
    
    return {
      userId,
      success: false,
      error: error.toString()
    };
  }
}

async function runLoadTest(numUsers: number, concurrency: number = 10) {
  const scenarios: Scenario[] = [
    { name: 'browse', actions: [...] },
    { name: 'purchase', actions: [...] },
    { name: 'support', actions: [...] }
  ];
  
  const results: UserResult[] = [];
  
  // Process users in batches
  for (let i = 0; i < numUsers; i += concurrency) {
    const batch = [];
    
    for (let j = 0; j < concurrency && i + j < numUsers; j++) {
      const userId = i + j;
      const scenario = scenarios[userId % scenarios.length];
      batch.push(simulateUserSession(userId, scenario));
    }
    
    const batchResults = await Promise.all(batch);
    results.push(...batchResults);
  }
  
  // Summary statistics
  const successful = results.filter(r => r.success).length;
  const avgDuration = results.reduce((sum, r) => sum + (r.duration || 0), 0) / results.length;
  
  console.log("Load Test Complete:");
  console.log(`  Total Users: ${numUsers}`);
  console.log(`  Successful: ${successful} (${(100 * successful / numUsers).toFixed(1)}%)`);
  console.log(`  Avg Duration: ${avgDuration.toFixed(2)}s`);
  console.log(`  View full results at: https://dashboard.lucidic.ai/experiments/${LOAD_TEST_EXPERIMENT}`);
}

// Run the load test
await runLoadTest(1000, 20);

Performance Benchmarking

Track performance over time:

import * as lai from 'lucidicai';

interface Benchmark {
  name: string;
  type: 'latency' | 'accuracy' | 'completion';
  input?: any;
  expectedTime?: number;
  scoringRubric?: any;
  steps?: string[];
  timeout?: number;
}

class PerformanceBenchmark {
  private benchmarkSuite: Benchmark[];
  
  constructor(private experimentId: string) {
    this.benchmarkSuite = this.loadBenchmarkSuite();
  }
  
  private loadBenchmarkSuite(): Benchmark[] {
    return [
      {
        name: 'Simple Query Response',
        type: 'latency',
        input: 'What is 2+2?',
        expectedTime: 1.0  // seconds
      },
      {
        name: 'Complex Reasoning',
        type: 'accuracy',
        input: 'Explain quantum computing',
        scoringRubric: { /* ... */ }
      },
      {
        name: 'Multi-step Task',
        type: 'completion',
        steps: ['search', 'analyze', 'summarize'],
        timeout: 30.0
      }
    ];
  }
  
  async runBenchmarks() {
    const results = [];
    
    for (const benchmark of this.benchmarkSuite) {
      await lai.init({
        sessionName: `Benchmark: ${benchmark.name}`,
        experimentId: this.experimentId,
        tags: [
          'benchmark',
          benchmark.type,
          new Date().toISOString().split('T')[0]
        ]
      });
      
      const result = await this.executeBenchmark(benchmark);
      const score = this.scoreBenchmark(benchmark, result);
      
      await lai.endSession({
        isSuccessful: score >= 7,
        sessionEval: score,
        sessionEvalReason: JSON.stringify(result, null, 2)
      });
      
      results.push({
        benchmark: benchmark.name,
        score,
        result
      });
    }
    
    return results;
  }
  
  private async executeBenchmark(benchmark: Benchmark) {
    // Implementation depends on benchmark type
    switch (benchmark.type) {
      case 'latency':
        return await measureLatency(benchmark.input);
      case 'accuracy':
        return await measureAccuracy(benchmark.input, benchmark.scoringRubric);
      case 'completion':
        return await measureCompletion(benchmark.steps, benchmark.timeout);
    }
  }
  
  private scoreBenchmark(benchmark: Benchmark, result: any): number {
    // Scoring logic based on benchmark type
    // Returns 0-10 score
    return calculateScore(benchmark, result);
  }
}

// Run daily benchmarks
async function runDailyBenchmarks() {
  // Assume experiment created in dashboard with naming pattern
  const today = new Date().toISOString().split('T')[0];
  const experimentId = `exp-benchmark-${today}`;  // Created in dashboard
  
  const benchmark = new PerformanceBenchmark(experimentId);
  const results = await benchmark.runBenchmarks();
  
  // Log summary
  const avgScore = results.reduce((sum, r) => sum + r.score, 0) / results.length;
  console.log(`Daily Benchmark Complete: ${today}`);
  console.log(`  Average Score: ${avgScore.toFixed(2)}`);
  console.log(`  View details: https://dashboard.lucidic.ai/experiments/${experimentId}`);
}

await runDailyBenchmarks();

Best Practices

Experiment Organization

Naming Sessions Consistently

// Good: Descriptive and searchable
const sessionName = `Test Case ${testId}: ${testDescription.substring(0, 50)}`;

// Bad: Generic and unhelpful
const sessionName = "Test 1";

Using Tags Effectively

await lai.init({
  sessionName: "Performance Test",
  experimentId: EXPERIMENT_ID,
  tags: [
    `version:${VERSION}`,
    `environment:${ENV}`,
    `date:${new Date().toISOString().split('T')[0]}`,
    `type:performance`
  ]
});

Providing Meaningful Evaluations

// Good: Specific success criteria and scores
await lai.endSession({
  isSuccessful: responseTime < 2.0 && accuracy > 0.95,
  sessionEval: calculateWeightedScore(responseTime, accuracy),
  sessionEvalReason: `Response: ${responseTime}s, Accuracy: ${(accuracy * 100).toFixed(1)}%`
});

// Bad: No context
await lai.endSession({ isSuccessful: true });

Error Handling

Always ensure sessions end properly:

import * as lai from 'lucidicai';

async function safeTestExecution(
  testCase: TestCase,
  experimentId: string
) {
  let sessionId: string | null = null;
  
  try {
    sessionId = await lai.init({
      sessionName: testCase.name,
      experimentId: experimentId
    });
    
    // Your test logic
    const result = await executeTest(testCase);
    
    await lai.endSession({
      isSuccessful: true,
      sessionEval: result.score
    });
    
  } catch (error) {
    // Ensure session ends even on error
    if (sessionId) {
      await lai.endSession({
        isSuccessful: false,
        sessionEval: 0,
        sessionEvalReason: `Error: ${error}\n${error.stack}`
      });
    }
    throw error;
  }
}

Batch Processing

For many sessions, consider batching:

import * as lai from 'lucidicai';

async function processBatch<T>(
  items: T[],
  experimentId: string,
  batchSize: number = 10
) {
  for (let i = 0; i < items.length; i += batchSize) {
    const batch = items.slice(i, i + batchSize);
    
    // Process batch in parallel
    await Promise.all(
      batch.map(async (item, index) => {
        await lai.init({
          sessionName: `Batch ${Math.floor(i/batchSize)} - Item ${item.id}`,
          experimentId: experimentId
        });
        
        await processItem(item);
        
        await lai.endSession({ isSuccessful: true });
      })
    );
    
    // Rate limiting between batches
    await new Promise(resolve => setTimeout(resolve, 1000));
  }
}

Tips and Tricks

Finding Experiment IDs

// Store experiment IDs in configuration
class ExperimentConfig {
  static readonly PRODUCTION = "exp-prod-baseline";
  static readonly STAGING = "exp-staging-tests";
  static readonly DEVELOPMENT = "exp-dev-testing";
  
  static getCurrent(): string {
    const env = process.env.ENVIRONMENT || 'development';
    return this[env.toUpperCase() as keyof typeof ExperimentConfig];
  }
}

// Use in code
await lai.init({
  sessionName: "Test Run",
  experimentId: ExperimentConfig.getCurrent()
});

Conditional Experiment Assignment

// Only add to experiment if specified
const experimentId = process.env.RUN_IN_EXPERIMENT
  ? process.env.EXPERIMENT_ID
  : undefined;

await lai.init({
  sessionName: "Conditional Test",
  experimentId  // undefined means no experiment
});

Experiment Metadata in Sessions

// Include experiment context in session data
const EXPERIMENT_NAME = "Q4 Performance Tests";

await lai.init({
  sessionName: "Contextual Test",
  experimentId: EXPERIMENT_ID,
  task: `Experiment: ${EXPERIMENT_NAME} | Test: ${testName}`
});

Type Safety

// Define experiment IDs with type safety
enum ExperimentIds {
  BASELINE = "exp-baseline-2024",
  CANDIDATE = "exp-candidate-2024",
  PERFORMANCE = "exp-performance-2024"
}

// Use with confidence
await lai.init({
  sessionName: "Type-safe Test",
  experimentId: ExperimentIds.BASELINE
});

Common Issues

Sessions Not Appearing in Experiment

Problem: Sessions created but not showing in experiment dashboard Solutions:

Verify experimentId is exactly correct (copy from URL)
Ensure session has ended (call await lai.endSession())
Refresh the dashboard page
Check that API key has correct permissions

Experiment ID Not Found

Problem: Error when using experimentId Solutions:

Confirm experiment exists in dashboard
Check you’re using the correct agent
Ensure experiment belongs to your project
Verify API key matches the project

Evaluations Not Running

Problem: Rubrics applied but no evaluation scores Solutions:

Ensure sessions are properly ended
Check rubric configuration in dashboard
Verify evaluation credits available
Wait for async evaluation to complete

Async/Await Issues

Problem: Sessions not completing in correct order Solutions:

// Always await session operations
await lai.init({ /* ... */ });
await lai.endSession({ /* ... */ });

// For parallel execution, use Promise.all
await Promise.all(tests.map(test => runTest(test)));

init Function Reference - Complete init parameters
Experiments Overview - Dashboard features
Getting Started with Experiments - Tutorial
Core Concepts: Experiments - Conceptual overview

Next Steps

Create your first experiment in the dashboard
Copy the experiment ID from the URL
Run the examples above with your ID
View results in the experiment analytics
Use failure analysis to identify patterns

Getting Started

TypeScript SDK Functions

Integrations

Advanced Topics

Working with Experiments

Overview

Prerequisites

Basic Usage

Adding a Session to an Experiment

Key Points

Common Patterns

A/B Testing

Regression Testing

Load Testing

Performance Benchmarking

Best Practices

Experiment Organization

Error Handling

Batch Processing

Tips and Tricks

Finding Experiment IDs

Conditional Experiment Assignment

Experiment Metadata in Sessions

Type Safety

Common Issues

Sessions Not Appearing in Experiment

Experiment ID Not Found

Evaluations Not Running

Async/Await Issues

Next Steps

Getting Started

TypeScript SDK Functions

Integrations

Advanced Topics

​Overview

​Prerequisites

​Basic Usage

​Adding a Session to an Experiment

​Key Points

​Common Patterns

​A/B Testing

​Regression Testing

​Load Testing

​Performance Benchmarking

​Best Practices

​Experiment Organization

​Error Handling

​Batch Processing

​Tips and Tricks

​Finding Experiment IDs

​Conditional Experiment Assignment

​Experiment Metadata in Sessions

​Type Safety

​Common Issues

​Sessions Not Appearing in Experiment

​Experiment ID Not Found

​Evaluations Not Running

​Async/Await Issues

​Related Documentation

​Next Steps

Overview

Prerequisites

Basic Usage

Adding a Session to an Experiment

Key Points

Common Patterns

A/B Testing

Regression Testing

Load Testing

Performance Benchmarking

Best Practices

Experiment Organization

Error Handling

Batch Processing

Tips and Tricks

Finding Experiment IDs

Conditional Experiment Assignment

Experiment Metadata in Sessions

Type Safety

Common Issues

Sessions Not Appearing in Experiment

Experiment ID Not Found

Evaluations Not Running

Async/Await Issues

Related Documentation

Next Steps