Mass Simulations

A Mass Simulation is a collection of Sessions where the same agent runs the same task multiple times. Due to the inherent non-determinism of autonomous agents, a single run is often misleading — Mass Sims let you observe what your agent can do, usually does, and sometimes fails to do. Learn to create one here

Why Mass Sims?

Mass Sims help answer:
  • What are the possible paths my agent might take?
  • Where does the agent behave inconsistently?
  • Are there probabilistic failure modes I’m not catching in single sessions?
  • What is the agent’s real-world reliability over many runs?
If your agent succeeds only 80–90% of the time, it’s not ready for production. Mass Sims reveal where and why the other 10–20% breaks.

Structure

Each Mass Simulation:
  • Groups many Sessions that run the same task
  • Computes a unified Workflow Trajectory graph
  • Clusters Steps across Sessions based on similar states
  • Shows a probabilistic map of your agent’s possible decisions
MassSim
├── Session 1
│   └── Steps → Events
├── Session 2
│   └── Steps → Events
├── ...
└── Workflow Trajectory Graph (merged from all Sessions)