Evals UI

This project aims to allow people to quickly understand MultiOn agent behavior.

Agents

Easily create and manage intelligent agents for a variety of tasks.

Manage and review all agent sessions efficiently.

Track and manage all test runs across various agents, evaluators, and datasets.

Access detailed evaluation summaries for agent runs and review key performance metrics.

Analyze and review test cases to understand the behavior of agents.

View and manage key metrics for various evaluation processes.

Lessons learned from evaluating agents.