[Open Beta] Auto Evaluation for Search Quality at Scale

Auto Evaluation introduces an LLM-driven pipeline that automatically measures search quality across hundreds of queries and multiple customer profiles. It replaces manual spot-checking with structured, repeatable evaluations of your search configuration.

What it does

Auto Evaluation runs your most important queries—pulled from analytics or test datasets—through different customer contexts to assess how well your search performs under real-world conditions.

Key capabilities

Automated testing across top searches or seeded queries
Multi-profile evaluation to understand personalization impact
Comprehensive scoring covering relevance, intent satisfaction, attributes, brand handling, and result diversity
Run comparisons to track improvements or regressions over time
Configuration tuning to test different search settings per run
Exportable results (CSV or JSON) for deeper analysis

How results are scored

Each query produces an overall quality score, backed by detailed metrics for relevance, intent, attribute compliance, brand handling, and result diversity.

Auto Evaluation helps you systematically identify weak queries, validate tuning changes, and confidently improve search quality over time.

Learn More

[Open Beta] Auto Evaluation for Search Quality at Scale

Products

Resources

Compare