← Back to Changelog

[Open Beta] Auto Evaluation for Search Quality at Scale

Auto Evaluation introduces an LLM-driven pipeline that automatically measures search quality across hundreds of queries and multiple customer profiles. It replaces manual spot-checking with structured, repeatable evaluations of your search configuration.

What it does

Auto Evaluation runs your most important queries—pulled from analytics or test datasets—through different customer contexts to assess how well your search performs under real-world conditions.

Key capabilities

  • Automated testing across top searches or seeded queries
  • Multi-profile evaluation to understand personalization impact
  • Comprehensive scoring covering relevance, intent satisfaction, attributes, brand handling, and result diversity
  • Run comparisons to track improvements or regressions over time
  • Configuration tuning to test different search settings per run
  • Exportable results (CSV or JSON) for deeper analysis

How results are scored

Each query produces an overall quality score, backed by detailed metrics for relevance, intent, attribute compliance, brand handling, and result diversity.

Auto Evaluation helps you systematically identify weak queries, validate tuning changes, and confidently improve search quality over time.

Learn More

FeatureAI SearchOpen BetaEvaluation