Measuring Search Relevancy: Increasing App Store Ranking with LLM-Designed Judgments

nimda February 27, 2026

0 4 1 minute read

Measuring Search Relevancy: Increasing App Store Ranking with LLM-Designed Judgments

Major search engine marketing programs optimize relevance to drive more efficient times that help users find what they're looking for. To increase relevance, we propose two complementary objectives: behavioral relevance (results users tend to click on or download) and textual relevance (relevance of the semantic result to the query). A continuing challenge is the lack of textual labels provided by experts relative to many behavioral labels. We first address this by systematically testing LLM optimizations, finding that a specialized, well-tuned model significantly outperforms a pre-trained one in providing the most appropriate labels. Using this fit model as a power multiplier, we generate millions of text-related labels to overcome data scarcity. We show that increasing our productivity with these text-related labels leads to a significant shift outside the Pareto frontier: the offline NDCG improves on behavioral relevance while simultaneously growing on textual relevance. These offline benefits were confirmed by a global A/B test at the App Store level, which showed a statistically significant increase of +0.24% in the conversion rate, as well as significant performance benefits from tail queries, where the new contextual text labels provide a strong signal without reliable behavioral related labels.

Source link

nimda February 27, 2026

0 4 1 minute read