Harnessing the Power of Interleaving and Counterfactual Evaluation for Airbnb Search Ranking

Abstract

Evaluation plays a crucial role in the development of ranking algorithms onsearch and recommender systems. It enables online platforms to createuser-friendly features that drive commercial success in a steady and effectivemanner. The online environment is particularly conducive to applying causalinference techniques, such as randomized controlled experiments (known as A/Btest), which are often more challenging to implement in fields like medicineand public policy. However, businesses face unique challenges when it comes toeffective A/B test. Specifically, achieving sufficient statistical power forconversion-based metrics can be time-consuming, especially for significantpurchases like booking accommodations. While offline evaluations are quickerand more cost-effective, they often lack accuracy and are inadequate forselecting candidates for A/B test. To address these challenges, we developedinterleaving and counterfactual evaluation methods to facilitate rapid onlineassessments for identifying the most promising candidates for A/B tests. Ourapproach not only increased the sensitivity of experiments by a factor of up to100 (depending on the approach and metrics) compared to traditional A/B testingbut also streamlined the experimental process. The practical insights gainedfrom usage in production can also benefit organizations with similar interests.

Quick Read (beta)

loading the full paper ...