Search Beyond Queries: Training Smaller Language Models for Web Interactions via Reinforcement Learning

  • 2024-04-16 21:15:32
  • Moghis Fereidouni, A. B. Siddique
  • 0

Abstract

Traditional search systems focus on query formulation for effective resultsbut face challenges in scenarios such as product searches where crucial productdetails (e.g., size, color) remain concealed until users visit specific productpages. This highlights the need for intelligent web navigation agents capableof formulating queries and navigating web pages according to users' high-levelintents. In response to this need, this work introduces a Grounded LanguageAgent for Intelligent Web Interactions, called GLAINTEL. Drawing uponadvancements in language modeling and reinforcement learning, GLAINTELinvestigates the efficacy of transformer-based models in enhancing the searchcapabilities of interactive web environments. Given the dynamic action spacefor each state in web navigation, GLAINTEL employs the Flan-T5 architecture andincorporates language modeling and value estimation heads. This work focuses ontraining smaller language models as agents across various scenarios,systematically evaluating the impact of human demonstrations on the trainingprocess. Specifically, we investigate scenarios where no human demonstrationsare available and subsequently assess the effective utilization of suchdemonstrations. We also explore unsupervised domain adaptation for situationswhere demonstrations are confined to a specific domain. Experimentalevaluations across diverse setups demonstrate the effectiveness of trainingagents in unsupervised settings, outperforming in-context learning-basedapproaches that employ larger models with up to 540 billion parameters.Surprisingly, behavioral cloning-based methods that straightforwardly use humandemonstrations do not outperform unsupervised learning-based methods.Additionally, combining human demonstrations with Reinforcement Learning-basedtraining yields results comparable to models utilizing GPT-4.

 

Quick Read (beta)

loading the full paper ...