Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

  • 2024-01-29 18:59:07
  • Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Di He, Jingjing Xu, Zhi Zhang, Hongxia Yang, Liwei Wang
  • 0

Abstract

In this work, we leverage the intrinsic segmentation of language sequencesand design a new positional encoding method called Bilevel Positional Encoding(BiPE). For each position, our BiPE blends an intra-segment encoding and aninter-segment encoding. The intra-segment encoding identifies the locationswithin a segment and helps the model capture the semantic information thereinvia absolute positional encoding. The inter-segment encoding specifies thesegment index, models the relationships between segments, and aims to improveextrapolation capabilities via relative positional encoding. Theoreticalanalysis shows this disentanglement of positional information makes learningmore effective. The empirical results also show that our BiPE has superiorlength extrapolation capabilities across a wide range of tasks in diverse textmodalities.

 

Quick Read (beta)

loading the full paper ...