Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

Abstract

In this work, we leverage the intrinsic segmentation of language sequencesand design a new positional encoding method called Bilevel Positional Encoding(BiPE). For each position, our BiPE blends an intra-segment encoding and aninter-segment encoding. The intra-segment encoding identifies the locationswithin a segment and helps the model capture the semantic information thereinvia absolute positional encoding. The inter-segment encoding specifies thesegment index, models the relationships between segments, and aims to improveextrapolation capabilities via relative positional encoding. Theoreticalanalysis shows this disentanglement of positional information makes learningmore effective. The empirical results also show that our BiPE has superiorlength extrapolation capabilities across a wide range of tasks in diverse textmodalities.

Quick Read (beta)

loading the full paper ...