Abstract
Complex scenes present significant challenges for predicting human behaviourdue to the abundance of interaction information, such as human-human andhumanenvironment interactions. These factors complicate the analysis andunderstanding of human behaviour, thereby increasing the uncertainty inforecasting human motions. Existing motion prediction methods thus struggle inthese complex scenarios. In this paper, we propose an effective method forhuman motion forecasting in interactive scenes. To achieve a comprehensiverepresentation of interactions, we design a hierarchical interaction featurerepresentation so that high-level features capture the overall context of theinteractions, while low-level features focus on fine-grained details. Besides,we propose a coarse-to-fine interaction reasoning module that leverages bothspatial and frequency perspectives to efficiently utilize hierarchicalfeatures, thereby enhancing the accuracy of motion predictions. Our methodachieves state-of-the-art performance across four public datasets. Code will bereleased when this paper is published.