Abstract
Word sense induction (WSI) is the problem of grouping occurrences of anambiguous word according to the expressed sense of this word. Recently a newapproach to this task was proposed, which generates possible substitutes forthe ambiguous word in a particular context using neural language models, andthen clusters sparse bag-of-words vectors built from these substitutes. In thiswork, we apply this approach to the Russian language and improve it in twoways. First, we propose methods of combining left and right contexts, resultingin better substitutes generated. Second, instead of fixed number of clustersfor all ambiguous words we propose a technique for selecting individual numberof clusters for each word. Our approach established new state-of-the-art level,improving current best results of WSI for the Russian language on two RUSSE2018 datasets by a large margin.