Enhancing PyKEEN with Multiple Negative Sampling Solutions for Knowledge Graph Embedding Models

  • 2025-08-07 17:24:34
  • Claudia d'Amato, Ivan Diliso, Nicola Fanizzi, Zafar Saeed
  • 0

Abstract

Embedding methods have become popular due to their scalability on linkprediction and/or triple classification tasks on Knowledge Graphs. Embeddingmodels are trained relying on both positive and negative samples of triples.However, in the absence of negative assertions, these must be usuallyartificially generated using various negative sampling strategies, ranging fromrandom corruption to more sophisticated techniques which have an impact on theoverall performance. Most of the popular libraries for knowledge graphembedding, support only basic such strategies and lack advanced solutions. Toaddress this gap, we deliver an extension for the popular KGE framework PyKEENthat integrates a suite of several advanced negative samplers (including bothstatic and dynamic corruption strategies), within a consistent modulararchitecture, to generate meaningful negative samples, while remainingcompatible with existing PyKEEN -based workflows and pipelines. The developedextension not only enhancesPyKEEN itself but also allows for easier andcomprehensive development of embedding methods and/or for their customization.As a proof of concept, we present a comprehensive empirical study of thedeveloped extensions and their impact on the performance (link predictiontasks) of different embedding methods, which also provides useful insights forthe design of more effective strategies

 

Quick Read (beta)

loading the full paper ...