Abstract
Ensuring content compliance with community guidelines is crucial formaintaining healthy online social environments. However, traditionalhuman-based compliance checking struggles with scaling due to the increasingvolume of user-generated content and a limited number of moderators. Recentadvancements in Natural Language Understanding demonstrated by Large LanguageModels unlock new opportunities for automated content compliance verification.This work evaluates six AI-agents built on Open-LLMs for automated rulecompliance checking in Decentralized Social Networks, a challenging environmentdue to heterogeneous community scopes and rules. Analyzing over 50,000 postsfrom hundreds of Mastodon servers, we find that AI-agents effectively detectnon-compliant content, grasp linguistic subtleties, and adapt to diversecommunity contexts. Most agents also show high inter-rater reliability andconsistency in score justification and suggestions for compliance. Human-basedevaluation with domain experts confirmed the agents' reliability andusefulness, rendering them promising tools for semi-automated orhuman-in-the-loop content moderation systems.