Evaluating Copyright Takedown Methods for Language Models

Abstract

Language models (LMs) derive their capabilities from extensive training ondiverse data, including potentially copyrighted material. These models canmemorize and generate content similar to their training data, posing potentialconcerns. Therefore, model creators are motivated to develop mitigation methodsthat prevent generating protected content. We term this procedure as copyrighttakedowns for LMs, noting the conceptual similarity to (but legal distinctionfrom) the DMCA takedown This paper introduces the first evaluation of thefeasibility and side effects of copyright takedowns for LMs. We proposeCoTaEval, an evaluation framework to assess the effectiveness of copyrighttakedown methods, the impact on the model's ability to retain uncopyrightablefactual knowledge from the training data whose recitation is embargoed, and howwell the model maintains its general utility and efficiency. We examine severalstrategies, including adding system prompts, decoding-time filteringinterventions, and unlearning approaches. Our findings indicate that no testedmethod excels across all metrics, showing significant room for research in thisunique problem setting and indicating potential unresolved challenges for livepolicy proposals.

Quick Read (beta)

loading the full paper ...