NLEBench+NorGLM: A Comprehensive Empirical Analysis and Benchmark Dataset for Generative Language Models in Norwegian

Abstract

Recent advancements in Generative Language Models (GLMs) have transformedNatural Language Processing (NLP) by showcasing the effectiveness of the"pre-train, prompt, and predict" paradigm in utilizing pre-trained GLMknowledge for diverse applications. Despite their potential, these capabilitieslack adequate quantitative characterization due to the absence of comprehensivebenchmarks, particularly for low-resource languages. Existing low-resourcebenchmarks focus on discriminative language models like BERT, neglecting theevaluation of generative language models. Moreover, current benchmarks oftenoverlook measuring generalization performance across multiple tasks, a crucialmetric for GLMs. To bridge these gaps, we introduce NLEBench, a comprehensive benchmarktailored for evaluating natural language generation capabilities in Norwegian,a low-resource language. We use Norwegian as a case study to explore whethercurrent GLMs and benchmarks in mainstream languages like English can reveal theunique characteristics of underrepresented languages. NLEBench encompasses asuite of real-world NLP tasks ranging from news storytelling, summarization,open-domain conversation, natural language understanding, instructionfine-tuning, toxicity and bias evaluation, to self-curated Chain-of-Thoughtinvestigation. It features two high-quality, human-annotated datasets: aninstruction dataset covering traditional Norwegian cultures, idioms, slang, andspecial expressions, and a document-grounded multi-label dataset for topicclassification, question answering, and summarization. This paper alsointroduces foundational Norwegian Generative Language Models (NorGLMs)developed with diverse parameter scales and Transformer-based architectures.Systematic evaluations on the proposed benchmark suite provide insights intothe capabilities and scalability of NorGLMs across various downstream tasks.

Quick Read (beta)

loading the full paper ...