Abstract
This study conducts a systematic assessment of the capabilities of 12 machinelearning models and model variations in detecting economic ideology. As anevaluation benchmark, I use manifesto data spanning six elections in the UnitedKingdom and pre-annotated by expert and crowd coders. The analysis assesses theperformance of several generative, fine-tuned, and zero-shot models at thegranular and aggregate levels. The results show that generative models such asGPT-4o and Gemini 1.5 Flash consistently outperform other models against allbenchmarks. However, they pose issues of accessibility and resourceavailability. Fine-tuning yielded competitive performance and offers a reliablealternative through domain-specific optimization. But its dependency ontraining data severely limits scalability. Zero-shot models consistently facedifficulties with identifying signals of economic ideology, often resulting innegative associations with human coding. Using general knowledge for thedomain-specific task of ideology scaling proved to be unreliable. Other keyfindings include considerable within-party variation, fine-tuning benefitingfrom larger training data, and zero-shot's sensitivity to prompt content. Theassessments include the strengths and limitations of each model and derivebest-practices for automated analyses of political content.