Automatic Identification of Machine Learning-Specific Code Smells

Abstract

Machine learning (ML) has rapidly grown in popularity, becoming vital to manyindustries. Currently, the research on code smells in ML applications lackstools and studies that address the identification and validity of ML-specificcode smells. This work investigates suitable methods and tools to design anddevelop a static code analysis tool (MLpylint) based on code smell criteria.This research employed the Design Science Methodology. In the problemidentification phase, a literature review was conducted to identify ML-specificcode smells. In solution design, a secondary literature review andconsultations with experts were performed to select methods and tools forimplementing the tool. We evaluated the tool on data from 160 open-source MLapplications sourced from GitHub. We also conducted a static validation throughan expert survey involving 15 ML professionals. The results indicate theeffectiveness and usefulness of the MLpylint. We aim to extend our currentapproach by investigating ways to introduce MLpylint seamlessly intodevelopment workflows, fostering a more productive and innovative developerenvironment.

Quick Read (beta)

loading the full paper ...