Abstract
Scientific workflow systems are increasingly popular for expressing andexecuting complex data analysis pipelines over large datasets, as they offerreproducibility, dependability, and scalability of analyses by automaticparallelization on large compute clusters. However, implementing workflows isdifficult due to the involvement of many black-box tools and the deepinfrastructure stack necessary for their execution. Simultaneously,user-supporting tools are rare, and the number of available examples is muchlower than in classical programming languages. To address these challenges, weinvestigate the efficiency of Large Language Models (LLMs), specificallyChatGPT, to support users when dealing with scientific workflows. We performedthree user studies in two scientific domains to evaluate ChatGPT forcomprehending, adapting, and extending workflows. Our results indicate thatLLMs efficiently interpret workflows but achieve lower performance forexchanging components or purposeful workflow extensions. We characterize theirlimitations in these challenging scenarios and suggest future researchdirections.