Helix: Accelerating Human-in-the-loop Machine Learning

Abstract

Data application developers and data scientists spend an inordinate amount oftime iterating on machine learning (ML) workflows -- by modifying the datapre-processing, model training, and post-processing steps -- viatrial-and-error to achieve the desired model performance. Existing work onaccelerating machine learning focuses on speeding up one-shot execution ofworkflows, failing to address the incremental and dynamic nature of typical MLdevelopment. We propose Helix, a declarative machine learning system thataccelerates iterative development by optimizing workflow execution end-to-endand across iterations. Helix minimizes the runtime per iteration via programanalysis and intelligent reuse of previous results, which are selectivelymaterialized -- trading off the cost of materialization for potential futurebenefits -- to speed up future iterations. Additionally, Helix offers agraphical interface to visualize workflow DAGs and compare versions tofacilitate iterative development. Through two ML applications, inclassification and in structured prediction, attendees will experience thesuccinctness of Helix programming interface and the speed and ease of iterativedevelopment using Helix. In our evaluations, Helix achieved up to an order ofmagnitude reduction in cumulative run time compared to state-of-the-art machinelearning tools.

Quick Read (beta)

loading the full paper ...