Abstract
Typically, machine learning models are trained and evaluated without makingany distinction between users (e.g, using traditional hold-out andcross-validation). However, this produces inaccurate performance metricsestimates in multi-user settings. That is, situations where the data werecollected by multiple users with different characteristics (e.g., age, gender,height, etc.) which is very common in user computer interaction and medicalapplications. For these types of scenarios model evaluation strategies thatprovide better performance estimates have been proposed such as mixed,user-independent, user-dependent, and user-adaptive models. Although thosestrategies are better suited for multi-user systems, they are typicallyassessed with respect to performance metrics that capture the overall behaviorof the models and do not provide any performance guarantees for individualpredictions nor they provide any feedback about the predictions' uncertainty.In order to overcome those limitations, in this work we evaluated the conformalprediction framework in several multi-user settings. Conformal prediction is amodel agnostic method that provides confidence guarantees on the predictions,thus, increasing the trustworthiness and robustness of the models. We conductedextensive experiments using different evaluation strategies and foundsignificant differences in terms of conformal performance measures. We alsoproposed several visualizations based on matrices, graphs, and charts thatcapture different aspects of the resulting prediction sets.