Catastrophic forgetting occurs when a neural network loses the informationlearned with the first task, after training on a second task. This problemremains a hurdle for general artificial intelligence systems with sequentiallearning capabilities. In this paper, we propose a task-based hard attentionmechanism that preserves previous tasks' information without substantiallyaffecting the current task's learning. An attention mask is learnedconcurrently to every task through stochastic gradient descent, and previousmasks are exploited to constrain such learning. We show that the proposedmechanism is effective for reducing catastrophic forgetting, cutting currentrates by 33 to 84%. We also show that it is robust to different hyperparameterchoices and that it offers a number of monitoring capabilities. The approachfeatures the possibility to control both the stability and compactness of thelearned knowledge, which we believe makes it also attractive for onlinelearning and network compression applications.