Abstract
A common view on the brain learning processes proposes that the three classiclearning paradigms -- unsupervised, reinforcement, and supervised -- take placein respectively the cortex, the basal-ganglia, and the cerebellum. However,dopamine outbursts, usually assumed to encode reward, are not limited to thebasal ganglia but also reach prefrontal, motor, and higher sensory cortices. Wepropose that in the cortex the same reward-based trial-and-error processesmight support not only the acquisition of motor representations but also ofsensory representations. In particular, reward signals might guidetrial-and-error processes that mix with associative learning processes tosupport the acquisition of representations better serving downstream actionselection. We tested the soundness of this hypothesis with a computationalmodel that integrates unsupervised learning (Contrastive Divergence) andreinforcement learning (REINFORCE). The model was tested with a task requiringdifferent responses to different visual images grouped in categories involvingeither colour, shape, or size. Results show that a balanced mix of unsupervisedand reinforcement learning processes leads to the best performance. Indeed,excessive unsupervised learning tends to under-represent task-relevant featureswhile excessive reinforcement learning tends to initially learn slowly and thento incur in local minima. These results stimulate future empirical studies oncategory learning directed to investigate similar effects in the extrastriatevisual cortices. Moreover, they prompt further computational investigationsdirected to study the possible advantages of integrating unsupervised andreinforcement learning processes.