Abstract
Reinforcement Learning (RL) is used extensively in Autonomous Systems (AS) asit enables learning at runtime without the need for a model of the environmentor predefined actions. However, most applications of RL in AS, such as thosebased on Q-learning, can only optimize one objective, making it necessary inmulti-objective systems to combine multiple objectives in a single objectivefunction with predefined weights. A number of Multi-Objective ReinforcementLearning (MORL) techniques exist but they have mostly been applied in RLbenchmarks rather than real-world AS systems. In this work, we use a MORLtechnique called Deep W-Learning (DWN) and apply it to the Emergent Web Serversexemplar, a self-adaptive server, to find the optimal configuration for runtimeperformance optimization. We compare DWN to two single-objective optimizationimplementations: {\epsilon}-greedy algorithm and Deep Q-Networks. Our initialevaluation shows that DWN optimizes multiple objectives simultaneously withsimilar results than DQN and {\epsilon}-greedy approaches, having a betterperformance for some metrics, and avoids issues associated with combiningmultiple objectives into a single utility function.