Current approaches of Reinforcement Learning (RL) applied in urban AutonomousDriving (AD) focus on decoupling the perception training from the drivingpolicy training. The main reason is to avoid training a convolution encoderalongside a policy network, which is known to have issues related to sampleefficiency, degenerated feature representations, and catastrophicself-overfitting. However, this paradigm can lead to representations of theenvironment that are not aligned with the downstream task, which may result insuboptimal performances. To address this limitation, this paper proposes RLAD,the first Reinforcement Learning from Pixels (RLfP) method applied in the urbanAD domain. We propose several techniques to enhance the performance of an RLfPalgorithm in this domain, including: i) an image encoder that leverages bothimage augmentations and Adaptive Local Signal Mixing (A-LIX) layers; ii)WayConv1D, which is a waypoint encoder that harnesses the 2D geometricalinformation of the waypoints using 1D convolutions; and iii) an auxiliary lossto increase the significance of the traffic lights in the latent representationof the environment. Experimental results show that RLAD significantlyoutperforms all state-of-the-art RLfP methods on the NoCrash benchmark. We alsopresent an infraction analysis on the NoCrash-regular benchmark, whichindicates that RLAD performs better than all other methods in terms of bothcollision rate and red light infractions.