The ear, as an important part of the human head, has received much lessattention compared to the human face in the area of computer vision. Inspiredby previous work on monocular 3D face reconstruction using an autoencoderstructure to achieve self-supervised learning, we aim to utilise such aframework to tackle the 3D ear reconstruction task, where more subtle anddifficult curves and features are present on the 2D ear input images. Our HumanEar Reconstruction Autoencoder (HERA) system predicts 3D ear poses and shapeparameters for 3D ear meshes, without any supervision to these parameters. Tomake our approach cover the variance for in-the-wild images, even grayscaleimages, we propose an in-the-wild ear colour model. The constructed end-to-endself-supervised model is then evaluated both with 2D landmark localisationperformance and the appearance of the reconstructed 3D ears.