The remarkable technological advance in well-equipped wearable devices ispushing an increasing production of long first-person videos. However, sincemost of these videos have long and tedious parts, they are forgotten or neverseen. Despite a large number of techniques proposed to fast-forward thesevideos by highlighting relevant moments, most of them are image based only.Most of these techniques disregard other relevant sensors present in thecurrent devices such as high-definition microphones. In this work, we propose anew approach to fast-forward videos using psychoacoustic metrics extracted fromthe soundtrack. These metrics can be used to estimate the annoyance of asegment allowing our method to emphasize moments of sound pleasantness. Theefficiency of our method is demonstrated through qualitative results andquantitative results as far as of speed-up and instability are concerned.