Abstract
Although automatic shot transition detection approaches are alreadyinvestigated for more than two decades, an effective universal human-levelmodel was not proposed yet. Even for common shot transitions like hard cuts orsimple gradual changes, the potential diversity of analyzed video contents maystill lead to both false hits and false dismissals. Recently, deeplearning-based approaches significantly improved the accuracy of shottransition detection using 3D convolutional architectures and artificiallycreated training data. Nevertheless, one hundred percent accuracy is still anunreachable ideal. In this paper, we share the current version of our deepnetwork TransNet V2 that reaches state-of-the-art performance on respectedbenchmarks. A trained instance of the model is provided so it can be instantlyutilized by the community for a highly efficient analysis of large videoarchives. Furthermore, the network architecture, as well as our experience withthe training process, are detailed, including simple code snippets forconvenient usage of the proposed model and visualization of results.