In this paper, a comparative experimental assessment of computer vision-basedmethods for sign language recognition is conducted. By implementing the mostrecent deep neural network methods in this field, a thorough evaluation onmultiple publicly available datasets is performed. The aim of the present studyis to provide insights on sign language recognition, focusing on mappingnon-segmented video streams to glosses. For this task, two new sequencetraining criteria, known from the fields of speech and scene text recognition,are introduced. Furthermore, a plethora of pretraining schemes is thoroughlydiscussed. Finally, a new RGB+D dataset for the Greek sign language is created.To the best of our knowledge, this is the first sign language dataset wheresentence and gloss level annotations are provided for a video capture.