Lessons Learned from Applying off-the-shelf BERT: There is no SilverBullet

Abstract

One of the challenges in the NLP field is training large classificationmodels, a task that is both difficult and tedious. It is even harder when GPUhardware is unavailable. The increased availability of pre-trained andoff-the-shelf word embeddings, models, and modules aim at easing the process oftraining large models and achieving a competitive performance. We explore theuse of off-the-shelf BERT models and share the results of our experiments andcompare their results to those of LSTM networks and more simple baselines. Weshow that the complexity and computational cost of BERT is not a guarantee forenhanced predictive performance in the classification tasks at hand.

Quick Read (beta)

loading the full paper ...