Language-brain encoding experiments evaluate the ability of language modelsto predict brain responses elicited by language stimuli. The evaluationscenarios for this task have not yet been standardized which makes it difficultto compare and interpret results. We perform a series of evaluation experimentswith a consistent encoding setup and compute the results for multiple fMRIdatasets. In addition, we test the sensitivity of the evaluation measures torandomized data and analyze the effect of voxel selection methods. Ourexperimental framework is publicly available to make modelling decisions moretransparent and support reproducibility for future comparisons.