Reka Core, Flash, and Edge: A Series of Powerful Multimodal Language Models

Abstract

We introduce Reka Core, Flash, and Edge, a series of powerful multimodallanguage models trained from scratch by Reka. Reka models are able to processand reason with text, images, video, and audio inputs. This technical reportdiscusses details of training some of these models and provides comprehensiveevaluation results. We show that Reka Edge and Reka Flash are not onlystate-of-the-art but also outperform many much larger models, deliveringoutsized values for their respective compute class. Meanwhile, our most capableand largest model, Reka Core, approaches the best frontier models on bothautomatic evaluations and blind human evaluations. On image question answeringbenchmarks (e.g. MMMU, VQAv2), Core performs competitively to GPT4-V.Meanwhile, on multimodal chat, Core ranks as the second most preferred modelunder a blind third-party human evaluation setup, outperforming other modelssuch as Claude 3 Opus. On text benchmarks, Core not only performs competitivelyto other frontier models on a set of well-established benchmarks (e.g. MMLU,GSM8K) but also outperforms GPT4-0613 on human evaluation. On video questionanswering (Perception-Test), Core outperforms Gemini Ultra. Models are shippedin production at http://chat.reka.ai . A showcase of non cherry pickedqualitative examples can also be found at http://showcase.reka.ai .

Quick Read (beta)

loading the full paper ...