SpikingBrain Technical Report: Spiking Brain-inspired Large Models

  • 2025-09-05 17:34:00
  • Yuqi Pan, Yupeng Feng, Jinghao Zhuang, Siyu Ding, Zehao Liu, Bohan Sun, Yuhong Chou, Han Xu, Xuerui Qiu, Anlin Deng, Anjie Hu, Peng Zhou, Man Yao, Jibin Wu, Jian Yang, Guoliang Sun, Bo Xu, Guoqi Li
  • 0

Abstract

Mainstream Transformer-based large language models face major efficiencybottlenecks: training computation scales quadratically with sequence length,and inference memory grows linearly, limiting long-context processing. Buildinglarge models on non-NVIDIA platforms also poses challenges for stable andefficient training. To address this, we introduce SpikingBrain, a family ofbrain-inspired models designed for efficient long-context training andinference. SpikingBrain leverages the MetaX GPU cluster and focuses on threeaspects: (1) Model Architecture: linear and hybrid-linear attentionarchitectures with adaptive spiking neurons; (2) Algorithmic Optimizations: anefficient, conversion-based training pipeline and a dedicated spike codingframework; (3) System Engineering: customized training frameworks, operatorlibraries, and parallelism strategies tailored to MetaX hardware. Using these techniques, we develop two models: SpikingBrain-7B, a linear LLM,and SpikingBrain-76B, a hybrid-linear MoE LLM. These models demonstrate thefeasibility of large-scale LLM development on non-NVIDIA platforms.SpikingBrain achieves performance comparable to open-source Transformerbaselines while using only about 150B tokens for continual pre-training. Ourmodels significantly improve long-sequence training efficiency and deliverinference with (partially) constant memory and event-driven spiking behavior.For example, SpikingBrain-7B attains over 100x speedup in Time to First Tokenfor 4M-token sequences. Training remains stable for weeks on hundreds of MetaXC550 GPUs, with the 7B model reaching a Model FLOPs Utilization of 23.4percent. The proposed spiking scheme achieves 69.15 percent sparsity, enablinglow-power operation. Overall, this work demonstrates the potential ofbrain-inspired mechanisms to drive the next generation of efficient andscalable large model design.

 

Quick Read (beta)

loading the full paper ...