Hunyuan-Large: An Open-Source MoE Model with 52 Billion Activated Parameters by Tencent

  • 2024-11-06 09:15:27
  • Xingwu Sun, Yanfeng Chen, Yiqing Huang, Ruobing Xie, Jiaqi Zhu, Kai Zhang, Shuaipeng Li, Zhen Yang, Jonny Han, Xiaobo Shu, Jiahao Bu, Zhongzhi Chen, Xuemeng Huang, Fengzong Lian, Saiyong Yang, Jianfeng Yan, Yuyuan Zeng, Xiaoqin Ren, Chao Yu, Lulu Wu, Yue Mao, Jun Xia, Tao Yang, Suncong Zheng, Kan Wu, Dian Jiao, Jinbao Xue, Xipeng Zhang, Decheng Wu, Kai Liu, Dengpeng Wu, Guanghui Xu, Shaohua Chen, Shuang Chen, Xiao Feng, Yigeng Hong, Junqiang Zheng, Chengcheng Xu, Zongwei Li, Xiong Kuang, Jianglu Hu, Yiqi Chen, Yuchi Deng, Guiyang Li, Ao Liu, Chenchen Zhang, Shihui Hu, Zilong Zhao, Zifan Wu, Yao Ding, Weichao Wang, Han Liu, Roberts Wang, Hao Fei, Peijie Yu, Ze Zhao, Xun Cao, Hai Wang, Fusheng Xiang, Mengyuan Huang, Zhiyuan Xiong, Bin Hu, Xuebin Hou, Lei Jiang, Jianqiang Ma, Jiajia Wu, Yapin
  • 0

Abstract

In this paper, we introduce Hunyuan-Large, which is currently the largestopen-source Transformer-based mixture of experts model, with a total of 389billion parameters and 52 billion activation parameters, capable of handling upto 256K tokens. We conduct a thorough evaluation of Hunyuan-Large's superiorperformance across various benchmarks including language understanding andgeneration, logical reasoning, mathematical problem-solving, coding,long-context, and aggregated tasks, where it outperforms LLama3.1-70B andexhibits comparable performance when compared to the significantly largerLLama3.1-405B model. Key practice of Hunyuan-Large include large-scalesynthetic data that is orders larger than in previous literature, a mixedexpert routing strategy, a key-value cache compression technique, and anexpert-specific learning rate strategy. Additionally, we also investigate thescaling laws and learning rate schedule of mixture of experts models, providingvaluable insights and guidances for future model development and optimization.The code and checkpoints of Hunyuan-Large are released to facilitate futureinnovations and applications. Codes: https://github.com/Tencent/Hunyuan-Large Models: https://huggingface.co/tencent/Tencent-Hunyuan-Large

 

Quick Read (beta)

loading the full paper ...