An Automated Multi-modal Evaluation Framework for Mobile Intelligent Assistants Based on Large Language Models and Multi-Agent Collaboration

  • 2025-10-21 14:26:59
  • Meiping Wang, Jian Zhong, Rongduo Han, Liming Kang, Zhengkun Shi, Xiao Liang, Xing Lin, Nan Gao, Haining Zhang
  • 0

Abstract

With the rapid development of mobile intelligent assistant technologies,multi-modal AI assistants have become essential interfaces for daily userinteractions. However, current evaluation methods face challenges includinghigh manual costs, inconsistent standards, and subjective bias. This paperproposes an automated multi-modal evaluation framework based on large languagemodels and multi-agent collaboration. The framework employs a three-tier agentarchitecture consisting of interaction evaluation agents, semantic verificationagents, and experience decision agents. Through supervised fine-tuning on theQwen3-8B model, we achieve a significant evaluation matching accuracy withhuman experts. Experimental results on eight major intelligent agentsdemonstrate the framework's effectiveness in predicting users' satisfaction andidentifying generation defects.

 

Quick Read (beta)

loading the full paper ...