Abstract
Decentralized inference provides a scalable and resilient paradigm forserving large language models (LLMs), enabling distributed resource utilizationand reducing reliance on centralized providers. However, in a permissionlessenvironment without trusted nodes, ensuring the correctness of model outputsremains a core challenge. We introduce VeriLLM, a publicly verifiable protocolfor decentralized LLM inference that achieves security under aone-honest-verifier assumption while maintaining practical efficiency. VeriLLMcombines lightweight empirical rerunning with cryptographic commitments,allowing verifiers to validate results at approximately 1% of the underlyinginference cost. To prevent verification bottlenecks, we design an isomorphicinference-verification architecture that multiplexes both inference andverification roles across the same GPU workers. This design (i) improves GPUutilization and overall throughput, (ii) enlarges the effective validator set,enhancing robustness and liveness, and (iii) enforces task indistinguishabilityto prevent node-specific optimizations or selective behavior. Throughtheoretical analysis and system-level evaluation, we show that VeriLLM achievesreliable public verifiability with minimal overhead, offering a practicalfoundation for trustworthy and scalable decentralized LLM inference.