Abstract
This paper introduces MARCO (Multi-Agent Reinforcement learning withConformal Optimization), a novel hardware-aware framework for efficient neuralarchitecture search (NAS) targeting resource-constrained edge devices. Bysignificantly reducing search time and maintaining accuracy under stricthardware constraints, MARCO bridges the gap between automated DNN design andCAD for edge AI deployment. MARCO's core technical contribution lies in itsunique combination of multi-agent reinforcement learning (MARL) with ConformalPrediction (CP) to accelerate the hardware/software co-design process fordeploying deep neural networks. Unlike conventional once-for-all (OFA) supernetapproaches that require extensive pretraining, MARCO decomposes the NAS taskinto a hardware configuration agent (HCA) and a Quantization Agent (QA). TheHCA optimizes high-level design parameters, while the QA determines per-layerbit-widths under strict memory and latency budgets using a shared reward signalwithin a centralized-critic, decentralized-execution (CTDE) paradigm. A keyinnovation is the integration of a calibrated CP surrogate model that providesstatistical guarantees (with a user-defined miscoverage rate) to pruneunpromising candidate architectures before incurring the high costs of partialtraining or hardware simulation. This early filtering drastically reduces thesearch space while ensuring that high-quality designs are retained with a highprobability. Extensive experiments on MNIST, CIFAR-10, and CIFAR-100demonstrate that MARCO achieves a 3-4x reduction in total search time comparedto an OFA baseline while maintaining near-baseline accuracy (within 0.3%).Furthermore, MARCO also reduces inference latency. Validation on a MAX78000evaluation board confirms that simulator trends hold in practice, withsimulator estimates deviating from measured values by less than 5%.