Abstract
Near-term quantum devices provide only finite-shot measurements and prepare imperfect, contaminated states. This motivates algorithms that convert samples into reliable low-energy estimates without full tomography or exhaustive measurements. We propose Active Sampling Sample-based Quantum Diagonalization (AS-SQD), framing SQD as an active learning problem: given measured bitstrings, which additional basis states should be included to efficiently recover the ground-state energy? SQD restricts the Hamiltonian to a selected set of basis states and classically diagonalizes the restricted matrix. However, naive SQD using only sampled states suffers from bias under finite-shot sampling and excited-state contamination, while blind random expansion is inefficient as system size grows. We introduce a perturbation-theoretic acquisition function based on Epstein--Nesbet second-order energy corrections to rank candidate basis states connected to the current subspace. At each iteration, AS-SQD diagonalizes the restricted Hamiltonian, generates connected candidates, and adds the most valuable ones according to this score. We evaluate AS-SQD on disordered Heisenberg and Transverse-Field Ising (TFIM) spin chains up to 16 qubits under a preparation model mixing 80\% ground state and 20\% first excited state. Furthermore, we validate its robustness against real-world state preparation and measurement (SPAM) errors using physical samples from an IBM Quantum processor. Across simulated and hardware evaluations, AS-SQD consistently achieves substantially lower absolute energy errors than standard SQD and random expansion. Detailed ablation studies demonstrate that physics-guided basis acquisition effectively concentrates computation on energetically relevant directions, bypassing exponential combinatorial bottlenecks.