A major challenge in the field of education is providing review schedulesthat present learned items at appropriate intervals to each student so thatmemory is retained over time. In recent years, attempts have been made toformulate item reviews as sequential decision-making problems to realizeadaptive instruction based on the knowledge state of students. It has beenreported previously that reinforcement learning can help realize mathematicalmodels of students learning strategies to maintain a high memory rate. However,optimization using reinforcement learning requires a large number ofinteractions, and thus it cannot be applied directly to actual students. Inthis study, we propose a framework for optimizing teaching strategies byconstructing a virtual model of the student while minimizing the interactionwith the actual teaching target. In addition, we conducted an experimentconsidering actual instructions using the mathematical model and confirmed thatthe model performance is comparable to that of conventional teaching methods.Our framework can directly substitute mathematical models used in experimentswith human students, and our results can serve as a buffer between theoreticalinstructional optimization and practical applications in e-learning systems.