Deep Reinforcement Learning (DRL) underlies in a simulated environment andoptimizes objective goals. By extending the conventional interaction scheme,this paper proffers gym-ds3, a scalable and reproducible open environmenttailored for a high-fidelity Domain-Specific System-on-Chip (DSSoC)application. The simulation corroborates to schedule hierarchical jobs ontoheterogeneous System-on-Chip (SoC) processors and bridges the system toreinforcement learning research. We systematically analyze the representativeSoC simulator and discuss the primary challenging aspects that the system (1)continuously generates indefinite jobs at a rapid injection rate, (2) optimizescomplex objectives, and (3) operates in steady-state scheduling. We provideexemplary snippets and experimentally demonstrate the run-time performances ondifferent schedulers that successfully mimic results achieved from the standardDS3 framework and real-world embedded systems.