DNN accelerators provide efficiency by leveraging reuse ofactivations/weights/outputs during the DNN computations to reduce data movementfrom DRAM to the chip. The reuse is captured by the accelerator's dataflow.While there has been significant prior work in exploring and comparing variousdataflows, the strategy for assigning on-chip hardware resources (i.e., computeand memory) given a dataflow that can optimize for performance/energy whilemeeting platform constraints of area/power for DNN(s) of interest is stillrelatively unexplored. The design-space of choices for balancing compute andmemory explodes combinatorially, as we show in this work (e.g., as large asO(10^(72)) choices for running \mobilenet), making it infeasible to domanual-tuning via exhaustive searches. It is also difficult to come up with aspecific heuristic given that different DNNs and layer types exhibit differentamounts of reuse. In this paper, we propose an autonomous strategy called ConfuciuX to findoptimized HW resource assignments for a given model and dataflow style.ConfuciuX leverages a reinforcement learning method, REINFORCE, to guide thesearch process, leveraging a detailed HW performance cost model within thetraining loop to estimate rewards. We also augment the RL approach with agenetic algorithm for further fine-tuning. ConfuciuX demonstrates the highestsample-efficiency for training compared to other techniques such as Bayesianoptimization, genetic algorithm, simulated annealing, and other RL methods. Itconverges to the optimized hardware configuration 4.7 to 24 times faster thanalternate techniques.