In decision-making for autonomous vehicles, we need to predict othervehicle's behaviors or learn their behavior implicitly using machine learning.However, often the predictions and learned models have errors or might be wrongaltogether which can lead to dangerous situations. Therefore, decision-makingalgorithms should consider counterfactual reasoning such as: what would happenif the other agents will behave in a certain way? The approach we present inthis paper is two-fold: First, during training, we randomly select behaviormodels from a behavior model pool and assign them to the other vehicles in thescenario, such as more passive or aggressive behavior models. Second, duringthe application, we derive several virtual worlds from the actual world thathave the same initial state. In each of these worlds, we also assign behaviormodels from the behavior model pool to others. We then evolve these virtualworlds for a defined time-horizon. This enables us to apply counterfactualreasoning by asking what would happen if the actual world evolves as in thevirtual world. In uncertain environments, this makes it possible to generatemore probable risk estimates and, thus, to enable safer decision-making. Weconduct studies using a lane-change scenario that shows the advantages ofcounterfactual reasoning using learned policies and virtual worlds to estimatetheir risk and performance.