Abstract
We consider an extended notion of reinforcement learning in which theenvironment can simulate the agent and base its outputs on the agent'shypothetical behavior. Since good performance usually requires paying attentionto whatever things the environment's outputs are based on, we argue that for anagent to achieve on-average good performance across many such extendedenvironments, it is necessary for the agent to self-reflect. Thus, an agent'sself-reflection ability can be numerically estimated by running the agentthrough a battery of extended environments. We are simultaneously releasing anopen-source library of extended environments to serve as proof-of-concept ofthis technique. As the library is first-of-kind, we have avoided the difficultproblem of optimizing it. Instead we have chosen environments with interestingproperties. Some seem paradoxical, some lead to interesting thoughtexperiments, some are even suggestive of how self-reflection might have evolvedin nature. We give examples and introduce a simple transformation whichexperimentally seems to increase self-reflection.