Optimal Decision-Making in Mixed-Agent Partially Observable Stochastic Environments via Reinforcement Learning

Abstract

Optimal decision making with limited or no information in stochasticenvironments where multiple agents interact is a challenging topic in the realmof artificial intelligence. Reinforcement learning (RL) is a popular approachfor arriving at optimal strategies by predicating stimuli, such as the rewardfor following a strategy, on experience. RL is heavily explored in thesingle-agent context, but is a nascent concept in multiagent problems. To thisend, I propose several principled model-free and partially model-basedreinforcement learning approaches for several multiagent settings. In the realmof normative reinforcement learning, I introduce scalable extensions to MonteCarlo exploring starts for partially observable Markov Decision Processes(POMDP), dubbed MCES-P, where I expand the theory and algorithm to themultiagent setting. I first examine MCES-P with probably approximately correct(PAC) bounds in the context of multiagent setting, showing MCESP+PAC holds inthe presence of other agents. I then propose a more sample-efficientmethodology for antagonistic settings, MCESIP+PAC. For cooperative settings, Iextend MCES-P to the Multiagent POMDP, dubbed MCESMP+PAC. I then explore theuse of reinforcement learning as a methodology in searching for optima inrealistic and latent model environments. First, I explore a parameterizedQ-learning approach in modeling humans learning to reason in an uncertain,multiagent environment. Next, I propose an implementation of MCES-P, along withimage segmentation, to create an adaptive team-based reinforcement learningtechnique to positively identify the presence of phenotypically-expressed waterand pathogen stress in crop fields.

Quick Read (beta)

loading the full paper ...