### Abstract

We consider the problem of modeling trajectories of drivers in a road networkfrom the perspective of inverse reinforcement learning. Cars are detected bysensors placed on sparsely distributed points on the street network of a city.As rational agents, drivers are trying to maximize some reward function unknownto an external observer. We apply the concept of random utility fromeconometrics to model the unknown reward function as a function of observed andunobserved features. In contrast to current inverse reinforcement learningapproaches, we do not assume that agents act according to a stochastic policy;rather, we assume that agents act according to a deterministic optimal policyand show that randomness in data arises because the exact rewards are not fullyobserved by an external observer. We introduce the concept of extended state tocope with unobserved features and develop a Markov decision process formulationof drivers decisions. We present theoretical results which guarantee theexistence of solutions and show that maximum entropy inverse reinforcementlearning is a particular case of our approach. Finally, we illustrate Bayesianinference on model parameters through a case study with real trajectory datafrom a large city in Brazil.