Guido Montufar, Assistant Professor
Departments of Mathematics and Statistics, UCLA
Location: Franz 2258A
Abstract:
We consider the problem of finding the best memoryless stochastic policy for an infinite-horizon partially observable Markov decision process (POMDP) with finite state and action spaces with respect to either the discounted or mean reward criterion. We show that the (discounted) state-action frequencies and the expected cumulative reward are rational functions of the policy, whereby the degree is determined by the degree of partial observability. We then describe the optimization problem as a linear optimization problem in the space of feasible state-action frequencies subject to polynomial constraints that we characterize explicitly. This allows us to address the combinatorial and geometric complexity of the optimization problem using recent tools from polynomial optimization. In particular, we demonstrate how the partial observability constraints can lead to multiple smooth and non-smooth local optimizers and we estimate the number of critical points. This is work with Johannes Müller.
Bio:
Guido Montúfar is an Assistant Professor at the Department of Mathematics and the Department of Statistics at UCLA. He studied mathematics and theoretical physics at the TU Berlin and completed the PhD at the Max Planck Institute for Mathematics in the Sciences. Guido is interested in mathematical machine learning, especially the interplay of capacity, optimization, and generalization in deep learning. Since 2018 he is the PI of the ERC starting grant project Deep Learning Theory. His research interfaces with information geometry, optimal transport, and algebraic statistics.