The h‐index is an objective and easily calculable measure that can be used to evaluate both the relevance and amount of scientific contributions of an individual author. We propose an alternative theory based on stochastic optimal feedback control. In this article, we describe a method for optimizing control policies, with guaranteed monotonic improvement, with little tuning of hyperparameters. We use value functions to substantially reduce the variance of policy gradient estimates at the cost of some bias, with an exponentially-weighted estimator of the advantage function. His research interests include topic models and he was one of the original developers of latent Dirichlet allocation, along with Andrew Ng and Michael I. Jordan. As of June 18, 2020, his publications have been cited 83,214 times, giving him an h-index of 85. We propose a new Deep Adaptation Network (DAN) architecture, which generalizes deep convolutional neural network to the domain adaptation scenario. 