Publications with abstracts:

Gregory Lawrence and Stuart Russell. Improving Gradient Estimation by Incorporating Sensor Data. In Proceedings of the 24th International Conference on Uncertainty in Artifical Intelligence, Helsinki, Finland, 2008.

The task of estimating the gradient of a function in the presence of noise is central to several forms of reinforcement learning, including policy search methods. We present two techniques for reducing gradient estimation errors in the presence of observable input noise applied to the control signal. The first method extends the idea of a reinforcement baseline by fitting a local model to the response function whose gradient is being estimated; we show how to find the response surface model that minimizes the variance of the gradient estimate, and how to estimate the model from data. The second method improves this further by discounting components of the gradient vector that have high variance. These methods are applied to the problem of motor control learning, where actuator noise has a significant influence on behavior. In particular, we apply the techniques to learn locally optimal controllers for a dart-throwing task using a simulated three-link arm; we demonstrate that the proposed methods significantly improve the response function gradient estimate and, consequently, the learning curve, over existing methods.

Gregory Lawrence, Noah Cowan, and Stuart Russell. Efficient Gradient Estimation for Motor Control Learning. In Proceedings of the 19th International Conference on Uncertainty in Artificial Intelligence, Acapulco, Mexico, 2003.

An efficient policy search algorithm should estimate the local gradient of the objective function, with respect to the policy parameters, from as few trials as possible. Whereas most policy search methods estimate this gradient by observing the rewards obtained during policy trials, we show, both theoretically and empirically, that taking into account the sensor data as well gives better gradient estimates and hence faster learning. The reason is that rewards obtained during policy execution vary from trial to trial due to noise in the environment; sensor data, which correlates with the noise, can be used to partially correct for this variation, resulting in an estimator with lower variance.

Mark Paskin and Gregory Lawrence. Junction Tree Algorithms for Solving Sparse Linear Systems. Technical Report UCB/CSD-03-1271, University of California at Berkeley, 2003.

In this technical report we demonstrate how message passing on a junction tree can be used to efficiently solve a linear system Ax = b when A is an n x n sparse matrix. The method requires O(n w^3) time and O(n w^2) space, where w is the treewidth of A's sparsity graph.