Significant improvements in robotic perception are critical to advance robotic capabilities in unstructured environments. I believe that rather than following the current common practice of trying to build a system that can detect chairs, for example, from having seen a relatively small number of example images of chairs, it is more fruitful for robotic perception to work on instance recognition. Instance recognition is also a hard problem, especially when considering the large number of instances I envision, but it is intrinsically simpler than category recognition. For example, in the ICRA 2011 Solutions in Perception Challenge the problem was to detect presence and pose of 35 a priori known object instances in RGB-D images of cluttered scenes. A system developed by us won that contest and it currently attains a precision of 98.7% at a recall rate of 90%. In contrast, on the 2011 edition of the standard Pascal VOC twenty category detection RGB task (the standard computer vision community challenge dataset) the winning teams (different teams tend to win for different categories) obtained average precisions as low as 16.2% (for the most difficult category) and only up to 58.3% (for the best-solved category).
Our current research is on (i) machine learning methods to improve local RGB and 3-D features; (ii) machine learning methods for large-scale instance recognition (versus category level recognition).
Deformable objects present a formidable challenge, as they require the ability to deal with great variety, which is in sharp contrast to the highly structured manufacturing environments in which robots have had their main success stories. Our focus is on new algorithms for perception, motion planning and control for deformable objects. For our experiments we have access to both the state-of-the-art open-source bimanual mobile manipulation platform, the PR2, and the state-of-the-art open-source surgical robotics platform, the Raven II. Our initial work provided the first demonstration of a general-purpose robot folding any article of clothing. It drew significant attention, even beyond the traditional research community, including coverage by BBC, Smart Planet, NSF Innovation Nation, and PBS News Hour, as well as over 1 million YouTube views.
Our current research is on: (i) Development of a hierarchical reasoning framework that uses (fast) simplified geometric models to guide a search over robotic manipulation primitives, which will then get instantiated and locally optimized through sequential convex programming to obtain a fast robust plan. (ii) Machine learning for perception of local features. (iii) A probabilistic reasoning framework that can handle such large state spaces effectively. (iv) Reinforcement learning for robots to learn to manipulate deformable objects through self-exploration.
Motion Planning and Control under Uncertainty. Uncertainty is unavoidable in most environments. Most existing motion planning and control approaches cannot explicitly account for uncertainty. We developed LQG-MP (linear-quadratic Gaussian motion planning), a new approach to robot motion planning that takes into account the sensors and the controller that will be used during execution of the robot's path. LQG-MP is based on the linear-quadratic controller with Gaussian models of uncertainty, and explicitly characterizes in advance (i.e., before execution) the a-priori probability distributions of the state of the robot along its path. These distributions can be used to assess the quality of the path, for instance by computing the probability of avoiding collisions. In current work we are investigating how to handle non-Gaussian distributions using sample-based representations and sequential convex optimization to find a locally optimal trajectory.
Safe Exploration. When humans learn to control a system, they naturally account for what we think of as safety. For example, when a novice pilot learns how to fly an RC helicopter, they will slowly spin up the blades until the helicopter barely lifts off, then quickly put it back down. They will repeat this a few times, slowly starting to bring the helicopter a little bit off the ground. When doing so they would try out the cyclic (roll and pitch) and rudder (yaw) control---at all times staying low enough that simply shutting it down would still have it land safely. When a driver with limited such experience wants to become skilled at driving on snow, they would first slowly drive their car to a wide open space where they could start pushing their limits. When we are skiing downhill, we are careful about not going down a slope into a valley where there is no lift (or other transportation) to take us back up.
One would hope that exploration algorithms for physical systems would be able to account for safety and have similar behavior naturally emerge. Unfortunately most existing exploration algorithms completely ignore safety issues. More precisely phrased, most existing algorithms have strong exploration guarantees, but to achieve these guarantees they assume ergodicity of the Markov decision process (MDP) in which the exploration takes place. An MDP is ergodic if any state is reachable from any other state by following a suitable policy. This assumption does not hold true in the exploration examples presented above as each of these systems could break during (non-safe) exploration.
Our current research is on formalizing the notion of safety and developing efficient algorithms for safe exploration.
The ability to obtain complete, albeit static, maps of the neural connectivity in the brain (the brain's wiring diagram) has the potential to dramatically transform neuroscience. Existing experimental methods allow only limited statistical observations regarding connectivity of neurons involved in certain brain functions, which severely limits the extent to which hypothesized comptuational models of brain function can be validated empirically. Complete connectivity maps would enable far richer computational models to be developed and validated.
The connectivity map of the complete nervous system of the microscopic roundworm Caenorhabditis elegans, the largest such reconstruction produced to date, with 302 neurons and about 8000 connections, required 15 years of labor to manually cut, image and trace the neuronal processes in 8000 thin sections. C. elegans is widely used as a model organism for biological study, and this reconstruction was a seminal work that proved fundamentally important in understanding the complete function of its nervous system; one notable example is a study of touch sensitivity in C. elegans that was guided significantly by the connectivity diagram.
Two recent studies have combined functional imaging of live cells with subsequent mapping of neural connectivity in the same tissue to study the mammalian visual system, specifically motion direction-selectivity in the retina and orientation selectivity in the primary visual cortex, respectively. Although these studies were limited in scope by the need to manually trace neural processes due to the inadequate accuracy provided by existing automated methods, they were nonetheless insightful regarding the computational models underlying these functions, and pave the way for future larger-scale studies once adequate automated methods for neural connectivity mapping are developed.
Large-scale image data analysis has in recent years increasingly become a key bottleneck in neuroscience, and more generally, in natural science research. Technological advances in automated data acquisition have enabled the collection of terabyte and petabyte-size datasets. Extracting the rich information contained in these datasets manually would require an inordinate amount of human labor; reconstructing the neural connectivity in a complete fruitfly brain or cortical column of a mouse from electron microscopy data, key tasks of interest, would require ten thousand years of human labor using current state-of-the-art manual and semi-automated approaches. The large size of the datasets, the need for high accuracy to avoid incorrect scientific conclusions being drawn about the data, and the need for well-calibrated confidence measures in order to limit the time that must be spent manually verifying the output of algorithms, are all substantial challenges not well-addressed by existing segmentation methods. We have started to investigate fast nearest neighbor approaches (similar to the ones we pursue for robotic perception!) which have given us very promising results.
We have developed a theory of apprenticeship learning---machine learning algorithms that learn autonomous controllers by watching expert demonstrations. These techniques enabled a four-legged robot to climb across challenging terrains, and enabled autonomous helicopter aerobatics well beyond the capabilities of any other helicopter, including maneuvers such as flips, rolls, loops, and even auto-rotation landings (an emergency maneuver during which the helicopter lands with the engine off!), chaos (pirouetting while flipping in place) and tic-tocs (where the helicopter throws itself back and forth, keeping itself close to vertical) which only exceptional human pilots can perform. This work won the International Conference on Machine Learning (ICML) 2008 Best Application Paper Award.
We have started to adopt apprenticeship learning to make it applicable in minimally invasive robotic surgery. During such surgeries, as performed with the da Vinci robot deployed in over 1,700 hospitals worldwide, surgeons drive the surgical robots in master-slave mode, hence providing demonstrations throughout every surgery they perform. Our initial work towards this goal won the 2010 IEEE International Conference on Robotics and Automation (ICRA) 2010 Best Medical Robotics Paper Award.
At their core, the apprenticeship learning algorithms leverage state-of-the-art optimal planning and control algorithms (such as value iteration, A-star, differential dynamic programming, and sequential convex optimization based model-predictive control techniques). These techniques are very powerful, but rely on (and can be very sensitive to the accuracy of) a dynamics model and a task description in the form of a cost function. There are two key technical breakthroughs at the foundation of my apprenticeship learning algorithms: (i) We devised an algorithm that, assuming demonstrations are (perhaps noisy) executions of an optimal plan or control policy with respect to some cost function (mapping from states to the reals), is able to estimate that cost function from these state-action sequences. (ii) We also devised an algorithm that, from data, learns a non-parametric correction to standard (nonlinear) dynamics models. The key observation it exploits here was that for many tasks that require skill (such as flying a helicopter, maneuvering a car, etc.) the dynamics, despite being very complex, exhibit a significant amount of repeatability. Humans seem to exploit this through development of significant muscle memory through practice, allowing them to exploit repeatability of the dynamics. Our approach learned what was different between real-world executions and a standard physics-based simulation's predictions and would use that difference to correct the standard physics-based simulation's prediction to make it more accurate for the particular tasks for which data was collected.
A key new direction we are about to start pursuing is to extend the apprenticeship learning framework to ground language. Concretely, many verbs relate to task executions. In this line of work I intend the training data to be demonstrations annotated with language. The learned cost functions will now have features that depend on words and phrases and on aspects of the execution (as before). I anticipate this will enable significant progress in building grounded language models, and as an intermediate result will enable natural language commanding of robots.