I'm a research scientist at OpenAI, and was previously a PhD student at UC Berkeley Pieter Abbeel's group. I study deep reinforcement learning, i.e., reinforcement learning using nonlinear function approximators (such as neural networks), which are optimized by gradient-based algorithms. I strive to develop policy optimization methods that are robust, scalable, and sample-efficient. This research is inspired by my earlier work in robotics, where I mainly investigated the following two problems: (1) teaching robots to perform manipulation tasks using human demonstrations, work that enabled autonomous knot tying and surgical suturing; (2) using trajectory optimization for motion planning. The software library developed for this project has been used on a variety of real robots, including one scary humanoid. While a PhD student, I've interned at Industrial Perception Inc. and Google DeepMind.
You can contact me at email@example.com.
- Benchmarking Deep Reinforcement Learning for Continuous Control
Yan Duan, Xi Chen, Rein Houthooft, John Schulman, Pieter Abbeel.
International Conference of Machine Learning (ICML), 2016
- High-Dimensional Continuous Control Using Generalized Advantage Estimation
John Schulman, Philipp Moritz, Sergey Levine, Michael I. Jordan, Pieter Abbeel
International Conference of Learning Representations (ICLR), 2015
Paper (arXiv) / Videos
- Trust Region Policy Optimization
John Schulman, Sergey Levine, Philipp Moritz, Michael I. Jordan, Pieter Abbeel
International Conference on Machine Learning (ICML), 2015
Paper (arXiv) / Videos
- Spike Sorting for Large, Dense Electrode Arrays
Cyrille Rossant, Shabnam Kadir, Dan F. M. Goodman, John Schulman, Mariano Belluscio, Gyorgy Buzsaki, Kenneth D. Harris
Nature Neuroscience, 2016
- Scaling up Gaussian Belief Space Planning Through Covariance-Free Trajectory Optimization and Automatic Differentiation
Sachin Patil, Greg Kahn, Michael Laskey, John Schulman, Ken Goldberg, Pieter Abbeel.
Workshop on Algorithm Foundations of Robotics (WAFR), 2014
- Motion Planning with Sequential Convex Optimization and Convex Collision Checking
John Schulman, Yan Duan, Jonathan Ho, Alex Lee, Ibrahim Awwal, Henry Bradlow, Jia Pan, Sachin Patil, Ken Goldberg, Pieter Abbeel.
International Journal of Robotics Research (IJRR), 2014
- Planning Locally Optimal, Curvature-Constrained Trajectories in 3D Using Sequential Convex Optimization
Yan Duan, Sachin Patil, John Schulman, Ken Goldberg, Pieter Abbeel.
International Conference on Robotics and Automation (ICRA), 2014
- Generalization in Robotic Manipulation Through the Use of Non-Rigid Registration
John Schulman, Jonathan Ho, Cameron Lee, and Pieter Abbeel
International Symposium on Robotics Research (ISRR), 2013
Paper / Videos
- A Case Study of Trajectory Transfer Through Non-Rigid Registration for a Simplified Suturing Scenario
John Schulman, Ankush Gupta, Sibi Venkatesan, Mallory Tayson-Frederick, Pieter Abbeel
International Conference on Intelligent Robots and Systems (IROS), 2013
Paper / Videos
- Finding Locally Optimal, Collision-Free Trajectories with Sequential Convex Optimization
John Schulman, Jonathan Ho, Alex Lee, Ibrahim Awwal, Henry Bradlow, Pieter Abbeel
Robotics: Science and Systems (RSS), 2013
Paper / Documentation / Github / Videos / Slides (With & Without Notes)
- Tracking Deformable Objects with Point Clouds
John Schulman, Alex Lee, Jonathan Ho, Pieter Abbeel
International Conference on Robotics and Automation (ICRA), 2013, Winner of Best Vision Paper
Paper / Website / Video (Youtube, MP4) / Slides (With & Without Notes)
- Grasping and Fixturing as Submodular Coverage Problems
John Schulman, Ken Goldberg, Pieter Abbeel
International Symposium on Robotics Research (ISRR), 2011
- OpenAI Gym (2015-future): Homepage / GitHub / Blog post / Article on NVIDIA blog
- Computation Graph Toolkit (2015): Announcement / GitHub / Documentation.
- TrajOpt (developed 2012-2013) is a software framework for generating robot trajectories by local optimization. The following core capabilities are included: a solver for generic nonlinear optimization problems by sequential quadratic programming, cost and constraint functions for kinematics and collision avoidance, and a JSON-based problem specification format for trajectory optimization problems. The core libraries are implemented in C++, and a Python API using Boost.Python is provided.
- Caton (developed 2009-2010) is a software package that automates the process of spike sorting, a common task in the analysis of neural data. spikedetekt, developed in the Cortical Processing Lab at University College London, is somewhat based on this code.