Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

James M. Rehg

Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation

Jul 30, 2018
Zhaoyang Lv, Kihwan Kim, Alejandro Troccoli, Deqing Sun, James M. Rehg, Jan Kautz

Figure 1 for Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation

Figure 2 for Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation

Figure 3 for Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation

Figure 4 for Learning Rigidity in Dynamic Scenes with a Moving Camera for 3D Motion Field Estimation

Estimation of 3D motion in a dynamic scene from a temporal pair of images is a core task in many scene understanding problems. In real world applications, a dynamic scene is commonly captured by a moving camera (i.e., panning, tilting or hand-held), increasing the task complexity because the scene is observed from different view points. The main challenge is the disambiguation of the camera motion from scene motion, which becomes more difficult as the amount of rigidity observed decreases, even with successful estimation of 2D image correspondences. Compared to other state-of-the-art 3D scene flow estimation methods, in this paper we propose to \emph{learn} the rigidity of a scene in a supervised manner from a large collection of dynamic scene data, and directly infer a rigidity mask from two sequential images with depths. With the learned network, we show how we can effectively estimate camera motion and projected scene flow using computed 2D optical flow and the inferred rigidity mask. For training and testing the rigidity network, we also provide a new semi-synthetic dynamic scene dataset (synthetic foreground objects with a real background) and an evaluation split that accounts for the percentage of observed non-rigid pixels. Through our evaluation we show the proposed framework outperforms current state-of-the-art scene flow estimation methods in challenging dynamic scenes.

* This work is accepted at European Conference on Computer Vision 2018. Project page (with the video): http://research.nvidia.com/publication/2018-09_Learning-Rigidity-in The codes will be released at https://github.com/NVlabs/learningrigidity

Via

Access Paper or Ask Questions

Towards Black-box Iterative Machine Teaching

Jun 05, 2018
Weiyang Liu, Bo Dai, Xingguo Li, Zhen Liu, James M. Rehg, Le Song

Figure 1 for Towards Black-box Iterative Machine Teaching

Figure 2 for Towards Black-box Iterative Machine Teaching

Figure 3 for Towards Black-box Iterative Machine Teaching

Figure 4 for Towards Black-box Iterative Machine Teaching

In this paper, we make an important step towards the black-box machine teaching by considering the cross-space machine teaching, where the teacher and the learner use different feature representations and the teacher can not fully observe the learner's model. In such scenario, we study how the teacher is still able to teach the learner to achieve faster convergence rate than the traditional passive learning. We propose an active teacher model that can actively query the learner (i.e., make the learner take exams) for estimating the learner's status and provably guide the learner to achieve faster convergence. The sample complexities for both teaching and query are provided. In the experiments, we compare the proposed active teacher with the omniscient teacher and verify the effectiveness of the active teacher model.

* Published in ICML 2018

Via

Access Paper or Ask Questions

AutoRally An open platform for aggressive autonomous driving

Jun 02, 2018
Brian Goldfain, Paul Drews, Changxi You, Matthew Barulic, Orlin Velev, Panagiotis Tsiotras, James M. Rehg

Figure 1 for AutoRally An open platform for aggressive autonomous driving

Figure 2 for AutoRally An open platform for aggressive autonomous driving

Figure 3 for AutoRally An open platform for aggressive autonomous driving

Figure 4 for AutoRally An open platform for aggressive autonomous driving

This article presents AutoRally, a 1$:$5 scale robotics testbed for autonomous vehicle research. AutoRally is designed for robustness, ease of use, and reproducibility, so that a team of two people with limited knowledge of mechanical engineering, electrical engineering, and computer science can construct and then operate the testbed to collect real world autonomous driving data in whatever domain they wish to study. Complete documentation to construct and operate the platform is available online along with tutorials, example controllers, and a driving dataset collected at the Georgia Tech Autonomous Racing Facility. Offline estimation algorithms are used to determine parameters for physics-based dynamics models using an adaptive limited memory joint state unscented Kalman filter. Online vehicle state estimation using a factor graph optimization scheme and a convolutional neural network for semantic segmentation of drivable surface are presented. All algorithms are tested with real world data from the fleet of six AutoRally robots at the Georgia Tech Autonomous Racing Facility tracks, and serve as a demonstration of the robot$'$s capabilities.

Via

Access Paper or Ask Questions

Decoupled Networks

Apr 22, 2018
Weiyang Liu, Zhen Liu, Zhiding Yu, Bo Dai, Rongmei Lin, Yisen Wang, James M. Rehg, Le Song

Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations. Inspired by the observation that CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference, we propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. Specifically, we first reparametrize the inner product to a decoupled form and then generalize it to the decoupled convolution operator which serves as the building block of our decoupled networks. We present several effective instances of the decoupled convolution operator. Each decoupled operator is well motivated and has an intuitive geometric interpretation. Based on these decoupled operators, we further propose to directly learn the operator from data. Extensive experiments show that such decoupled reparameterization renders significant performance gain with easier convergence and stronger robustness.

* CVPR 2018 (Spotlight)

Via

Access Paper or Ask Questions

Fine-Grained Head Pose Estimation Without Keypoints

Apr 13, 2018
Nataniel Ruiz, Eunji Chong, James M. Rehg

Figure 1 for Fine-Grained Head Pose Estimation Without Keypoints

Figure 2 for Fine-Grained Head Pose Estimation Without Keypoints

Figure 3 for Fine-Grained Head Pose Estimation Without Keypoints

Figure 4 for Fine-Grained Head Pose Estimation Without Keypoints

Estimating the head pose of a person is a crucial problem that has a large amount of applications such as aiding in gaze estimation, modeling attention, fitting 3D models to video and performing face alignment. Traditionally head pose is computed by estimating some keypoints from the target face and solving the 2D to 3D correspondence problem with a mean human head model. We argue that this is a fragile method because it relies entirely on landmark detection performance, the extraneous head model and an ad-hoc fitting step. We present an elegant and robust way to determine pose by training a multi-loss convolutional neural network on 300W-LP, a large synthetically expanded dataset, to predict intrinsic Euler angles (yaw, pitch and roll) directly from image intensities through joint binned pose classification and regression. We present empirical tests on common in-the-wild pose benchmark datasets which show state-of-the-art results. Additionally we test our method on a dataset usually used for pose estimation using depth and start to close the gap with state-of-the-art depth pose methods. We open-source our training and testing code as well as release our pre-trained models.

* The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) Workshops, 2018, pp. 2074-2083
* Accepted to Computer Vision and Pattern Recognition Workshops (CVPRW), 2018 IEEE Conference on. IEEE, 2018

Via

Access Paper or Ask Questions

Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container

Apr 05, 2018
Nataniel Ruiz, James M. Rehg

Figure 1 for Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container

Figure 2 for Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container

Figure 3 for Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container

Figure 4 for Dockerface: an Easy to Install and Use Faster R-CNN Face Detector in a Docker Container

Face detection is a very important task and a necessary pre-processing step for many applications such as facial landmark detection, pose estimation, sentiment analysis and face recognition. Not only is face detection an important pre-processing step in computer vision applications but also in computational psychology, behavioral imaging and other fields where researchers might not be initiated in computer vision frameworks and state-of-the-art detection applications. A large part of existing research that includes face detection as a pre-processing step uses existing out-of-the-box detectors such as the HoG-based dlib and the OpenCV Haar face detector which are no longer state-of-the-art - they are primarily used because of their ease of use and accessibility. We introduce Dockerface, a very accurate Faster R-CNN face detector in a Docker container which requires no training and is easy to install and use.

Via

Access Paper or Ask Questions

Iterative Machine Teaching

Nov 17, 2017
Weiyang Liu, Bo Dai, Ahmad Humayun, Charlene Tay, Chen Yu, Linda B. Smith, James M. Rehg, Le Song

In this paper, we consider the problem of machine teaching, the inverse problem of machine learning. Different from traditional machine teaching which views the learners as batch algorithms, we study a new paradigm where the learner uses an iterative algorithm and a teacher can feed examples sequentially and intelligently based on the current performance of the learner. We show that the teaching complexity in the iterative case is very different from that in the batch case. Instead of constructing a minimal training set for learners, our iterative machine teaching focuses on achieving fast convergence in the learner model. Depending on the level of information the teacher has from the learner model, we design teaching algorithms which can provably reduce the number of teaching examples and achieve faster convergence than learning without teachers. We also validate our theoretical findings with extensive experiments on different data distribution and real image datasets.

* Published in ICML 2017

Via

Access Paper or Ask Questions

Inferring Object Properties with a Tactile Sensing Array Given Varying Joint Stiffness and Velocity

Nov 04, 2017
Tapomayukh Bhattacharjee, James M. Rehg, Charles C. Kemp

Figure 1 for Inferring Object Properties with a Tactile Sensing Array Given Varying Joint Stiffness and Velocity

Figure 2 for Inferring Object Properties with a Tactile Sensing Array Given Varying Joint Stiffness and Velocity

Figure 3 for Inferring Object Properties with a Tactile Sensing Array Given Varying Joint Stiffness and Velocity

Figure 4 for Inferring Object Properties with a Tactile Sensing Array Given Varying Joint Stiffness and Velocity

Whole-arm tactile sensing enables a robot to sense contact and infer contact properties across its entire arm. Within this paper, we demonstrate that using data-driven methods, a humanoid robot can infer mechanical properties of objects from contact with its forearm during a simple reaching motion. A key issue is the extent to which the performance of data-driven methods can generalize to robot actions that differ from those used during training. To investigate this, we developed an idealized physics-based lumped element model of a robot with a compliant joint making contact with an object. Using this physics-based model, we performed experiments with varied robot, object and environment parameters. We also collected data from a tactile-sensing forearm on a real robot as it made contact with various objects during a simple reaching motion with varied arm velocities and joint stiffnesses. The robot used one nearest neighbor classifiers (1-NN), hidden Markov models (HMMs), and long short-term memory (LSTM) networks to infer two object properties (hard vs. soft and moved vs. unmoved) based on features of time-varying tactile sensor data (maximum force, contact area, and contact motion). We found that, in contrast to 1-NN, the performance of LSTMs (with sufficient data availability) and multivariate HMMs successfully generalized to new robot motions with distinct velocities and joint stiffnesses. Compared to single features, using multiple features gave the best results for both experiments with physics-based models and a real-robot.

* This updated version [v2] was accepted to the International Journal of Humanoid Robotics, Special Issue on Tactile Sensing for Manipulation : New Progress and Challenges

Via

Access Paper or Ask Questions

Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model

Jul 17, 2017
Paul Drews, Grady Williams, Brian Goldfain, Evangelos A. Theodorou, James M. Rehg

Figure 1 for Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model

Figure 2 for Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model

Figure 3 for Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model

Figure 4 for Aggressive Deep Driving: Model Predictive Control with a CNN Cost Model

We present a framework for vision-based model predictive control (MPC) for the task of aggressive, high-speed autonomous driving. Our approach uses deep convolutional neural networks to predict cost functions from input video which are directly suitable for online trajectory optimization with MPC. We demonstrate the method in a high speed autonomous driving scenario, where we use a single monocular camera and a deep convolutional neural network to predict a cost map of the track in front of the vehicle. Results are demonstrated on a 1:5 scale autonomous vehicle given the task of high speed, aggressive driving.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

Autonomous Racing with AutoRally Vehicles and Differential Games

Jul 14, 2017
Grady Williams, Brian Goldfain, Paul Drews, James M. Rehg, Evangelos A. Theodorou

Figure 1 for Autonomous Racing with AutoRally Vehicles and Differential Games

Figure 2 for Autonomous Racing with AutoRally Vehicles and Differential Games

Figure 3 for Autonomous Racing with AutoRally Vehicles and Differential Games

Figure 4 for Autonomous Racing with AutoRally Vehicles and Differential Games

Safe autonomous vehicles must be able to predict and react to the drivers around them. Previous control methods rely heavily on pre-computation and are unable to react to dynamic events as they unfold in real-time. In this paper, we extend Model Predictive Path Integral Control (MPPI) using differential game theory and introduce Best-Response MPPI (BR-MPPI) for real-time multi-vehicle interactions. Experimental results are presented using two AutoRally platforms in a racing format with BR-MPPI competing against a skilled human driver at the Georgia Tech Autonomous Racing Facility.

* 8 pages, 7 figures

Via

Access Paper or Ask Questions