Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jean-Baptiste Alayrac

Dima

Joint Discovery of Object States and Manipulation Actions

Aug 28, 2017

Jean-Baptiste Alayrac, Josev Sivic, Ivan Laptev, Simon Lacoste-Julien

Figure 1 for Joint Discovery of Object States and Manipulation Actions

Figure 2 for Joint Discovery of Object States and Manipulation Actions

Figure 3 for Joint Discovery of Object States and Manipulation Actions

Figure 4 for Joint Discovery of Object States and Manipulation Actions

Abstract:Many human activities involve object manipulations aiming to modify the object state. Examples of common state changes include full/empty bottle, open/closed door, and attached/detached car wheel. In this work, we seek to automatically discover the states of objects and the associated manipulation actions. Given a set of videos for a particular task, we propose a joint model that learns to identify object states and to localize state-modifying actions. Our model is formulated as a discriminative clustering cost with constraints. We assume a consistent temporal order for the changes in object states and manipulation actions, and introduce new optimization techniques to learn model parameters without additional supervision. We demonstrate successful discovery of seven manipulation actions and corresponding object states on a new dataset of videos depicting real-life object manipulations. We show that our joint formulation results in an improvement of object state discovery by action recognition and vice versa.

* Appears in: International Conference on Computer Vision 2017 (ICCV 2017). 15 pages

Via

Access Paper or Ask Questions

Learning from Video and Text via Large-Scale Discriminative Clustering

Jul 27, 2017

Antoine Miech, Jean-Baptiste Alayrac, Piotr Bojanowski, Ivan Laptev, Josef Sivic

Figure 1 for Learning from Video and Text via Large-Scale Discriminative Clustering

Figure 2 for Learning from Video and Text via Large-Scale Discriminative Clustering

Figure 3 for Learning from Video and Text via Large-Scale Discriminative Clustering

Figure 4 for Learning from Video and Text via Large-Scale Discriminative Clustering

Abstract:Discriminative clustering has been successfully applied to a number of weakly-supervised learning tasks. Such applications include person and action recognition, text-to-video alignment, object co-segmentation and colocalization in videos and images. One drawback of discriminative clustering, however, is its limited scalability. We address this issue and propose an online optimization algorithm based on the Block-Coordinate Frank-Wolfe algorithm. We apply the proposed method to the problem of weakly supervised learning of actions and actors from movies together with corresponding movie scripts. The scaling up of the learning problem to 66 feature length movies enables us to significantly improve weakly supervised action recognition.

* To appear in ICCV 2017

Via

Access Paper or Ask Questions

Unsupervised Learning from Narrated Instruction Videos

Jun 28, 2016

Jean-Baptiste Alayrac, Piotr Bojanowski, Nishant Agrawal, Josef Sivic, Ivan Laptev, Simon Lacoste-Julien

Figure 1 for Unsupervised Learning from Narrated Instruction Videos

Figure 2 for Unsupervised Learning from Narrated Instruction Videos

Figure 3 for Unsupervised Learning from Narrated Instruction Videos

Figure 4 for Unsupervised Learning from Narrated Instruction Videos

Abstract:We address the problem of automatically learning the main steps to complete a certain task, such as changing a car tire, from a set of narrated instruction videos. The contributions of this paper are three-fold. First, we develop a new unsupervised learning approach that takes advantage of the complementary nature of the input video and the associated narration. The method solves two clustering problems, one in text and one in video, applied one after each other and linked by joint constraints to obtain a single coherent sequence of steps in both modalities. Second, we collect and annotate a new challenging dataset of real-world instruction videos from the Internet. The dataset contains about 800,000 frames for five different tasks that include complex interactions between people and objects, and are captured in a variety of indoor and outdoor settings. Third, we experimentally demonstrate that the proposed method can automatically discover, in an unsupervised manner, the main steps to achieve the task and locate the steps in the input videos.

* Appears in: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2016). 21 pages

Via

Access Paper or Ask Questions

Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

May 30, 2016

Anton Osokin, Jean-Baptiste Alayrac, Isabella Lukasewitz, Puneet K. Dokania, Simon Lacoste-Julien

Figure 1 for Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

Figure 2 for Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

Figure 3 for Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

Figure 4 for Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs

Abstract:In this paper, we propose several improvements on the block-coordinate Frank-Wolfe (BCFW) algorithm from Lacoste-Julien et al. (2013) recently used to optimize the structured support vector machine (SSVM) objective in the context of structured prediction, though it has wider applications. The key intuition behind our improvements is that the estimates of block gaps maintained by BCFW reveal the block suboptimality that can be used as an adaptive criterion. First, we sample objects at each iteration of BCFW in an adaptive non-uniform way via gapbased sampling. Second, we incorporate pairwise and away-step variants of Frank-Wolfe into the block-coordinate setting. Third, we cache oracle calls with a cache-hit criterion based on the block gaps. Fourth, we provide the first method to compute an approximate regularization path for SSVM. Finally, we provide an exhaustive empirical evaluation of all our methods on four structured prediction datasets.

* Appears in Proceedings of the 33rd International Conference on Machine Learning (ICML 2016). 31 pages

Via

Access Paper or Ask Questions