Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nuno Vasconcelos

OOWL500: Overcoming Dataset Collection Bias in the Wild

Aug 24, 2021

Brandon Leung, Chih-Hui Ho, Amir Persekian, David Orozco, Yen Chang, Erik Sandstrom, Bo Liu, Nuno Vasconcelos

Figure 1 for OOWL500: Overcoming Dataset Collection Bias in the Wild

Figure 2 for OOWL500: Overcoming Dataset Collection Bias in the Wild

Figure 3 for OOWL500: Overcoming Dataset Collection Bias in the Wild

Figure 4 for OOWL500: Overcoming Dataset Collection Bias in the Wild

Abstract:The hypothesis that image datasets gathered online "in the wild" can produce biased object recognizers, e.g. preferring professional photography or certain viewing angles, is studied. A new "in the lab" data collection infrastructure is proposed consisting of a drone which captures images as it circles around objects. Crucially, the control provided by this setup and the natural camera shake inherent to flight mitigate many biases. It's inexpensive and easily replicable nature may also potentially lead to a scalable data collection effort by the vision community. The procedure's usefulness is demonstrated by creating a dataset of Objects Obtained With fLight (OOWL). Denoted as OOWL500, it contains 120,000 images of 500 objects and is the largest "in the lab" image dataset available when both number of classes and objects per class are considered. Furthermore, it has enabled several of new insights on object recognition. First, a novel adversarial attack strategy is proposed, where image perturbations are defined in terms of semantic properties such as camera shake and pose. Indeed, experiments have shown that ImageNet has considerable amounts of pose and professional photography bias. Second, it is used to show that the augmentation of in the wild datasets, such as ImageNet, with in the lab data, such as OOWL500, can significantly decrease these biases, leading to object recognizers of improved generalization. Third, the dataset is used to study questions on "best procedures" for dataset collection. It is revealed that data augmentation with synthetic images does not suffice to eliminate in the wild datasets biases, and that camera shake and pose diversity play a more important role in object recognition robustness than previously thought.

Via

Access Paper or Ask Questions

Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction

Aug 23, 2021

Brandon Leung, Chih-Hui Ho, Nuno Vasconcelos

Figure 1 for Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction

Figure 2 for Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction

Figure 3 for Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction

Figure 4 for Black-Box Test-Time Shape REFINEment for Single View 3D Reconstruction

Abstract:Much recent progress has been made in reconstructing the 3D shape of an object from an image of it, i.e. single view 3D reconstruction. However, it has been suggested that current methods simply adopt a "nearest-neighbor" strategy, instead of genuinely understanding the shape behind the input image. In this paper, we rigorously show that for many state of the art methods, this issue manifests as (1) inconsistencies between coarse reconstructions and input images, and (2) inability to generalize across domains. We thus propose REFINE, a postprocessing mesh refinement step that can be easily integrated into the pipeline of any black-box method in the literature. At test time, REFINE optimizes a network per mesh instance, to encourage consistency between the mesh and the given object view. This, along with a novel combination of regularizing losses, reduces the domain gap and achieves state of the art performance. We believe that this novel paradigm is an important step towards robust, accurate reconstructions, remaining relevant as new reconstruction networks are introduced.

Via

Access Paper or Ask Questions

Learning of Visual Relations: The Devil is in the Tails

Aug 22, 2021

Alakh Desai, Tz-Ying Wu, Subarna Tripathi, Nuno Vasconcelos

Figure 1 for Learning of Visual Relations: The Devil is in the Tails

Figure 2 for Learning of Visual Relations: The Devil is in the Tails

Figure 3 for Learning of Visual Relations: The Devil is in the Tails

Figure 4 for Learning of Visual Relations: The Devil is in the Tails

Abstract:Significant effort has been recently devoted to modeling visual relations. This has mostly addressed the design of architectures, typically by adding parameters and increasing model complexity. However, visual relation learning is a long-tailed problem, due to the combinatorial nature of joint reasoning about groups of objects. Increasing model complexity is, in general, ill-suited for long-tailed problems due to their tendency to overfit. In this paper, we explore an alternative hypothesis, denoted the Devil is in the Tails. Under this hypothesis, better performance is achieved by keeping the model simple but improving its ability to cope with long-tailed distributions. To test this hypothesis, we devise a new approach for training visual relationships models, which is inspired by state-of-the-art long-tailed recognition literature. This is based on an iterative decoupled training scheme, denoted Decoupled Training for Devil in the Tails (DT2). DT2 employs a novel sampling approach, Alternating Class-Balanced Sampling (ACBS), to capture the interplay between the long-tailed entity and predicate distributions of visual relations. Results show that, with an extremely simple architecture, DT2-ACBS significantly outperforms much more complex state-of-the-art methods on scene graph generation tasks. This suggests that the development of sophisticated models must be considered in tandem with the long-tailed nature of the problem.

* Accepted to ICCV 2021

Via

Access Paper or Ask Questions

MicroNet: Improving Image Recognition with Extremely Low FLOPs

Aug 12, 2021

Yunsheng Li, Yinpeng Chen, Xiyang Dai, Dongdong Chen, Mengchen Liu, Lu Yuan, Zicheng Liu, Lei Zhang, Nuno Vasconcelos

Figure 1 for MicroNet: Improving Image Recognition with Extremely Low FLOPs

Figure 2 for MicroNet: Improving Image Recognition with Extremely Low FLOPs

Figure 3 for MicroNet: Improving Image Recognition with Extremely Low FLOPs

Figure 4 for MicroNet: Improving Image Recognition with Extremely Low FLOPs

Abstract:This paper aims at addressing the problem of substantial performance degradation at extremely low computational cost (e.g. 5M FLOPs on ImageNet classification). We found that two factors, sparse connectivity and dynamic activation function, are effective to improve the accuracy. The former avoids the significant reduction of network width, while the latter mitigates the detriment of reduction in network depth. Technically, we propose micro-factorized convolution, which factorizes a convolution matrix into low rank matrices, to integrate sparse connectivity into convolution. We also present a new dynamic activation function, named Dynamic Shift Max, to improve the non-linearity via maxing out multiple dynamic fusions between an input feature map and its circular channel shift. Building upon these two new operators, we arrive at a family of networks, named MicroNet, that achieves significant performance gains over the state of the art in the low FLOP regime. For instance, under the constraint of 12M FLOPs, MicroNet achieves 59.4\% top-1 accuracy on ImageNet classification, outperforming MobileNetV3 by 9.6\%. Source code is at \href{https://github.com/liyunsheng13/micronet}{https://github.com/liyunsheng13/micronet}.

* ICCV 2021, code is available at https://github.com/liyunsheng13/micronet}{https://github.com/liyunsheng13/micronet

Via

Access Paper or Ask Questions

Semi-supervised Long-tailed Recognition using Alternate Sampling

May 01, 2021

Bo Liu, Haoxiang Li, Hao Kang, Nuno Vasconcelos, Gang Hua

Figure 1 for Semi-supervised Long-tailed Recognition using Alternate Sampling

Figure 2 for Semi-supervised Long-tailed Recognition using Alternate Sampling

Figure 3 for Semi-supervised Long-tailed Recognition using Alternate Sampling

Figure 4 for Semi-supervised Long-tailed Recognition using Alternate Sampling

Abstract:Main challenges in long-tailed recognition come from the imbalanced data distribution and sample scarcity in its tail classes. While techniques have been proposed to achieve a more balanced training loss and to improve tail classes data variations with synthesized samples, we resort to leverage readily available unlabeled data to boost recognition accuracy. The idea leads to a new recognition setting, namely semi-supervised long-tailed recognition. We argue this setting better resembles the real-world data collection and annotation process and hence can help close the gap to real-world scenarios. To address the semi-supervised long-tailed recognition problem, we present an alternate sampling framework combining the intuitions from successful methods in these two research areas. The classifier and feature embedding are learned separately and updated iteratively. The class-balanced sampling strategy has been implemented to train the classifier in a way not affected by the pseudo labels' quality on the unlabeled data. A consistency loss has been introduced to limit the impact from unlabeled data while leveraging them to update the feature embedding. We demonstrate significant accuracy improvements over other competitive methods on two datasets.

Via

Access Paper or Ask Questions

GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition

May 01, 2021

Bo Liu, Haoxiang Li, Hao Kang, Gang Hua, Nuno Vasconcelos

Figure 1 for GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition

Figure 2 for GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition

Figure 3 for GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition

Figure 4 for GistNet: a Geometric Structure Transfer Network for Long-Tailed Recognition

Abstract:The problem of long-tailed recognition, where the number of examples per class is highly unbalanced, is considered. It is hypothesized that the well known tendency of standard classifier training to overfit to popular classes can be exploited for effective transfer learning. Rather than eliminating this overfitting, e.g. by adopting popular class-balanced sampling methods, the learning algorithm should instead leverage this overfitting to transfer geometric information from popular to low-shot classes. A new classifier architecture, GistNet, is proposed to support this goal, using constellations of classifier parameters to encode the class geometry. A new learning algorithm is then proposed for GeometrIc Structure Transfer (GIST), with resort to a combination of loss functions that combine class-balanced and random sampling to guarantee that, while overfitting to the popular classes is restricted to geometric parameters, it is leveraged to transfer class geometry from popular to few-shot classes. This enables better generalization for few-shot classes without the need for the manual specification of class weights, or even the explicit grouping of classes into different types. Experiments on two popular long-tailed recognition datasets show that GistNet outperforms existing solutions to this problem.

Via

Access Paper or Ask Questions

Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition

May 01, 2021

Bo Liu, Haoxiang Li, Hao Kang, Gang Hua, Nuno Vasconcelos

Figure 1 for Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition

Figure 2 for Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition

Figure 3 for Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition

Figure 4 for Breadcrumbs: Adversarial Class-Balanced Sampling for Long-tailed Recognition

Abstract:The problem of long-tailed recognition, where the number of examples per class is highly unbalanced, is considered. While training with class-balanced sampling has been shown effective for this problem, it is known to over-fit to few-shot classes. It is hypothesized that this is due to the repeated sampling of examples and can be addressed by feature space augmentation. A new feature augmentation strategy, EMANATE, based on back-tracking of features across epochs during training, is proposed. It is shown that, unlike class-balanced sampling, this is an adversarial augmentation strategy. A new sampling procedure, Breadcrumb, is then introduced to implement adversarial class-balanced sampling without extra computation. Experiments on three popular long-tailed recognition datasets show that Breadcrumb training produces classifiers that outperform existing solutions to the problem.

Via

Access Paper or Ask Questions

Sparse Pose Trajectory Completion

May 01, 2021

Bo Liu, Mandar Dixit, Roland Kwitt, Gang Hua, Nuno Vasconcelos

Figure 1 for Sparse Pose Trajectory Completion

Figure 2 for Sparse Pose Trajectory Completion

Figure 3 for Sparse Pose Trajectory Completion

Figure 4 for Sparse Pose Trajectory Completion

Abstract:We propose a method to learn, even using a dataset where objects appear only in sparsely sampled views (e.g. Pix3D), the ability to synthesize a pose trajectory for an arbitrary reference image. This is achieved with a cross-modal pose trajectory transfer mechanism. First, a domain transfer function is trained to predict, from an RGB image of the object, its 2D depth map. Then, a set of image views is generated by learning to simulate object rotation in the depth space. Finally, the generated poses are mapped from this latent space into a set of corresponding RGB images using a learned identity preserving transform. This results in a dense pose trajectory of the object in image space. For each object type (e.g., a specific Ikea chair model), a 3D CAD model is used to render a full pose trajectory of 2D depth maps. In the absence of dense pose sampling in image space, these latent space trajectories provide cross-modal guidance for learning. The learned pose trajectories can be transferred to unseen examples, effectively synthesizing all object views in image space. Our method is evaluated on the Pix3D and ShapeNet datasets, in the setting of novel view synthesis under sparse pose supervision, demonstrating substantial improvements over recent art.

Via

Access Paper or Ask Questions

A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

Apr 15, 2021

Jiteng Mu, Weichao Qiu, Adam Kortylewski, Alan Yuille, Nuno Vasconcelos, Xiaolong Wang

Figure 1 for A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

Figure 2 for A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

Figure 3 for A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

Figure 4 for A-SDF: Learning Disentangled Signed Distance Functions for Articulated Shape Representation

Abstract:Recent work has made significant progress on using implicit functions, as a continuous representation for 3D rigid object shape reconstruction. However, much less effort has been devoted to modeling general articulated objects. Compared to rigid objects, articulated objects have higher degrees of freedom, which makes it hard to generalize to unseen shapes. To deal with the large shape variance, we introduce Articulated Signed Distance Functions (A-SDF) to represent articulated shapes with a disentangled latent space, where we have separate codes for encoding shape and articulation. We assume no prior knowledge on part geometry, articulation status, joint type, joint axis, and joint location. With this disentangled continuous representation, we demonstrate that we can control the articulation input and animate unseen instances with unseen joint angles. Furthermore, we propose a Test-Time Adaptation inference algorithm to adjust our model during inference. We demonstrate our model generalize well to out-of-distribution and unseen data, e.g., partial point clouds and real-world depth images.

* Our project page is available at: https://jitengmu.github.io/A-SDF/

Via

Access Paper or Ask Questions

IMAGINE: Image Synthesis by Image-Guided Model Inversion

Apr 13, 2021

Pei Wang, Yijun Li, Krishna Kumar Singh, Jingwan Lu, Nuno Vasconcelos

Figure 1 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 2 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 3 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Figure 4 for IMAGINE: Image Synthesis by Image-Guided Model Inversion

Abstract:We introduce an inversion based method, denoted as IMAge-Guided model INvErsion (IMAGINE), to generate high-quality and diverse images from only a single training sample. We leverage the knowledge of image semantics from a pre-trained classifier to achieve plausible generations via matching multi-level feature representations in the classifier, associated with adversarial training with an external discriminator. IMAGINE enables the synthesis procedure to simultaneously 1) enforce semantic specificity constraints during the synthesis, 2) produce realistic images without generator training, and 3) give users intuitive control over the generation process. With extensive experimental results, we demonstrate qualitatively and quantitatively that IMAGINE performs favorably against state-of-the-art GAN-based and inversion-based methods, across three different image domains (i.e., objects, scenes, and textures).

* Published in CVPR2021

Via

Access Paper or Ask Questions