Alert button
Picture for Xinghui Li

Xinghui Li

Alert button

SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning

May 03, 2023
Xinghui Li, Kai Han, Xingchen Wan, Victor Adrian Prisacariu

Figure 1 for SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning
Figure 2 for SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning
Figure 3 for SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning
Figure 4 for SimSC: A Simple Framework for Semantic Correspondence with Temperature Learning

We propose SimSC, a remarkably simple framework, to address the problem of semantic matching only based on the feature backbone. We discover that when fine-tuning ImageNet pre-trained backbone on the semantic matching task, L2 normalization of the feature map, a standard procedure in feature matching, produces an overly smooth matching distribution and significantly hinders the fine-tuning process. By setting an appropriate temperature to the softmax, this over-smoothness can be alleviated and the quality of features can be substantially improved. We employ a learning module to predict the optimal temperature for fine-tuning feature backbones. This module is trained together with the backbone and the temperature is updated online. We evaluate our method on three public datasets and demonstrate that we can achieve accuracy on par with state-of-the-art methods under the same backbone without using a learned matching head. Our method is versatile and works on various types of backbones. We show that the accuracy of our framework can be easily improved by coupling it with more powerful backbones.

Viaarxiv icon

Robust Ellipsoid Fitting Using Axial Distance and Combination

Apr 02, 2023
Min Han, Jiangming Kan, Gongping Yang, Xinghui Li

Figure 1 for Robust Ellipsoid Fitting Using Axial Distance and Combination
Figure 2 for Robust Ellipsoid Fitting Using Axial Distance and Combination
Figure 3 for Robust Ellipsoid Fitting Using Axial Distance and Combination
Figure 4 for Robust Ellipsoid Fitting Using Axial Distance and Combination

In random sample consensus (RANSAC), the problem of ellipsoid fitting can be formulated as a problem of minimization of point-to-model distance, which is realized by maximizing model score. Hence, the performance of ellipsoid fitting is affected by distance metric. In this paper, we proposed a novel distance metric called the axial distance, which is converted from the algebraic distance by introducing a scaling factor to solve nongeometric problems of the algebraic distance. There is complementarity between the axial distance and Sampson distance because their combination is a stricter metric when calculating the model score of sample consensus and the weight of the weighted least squares (WLS) fitting. Subsequently, a novel sample-consensus-based ellipsoid fitting method is proposed by using the combination between the axial distance and Sampson distance (CAS). We compare the proposed method with several representative fitting methods through experiments on synthetic and real datasets. The results show that the proposed method has a higher robustness against outliers, consistently high accuracy, and a speed close to that of the method based on sample consensus.

* 13 pages 
Viaarxiv icon

Refinement for Absolute Pose Regression with Neural Feature Synthesis

Mar 17, 2023
Shuai Chen, Yash Bhalgat, Xinghui Li, Jiawang Bian, Kejie Li, Zirui Wang, Victor Adrian Prisacariu

Figure 1 for Refinement for Absolute Pose Regression with Neural Feature Synthesis
Figure 2 for Refinement for Absolute Pose Regression with Neural Feature Synthesis
Figure 3 for Refinement for Absolute Pose Regression with Neural Feature Synthesis
Figure 4 for Refinement for Absolute Pose Regression with Neural Feature Synthesis

Absolute Pose Regression (APR) methods use deep neural networks to directly regress camera poses from RGB images. Despite their advantages in inference speed and simplicity, these methods still fall short of the accuracy achieved by geometry-based techniques. To address this issue, we propose a new model called the Neural Feature Synthesizer (NeFeS). Our approach encodes 3D geometric features during training and renders dense novel view features at test time to refine estimated camera poses from arbitrary APR methods. Unlike previous APR works that require additional unlabeled training data, our method leverages implicit geometric constraints during test time using a robust feature field. To enhance the robustness of our NeFeS network, we introduce a feature fusion module and a progressive training strategy. Our proposed method improves the state-of-the-art single-image APR accuracy by as much as 54.9% on indoor and outdoor benchmark datasets without additional time-consuming unlabeled data training.

* Paper Website: http://nefes.active.vision 
Viaarxiv icon

Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancy

Jan 14, 2023
Erik Isai Valle Salgado, Haoxin Yan, Yue Hong, Peiyuan Zhu, Shidong Zhu, Chengwei Liao, Yanxiang Wen, Xiu Li, Xiang Qian, Xiaohao Wang, Xinghui Li

Figure 1 for Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancy
Figure 2 for Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancy
Figure 3 for Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancy
Figure 4 for Model-based Transfer Learning for Automatic Optical Inspection based on domain discrepancy

Transfer learning is a promising method for AOI applications since it can significantly shorten sample collection time and improve efficiency in today's smart manufacturing. However, related research enhanced the network models by applying TL without considering the domain similarity among datasets, the data long-tailedness of a source dataset, and mainly used linear transformations to mitigate the lack of samples. This research applies model-based TL via domain similarity to improve the overall performance and data augmentation in both target and source domains to enrich the data quality and reduce the imbalance. Given a group of source datasets from similar industrial processes, we define which group is the most related to the target through the domain discrepancy score and the number of samples each has. Then, we transfer the chosen pre-trained backbone weights to train and fine-tune the target network. Our research suggests increases in the F1 score and the PR curve up to 20% compared with TL using benchmark datasets.

* Proc. SPIE 12317, Optoelectronic Imaging and Multimedia Technology IXMultimedia Technology IX, 2023  
* This is a fix of the published paper "Relational-based transfer learning for automatic optical inspection based on domain discrepancy" 
Viaarxiv icon

Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape and Garment

Aug 05, 2022
Xue Hu, Xinghui Li, Benjamin Busam, Yiren Zhou, Ales Leonardis, Shanxin Yuan

Figure 1 for Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape and Garment
Figure 2 for Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape and Garment
Figure 3 for Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape and Garment
Figure 4 for Disentangling 3D Attributes from a Single 2D Image: Human Pose, Shape and Garment

For visual manipulation tasks, we aim to represent image content with semantically meaningful features. However, learning implicit representations from images often lacks interpretability, especially when attributes are intertwined. We focus on the challenging task of extracting disentangled 3D attributes only from 2D image data. Specifically, we focus on human appearance and learn implicit pose, shape and garment representations of dressed humans from RGB images. Our method learns an embedding with disentangled latent representations of these three image properties and enables meaningful re-assembling of features and property control through a 2D-to-3D encoder-decoder structure. The 3D model is inferred solely from the feature map in the learned embedding space. To the best of our knowledge, our method is the first to achieve cross-domain disentanglement for this highly under-constrained problem. We qualitatively and quantitatively demonstrate our framework's ability to transfer pose, shape, and garments in 3D reconstruction on virtual data and show how an implicit shape loss can benefit the model's ability to recover fine-grained reconstruction details.

Viaarxiv icon

DFNet: Enhance Absolute Pose Regression with Direct Feature Matching

Apr 04, 2022
Shuai Chen, Xinghui Li, Zirui Wang, Victor Adrian Prisacariu

Figure 1 for DFNet: Enhance Absolute Pose Regression with Direct Feature Matching
Figure 2 for DFNet: Enhance Absolute Pose Regression with Direct Feature Matching
Figure 3 for DFNet: Enhance Absolute Pose Regression with Direct Feature Matching
Figure 4 for DFNet: Enhance Absolute Pose Regression with Direct Feature Matching

We introduce a camera relocalization pipeline that combines absolute pose regression (APR) and direct feature matching. Existing photometric-based methods have trouble on scenes with large photometric distortions, e.g. outdoor environments. By incorporating an exposure-adaptive novel view synthesis, our methods can successfully address the challenges. Moreover, by introducing domain-invariant feature matching, our solution can improve pose regression accuracy while using semi-supervised learning on unlabeled data. In particular, the pipeline consists of two components, Novel View Synthesizer and FeatureNet (DFNet). The former synthesizes novel views compensating for changes in exposure and the latter regresses camera poses and extracts robust features that bridge the domain gap between real images and synthetic ones. We show that domain invariant feature matching effectively enhances camera pose estimation both in indoor and outdoor scenes. Hence, our method achieves a state-of-the-art accuracy by outperforming existing single-image APR methods by as much as 56%, comparable to 3D structure-based methods.

Viaarxiv icon

Dual-Resolution Correspondence Networks

Jun 16, 2020
Xinghui Li, Kai Han, Shuda Li, Victor Adrian Prisacariu

Figure 1 for Dual-Resolution Correspondence Networks
Figure 2 for Dual-Resolution Correspondence Networks
Figure 3 for Dual-Resolution Correspondence Networks
Figure 4 for Dual-Resolution Correspondence Networks

We tackle the problem of establishing dense pixel-wise correspondences between a pair of images. In this work, we introduce Dual-Resolution Correspondence Networks (DRC-Net), to obtain pixel-wise correspondences in a coarse-to-fine manner. DRC-Net extracts both coarse- and fine- resolution feature maps. The coarse maps are used to produce a full but coarse 4D correlation tensor, which is then refined by a learnable neighbourhood consensus module. The fine-resolution feature maps are used to obtain the final dense correspondences guided by the refined coarse 4D correlation tensor. The selected coarse-resolution matching scores allow the fine-resolution features to focus only on a limited number of possible matches with high confidence. In this way, DRC-Net dramatically increases matching reliability and localisation accuracy, while avoiding to apply the expensive 4D convolution kernels on fine-resolution feature maps. We comprehensively evaluate our method on large-scale public benchmarks including HPatches, InLoc, and Aachen Day-Night. It achieves the state-of-the-art results on all of them.

Viaarxiv icon