Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xiaoming Liu

RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry

Jul 27, 2024

Shengjie Zhu, Girish Chandar Ganesan, Abhinav Kumar, Xiaoming Liu

Figure 1 for RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry

Figure 2 for RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry

Figure 3 for RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry

Figure 4 for RePLAy: Remove Projective LiDAR Depthmap Artifacts via Exploiting Epipolar Geometry

Abstract:3D sensing is a fundamental task for Autonomous Vehicles. Its deployment often relies on aligned RGB cameras and LiDAR. Despite meticulous synchronization and calibration, systematic misalignment persists in LiDAR projected depthmap. This is due to the physical baseline distance between the two sensors. The artifact is often reflected as background LiDAR incorrectly projected onto the foreground, such as cars and pedestrians. The KITTI dataset uses stereo cameras as a heuristic solution to remove artifacts. However most AV datasets, including nuScenes, Waymo, and DDAD, lack stereo images, making the KITTI solution inapplicable. We propose RePLAy, a parameter-free analytical solution to remove the projective artifacts. We construct a binocular vision system between a hypothesized virtual LiDAR camera and the RGB camera. We then remove the projective artifacts by determining the epipolar occlusion with the proposed analytical solution. We show unanimous improvement in the State-of-The-Art (SoTA) monocular depth estimators and 3D object detectors with the artifacts-free depthmaps.

Via

Access Paper or Ask Questions

Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

Jul 27, 2024

Shengjie Zhu, Xiaoming Liu

Figure 1 for Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

Figure 2 for Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

Figure 3 for Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

Figure 4 for Revisit Self-supervised Depth Estimation with Local Structure-from-Motion

Abstract:Both self-supervised depth estimation and Structure-from-Motion (SfM) recover scene depth from RGB videos. Despite sharing a similar objective, the two approaches are disconnected. Prior works of self-supervision backpropagate losses defined within immediate neighboring frames. Instead of learning-through-loss, this work proposes an alternative scheme by performing local SfM. First, with calibrated RGB or RGB-D images, we employ a depth and correspondence estimator to infer depthmaps and pair-wise correspondence maps. Then, a novel bundle-RANSAC-adjustment algorithm jointly optimizes camera poses and one depth adjustment for each depthmap. Finally, we fix camera poses and employ a NeRF, however, without a neural network, for dense triangulation and geometric verification. Poses, depth adjustments, and triangulated sparse depths are our outputs. For the first time, we show self-supervision within $5$ frames already benefits SoTA supervised depth and correspondence models.

Via

Access Paper or Ask Questions

Open-Set Biometrics: Beyond Good Closed-Set Models

Jul 23, 2024

Yiyang Su, Minchul Kim, Feng Liu, Anil Jain, Xiaoming Liu

Abstract:Biometric recognition has primarily addressed closed-set identification, assuming all probe subjects are in the gallery. However, most practical applications involve open-set biometrics, where probe subjects may or may not be present in the gallery. This poses distinct challenges in effectively distinguishing individuals in the gallery while minimizing false detections. While it is commonly believed that powerful biometric models can excel in both closed- and open-set scenarios, existing loss functions are inconsistent with open-set evaluation. They treat genuine (mated) and imposter (non-mated) similarity scores symmetrically and neglect the relative magnitudes of imposter scores. To address these issues, we simulate open-set evaluation using minibatches during training and introduce novel loss functions: (1) the identification-detection loss optimized for open-set performance under selective thresholds and (2) relative threshold minimization to reduce the maximum negative score for each probe. Across diverse biometric tasks, including face recognition, gait recognition, and person re-identification, our experiments demonstrate the effectiveness of the proposed loss functions, significantly enhancing open-set performance while positively impacting closed-set performance. Our code and models are available at https://github.com/prevso1088/open-set-biometrics.

* Published at ECCV 2024

Via

Access Paper or Ask Questions

Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models

Jun 15, 2024

Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Yichen Wang, Chen Liu, Yu Lan, Chao Shen

Figure 1 for Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models

Figure 2 for Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models

Figure 3 for Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models

Figure 4 for Concentrate Attention: Towards Domain-Generalizable Prompt Optimization for Language Models

Abstract:Recent advances in prompt optimization have notably enhanced the performance of pre-trained language models (PLMs) on downstream tasks. However, the potential of optimized prompts on domain generalization has been under-explored. To explore the nature of prompt generalization on unknown domains, we conduct pilot experiments and find that (i) Prompts gaining more attention weight from PLMs' deep layers are more generalizable and (ii) Prompts with more stable attention distributions in PLMs' deep layers are more generalizable. Thus, we offer a fresh objective towards domain-generalizable prompts optimization named "Concentration", which represents the "lookback" attention from the current decoding token to the prompt tokens, to increase the attention strength on prompts and reduce the fluctuation of attention distribution. We adapt this new objective to popular soft prompt and hard prompt optimization methods, respectively. Extensive experiments demonstrate that our idea improves comparison prompt optimization methods by 1.42% for soft prompt generalization and 2.16% for hard prompt generalization in accuracy on the multi-source domain generalization setting, while maintaining satisfying in-domain performance. The promising results validate the effectiveness of our proposed prompt optimization objective and provide key insights into domain-generalizable prompts.

* Preprint

Via

Access Paper or Ask Questions

StablePT: Towards Stable Prompting for Few-shot Learning via Input Separation

Apr 30, 2024

Xiaoming Liu, Chen Liu, Zhaohan Zhang, Chengzhengxu Li, Longtian Wang, Yu Lan, Chao Shen

Abstract:Large language models have shown their ability to become effective few-shot learners with prompting, revoluting the paradigm of learning with data scarcity. However, this approach largely depends on the quality of prompt initialization, and always exhibits large variability among different runs. Such property makes prompt tuning highly unreliable and vulnerable to poorly constructed prompts, which limits its extension to more real-world applications. To tackle this issue, we propose to treat the hard prompt and soft prompt as separate inputs to mitigate noise brought by the prompt initialization. Furthermore, we optimize soft prompts with contrastive learning for utilizing class-aware information in the training process to maintain model performance. Experimental results demonstrate that \sysname outperforms state-of-the-art methods by 7.20% in accuracy and reduces the standard deviation by 2.02 on average. Furthermore, extensive experiments underscore its robustness and stability across 7 datasets covering various tasks.

* Submitted to ACL 2024

Via

Access Paper or Ask Questions

Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Apr 16, 2024

Ivan DeAndres-Tame, Ruben Tolosana, Pietro Melzi, Ruben Vera-Rodriguez, Minchul Kim, Christian Rathgeb, Xiaoming Liu, Aythami Morales, Julian Fierrez, Javier Ortega-Garcia(+48 more)

Figure 1 for Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Figure 2 for Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Figure 3 for Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Figure 4 for Second Edition FRCSyn Challenge at CVPR 2024: Face Recognition Challenge in the Era of Synthetic Data

Abstract:Synthetic data is gaining increasing relevance for training machine learning models. This is mainly motivated due to several factors such as the lack of real data and intra-class variability, time and errors produced in manual labeling, and in some cases privacy concerns, among others. This paper presents an overview of the 2nd edition of the Face Recognition Challenge in the Era of Synthetic Data (FRCSyn) organized at CVPR 2024. FRCSyn aims to investigate the use of synthetic data in face recognition to address current technological limitations, including data privacy concerns, demographic biases, generalization to novel scenarios, and performance constraints in challenging situations such as aging, pose variations, and occlusions. Unlike the 1st edition, in which synthetic data from DCFace and GANDiffFace methods was only allowed to train face recognition systems, in this 2nd edition we propose new sub-tasks that allow participants to explore novel face generative methods. The outcomes of the 2nd FRCSyn Challenge, along with the proposed experimental protocol and benchmarking contribute significantly to the application of synthetic data to face recognition.

* IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRw 2024)
* arXiv admin note: text overlap with arXiv:2311.10476

Via

Access Paper or Ask Questions

SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

Mar 29, 2024

Abhinav Kumar, Yuliang Guo, Xinyu Huang, Liu Ren, Xiaoming Liu

Figure 1 for SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

Figure 2 for SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

Figure 3 for SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

Figure 4 for SeaBird: Segmentation in Bird's View with Dice Loss Improves Monocular 3D Detection of Large Objects

Abstract:Monocular 3D detectors achieve remarkable performance on cars and smaller objects. However, their performance drops on larger objects, leading to fatal accidents. Some attribute the failures to training data scarcity or their receptive field requirements of large objects. In this paper, we highlight this understudied problem of generalization to large objects. We find that modern frontal detectors struggle to generalize to large objects even on nearly balanced datasets. We argue that the cause of failure is the sensitivity of depth regression losses to noise of larger objects. To bridge this gap, we comprehensively investigate regression and dice losses, examining their robustness under varying error levels and object sizes. We mathematically prove that the dice loss leads to superior noise-robustness and model convergence for large objects compared to regression losses for a simplified case. Leveraging our theoretical insights, we propose SeaBird (Segmentation in Bird's View) as the first step towards generalizing to large objects. SeaBird effectively integrates BEV segmentation on foreground objects for 3D detection, with the segmentation head trained with the dice loss. SeaBird achieves SoTA results on the KITTI-360 leaderboard and improves existing detectors on the nuScenes leaderboard, particularly for large objects. Code and models at https://github.com/abhi1kumar/SeaBird

* CVPR 2024

Via

Access Paper or Ask Questions

KeyPoint Relative Position Encoding for Face Recognition

Mar 21, 2024

Minchul Kim, Yiyang Su, Feng Liu, Anil Jain, Xiaoming Liu

Figure 1 for KeyPoint Relative Position Encoding for Face Recognition

Figure 2 for KeyPoint Relative Position Encoding for Face Recognition

Figure 3 for KeyPoint Relative Position Encoding for Face Recognition

Figure 4 for KeyPoint Relative Position Encoding for Face Recognition

Abstract:In this paper, we address the challenge of making ViT models more robust to unseen affine transformations. Such robustness becomes useful in various recognition tasks such as face recognition when image alignment failures occur. We propose a novel method called KP-RPE, which leverages key points (e.g.~facial landmarks) to make ViT more resilient to scale, translation, and pose variations. We begin with the observation that Relative Position Encoding (RPE) is a good way to bring affine transform generalization to ViTs. RPE, however, can only inject the model with prior knowledge that nearby pixels are more important than far pixels. Keypoint RPE (KP-RPE) is an extension of this principle, where the significance of pixels is not solely dictated by their proximity but also by their relative positions to specific keypoints within the image. By anchoring the significance of pixels around keypoints, the model can more effectively retain spatial relationships, even when those relationships are disrupted by affine transformations. We show the merit of KP-RPE in face and gait recognition. The experimental results demonstrate the effectiveness in improving face recognition performance from low-quality images, particularly where alignment is prone to failure. Code and pre-trained models are available.

* To appear in CVPR2024

Via

Access Paper or Ask Questions

ProMark: Proactive Diffusion Watermarking for Causal Attribution

Mar 14, 2024

Vishal Asnani, John Collomosse, Tu Bui, Xiaoming Liu, Shruti Agarwal

Figure 1 for ProMark: Proactive Diffusion Watermarking for Causal Attribution

Figure 2 for ProMark: Proactive Diffusion Watermarking for Causal Attribution

Figure 3 for ProMark: Proactive Diffusion Watermarking for Causal Attribution

Figure 4 for ProMark: Proactive Diffusion Watermarking for Causal Attribution

Abstract:Generative AI (GenAI) is transforming creative workflows through the capability to synthesize and manipulate images via high-level prompts. Yet creatives are not well supported to receive recognition or reward for the use of their content in GenAI training. To this end, we propose ProMark, a causal attribution technique to attribute a synthetically generated image to its training data concepts like objects, motifs, templates, artists, or styles. The concept information is proactively embedded into the input training images using imperceptible watermarks, and the diffusion models (unconditional or conditional) are trained to retain the corresponding watermarks in generated images. We show that we can embed as many as $2^{16}$ unique watermarks into the training data, and each training image can contain more than one watermark. ProMark can maintain image quality whilst outperforming correlation-based attribution. Finally, several qualitative examples are presented, providing the confidence that the presence of the watermark conveys a causative relationship between training data and synthetic images.

* Accepted to CVPR 2024

Via

Access Paper or Ask Questions

BigGait: Learning Gait Representation You Want by Large Vision Models

Feb 29, 2024

Dingqiang Ye, Chao Fan, Jingzhe Ma, Xiaoming Liu, Shiqi Yu

Figure 1 for BigGait: Learning Gait Representation You Want by Large Vision Models

Figure 2 for BigGait: Learning Gait Representation You Want by Large Vision Models

Figure 3 for BigGait: Learning Gait Representation You Want by Large Vision Models

Figure 4 for BigGait: Learning Gait Representation You Want by Large Vision Models

Abstract:Gait recognition stands as one of the most pivotal remote identification technologies and progressively expands across research and industrial communities. However, existing gait recognition methods heavily rely on task-specific upstream driven by supervised learning to provide explicit gait representations, which inevitably introduce expensive annotation costs and potentially cause cumulative errors. Escaping from this trend, this work explores effective gait representations based on the all-purpose knowledge produced by task-agnostic Large Vision Models (LVMs) and proposes a simple yet efficient gait framework, termed BigGait. Specifically, the Gait Representation Extractor (GRE) in BigGait effectively transforms all-purpose knowledge into implicit gait features in an unsupervised manner, drawing from design principles of established gait representation construction approaches. Experimental results on CCPG, CAISA-B* and SUSTech1K indicate that BigGait significantly outperforms the previous methods in both self-domain and cross-domain tasks in most cases, and provides a more practical paradigm for learning the next-generation gait representation. Eventually, we delve into prospective challenges and promising directions in LVMs-based gait recognition, aiming to inspire future work in this emerging topic. The source code will be available at https://github.com/ShiqiYu/OpenGait.

Via

Access Paper or Ask Questions