Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Feng Liu

State Key Laboratory of the Control and Simulation of Power Systems and Generation Equipment, Tsinghua University

DynamicLip: Shape-Independent Continuous Authentication via Lip Articulator Dynamics

Jan 02, 2025

Huashan Chen, Yifan Xu, Yue Feng, Ming Jian, Feng Liu, Pengfei Hu, Kebin Peng, Sen He, Zi Wang

Abstract:Biometrics authentication has become increasingly popular due to its security and convenience; however, traditional biometrics are becoming less desirable in scenarios such as new mobile devices, Virtual Reality, and Smart Vehicles. For example, while face authentication is widely used, it suffers from significant privacy concerns. The collection of complete facial data makes it less desirable for privacy-sensitive applications. Lip authentication, on the other hand, has emerged as a promising biometrics method. However, existing lip-based authentication methods heavily depend on static lip shape when the mouth is closed, which can be less robust due to lip shape dynamic motion and can barely work when the user is speaking. In this paper, we revisit the nature of lip biometrics and extract shape-independent features from the lips. We study the dynamic characteristics of lip biometrics based on articulator motion. Building on the knowledge, we propose a system for shape-independent continuous authentication via lip articulator dynamics. This system enables robust, shape-independent and continuous authentication, making it particularly suitable for scenarios with high security and privacy requirements. We conducted comprehensive experiments in different environments and attack scenarios and collected a dataset of 50 subjects. The results indicate that our system achieves an overall accuracy of 99.06% and demonstrates robustness under advanced mimic attacks and AI deepfake attacks, making it a viable solution for continuous biometric authentication in various applications.

Via

Access Paper or Ask Questions

FloNa: Floor Plan Guided Embodied Visual Navigation

Dec 24, 2024

Jiaxin Li, Weiqi Huang, Zan Wang, Wei Liang, Huijun Di, Feng Liu

Abstract:Humans naturally rely on floor plans to navigate in unfamiliar environments, as they are readily available, reliable, and provide rich geometrical guidance. However, existing visual navigation settings overlook this valuable prior knowledge, leading to limited efficiency and accuracy. To eliminate this gap, we introduce a novel navigation task: Floor Plan Visual Navigation (FloNa), the first attempt to incorporate floor plan into embodied visual navigation. While the floor plan offers significant advantages, two key challenges emerge: (1) handling the spatial inconsistency between the floor plan and the actual scene layout for collision-free navigation, and (2) aligning observed images with the floor plan sketch despite their distinct modalities. To address these challenges, we propose FloDiff, a novel diffusion policy framework incorporating a localization module to facilitate alignment between the current observation and the floor plan. We further collect $20k$ navigation episodes across $117$ scenes in the iGibson simulator to support the training and evaluation. Extensive experiments demonstrate the effectiveness and efficiency of our framework in unfamiliar scenes using floor plan knowledge. Project website: https://gauleejx.github.io/flona/.

* Accepted by AAAI 2025

Via

Access Paper or Ask Questions

Move-in-2D: 2D-Conditioned Human Motion Generation

Dec 17, 2024

Hsin-Ping Huang, Yang Zhou, Jui-Hsien Wang, Difan Liu, Feng Liu, Ming-Hsuan Yang, Zhan Xu

Figure 1 for Move-in-2D: 2D-Conditioned Human Motion Generation

Figure 2 for Move-in-2D: 2D-Conditioned Human Motion Generation

Figure 3 for Move-in-2D: 2D-Conditioned Human Motion Generation

Figure 4 for Move-in-2D: 2D-Conditioned Human Motion Generation

Abstract:Generating realistic human videos remains a challenging task, with the most effective methods currently relying on a human motion sequence as a control signal. Existing approaches often use existing motion extracted from other videos, which restricts applications to specific motion types and global scene matching. We propose Move-in-2D, a novel approach to generate human motion sequences conditioned on a scene image, allowing for diverse motion that adapts to different scenes. Our approach utilizes a diffusion model that accepts both a scene image and text prompt as inputs, producing a motion sequence tailored to the scene. To train this model, we collect a large-scale video dataset featuring single-human activities, annotating each video with the corresponding human motion as the target output. Experiments demonstrate that our method effectively predicts human motion that aligns with the scene image after projection. Furthermore, we show that the generated motion sequence improves human motion quality in video synthesis tasks.

* Project page: https://hhsinping.github.io/Move-in-2D/

Via

Access Paper or Ask Questions

Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

Dec 11, 2024

Shuhai Zhang, Jiahao Yang, Hui Luo, Jie Chen, Li Wang, Feng Liu, Bo Han, Mingkui Tan

Figure 1 for Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

Figure 2 for Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

Figure 3 for Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

Figure 4 for Adversarial Purification by Consistency-aware Latent Space Optimization on Data Manifolds

Abstract:Deep neural networks (DNNs) are vulnerable to adversarial samples crafted by adding imperceptible perturbations to clean data, potentially leading to incorrect and dangerous predictions. Adversarial purification has been an effective means to improve DNNs robustness by removing these perturbations before feeding the data into the model. However, it faces significant challenges in preserving key structural and semantic information of data, as the imperceptible nature of adversarial perturbations makes it hard to avoid over-correcting, which can destroy important information and degrade model performance. In this paper, we break away from traditional adversarial purification methods by focusing on the clean data manifold. To this end, we reveal that samples generated by a well-trained generative model are close to clean ones but far from adversarial ones. Leveraging this insight, we propose Consistency Model-based Adversarial Purification (CMAP), which optimizes vectors within the latent space of a pre-trained consistency model to generate samples for restoring clean data. Specifically, 1) we propose a \textit{Perceptual consistency restoration} mechanism by minimizing the discrepancy between generated samples and input samples in both pixel and perceptual spaces. 2) To maintain the optimized latent vectors within the valid data manifold, we introduce a \textit{Latent distribution consistency constraint} strategy to align generated samples with the clean data distribution. 3) We also apply a \textit{Latent vector consistency prediction} scheme via an ensemble approach to enhance prediction reliability. CMAP fundamentally addresses adversarial perturbations at their source, providing a robust purification. Extensive experiments on CIFAR-10 and ImageNet-100 show that our CMAP significantly enhances robustness against strong adversarial attacks while preserving high natural accuracy.

* 17 pages, 8 figures

Via

Access Paper or Ask Questions

Semantic Retrieval at Walmart

Dec 05, 2024

Alessandro Magnani, Feng Liu, Suthee Chaidaroon, Sachin Yadav, Praveen Reddy Suram, Ajit Puthenputhussery, Sijie Chen, Min Xie, Anirudh Kashi, Tony Lee(+1 more)

Figure 1 for Semantic Retrieval at Walmart

Figure 2 for Semantic Retrieval at Walmart

Figure 3 for Semantic Retrieval at Walmart

Figure 4 for Semantic Retrieval at Walmart

Abstract:In product search, the retrieval of candidate products before re-ranking is more critical and challenging than other search like web search, especially for tail queries, which have a complex and specific search intent. In this paper, we present a hybrid system for e-commerce search deployed at Walmart that combines traditional inverted index and embedding-based neural retrieval to better answer user tail queries. Our system significantly improved the relevance of the search engine, measured by both offline and online evaluations. The improvements were achieved through a combination of different approaches. We present a new technique to train the neural model at scale. and describe how the system was deployed in production with little impact on response time. We highlight multiple learnings and practical tricks that were used in the deployment of this system.

* 9 page, 2 figures, 10 tables, KDD 2022

Via

Access Paper or Ask Questions

NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Dec 02, 2024

Bikang Pan, Qun Li, Xiaoying Tang, Wei Huang, Zhen Fang, Feng Liu, Jingya Wang, Jingyi Yu, Ye Shi

Figure 1 for NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Figure 2 for NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Figure 3 for NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Figure 4 for NLPrompt: Noise-Label Prompt Learning for Vision-Language Models

Abstract:The emergence of vision-language foundation models, such as CLIP, has revolutionized image-text representation, enabling a broad range of applications via prompt learning. Despite its promise, real-world datasets often contain noisy labels that can degrade prompt learning performance. In this paper, we demonstrate that using mean absolute error (MAE) loss in prompt learning, named PromptMAE, significantly enhances robustness against noisy labels while maintaining high accuracy. Though MAE is straightforward and recognized for its robustness, it is rarely used in noisy-label learning due to its slow convergence and poor performance outside prompt learning scenarios. To elucidate the robustness of PromptMAE, we leverage feature learning theory to show that MAE can suppress the influence of noisy samples, thereby improving the signal-to-noise ratio and enhancing overall robustness. Additionally, we introduce PromptOT, a prompt-based optimal transport data purification method to enhance the robustness further. PromptOT employs text encoder representations in vision-language models as prototypes to construct an optimal transportation matrix. This matrix effectively partitions datasets into clean and noisy subsets, allowing for the application of cross-entropy loss to the clean subset and MAE loss to the noisy subset. Our Noise-Label Prompt Learning method, named NLPrompt, offers a simple and efficient approach that leverages the expressive representation and precise alignment capabilities of vision-language models for robust prompt learning. We validate NLPrompt through extensive experiments across various noise settings, demonstrating significant performance improvements.

Via

Access Paper or Ask Questions

Revisit Non-parametric Two-sample Testing as a Semi-supervised Learning Problem

Nov 30, 2024

Xunye Tian, Liuhua Peng, Zhijian Zhou, Mingming Gong, Feng Liu

Figure 1 for Revisit Non-parametric Two-sample Testing as a Semi-supervised Learning Problem

Figure 2 for Revisit Non-parametric Two-sample Testing as a Semi-supervised Learning Problem

Figure 3 for Revisit Non-parametric Two-sample Testing as a Semi-supervised Learning Problem

Figure 4 for Revisit Non-parametric Two-sample Testing as a Semi-supervised Learning Problem

Abstract:Learning effective data representations is crucial in answering if two samples X and Y are from the same distribution (a.k.a. the non-parametric two-sample testing problem), which can be categorized into: i) learning discriminative representations (DRs) that distinguish between two samples in a supervised-learning paradigm, and ii) learning inherent representations (IRs) focusing on data's inherent features in an unsupervised-learning paradigm. However, both paradigms have issues: learning DRs reduces the data points available for the two-sample testing phase, and learning purely IRs misses discriminative cues. To mitigate both issues, we propose a novel perspective to consider non-parametric two-sample testing as a semi-supervised learning (SSL) problem, introducing the SSL-based Classifier Two-Sample Test (SSL-C2ST) framework. While a straightforward implementation of SSL-C2ST might directly use existing state-of-the-art (SOTA) SSL methods to train a classifier with labeled data (with sample indexes X or Y) and unlabeled data (the remaining ones in the two samples), conventional two-sample testing data often exhibits substantial overlap between samples and violates SSL methods' assumptions, resulting in low test power. Therefore, we propose a two-step approach: first, learn IRs using all data, then fine-tune IRs with only labelled data to learn DRs, which can both utilize information from whole dataset and adapt the discriminative power to the given data. Extensive experiments and theoretical analysis demonstrate that SSL-C2ST outperforms traditional C2ST by effectively leveraging unlabeled data. We also offer a stronger empirically designed test achieving the SOTA performance in many two-sample testing datasets.

Via

Access Paper or Ask Questions

Automatic Differentiation-based Full Waveform Inversion with Flexible Workflows

Nov 30, 2024

Feng Liu, Haipeng Li, Guangyuan Zou, Junlun Li

Figure 1 for Automatic Differentiation-based Full Waveform Inversion with Flexible Workflows

Figure 2 for Automatic Differentiation-based Full Waveform Inversion with Flexible Workflows

Figure 3 for Automatic Differentiation-based Full Waveform Inversion with Flexible Workflows

Figure 4 for Automatic Differentiation-based Full Waveform Inversion with Flexible Workflows

Abstract:Full waveform inversion (FWI) is able to construct high-resolution subsurface models by iteratively minimizing discrepancies between observed and simulated seismic data. However, its implementation can be rather involved for complex wave equations, objective functions, or regularization. Recently, automatic differentiation (AD) has proven to be effective in simplifying solutions of various inverse problems, including FWI. In this study, we present an open-source AD-based FWI framework (ADFWI), which is designed to simplify the design, development, and evaluation of novel approaches in FWI with flexibility. The AD-based framework not only includes forword modeling and associated gradient computations for wave equations in various types of media from isotropic acoustic to vertically or horizontally transverse isotropic elastic, but also incorporates a suite of objective functions, regularization techniques, and optimization algorithms. By leveraging state-of-the-art AD, objective functions such as soft dynamic time warping and Wasserstein distance, which are difficult to apply in traditional FWI are also easily integrated into ADFWI. In addition, ADFWI is integrated with deep learning for implicit model reparameterization via neural networks, which not only introduces learned regularization but also allows rapid estimation of uncertainty through dropout. To manage high memory demands in large-scale inversion associated with AD, the proposed framework adopts strategies such as mini-batch and checkpointing. Through comprehensive evaluations, we demonstrate the novelty, practicality and robustness of ADFWI, which can be used to address challenges in FWI and as a workbench for prompt experiments and the development of new inversion strategies.

* Manuscript including 14 pages supplement. Code link: https://github.com/liufeng2317/ADFWI

Via

Access Paper or Ask Questions

Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Nov 28, 2024

Feng Liu, Shiwei Zhang, Xiaofeng Wang, Yujie Wei, Haonan Qiu, Yuzhong Zhao, Yingya Zhang, Qixiang Ye, Fang Wan

Figure 1 for Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Figure 2 for Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Figure 3 for Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Figure 4 for Timestep Embedding Tells: It's Time to Cache for Video Diffusion Model

Abstract:As a fundamental backbone for video generation, diffusion models are challenged by low inference speed due to the sequential nature of denoising. Previous methods speed up the models by caching and reusing model outputs at uniformly selected timesteps. However, such a strategy neglects the fact that differences among model outputs are not uniform across timesteps, which hinders selecting the appropriate model outputs to cache, leading to a poor balance between inference efficiency and visual quality. In this study, we introduce Timestep Embedding Aware Cache (TeaCache), a training-free caching approach that estimates and leverages the fluctuating differences among model outputs across timesteps. Rather than directly using the time-consuming model outputs, TeaCache focuses on model inputs, which have a strong correlation with the modeloutputs while incurring negligible computational cost. TeaCache first modulates the noisy inputs using the timestep embeddings to ensure their differences better approximating those of model outputs. TeaCache then introduces a rescaling strategy to refine the estimated differences and utilizes them to indicate output caching. Experiments show that TeaCache achieves up to 4.41x acceleration over Open-Sora-Plan with negligible (-0.07% Vbench score) degradation of visual quality.

* Project: https://liewfeng.github.io/TeaCache

Via

Access Paper or Ask Questions

DualCast: Disentangling Aperiodic Events from Traffic Series with a Dual-Branch Model

Nov 27, 2024

Xinyu Su, Feng Liu, Yanchuan Chang, Egemen Tanin, Majid Sarvi, Jianzhong Qi

Figure 1 for DualCast: Disentangling Aperiodic Events from Traffic Series with a Dual-Branch Model

Figure 2 for DualCast: Disentangling Aperiodic Events from Traffic Series with a Dual-Branch Model

Figure 3 for DualCast: Disentangling Aperiodic Events from Traffic Series with a Dual-Branch Model

Figure 4 for DualCast: Disentangling Aperiodic Events from Traffic Series with a Dual-Branch Model

Abstract:Traffic forecasting is an important problem in the operation and optimisation of transportation systems. State-of-the-art solutions train machine learning models by minimising the mean forecasting errors on the training data. The trained models often favour periodic events instead of aperiodic ones in their prediction results, as periodic events often prevail in the training data. While offering critical optimisation opportunities, aperiodic events such as traffic incidents may be missed by the existing models. To address this issue, we propose DualCast -- a model framework to enhance the learning capability of traffic forecasting models, especially for aperiodic events. DualCast takes a dual-branch architecture, to disentangle traffic signals into two types, one reflecting intrinsic {spatial-temporal} patterns and the other reflecting external environment contexts including aperiodic events. We further propose a cross-time attention mechanism, to capture high-order spatial-temporal relationships from both periodic and aperiodic patterns. DualCast is versatile. We integrate it with recent traffic forecasting models, consistently reducing their forecasting errors by up to 9.6% on multiple real datasets.

Via

Access Paper or Ask Questions