Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Lu

InfiniteYou: Flexible Photo Recrafting While Preserving Your Identity

Mar 20, 2025

Liming Jiang, Qing Yan, Yumin Jia, Zichuan Liu, Hao Kang, Xin Lu

Abstract:Achieving flexible and high-fidelity identity-preserved image generation remains formidable, particularly with advanced Diffusion Transformers (DiTs) like FLUX. We introduce InfiniteYou (InfU), one of the earliest robust frameworks leveraging DiTs for this task. InfU addresses significant issues of existing methods, such as insufficient identity similarity, poor text-image alignment, and low generation quality and aesthetics. Central to InfU is InfuseNet, a component that injects identity features into the DiT base model via residual connections, enhancing identity similarity while maintaining generation capabilities. A multi-stage training strategy, including pretraining and supervised fine-tuning (SFT) with synthetic single-person-multiple-sample (SPMS) data, further improves text-image alignment, ameliorates image quality, and alleviates face copy-pasting. Extensive experiments demonstrate that InfU achieves state-of-the-art performance, surpassing existing baselines. In addition, the plug-and-play design of InfU ensures compatibility with various existing methods, offering a valuable contribution to the broader community.

* Project page: https://bytedance.github.io/InfiniteYou/ Code and model: https://github.com/bytedance/InfiniteYou

Via

Access Paper or Ask Questions

Prediction Interval Construction Method for Electricity Prices

Jan 14, 2025

Xin Lu

Figure 1 for Prediction Interval Construction Method for Electricity Prices

Figure 2 for Prediction Interval Construction Method for Electricity Prices

Figure 3 for Prediction Interval Construction Method for Electricity Prices

Figure 4 for Prediction Interval Construction Method for Electricity Prices

Abstract:Accurate prediction of electricity prices plays an essential role in the electricity market. To reflect the uncertainty of electricity prices, price intervals are predicted. This paper proposes a novel prediction interval construction method. A conditional generative adversarial network is first presented to generate electricity price scenarios, with which the prediction intervals can be constructed. Then, different generated scenarios are stacked to obtain the probability densities, which can be applied to accurately reflect the uncertainty of electricity prices. Furthermore, a reinforced prediction mechanism based on the volatility level of weather factors is introduced to address the spikes or volatile prices. A case study is conducted to verify the effectiveness of the proposed novel prediction interval construction method. The method can also provide the probability density of each price scenario within the prediction interval and has the superiority to address the volatile prices and price spikes with a reinforced prediction mechanism.

Via

Access Paper or Ask Questions

Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models

Dec 15, 2024

Di Wu, Xin Lu, Yanyan Zhao, Bing Qin

Figure 1 for Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models

Figure 2 for Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models

Figure 3 for Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models

Figure 4 for Separate the Wheat from the Chaff: A Post-Hoc Approach to Safety Re-Alignment for Fine-Tuned Language Models

Abstract:Although large language models (LLMs) achieve effective safety alignment at the time of release, they still face various safety challenges. A key issue is that fine-tuning often compromises the safety alignment of LLMs. To address this issue, we propose a method named \textbf{IRR} (\textbf{I}dentify, \textbf{R}emove, and \textbf{R}ecalibrate for Safety Realignment) that performs safety realignment for LLMs. The core of IRR is to identify and remove unsafe delta parameters from the fine-tuned models, while recalibrating the retained ones. We evaluate the effectiveness of IRR across various datasets, including both full fine-tuning and LoRA methods. Our results demonstrate that IRR significantly enhances the safety performance of fine-tuned models on safety benchmarks, such as harmful queries and jailbreak attacks, while maintaining their performance on downstream tasks. The source code is available at: \url{https://anonymous.4open.science/r/IRR-BD4F}.

* 14 pages, 12 figures,

Via

Access Paper or Ask Questions

Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation

Dec 13, 2024

Yu-Jhe Li, Xinyang Zhang, Kun Wan, Lantao Yu, Ajinkya Kale, Xin Lu

Figure 1 for Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation

Figure 2 for Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation

Figure 3 for Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation

Figure 4 for Prompt-Guided Mask Proposal for Two-Stage Open-Vocabulary Segmentation

Abstract:We tackle the challenge of open-vocabulary segmentation, where we need to identify objects from a wide range of categories in different environments, using text prompts as our input. To overcome this challenge, existing methods often use multi-modal models like CLIP, which combine image and text features in a shared embedding space to bridge the gap between limited and extensive vocabulary recognition, resulting in a two-stage approach: In the first stage, a mask generator takes an input image to generate mask proposals, and the in the second stage the target mask is picked based on the query. However, the expected target mask may not exist in the generated mask proposals, which leads to an unexpected output mask. In our work, we propose a novel approach named Prompt-guided Mask Proposal (PMP) where the mask generator takes the input text prompts and generates masks guided by these prompts. Compared with mask proposals generated without input prompts, masks generated by PMP are better aligned with the input prompts. To realize PMP, we designed a cross-attention mechanism between text tokens and query tokens which is capable of generating prompt-guided mask proposals after each decoding. We combined our PMP with several existing works employing a query-based segmentation backbone and the experiments on five benchmark datasets demonstrate the effectiveness of this approach, showcasing significant improvements over the current two-stage models (1% ~ 3% absolute performance gain in terms of mIOU). The steady improvement in performance across these benchmarks indicates the effective generalization of our proposed lightweight prompt-aware method.

* 17 pages. Work done during 2023 summer and has been released

Via

Access Paper or Ask Questions

Multi-object Tracking by Detection and Query: an efficient end-to-end manner

Nov 09, 2024

Shukun Jia, Yichao Cao, Feng Yang, Xin Lu, Xiaobo Lu

Figure 1 for Multi-object Tracking by Detection and Query: an efficient end-to-end manner

Figure 2 for Multi-object Tracking by Detection and Query: an efficient end-to-end manner

Figure 3 for Multi-object Tracking by Detection and Query: an efficient end-to-end manner

Figure 4 for Multi-object Tracking by Detection and Query: an efficient end-to-end manner

Abstract:Multi-object tracking is advancing through two dominant paradigms: traditional tracking by detection and newly emerging tracking by query. In this work, we fuse them together and propose the tracking-by-detection-and-query paradigm, which is achieved by a Learnable Associator. Specifically, the basic information interaction module and the content-position alignment module are proposed for thorough information Interaction among object queries. Tracking results are directly Decoded from these queries. Hence, we name the method as LAID. Compared to tracking-by-query models, LAID achieves competitive tracking accuracy with notably higher training efficiency. With regard to tracking-by-detection methods, experimental results on DanceTrack show that LAID significantly surpasses the state-of-the-art heuristic method by 3.9% on HOTA metric and 6.1% on IDF1 metric. On SportsMOT, LAID also achieves the best score on HOTA metric. By holding low training cost, strong tracking capabilities, and an elegant end-to-end approach all at once, LAID presents a forward-looking direction for the field.

Via

Access Paper or Ask Questions

PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

Sep 29, 2024

Tao Tan, Yining Qian, Ang Lv, Hongzhan Lin, Songhao Wu, Yongbo Wang, Feng Wang, Jingtong Wu, Xin Lu, Rui Yan

Figure 1 for PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

Figure 2 for PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

Figure 3 for PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

Figure 4 for PEAR: Position-Embedding-Agnostic Attention Re-weighting Enhances Retrieval-Augmented Generation with Zero Inference Overhead

Abstract:Large language models (LLMs) enhanced with retrieval-augmented generation (RAG) have introduced a new paradigm for web search. However, the limited context awareness of LLMs degrades their performance on RAG tasks. Existing methods to enhance context awareness are often inefficient, incurring time or memory overhead during inference, and many are tailored to specific position embeddings. In this paper, we propose Position-Embedding-Agnostic attention Re-weighting (PEAR), which enhances the context awareness of LLMs with zero inference overhead. Specifically, on a proxy task focused on context copying, we first detect heads which suppress the models' context awareness thereby diminishing RAG performance. To weaken the impact of these heads, we re-weight their outputs with learnable coefficients. The LLM (with frozen parameters) is optimized by adjusting these coefficients to minimize loss on the proxy task. As a result, the coefficients are optimized to values less than one, thereby reducing their tendency to suppress RAG performance. During inference, the optimized coefficients are fixed to re-weight these heads, regardless of the specific task at hand. Our proposed PEAR offers two major advantages over previous approaches: (1) It introduces zero additional inference overhead in terms of memory usage or inference time, while outperforming competitive baselines in accuracy and efficiency across various RAG tasks. (2) It is independent of position embedding algorithms, ensuring broader applicability.

* preprint

Via

Access Paper or Ask Questions

GSpect: Spectral Filtering for Cross-Scale Graph Classification

Aug 31, 2024

Xiaoyu Zhang, Wenchuan Yang, Jiawei Feng, Bitao Dai, Tianci Bu, Xin Lu

Figure 1 for GSpect: Spectral Filtering for Cross-Scale Graph Classification

Figure 2 for GSpect: Spectral Filtering for Cross-Scale Graph Classification

Figure 3 for GSpect: Spectral Filtering for Cross-Scale Graph Classification

Figure 4 for GSpect: Spectral Filtering for Cross-Scale Graph Classification

Abstract:Identifying structures in common forms the basis for networked systems design and optimization. However, real structures represented by graphs are often of varying sizes, leading to the low accuracy of traditional graph classification methods. These graphs are called cross-scale graphs. To overcome this limitation, in this study, we propose GSpect, an advanced spectral graph filtering model for cross-scale graph classification tasks. Compared with other methods, we use graph wavelet neural networks for the convolution layer of the model, which aggregates multi-scale messages to generate graph representations. We design a spectral-pooling layer which aggregates nodes to one node to reduce the cross-scale graphs to the same size. We collect and construct the cross-scale benchmark data set, MSG (Multi Scale Graphs). Experiments reveal that, on open data sets, GSpect improves the performance of classification accuracy by 1.62% on average, and for a maximum of 3.33% on PROTEINS. On MSG, GSpect improves the performance of classification accuracy by 15.55% on average. GSpect fills the gap in cross-scale graph classification studies and has potential to provide assistance in application research like diagnosis of brain disease by predicting the brain network's label and developing new drugs with molecular structures learned from their counterparts in other systems.

Via

Access Paper or Ask Questions

PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors

Aug 16, 2024

Rongxuan Wang, Xin Lu, Xiaoyang Liu, Xiaoyi Zou, Tongyi Cao, Ying Li

Figure 1 for PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors

Figure 2 for PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors

Figure 3 for PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors

Figure 4 for PriorMapNet: Enhancing Online Vectorized HD Map Construction with Priors

Abstract:Online vectorized High-Definition (HD) map construction is crucial for subsequent prediction and planning tasks in autonomous driving. Following MapTR paradigm, recent works have made noteworthy achievements. However, reference points are randomly initialized in mainstream methods, leading to unstable matching between predictions and ground truth. To address this issue, we introduce PriorMapNet to enhance online vectorized HD map construction with priors. We propose the PPS-Decoder, which provides reference points with position and structure priors. Fitted from the map elements in the dataset, prior reference points lower the learning difficulty and achieve stable matching. Furthermore, we propose the PF-Encoder to enhance the image-to-BEV transformation with BEV feature priors. Besides, we propose the DMD cross-attention, which decouples cross-attention along multi-scale and multi-sample respectively to achieve efficiency. Our proposed PriorMapNet achieves state-of-the-art performance in the online vectorized HD map construction task on nuScenes and Argoverse2 datasets. The code will be released publicly soon.

Via

Access Paper or Ask Questions

Intermittent Semi-working Mask: A New Masking Paradigm for LLMs

Aug 01, 2024

Mingcong Lu, Jiangcai Zhu, Wang Hao, Zheng Li, Shusheng Zhang, Kailai Shao, Chao Chen, Nan Li, Feng Wang, Xin Lu

Figure 1 for Intermittent Semi-working Mask: A New Masking Paradigm for LLMs

Figure 2 for Intermittent Semi-working Mask: A New Masking Paradigm for LLMs

Figure 3 for Intermittent Semi-working Mask: A New Masking Paradigm for LLMs

Figure 4 for Intermittent Semi-working Mask: A New Masking Paradigm for LLMs

Abstract:Multi-turn dialogues are a key interaction method between humans and Large Language Models (LLMs), as conversations extend over multiple rounds, keeping LLMs' high generation quality and low latency is a challenge. Mainstream LLMs can be grouped into two categories based on masking strategy: causal LLM and prefix LLM. Several works have demonstrated that prefix LLMs tend to outperform causal ones in scenarios that heavily depend on historical context such as multi-turn dialogues or in-context learning, thanks to their bidirectional attention on prefix sequences. However, prefix LLMs have an inherent inefficient training problem in multi-turn dialogue datasets. In addition, the attention mechanism of prefix LLM makes it unable to reuse Key-Value Cache (KV Cache) across dialogue rounds to reduce generation latency. In this paper, we propose a novel masking scheme called Intermittent Semi-working Mask (ISM) to address these problems. Specifically, we apply alternate bidirectional and unidirectional attention on queries and answers in the dialogue history. In this way, ISM is able to maintain the high quality of prefix LLM and low generation latency of causal LLM, simultaneously. Extensive experiments illustrate that our ISM achieves significant performance.

Via

Access Paper or Ask Questions

Enhancing context models for point cloud geometry compression with context feature residuals and multi-loss

Jul 11, 2024

Chang Sun, Hui Yuan, Shuai Li, Xin Lu, Raouf Hamzaoui

Figure 1 for Enhancing context models for point cloud geometry compression with context feature residuals and multi-loss

Figure 2 for Enhancing context models for point cloud geometry compression with context feature residuals and multi-loss

Figure 3 for Enhancing context models for point cloud geometry compression with context feature residuals and multi-loss

Figure 4 for Enhancing context models for point cloud geometry compression with context feature residuals and multi-loss

Abstract:In point cloud geometry compression, context models usually use the one-hot encoding of node occupancy as the label, and the cross-entropy between the one-hot encoding and the probability distribution predicted by the context model as the loss function. However, this approach has two main weaknesses. First, the differences between contexts of different nodes are not significant, making it difficult for the context model to accurately predict the probability distribution of node occupancy. Second, as the one-hot encoding is not the actual probability distribution of node occupancy, the cross-entropy loss function is inaccurate. To address these problems, we propose a general structure that can enhance existing context models. We introduce the context feature residuals into the context model to amplify the differences between contexts. We also add a multi-layer perception branch, that uses the mean squared error between its output and node occupancy as a loss function to provide accurate gradients in backpropagation. We validate our method by showing that it can improve the performance of an octree-based model (OctAttention) and a voxel-based model (VoxelDNN) on the object point cloud datasets MPEG 8i and MVUB, as well as the LiDAR point cloud dataset SemanticKITTI.

* IEEE Journal on Emerging and Selected Topics in Circuits and Systems, vol. 14, no. 2, pp. 224-234, Jun. 2024
* 11 pages, 8 figures

Via

Access Paper or Ask Questions