Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bowen Xue

Stand-In: A Lightweight and Plug-and-Play Identity Control for Video Generation

Aug 12, 2025

Bowen Xue, Qixin Yan, Wenjing Wang, Hao Liu, Chen Li

Abstract:Generating high-fidelity human videos that match user-specified identities is important yet challenging in the field of generative AI. Existing methods often rely on an excessive number of training parameters and lack compatibility with other AIGC tools. In this paper, we propose Stand-In, a lightweight and plug-and-play framework for identity preservation in video generation. Specifically, we introduce a conditional image branch into the pre-trained video generation model. Identity control is achieved through restricted self-attentions with conditional position mapping, and can be learned quickly with only 2000 pairs. Despite incorporating and training just $\sim$1% additional parameters, our framework achieves excellent results in video quality and identity preservation, outperforming other full-parameter training methods. Moreover, our framework can be seamlessly integrated for other tasks, such as subject-driven video generation, pose-referenced video generation, stylization, and face swapping.

Via

Access Paper or Ask Questions

An Improved NeuMIP with Better Accuracy

Jul 19, 2023

Bowen Xue, Shuang Zhao, Henrik Wann Jensen, Zahra Montazeri

Figure 1 for An Improved NeuMIP with Better Accuracy

Figure 2 for An Improved NeuMIP with Better Accuracy

Figure 3 for An Improved NeuMIP with Better Accuracy

Figure 4 for An Improved NeuMIP with Better Accuracy

Abstract:Neural reflectance models are capable of accurately reproducing the spatially-varying appearance of many real-world materials at different scales. However, existing methods have difficulties handling highly glossy materials. To address this problem, we introduce a new neural reflectance model which, compared with existing methods, better preserves not only specular highlights but also fine-grained details. To this end, we enhance the neural network performance by encoding input data to frequency space, inspired by NeRF, to better preserve the details. Furthermore, we introduce a gradient-based loss and employ it in multiple stages, adaptive to the progress of the learning phase. Lastly, we utilize an optional extension to the decoder network using the Inception module for more accurate yet costly performance. We demonstrate the effectiveness of our method using a variety of synthetic and real examples.

Via

Access Paper or Ask Questions