Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Li Li

University of Southern California

Partition Map-Based Fast Block Partitioning for VVC Inter Coding

Apr 25, 2025

Xinmin Feng, Zhuoyuan Li, Li Li, Dong Liu, Feng Wu

Abstract:Among the new techniques of Versatile Video Coding (VVC), the quadtree with nested multi-type tree (QT+MTT) block structure yields significant coding gains by providing more flexible block partitioning patterns. However, the recursive partition search in the VVC encoder increases the encoder complexity substantially. To address this issue, we propose a partition map-based algorithm to pursue fast block partitioning in inter coding. Based on our previous work on partition map-based methods for intra coding, we analyze the characteristics of VVC inter coding, and thus improve the partition map by incorporating an MTT mask for early termination. Next, we develop a neural network that uses both spatial and temporal features to predict the partition map. It consists of several special designs including stacked top-down and bottom-up processing, quantization parameter modulation layers, and partitioning-adaptive warping. Furthermore, we present a dual-threshold decision scheme to achieve a fine-grained trade-off between complexity reduction and rate-distortion (RD) performance loss. The experimental results demonstrate that the proposed method achieves an average 51.30% encoding time saving with a 2.12% Bjontegaard Delta Bit Rate (BDBR) under the random access configuration.

* 23 pages, 26 figures. Project page: https://github.com/ustc-ivclab/IPM

Via

Access Paper or Ask Questions

Privacy Protection Against Personalized Text-to-Image Synthesis via Cross-image Consistency Constraints

Apr 17, 2025

Guanyu Wang, Kailong Wang, Yihao Huang, Mingyi Zhou, Zhang Qing cnwatcher, Geguang Pu, Li Li

Abstract:The rapid advancement of diffusion models and personalization techniques has made it possible to recreate individual portraits from just a few publicly available images. While such capabilities empower various creative applications, they also introduce serious privacy concerns, as adversaries can exploit them to generate highly realistic impersonations. To counter these threats, anti-personalization methods have been proposed, which add adversarial perturbations to published images to disrupt the training of personalization models. However, existing approaches largely overlook the intrinsic multi-image nature of personalization and instead adopt a naive strategy of applying perturbations independently, as commonly done in single-image settings. This neglects the opportunity to leverage inter-image relationships for stronger privacy protection. Therefore, we advocate for a group-level perspective on privacy protection against personalization. Specifically, we introduce Cross-image Anti-Personalization (CAP), a novel framework that enhances resistance to personalization by enforcing style consistency across perturbed images. Furthermore, we develop a dynamic ratio adjustment strategy that adaptively balances the impact of the consistency loss throughout the attack iterations. Extensive experiments on the classical CelebHQ and VGGFace2 benchmarks show that CAP substantially improves existing methods.

Via

Access Paper or Ask Questions

DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering

Apr 13, 2025

Yexing Xu, Longguang Wang, Minglin Chen, Sheng Ao, Li Li, Yulan Guo

Figure 1 for DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering

Figure 2 for DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering

Figure 3 for DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering

Figure 4 for DropoutGS: Dropping Out Gaussians for Better Sparse-view Rendering

Abstract:Although 3D Gaussian Splatting (3DGS) has demonstrated promising results in novel view synthesis, its performance degrades dramatically with sparse inputs and generates undesirable artifacts. As the number of training views decreases, the novel view synthesis task degrades to a highly under-determined problem such that existing methods suffer from the notorious overfitting issue. Interestingly, we observe that models with fewer Gaussian primitives exhibit less overfitting under sparse inputs. Inspired by this observation, we propose a Random Dropout Regularization (RDR) to exploit the advantages of low-complexity models to alleviate overfitting. In addition, to remedy the lack of high-frequency details for these models, an Edge-guided Splitting Strategy (ESS) is developed. With these two techniques, our method (termed DropoutGS) provides a simple yet effective plug-in approach to improve the generalization performance of existing 3DGS methods. Extensive experiments show that our DropoutGS produces state-of-the-art performance under sparse views on benchmark datasets including Blender, LLFF, and DTU. The project page is at: https://xuyx55.github.io/DropoutGS/.

* Accepted by CVPR 2025

Via

Access Paper or Ask Questions

Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Apr 06, 2025

Cheng Chang, Jingwei Ge, Jiazhe Guo, Zelin Guo, Binghong Jiang, Li Li

Figure 1 for Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Figure 2 for Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Figure 3 for Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Figure 4 for Driving-RAG: Driving Scenarios Embedding, Search, and RAG Applications

Abstract:Driving scenario data play an increasingly vital role in the development of intelligent vehicles and autonomous driving. Accurate and efficient scenario data search is critical for both online vehicle decision-making and planning, and offline scenario generation and simulations, as it allows for leveraging the scenario experiences to improve the overall performance. Especially with the application of large language models (LLMs) and Retrieval-Augmented-Generation (RAG) systems in autonomous driving, urgent requirements are put forward. In this paper, we introduce the Driving-RAG framework to address the challenges of efficient scenario data embedding, search, and applications for RAG systems. Our embedding model aligns fundamental scenario information and scenario distance metrics in the vector space. The typical scenario sampling method combined with hierarchical navigable small world can perform efficient scenario vector search to achieve high efficiency without sacrificing accuracy. In addition, the reorganization mechanism by graph knowledge enhances the relevance to the prompt scenarios and augment LLM generation. We demonstrate the effectiveness of the proposed framework on typical trajectory planning task for complex interactive scenarios such as ramps and intersections, showcasing its advantages for RAG applications.

Via

Access Paper or Ask Questions

Towards Understanding the Optimization Mechanisms in Deep Learning

Mar 29, 2025

Binchuan Qi, Wei Gong, Li Li

Abstract:In this paper, we adopt a probability distribution estimation perspective to explore the optimization mechanisms of supervised classification using deep neural networks. We demonstrate that, when employing the Fenchel-Young loss, despite the non-convex nature of the fitting error with respect to the model's parameters, global optimal solutions can be approximated by simultaneously minimizing both the gradient norm and the structural error. The former can be controlled through gradient descent algorithms. For the latter, we prove that it can be managed by increasing the number of parameters and ensuring parameter independence, thereby providing theoretical insights into mechanisms such as over-parameterization and random initialization. Ultimately, the paper validates the key conclusions of the proposed method through empirical results, illustrating its practical effectiveness.

Via

Access Paper or Ask Questions

Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression

Mar 27, 2025

Hanyue Tu, Siqi Wu, Li Li, Wengang Zhou, Houqiang Li

Figure 1 for Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression

Figure 2 for Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression

Figure 3 for Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression

Figure 4 for Multi-Scale Invertible Neural Network for Wide-Range Variable-Rate Learned Image Compression

Abstract:Autoencoder-based structures have dominated recent learned image compression methods. However, the inherent information loss associated with autoencoders limits their rate-distortion performance at high bit rates and restricts their flexibility of rate adaptation. In this paper, we present a variable-rate image compression model based on invertible transform to overcome these limitations. Specifically, we design a lightweight multi-scale invertible neural network, which bijectively maps the input image into multi-scale latent representations. To improve the compression efficiency, a multi-scale spatial-channel context model with extended gain units is devised to estimate the entropy of the latent representation from high to low levels. Experimental results demonstrate that the proposed method achieves state-of-the-art performance compared to existing variable-rate methods, and remains competitive with recent multi-model approaches. Notably, our method is the first learned image compression solution that outperforms VVC across a very wide range of bit rates using a single model, especially at high bit rates.The source code is available at \href{https://github.com/hytu99/MSINN-VRLIC}{https://github.com/hytu99/MSINN-VRLIC}.

* Accepted to IEEE Transactions on Multimedia 2025

Via

Access Paper or Ask Questions

Equivariant Image Modeling

Mar 24, 2025

Ruixiao Dong, Mengde Xu, Zigang Geng, Li Li, Han Hu, Shuyang Gu

Abstract:Current generative models, such as autoregressive and diffusion approaches, decompose high-dimensional data distribution learning into a series of simpler subtasks. However, inherent conflicts arise during the joint optimization of these subtasks, and existing solutions fail to resolve such conflicts without sacrificing efficiency or scalability. We propose a novel equivariant image modeling framework that inherently aligns optimization targets across subtasks by leveraging the translation invariance of natural visual signals. Our method introduces (1) column-wise tokenization which enhances translational symmetry along the horizontal axis, and (2) windowed causal attention which enforces consistent contextual relationships across positions. Evaluated on class-conditioned ImageNet generation at 256x256 resolution, our approach achieves performance comparable to state-of-the-art AR models while using fewer computational resources. Systematic analysis demonstrates that enhanced equivariance reduces inter-task conflicts, significantly improving zero-shot generalization and enabling ultra-long image synthesis. This work establishes the first framework for task-aligned decomposition in generative modeling, offering insights into efficient parameter sharing and conflict-free optimization. The code and models are publicly available at https://github.com/drx-code/EquivariantModeling.

Via

Access Paper or Ask Questions

TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba

Mar 17, 2025

Jiaxu Liu, Li Li, Hubert P. H. Shum, Toby P. Breckon

Figure 1 for TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba

Figure 2 for TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba

Figure 3 for TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba

Figure 4 for TFDM: Time-Variant Frequency-Based Point Cloud Diffusion with Mamba

Abstract:Diffusion models currently demonstrate impressive performance over various generative tasks. Recent work on image diffusion highlights the strong capabilities of Mamba (state space models) due to its efficient handling of long-range dependencies and sequential data modeling. Unfortunately, joint consideration of state space models with 3D point cloud generation remains limited. To harness the powerful capabilities of the Mamba model for 3D point cloud generation, we propose a novel diffusion framework containing dual latent Mamba block (DM-Block) and a time-variant frequency encoder (TF-Encoder). The DM-Block apply a space-filling curve to reorder points into sequences suitable for Mamba state-space modeling, while operating in a latent space to mitigate the computational overhead that arises from direct 3D data processing. Meanwhile, the TF-Encoder takes advantage of the ability of the diffusion model to refine fine details in later recovery stages by prioritizing key points within the U-Net architecture. This frequency-based mechanism ensures enhanced detail quality in the final stages of generation. Experimental results on the ShapeNet-v2 dataset demonstrate that our method achieves state-of-the-art performance (ShapeNet-v2: 0.14\% on 1-NNA-Abs50 EMD and 57.90\% on COV EMD) on certain metrics for specific categories while reducing computational parameters and inference time by up to 10$\times$ and 9$\times$, respectively. Source code is available in Supplementary Materials and will be released upon accpetance.

Via

Access Paper or Ask Questions

A Survey on Federated Fine-tuning of Large Language Models

Mar 15, 2025

Yebo Wu, Chunlin Tian, Jingguang Li, He Sun, Kahou Tam, Li Li, Chengzhong Xu

Figure 1 for A Survey on Federated Fine-tuning of Large Language Models

Figure 2 for A Survey on Federated Fine-tuning of Large Language Models

Figure 3 for A Survey on Federated Fine-tuning of Large Language Models

Figure 4 for A Survey on Federated Fine-tuning of Large Language Models

Abstract:Large Language Models (LLMs) have achieved remarkable success across a wide range of tasks, with fine-tuning playing a pivotal role in adapting them to specific downstream applications. Federated Learning (FL) offers a promising approach that enables collaborative model adaptation while ensuring data privacy, i.e., FedLLM. In this survey, we provide a systematic and thorough review of the integration of LLMs with FL. Specifically, we first trace the historical evolution of both LLMs and FL, while summarizing relevant prior surveys. We then present an in-depth analysis of the fundamental challenges encountered in deploying FedLLM. Following this, we conduct an extensive study of existing parameter-efficient fine-tuning (PEFT) methods and explore their applicability in FL. Furthermore, we introduce a comprehensive evaluation benchmark to rigorously assess FedLLM performance and discuss its diverse real-world applications across multiple domains. Finally, we identify critical open challenges and outline promising research directions to drive future advancements in FedLLM. We maintain an active \href{https://github.com/Clin0212/Awesome-Federated-LLM-Learning}{GitHub repository} tracking cutting-edge advancements. This survey serves as a foundational resource for researchers and practitioners, offering insights into the evolving landscape of federated fine-tuning for LLMs while guiding future innovations in privacy-preserving AI.

Via

Access Paper or Ask Questions

DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

Mar 10, 2025

Ming Wang, Fang Wang, Minghao Hu, Li He, Haiyang Wang, Jun Zhang, Tianwei Yan, Li Li, Zhunchen Luo, Wei Luo(+2 more)

Figure 1 for DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

Figure 2 for DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

Figure 3 for DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

Figure 4 for DeFine: A Decomposed and Fine-Grained Annotated Dataset for Long-form Article Generation

Abstract:Long-form article generation (LFAG) presents challenges such as maintaining logical consistency, comprehensive topic coverage, and narrative coherence across extended articles. Existing datasets often lack both the hierarchical structure and fine-grained annotation needed to effectively decompose tasks, resulting in shallow, disorganized article generation. To address these limitations, we introduce DeFine, a Decomposed and Fine-grained annotated dataset for long-form article generation. DeFine is characterized by its hierarchical decomposition strategy and the integration of domain-specific knowledge with multi-level annotations, ensuring granular control and enhanced depth in article generation. To construct the dataset, a multi-agent collaborative pipeline is proposed, which systematically segments the generation process into four parts: Data Miner, Cite Retreiver, Q&A Annotator and Data Cleaner. To validate the effectiveness of DeFine, we designed and tested three LFAG baselines: the web retrieval, the local retrieval, and the grounded reference. We fine-tuned the Qwen2-7b-Instruct model using the DeFine training dataset. The experimental results showed significant improvements in text quality, specifically in topic coverage, depth of information, and content fidelity. Our dataset publicly available to facilitate future research.

Via

Access Paper or Ask Questions