Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Fengqing Zhu

EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting

Aug 13, 2025

Yuning Huang, Jiahao Pang, Fengqing Zhu, Dong Tian

Figure 1 for EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting

Figure 2 for EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting

Figure 3 for EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting

Figure 4 for EntropyGS: An Efficient Entropy Coding on 3D Gaussian Splatting

Abstract:As an emerging novel view synthesis approach, 3D Gaussian Splatting (3DGS) demonstrates fast training/rendering with superior visual quality. The two tasks of 3DGS, Gaussian creation and view rendering, are typically separated over time or devices, and thus storage/transmission and finally compression of 3DGS Gaussians become necessary. We begin with a correlation and statistical analysis of 3DGS Gaussian attributes. An inspiring finding in this work reveals that spherical harmonic AC attributes precisely follow Laplace distributions, while mixtures of Gaussian distributions can approximate rotation, scaling, and opacity. Additionally, harmonic AC attributes manifest weak correlations with other attributes except for inherited correlations from a color space. A factorized and parameterized entropy coding method, EntropyGS, is hereinafter proposed. During encoding, distribution parameters of each Gaussian attribute are estimated to assist their entropy coding. The quantization for entropy coding is adaptively performed according to Gaussian attribute types. EntropyGS demonstrates about 30x rate reduction on benchmark datasets while maintaining similar rendering quality compared to input 3DGS data, with a fast encoding and decoding time.

Via

Access Paper or Ask Questions

OpenDCVCs: A PyTorch Open Source Implementation and Performance Evaluation of the DCVC series Video Codecs

Aug 06, 2025

Yichi Zhang, Fengqing Zhu

Abstract:We present OpenDCVCs, an open-source PyTorch implementation designed to advance reproducible research in learned video compression. OpenDCVCs provides unified and training-ready implementations of four representative Deep Contextual Video Compression (DCVC) models--DCVC, DCVC with Temporal Context Modeling (DCVC-TCM), DCVC with Hybrid Entropy Modeling (DCVC-HEM), and DCVC with Diverse Contexts (DCVC-DC). While the DCVC series achieves substantial bitrate reductions over both classical codecs and advanced learned models, previous public code releases have been limited to evaluation codes, presenting significant barriers to reproducibility, benchmarking, and further development. OpenDCVCs bridges this gap by offering a comprehensive, self-contained framework that supports both end-to-end training and evaluation for all included algorithms. The implementation includes detailed documentation, evaluation protocols, and extensive benchmarking results across diverse datasets, providing a transparent and consistent foundation for comparison and extension. All code and experimental tools are publicly available at https://gitlab.com/viper-purdue/opendcvcs, empowering the community to accelerate research and foster collaboration.

Via

Access Paper or Ask Questions

Evaluating Large Multimodal Models for Nutrition Analysis: A Benchmark Enriched with Contextual Metadata

Jul 09, 2025

Bruce Coburn, Jiangpeng He, Megan E. Rollo, Satvinder S. Dhaliwal, Deborah A. Kerr, Fengqing Zhu

Abstract:Large Multimodal Models (LMMs) are increasingly applied to meal images for nutrition analysis. However, existing work primarily evaluates proprietary models, such as GPT-4. This leaves the broad range of LLMs underexplored. Additionally, the influence of integrating contextual metadata and its interaction with various reasoning modifiers remains largely uncharted. This work investigates how interpreting contextual metadata derived from GPS coordinates (converted to location/venue type), timestamps (transformed into meal/day type), and the food items present can enhance LMM performance in estimating key nutritional values. These values include calories, macronutrients (protein, carbohydrates, fat), and portion sizes. We also introduce ACETADA, a new food-image dataset slated for public release. This open dataset provides nutrition information verified by the dietitian and serves as the foundation for our analysis. Our evaluation across eight LMMs (four open-weight and four closed-weight) first establishes the benefit of contextual metadata integration over straightforward prompting with images alone. We then demonstrate how this incorporation of contextual information enhances the efficacy of reasoning modifiers, such as Chain-of-Thought, Multimodal Chain-of-Thought, Scale Hint, Few-Shot, and Expert Persona. Empirical results show that integrating metadata intelligently, when applied through straightforward prompting strategies, can significantly reduce the Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE) in predicted nutritional values. This work highlights the potential of context-aware LMMs for improved nutrition analysis.

Via

Access Paper or Ask Questions

CL-LoRA: Continual Low-Rank Adaptation for Rehearsal-Free Class-Incremental Learning

May 30, 2025

Jiangpeng He, Zhihao Duan, Fengqing Zhu

Abstract:Class-Incremental Learning (CIL) aims to learn new classes sequentially while retaining the knowledge of previously learned classes. Recently, pre-trained models (PTMs) combined with parameter-efficient fine-tuning (PEFT) have shown remarkable performance in rehearsal-free CIL without requiring exemplars from previous tasks. However, existing adapter-based methods, which incorporate lightweight learnable modules into PTMs for CIL, create new adapters for each new task, leading to both parameter redundancy and failure to leverage shared knowledge across tasks. In this work, we propose ContinuaL Low-Rank Adaptation (CL-LoRA), which introduces a novel dual-adapter architecture combining \textbf{task-shared adapters} to learn cross-task knowledge and \textbf{task-specific adapters} to capture unique features of each new task. Specifically, the shared adapters utilize random orthogonal matrices and leverage knowledge distillation with gradient reassignment to preserve essential shared knowledge. In addition, we introduce learnable block-wise weights for task-specific adapters, which mitigate inter-task interference while maintaining the model's plasticity. We demonstrate CL-LoRA consistently achieves promising performance under multiple benchmarks with reduced training and inference computation, establishing a more efficient and scalable paradigm for continual learning with pre-trained models.

* Accepted to CVPR 2025

Via

Access Paper or Ask Questions

PS4PRO: Pixel-to-pixel Supervision for Photorealistic Rendering and Optimization

May 28, 2025

Yezhi Shen, Qiuchen Zhai, Fengqing Zhu

Abstract:Neural rendering methods have gained significant attention for their ability to reconstruct 3D scenes from 2D images. The core idea is to take multiple views as input and optimize the reconstructed scene by minimizing the uncertainty in geometry and appearance across the views. However, the reconstruction quality is limited by the number of input views. This limitation is further pronounced in complex and dynamic scenes, where certain angles of objects are never seen. In this paper, we propose to use video frame interpolation as the data augmentation method for neural rendering. Furthermore, we design a lightweight yet high-quality video frame interpolation model, PS4PRO (Pixel-to-pixel Supervision for Photorealistic Rendering and Optimization). PS4PRO is trained on diverse video datasets, implicitly modeling camera movement as well as real-world 3D geometry. Our model performs as an implicit world prior, enriching the photo supervision for 3D reconstruction. By leveraging the proposed method, we effectively augment existing datasets for neural rendering methods. Our experimental results indicate that our method improves the reconstruction performance on both static and dynamic scenes.

* Accepted to the CVPR 2025 Workshop on Autonomous Driving (WAD)

Via

Access Paper or Ask Questions

Accelerating Learned Image Compression Through Modeling Neural Training Dynamics

May 23, 2025

Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

Abstract:As learned image compression (LIC) methods become increasingly computationally demanding, enhancing their training efficiency is crucial. This paper takes a step forward in accelerating the training of LIC methods by modeling the neural training dynamics. We first propose a Sensitivity-aware True and Dummy Embedding Training mechanism (STDET) that clusters LIC model parameters into few separate modes where parameters are expressed as affine transformations of reference parameters within the same mode. By further utilizing the stable intra-mode correlations throughout training and parameter sensitivities, we gradually embed non-reference parameters, reducing the number of trainable parameters. Additionally, we incorporate a Sampling-then-Moving Average (SMA) technique, interpolating sampled weights from stochastic gradient descent (SGD) training to obtain the moving average weights, ensuring smooth temporal behavior and minimizing training state variances. Overall, our method significantly reduces training space dimensions and the number of trainable parameters without sacrificing model performance, thus accelerating model convergence. We also provide a theoretical analysis on the Noisy quadratic model, showing that the proposed method achieves a lower training variance than standard SGD. Our approach offers valuable insights for further developing efficient training methods for LICs.

* Accepted to TMLR

Via

Access Paper or Ask Questions

Low-Rank Adaptation of Pre-trained Vision Backbones for Energy-Efficient Image Coding for Machine

May 23, 2025

Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

Abstract:Image Coding for Machines (ICM) focuses on optimizing image compression for AI-driven analysis rather than human perception. Existing ICM frameworks often rely on separate codecs for specific tasks, leading to significant storage requirements, training overhead, and computational complexity. To address these challenges, we propose an energy-efficient framework that leverages pre-trained vision backbones to extract robust and versatile latent representations suitable for multiple tasks. We introduce a task-specific low-rank adaptation mechanism, which refines the pre-trained features to be both compressible and tailored to downstream applications. This design minimizes trainable parameters and reduces energy costs for multi-task scenarios. By jointly optimizing task performance and entropy minimization, our method enables efficient adaptation to diverse tasks and datasets without full fine-tuning, achieving high coding efficiency. Extensive experiments demonstrate that our framework significantly outperforms traditional codecs and pre-processors, offering an energy-efficient and effective solution for ICM applications. The code and the supplementary materials will be available at: https://gitlab.com/viper-purdue/efficient-compression.

* 2025 IEEE International Conference on Image Processing (ICIP2025)

Via

Access Paper or Ask Questions

Robust3D-CIL: Robust Class-Incremental Learning for 3D Perception

Mar 18, 2025

Jinge Ma, Jiangpeng He, Fengqing Zhu

Abstract:3D perception plays a crucial role in real-world applications such as autonomous driving, robotics, and AR/VR. In practical scenarios, 3D perception models must continuously adapt to new data and emerging object categories, but retraining from scratch incurs prohibitive costs. Therefore, adopting class-incremental learning (CIL) becomes particularly essential. However, real-world 3D point cloud data often include corrupted samples, which poses significant challenges for existing CIL methods and leads to more severe forgetting on corrupted data. To address these challenges, we consider the scenario in which a CIL model can be updated using point clouds with unknown corruption to better simulate real-world conditions. Inspired by Farthest Point Sampling, we propose a novel exemplar selection strategy that effectively preserves intra-class diversity when selecting replay exemplars, mitigating forgetting induced by data corruption. Furthermore, we introduce a point cloud downsampling-based replay method to utilize the limited replay buffer memory more efficiently, thereby further enhancing the model's continual learning ability. Extensive experiments demonstrate that our method improves the performance of replay-based CIL baselines by 2% to 11%, proving its effectiveness and promising potential for real-world 3D applications.

* 16 pages, 7 figures

Via

Access Paper or Ask Questions

Balanced Rate-Distortion Optimization in Learned Image Compression

Feb 27, 2025

Yichi Zhang, Zhihao Duan, Yuning Huang, Fengqing Zhu

Abstract:Learned image compression (LIC) using deep learning architectures has seen significant advancements, yet standard rate-distortion (R-D) optimization often encounters imbalanced updates due to diverse gradients of the rate and distortion objectives. This imbalance can lead to suboptimal optimization, where one objective dominates, thereby reducing overall compression efficiency. To address this challenge, we reformulate R-D optimization as a multi-objective optimization (MOO) problem and introduce two balanced R-D optimization strategies that adaptively adjust gradient updates to achieve more equitable improvements in both rate and distortion. The first proposed strategy utilizes a coarse-to-fine gradient descent approach along standard R-D optimization trajectories, making it particularly suitable for training LIC models from scratch. The second proposed strategy analytically addresses the reformulated optimization as a quadratic programming problem with an equality constraint, which is ideal for fine-tuning existing models. Experimental results demonstrate that both proposed methods enhance the R-D performance of LIC models, achieving around a 2\% BD-Rate reduction with acceptable additional training cost, leading to a more balanced and efficient optimization process. The code will be made publicly available.

* Preliminary version. Camera ready version and source code will be uploaded later. Accepted to CVPR 2025

Via

Access Paper or Ask Questions

MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

Nov 14, 2024

Jinge Ma, Xiaoyan Zhang, Gautham Vinod, Siddeshwar Raghavan, Jiangpeng He, Fengqing Zhu

Figure 1 for MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

Figure 2 for MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

Figure 3 for MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

Figure 4 for MFP3D: Monocular Food Portion Estimation Leveraging 3D Point Clouds

Abstract:Food portion estimation is crucial for monitoring health and tracking dietary intake. Image-based dietary assessment, which involves analyzing eating occasion images using computer vision techniques, is increasingly replacing traditional methods such as 24-hour recalls. However, accurately estimating the nutritional content from images remains challenging due to the loss of 3D information when projecting to the 2D image plane. Existing portion estimation methods are challenging to deploy in real-world scenarios due to their reliance on specific requirements, such as physical reference objects, high-quality depth information, or multi-view images and videos. In this paper, we introduce MFP3D, a new framework for accurate food portion estimation using only a single monocular image. Specifically, MFP3D consists of three key modules: (1) a 3D Reconstruction Module that generates a 3D point cloud representation of the food from the 2D image, (2) a Feature Extraction Module that extracts and concatenates features from both the 3D point cloud and the 2D RGB image, and (3) a Portion Regression Module that employs a deep regression model to estimate the food's volume and energy content based on the extracted features. Our MFP3D is evaluated on MetaFood3D dataset, demonstrating its significant improvement in accurate portion estimation over existing methods.

* 9th International Workshop on Multimedia Assisted Dietary Management, in conjunction with the 27th International Conference on Pattern Recognition (ICPR2024)

Via

Access Paper or Ask Questions