Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Honghai Liu

Neural Parameter Search for Slimmer Fine-Tuned Models and Better Transfer

May 24, 2025

Guodong Du, Zitao Fang, Jing Li, Junlin Li, Runhua Jiang, Shuyang Yu, Yifei Guo, Yangneng Chen, Sim Kuan Goh, Ho-Kin Tang(+3 more)

Abstract:Foundation models and their checkpoints have significantly advanced deep learning, boosting performance across various applications. However, fine-tuned models often struggle outside their specific domains and exhibit considerable redundancy. Recent studies suggest that combining a pruned fine-tuned model with the original pre-trained model can mitigate forgetting, reduce interference when merging model parameters across tasks, and improve compression efficiency. In this context, developing an effective pruning strategy for fine-tuned models is crucial. Leveraging the advantages of the task vector mechanism, we preprocess fine-tuned models by calculating the differences between them and the original model. Recognizing that different task vector subspaces contribute variably to model performance, we introduce a novel method called Neural Parameter Search (NPS-Pruning) for slimming down fine-tuned models. This method enhances pruning efficiency by searching through neural parameters of task vectors within low-rank subspaces. Our method has three key applications: enhancing knowledge transfer through pairwise model interpolation, facilitating effective knowledge fusion via model merging, and enabling the deployment of compressed models that retain near-original performance while significantly reducing storage costs. Extensive experiments across vision, NLP, and multi-modal benchmarks demonstrate the effectiveness and robustness of our approach, resulting in substantial performance gains. The code is publicly available at: https://github.com/duguodong7/NPS-Pruning.

* Accepted by ACL2025 Main

Via

Access Paper or Ask Questions

Unsupervised Cross-Domain 3D Human Pose Estimation via Pseudo-Label-Guided Global Transforms

Apr 17, 2025

Jingjing Liu, Zhiyong Wang, Xinyu Fan, Amirhossein Dadashzadeh, Honghai Liu, Majid Mirmehdi

Abstract:Existing 3D human pose estimation methods often suffer in performance, when applied to cross-scenario inference, due to domain shifts in characteristics such as camera viewpoint, position, posture, and body size. Among these factors, camera viewpoints and locations {have been shown} to contribute significantly to the domain gap by influencing the global positions of human poses. To address this, we propose a novel framework that explicitly conducts global transformations between pose positions in the camera coordinate systems of source and target domains. We start with a Pseudo-Label Generation Module that is applied to the 2D poses of the target dataset to generate pseudo-3D poses. Then, a Global Transformation Module leverages a human-centered coordinate system as a novel bridging mechanism to seamlessly align the positional orientations of poses across disparate domains, ensuring consistent spatial referencing. To further enhance generalization, a Pose Augmentor is incorporated to address variations in human posture and body size. This process is iterative, allowing refined pseudo-labels to progressively improve guidance for domain adaptation. Our method is evaluated on various cross-dataset benchmarks, including Human3.6M, MPI-INF-3DHP, and 3DPW. The proposed method outperforms state-of-the-art approaches and even outperforms the target-trained model.

* 11 pages, 6 figures, including appendix. This work has been submitted to the IEEE for possible publication

Via

Access Paper or Ask Questions

Text-Derived Relational Graph-Enhanced Network for Skeleton-Based Action Segmentation

Mar 19, 2025

Haoyu Ji, Bowen Chen, Weihong Ren, Wenze Huang, Zhihao Yang, Zhiyong Wang, Honghai Liu

Abstract:Skeleton-based Temporal Action Segmentation (STAS) aims to segment and recognize various actions from long, untrimmed sequences of human skeletal movements. Current STAS methods typically employ spatio-temporal modeling to establish dependencies among joints as well as frames, and utilize one-hot encoding with cross-entropy loss for frame-wise classification supervision. However, these methods overlook the intrinsic correlations among joints and actions within skeletal features, leading to a limited understanding of human movements. To address this, we propose a Text-Derived Relational Graph-Enhanced Network (TRG-Net) that leverages prior graphs generated by Large Language Models (LLM) to enhance both modeling and supervision. For modeling, the Dynamic Spatio-Temporal Fusion Modeling (DSFM) method incorporates Text-Derived Joint Graphs (TJG) with channel- and frame-level dynamic adaptation to effectively model spatial relations, while integrating spatio-temporal core features during temporal modeling. For supervision, the Absolute-Relative Inter-Class Supervision (ARIS) method employs contrastive learning between action features and text embeddings to regularize the absolute class distributions, and utilizes Text-Derived Action Graphs (TAG) to capture the relative inter-class relationships among action features. Additionally, we propose a Spatial-Aware Enhancement Processing (SAEP) method, which incorporates random joint occlusion and axial rotation to enhance spatial generalization. Performance evaluations on four public datasets demonstrate that TRG-Net achieves state-of-the-art results.

Via

Access Paper or Ask Questions

Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Oct 08, 2024

Bowen Chen, Haoyu Ji, Zhiyong Wang, Benjamin Filtjens, Chunzhuo Wang, Weihong Ren, Bart Vanrumste, Honghai Liu

Figure 1 for Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Figure 2 for Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Figure 3 for Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Figure 4 for Language-Assisted Human Part Motion Learning for Skeleton-Based Temporal Action Segmentation

Abstract:Skeleton-based Temporal Action Segmentation involves the dense action classification of variable-length skeleton sequences. Current approaches primarily apply graph-based networks to extract framewise, whole-body-level motion representations, and use one-hot encoded labels for model optimization. However, whole-body motion representations do not capture fine-grained part-level motion representations and the one-hot encoded labels neglect the intrinsic semantic relationships within the language-based action definitions. To address these limitations, we propose a novel method named Language-assisted Human Part Motion Representation Learning (LPL), which contains a Disentangled Part Motion Encoder (DPE) to extract dual-level (i.e., part and whole-body) motion representations and a Language-assisted Distribution Alignment (LDA) strategy for optimizing spatial relations within representations. Specifically, after part-aware skeleton encoding via DPE, LDA generates dual-level action descriptions to construct a textual embedding space with the help of a large-scale language model. Then, LDA motivates the alignment of the embedding space between text descriptions and motions. This alignment allows LDA not only to enhance intra-class compactness but also to transfer the language-encoded semantic correlations among actions to skeleton-based motion learning. Moreover, we propose a simple yet efficient Semantic Offset Adapter to smooth the cross-domain misalignment. Our experiments indicate that LPL achieves state-of-the-art performance across various datasets (e.g., +4.4\% Accuracy, +5.6\% F1 on the PKU-MMD dataset). Moreover, LDA is compatible with existing methods and improves their performance (e.g., +4.8\% Accuracy, +4.3\% F1 on the LARa dataset) without additional inference costs.

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements

Sep 17, 2024

Jipeng Yan, Shusei Kawara, Qingyuan Tan, Jingwen Zhu, Bingxue Wang, Matthieu Toulemonde, Honghai Liu, Ying Tan, Meng-Xing Tang

Figure 1 for Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements

Figure 2 for Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements

Figure 3 for Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements

Figure 4 for Online 4D Ultrasound-Guided Robotic Tracking Enables 3D Ultrasound Localisation Microscopy with Large Tissue Displacements

Abstract:Super-Resolution Ultrasound (SRUS) imaging through localising and tracking microbubbles, also known as Ultrasound Localisation Microscopy (ULM), has demonstrated significant potential for reconstructing microvasculature and flows with sub-diffraction resolution in clinical diagnostics. However, imaging organs with large tissue movements, such as those caused by respiration, presents substantial challenges. Existing methods often require breath holding to maintain accumulation accuracy, which limits data acquisition time and ULM image saturation. To improve image quality in the presence of large tissue movements, this study introduces an approach integrating high-frame-rate ultrasound with online precise robotic probe control. Tested on a microvasculature phantom with translation motions up to 20 mm, twice the aperture size of the matrix array used, our method achieved real-time tracking of the moving phantom and imaging volume rate at 85 Hz, keeping majority of the target volume in the imaging field of view. ULM images of the moving cross channels in the phantom were successfully reconstructed in post-processing, demonstrating the feasibility of super-resolution imaging under large tissue motions. This represents a significant step towards ULM imaging of organs with large motion.

Via

Access Paper or Ask Questions

The Unified Balance Theory of Second-Moment Exponential Scaling Optimizers in Visual Tasks

May 28, 2024

Gongyue Zhang, Honghai Liu

Abstract:We have identified a potential method for unifying first-order optimizers through the use of variable Second-Moment Exponential Scaling(SMES). We begin with back propagation, addressing classic phenomena such as gradient vanishing and explosion, as well as issues related to dataset sparsity, and introduce the theory of balance in optimization. Through this theory, we suggest that SGD and adaptive optimizers can be unified under a broader inference, employing variable moving exponential scaling to achieve a balanced approach within a generalized formula for first-order optimizers. We conducted tests on some classic datasets and networks to confirm the impact of different balance coefficients on the overall training process.

Via

Access Paper or Ask Questions

Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Jan 11, 2024

Weibo Jiang, Weihong Ren, Jiandong Tian, Liangqiong Qu, Zhiyong Wang, Honghai Liu

Figure 1 for Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Figure 2 for Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Figure 3 for Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Figure 4 for Exploring Self- and Cross-Triplet Correlations for Human-Object Interaction Detection

Abstract:Human-Object Interaction (HOI) detection plays a vital role in scene understanding, which aims to predict the HOI triplet in the form of <human, object, action>. Existing methods mainly extract multi-modal features (e.g., appearance, object semantics, human pose) and then fuse them together to directly predict HOI triplets. However, most of these methods focus on seeking for self-triplet aggregation, but ignore the potential cross-triplet dependencies, resulting in ambiguity of action prediction. In this work, we propose to explore Self- and Cross-Triplet Correlations (SCTC) for HOI detection. Specifically, we regard each triplet proposal as a graph where Human, Object represent nodes and Action indicates edge, to aggregate self-triplet correlation. Also, we try to explore cross-triplet dependencies by jointly considering instance-level, semantic-level, and layout-level relations. Besides, we leverage the CLIP model to assist our SCTC obtain interaction-aware feature by knowledge distillation, which provides useful action clues for HOI detection. Extensive experiments on HICO-DET and V-COCO datasets verify the effectiveness of our proposed SCTC.

Via

Access Paper or Ask Questions

A Simple Asymmetric Momentum Make SGD Greatest Again

Sep 05, 2023

Gongyue Zhang, Dinghuang Zhang, Shuwen Zhao, Donghan Liu, Carrie M. Toptan, Honghai Liu

Abstract:We propose the simplest SGD enhanced method ever, Loss-Controlled Asymmetric Momentum(LCAM), aimed directly at the Saddle Point problem. Compared to the traditional SGD with Momentum, there's no increase in computational demand, yet it outperforms all current optimizers. We use the concepts of weight conjugation and traction effect to explain this phenomenon. We designed experiments to rapidly reduce the learning rate at specified epochs to trap parameters more easily at saddle points. We selected WRN28-10 as the test network and chose cifar10 and cifar100 as test datasets, an identical group to the original paper of WRN and Cosine Annealing Scheduling(CAS). We compared the ability to bypass saddle points of Asymmetric Momentum with different priorities. Finally, using WRN28-10 on Cifar100, we achieved a peak average test accuracy of 80.78\% around 120 epoch. For comparison, the original WRN paper reported 80.75\%, while CAS was at 80.42\%, all at 200 epoch. This means that while potentially increasing accuracy, we use nearly half convergence time. Our demonstration code is available at\\ https://github.com/hakumaicc/Asymmetric-Momentum-LCAM

Via

Access Paper or Ask Questions

Full Resolution Repetition Counting

May 24, 2023

Jianing Li, Bowen Chen, Zhiyong Wang, Honghai Liu

Figure 1 for Full Resolution Repetition Counting

Figure 2 for Full Resolution Repetition Counting

Figure 3 for Full Resolution Repetition Counting

Figure 4 for Full Resolution Repetition Counting

Abstract:Given an untrimmed video, repetitive actions counting aims to estimate the number of repetitions of class-agnostic actions. To handle the various length of videos and repetitive actions, also optimization challenges in end-to-end video model training, down-sampling is commonly utilized in recent state-of-the-art methods, leading to ignorance of several repetitive samples. In this paper, we attempt to understand repetitive actions from a full temporal resolution view, by combining offline feature extraction and temporal convolution networks. The former step enables us to train repetition counting network without down-sampling while preserving all repetition regardless of the video length and action frequency, and the later network models all frames in a flexible and dynamically expanding temporal receptive field to retrieve all repetitions with a global aspect. We experimentally demonstrate that our method achieves better or comparable performance in three public datasets, i.e., TransRAC, UCFRep and QUVA. We expect this work will encourage our community to think about the importance of full temporal resolution.

* 12 pages and 4 figures and 17 conferences

Via

Access Paper or Ask Questions

Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation

Mar 09, 2023

Junjie Hu, Chenyou Fan, Liguang Zhou, Qing Gao, Honghai Liu, Tin Lun Lam

Figure 1 for Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation

Figure 2 for Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation

Figure 3 for Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation

Figure 4 for Lifelong-MonoDepth: Lifelong Learning for Multi-Domain Monocular Metric Depth Estimation

Abstract:In recent years, monocular depth estimation (MDE) has gained significant progress in a data-driven learning fashion. Previous methods can infer depth maps for specific domains based on the paradigm of single-domain or joint-domain training with mixed data. However, they suffer from low scalability to new domains. In reality, target domains often dynamically change or increase, raising the requirement of incremental multi-domain/task learning. In this paper, we seek to enable lifelong learning for MDE, which performs cross-domain depth learning sequentially, to achieve high plasticity on a new domain and maintain good stability on original domains. To overcome significant domain gaps and enable scale-aware depth prediction, we design a lightweight multi-head framework that consists of a domain-shared encoder for feature extraction and domain-specific predictors for metric depth estimation. Moreover, given an input image, we propose an efficient predictor selection approach that automatically identifies the corresponding predictor for depth inference. Through extensive numerical studies, we show that the proposed method can achieve good efficiency, stability, and plasticity, leading the benchmarks by 8% to 15%.

Via

Access Paper or Ask Questions