Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Huang

corresponding author

Clustering Properties of Self-Supervised Learning

Jan 30, 2025

Xi Weng, Jianing An, Xudong Ma, Binhang Qi, Jie Luo, Xi Yang, Jin Song Dong, Lei Huang

Figure 1 for Clustering Properties of Self-Supervised Learning

Figure 2 for Clustering Properties of Self-Supervised Learning

Figure 3 for Clustering Properties of Self-Supervised Learning

Figure 4 for Clustering Properties of Self-Supervised Learning

Abstract:Self-supervised learning (SSL) methods via joint embedding architectures have proven remarkably effective at capturing semantically rich representations with strong clustering properties, magically in the absence of label supervision. Despite this, few of them have explored leveraging these untapped properties to improve themselves. In this paper, we provide an evidence through various metrics that the encoder's output $encoding$ exhibits superior and more stable clustering properties compared to other components. Building on this insight, we propose a novel positive-feedback SSL method, termed Representation Soft Assignment (ReSA), which leverages the model's clustering properties to promote learning in a self-guided manner. Extensive experiments on standard SSL benchmarks reveal that models pretrained with ReSA outperform other state-of-the-art SSL methods by a significant margin. Finally, we analyze how ReSA facilitates better clustering properties, demonstrating that it effectively enhances clustering performance at both fine-grained and coarse-grained levels, shaping representations that are inherently more structured and semantically meaningful.

Via

Access Paper or Ask Questions

TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding

Jan 26, 2025

Xingjian Zhang, Xi Weng, Yihao Yue, Zhaoxin Fan, Wenjun Wu, Lei Huang

Figure 1 for TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding

Figure 2 for TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding

Figure 3 for TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding

Figure 4 for TinyLLaVA-Video: A Simple Framework of Small-scale Large Multimodal Models for Video Understanding

Abstract:We present the TinyLLaVA-Video, a video understanding model with parameters not exceeding 4B that processes video sequences in a simple manner, without the need for complex architectures, supporting both fps sampling and uniform frame sampling. Our model is characterized by modularity and scalability, allowing training and inference with limited computational resources and enabling users to replace components based on their needs. We validate the effectiveness of this framework through experiments, the best model achieving performance comparable to certain existing 7B models on multiple video understanding benchmarks. The code and training recipes are fully open source, with all components and training data publicly available. We hope this work can serve as a baseline for practitioners exploring small-scale multimodal models for video understanding. It is available at \url{https://github.com/ZhangXJ199/TinyLLaVA-Video}.

* code and training recipes are available at https://github.com/ZhangXJ199/TinyLLaVA-Video

Via

Access Paper or Ask Questions

Joint Antenna Selection and Beamforming Design for Active RIS-aided ISAC Systems

Jan 16, 2025

Wei Ma, Peichang Zhang, Junjie Ye, Rouyang Guan, Xiao-Peng Li, Lei Huang

Figure 1 for Joint Antenna Selection and Beamforming Design for Active RIS-aided ISAC Systems

Figure 2 for Joint Antenna Selection and Beamforming Design for Active RIS-aided ISAC Systems

Figure 3 for Joint Antenna Selection and Beamforming Design for Active RIS-aided ISAC Systems

Figure 4 for Joint Antenna Selection and Beamforming Design for Active RIS-aided ISAC Systems

Abstract:Active reconfigurable intelligent surface (A-RIS) aided integrated sensing and communications (ISAC) system has been considered as a promising paradigm to improve spectrum efficiency. However, massive energy-hungry radio frequency (RF) chains hinder its large-scale deployment. To address this issue, an A-RIS-aided ISAC system with antenna selection (AS) is proposed in this work, where a target is sensed while multiple communication users are served with specifically selected antennas. Specifically, a cuckoo search-based scheme is first utilized to select the antennas associated with high-gain channels. Subsequently, with the properly selected antennas, the weighted sum-rate (WSR) of the system is optimized under the condition of radar probing power level, power budget for the A-RIS and transmitter. To solve the highly non-convex optimization problem, we develop an efficient algorithm based on weighted minimum mean square error (WMMSE) and fractional programming (FP). Simulation results show that the proposed AS scheme and the algorithm are effective, which reduce the number of RF chains without significant performance degradation.

Via

Access Paper or Ask Questions

SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Jan 07, 2025

Yuchun Fan, Yongyu Mu, Yilin Wang, Lei Huang, Junhao Ruan, Bei Li, Tong Xiao, Shujian Huang, Xiaocheng Feng, Jingbo Zhu

Figure 1 for SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Figure 2 for SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Figure 3 for SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Figure 4 for SLAM: Towards Efficient Multilingual Reasoning via Selective Language Alignment

Abstract:Despite the significant improvements achieved by large language models (LLMs) in English reasoning tasks, these models continue to struggle with multilingual reasoning. Recent studies leverage a full-parameter and two-stage training paradigm to teach models to first understand non-English questions and then reason. However, this method suffers from both substantial computational resource computing and catastrophic forgetting. The fundamental cause is that, with the primary goal of enhancing multilingual comprehension, an excessive number of irrelevant layers and parameters are tuned during the first stage. Given our findings that the representation learning of languages is merely conducted in lower-level layers, we propose an efficient multilingual reasoning alignment approach that precisely identifies and fine-tunes the layers responsible for handling multilingualism. Experimental results show that our method, SLAM, only tunes 6 layers' feed-forward sub-layers including 6.5-8% of all parameters within 7B and 13B LLMs, achieving superior average performance than all strong baselines across 10 languages. Meanwhile, SLAM only involves one training stage, reducing training time by 4.1-11.9 compared to the two-stage method.

* Accepted by COLING 2025 (Oral)

Via

Access Paper or Ask Questions

Length Controlled Generation for Black-box LLMs

Dec 19, 2024

Yuxuan Gu, Wenjie Wang, Xiaocheng Feng, Weihong Zhong, Kun Zhu, Lei Huang, Tat-Seng Chua, Bing Qin

Figure 1 for Length Controlled Generation for Black-box LLMs

Figure 2 for Length Controlled Generation for Black-box LLMs

Figure 3 for Length Controlled Generation for Black-box LLMs

Figure 4 for Length Controlled Generation for Black-box LLMs

Abstract:Large language models (LLMs) have demonstrated impressive instruction following capabilities, while still struggling to accurately manage the length of the generated text, which is a fundamental requirement in many real-world applications. Existing length control methods involve fine-tuning the parameters of LLMs, which is inefficient and suboptimal for practical use. In this paper, we propose a novel iterative sampling framework for text length control, integrating the Metropolis-Hastings algorithm with an importance sampling acceleration strategy. This framework efficiently and reliably regulates LLMs to generate length-constrained text without modifying the underlying parameters, thereby preserving the original capabilities of LLMs. Experimental results demonstrate that our framework achieves almost 100\% success rates of length control on Llama3.1 for tasks such as length-controlled abstractive summarization and length-constrained instruction following, with minimal additional computational overhead. This also highlights the significant potential of our method for precise length control across a broader range of applications, without compromising the versatility of LLMs.

* Preprint

Via

Access Paper or Ask Questions

XTransplant: A Probe into the Upper Bound Performance of Multilingual Capability and Culture Adaptability in LLMs via Mutual Cross-lingual Feed-forward Transplantation

Dec 17, 2024

Yangfan Ye, Xiaocheng Feng, Xiachong Feng, Libo Qin, Yichong Huang, Lei Huang, Weitao Ma, Zhirui Zhang, Yunfei Lu, Xiaohui Yan(+3 more)

Abstract:Current large language models (LLMs) often exhibit imbalances in multilingual capabilities and cultural adaptability, largely due to their English-centric pretraining data. To address this imbalance, we propose a probing method named XTransplant that explores cross-lingual latent interactions via cross-lingual feed-forward transplantation during inference stage, with the hope of enabling the model to leverage the strengths of both English and non-English languages. Through extensive pilot experiments, we empirically prove that both the multilingual capabilities and cultural adaptability of LLMs hold the potential to be significantly improved by XTransplant, respectively from En -> non-En and non-En -> En, highlighting the underutilization of current LLMs' multilingual potential. And the patterns observed in these pilot experiments further motivate an offline scaling inference strategy, which demonstrates consistent performance improvements in multilingual and culture-aware tasks, sometimes even surpassing multilingual supervised fine-tuning. And we do hope our further analysis and discussion could help gain deeper insights into XTransplant mechanism.

Via

Access Paper or Ask Questions

Bench-CoE: a Framework for Collaboration of Experts from Benchmark

Dec 05, 2024

Yuanshuai Wang, Xingjian Zhang, Jinkun Zhao, Siwei Wen, Peilin Feng, Shuhao Liao, Lei Huang, Wenjun Wu

Abstract:Large Language Models (LLMs) are key technologies driving intelligent systems to handle multiple tasks. To meet the demands of various tasks, an increasing number of LLMs-driven experts with diverse capabilities have been developed, accompanied by corresponding benchmarks to evaluate their performance. This paper proposes the Bench-CoE framework, which enables Collaboration of Experts (CoE) by effectively leveraging benchmark evaluations to achieve optimal performance across various tasks. Bench-CoE includes a set of expert models, a router for assigning tasks to corresponding experts, and a benchmark dataset for training the router. Moreover, we formulate Query-Level and Subject-Level approaches based on our framework, and analyze the merits and drawbacks of these two approaches. Finally, we conduct a series of experiments with vary data distributions on both language and multimodal tasks to validate that our proposed Bench-CoE outperforms any single model in terms of overall performance. We hope this method serves as a baseline for further research in this area. The code is available at \url{https://github.com/ZhangXJ199/Bench-CoE}.

* The code is available at \url{https://github.com/ZhangXJ199/Bench-CoE}

Via

Access Paper or Ask Questions

Copy-Move Forgery Detection and Question Answering for Remote Sensing Image

Dec 03, 2024

Ze Zhang, Enyuan Zhao, Ziyi Wan, Jie Nie, Xinyue Liang, Lei Huang

Figure 1 for Copy-Move Forgery Detection and Question Answering for Remote Sensing Image

Figure 2 for Copy-Move Forgery Detection and Question Answering for Remote Sensing Image

Figure 3 for Copy-Move Forgery Detection and Question Answering for Remote Sensing Image

Figure 4 for Copy-Move Forgery Detection and Question Answering for Remote Sensing Image

Abstract:This paper introduces the task of Remote Sensing Copy-Move Question Answering (RSCMQA). Unlike traditional Remote Sensing Visual Question Answering (RSVQA), RSCMQA focuses on interpreting complex tampering scenarios and inferring relationships between objects. Based on the practical needs of national defense security and land resource monitoring, we have developed an accurate and comprehensive global dataset for remote sensing image copy-move question answering, named RS-CMQA-2.1M. These images were collected from 29 different regions across 14 countries. Additionally, we have refined a balanced dataset, RS-CMQA-B, to address the long-standing issue of long-tail data in the remote sensing field. Furthermore, we propose a region-discriminative guided multimodal CMQA model, which enhances the accuracy of answering questions about tampered images by leveraging prompt about the differences and connections between the source and tampered domains. Extensive experiments demonstrate that our method provides a stronger benchmark for RS-CMQA compared to general VQA and RSVQA models. Our dataset and code are available at https://github.com/shenyedepisa/RSCMQA.

* 7 figs, 7 tables

Via

Access Paper or Ask Questions

Discrete Modeling via Boundary Conditional Diffusion Processes

Oct 29, 2024

Yuxuan Gu, Xiaocheng Feng, Lei Huang, Yingsheng Wu, Zekun Zhou, Weihong Zhong, Kun Zhu, Bing Qin

Figure 1 for Discrete Modeling via Boundary Conditional Diffusion Processes

Figure 2 for Discrete Modeling via Boundary Conditional Diffusion Processes

Figure 3 for Discrete Modeling via Boundary Conditional Diffusion Processes

Figure 4 for Discrete Modeling via Boundary Conditional Diffusion Processes

Abstract:We present an novel framework for efficiently and effectively extending the powerful continuous diffusion processes to discrete modeling. Previous approaches have suffered from the discrepancy between discrete data and continuous modeling. Our study reveals that the absence of guidance from discrete boundaries in learning probability contours is one of the main reasons. To address this issue, we propose a two-step forward process that first estimates the boundary as a prior distribution and then rescales the forward trajectory to construct a boundary conditional diffusion model. The reverse process is proportionally adjusted to guarantee that the learned contours yield more precise discrete data. Experimental results indicate that our approach achieves strong performance in both language modeling and discrete image generation tasks. In language modeling, our approach surpasses previous state-of-the-art continuous diffusion language models in three translation tasks and a summarization task, while also demonstrating competitive performance compared to auto-regressive transformers. Moreover, our method achieves comparable results to continuous diffusion models when using discrete ordinal pixels and establishes a new state-of-the-art for categorical image generation on the Cifar-10 dataset.

* NeuraIPS 2024 poster

Via

Access Paper or Ask Questions

Advancing Large Language Model Attribution through Self-Improving

Oct 17, 2024

Lei Huang, Xiaocheng Feng, Weitao Ma, Liang Zhao, Yuchun Fan, Weihong Zhong, Dongliang Xu, Qing Yang, Hongtao Liu, Bing Qin

Figure 1 for Advancing Large Language Model Attribution through Self-Improving

Figure 2 for Advancing Large Language Model Attribution through Self-Improving

Figure 3 for Advancing Large Language Model Attribution through Self-Improving

Figure 4 for Advancing Large Language Model Attribution through Self-Improving

Abstract:Teaching large language models (LLMs) to generate text with citations to evidence sources can mitigate hallucinations and enhance verifiability in information-seeking systems. However, improving this capability requires high-quality attribution data, which is costly and labor-intensive. Inspired by recent advances in self-improvement that enhance LLMs without manual annotation, we present START, a Self-Taught AttRibuTion framework for iteratively improving the attribution capability of LLMs. First, to prevent models from stagnating due to initially insufficient supervision signals, START leverages the model to self-construct synthetic training data for warming up. To further self-improve the model's attribution ability, START iteratively utilizes fine-grained preference supervision signals constructed from its sampled responses to encourage robust, comprehensive, and attributable generation. Experiments on three open-domain question-answering datasets, covering long-form QA and multi-step reasoning, demonstrate significant performance gains of 25.13% on average without relying on human annotations and more advanced models. Further analysis reveals that START excels in aggregating information across multiple sources.

* Accepted by EMNLP 2024 Main Conference

Via

Access Paper or Ask Questions