Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lei Huang

corresponding author

TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

May 20, 2024

Junlong Jia, Ying Hu, Xi Weng, Yiming Shi, Miao Li, Xingjian Zhang, Baichuan Zhou, Ziyu Liu, Jie Luo, Lei Huang(+1 more)

Figure 1 for TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

Figure 2 for TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

Figure 3 for TinyLLaVA Factory: A Modularized Codebase for Small-scale Large Multimodal Models

Abstract:We present TinyLLaVA Factory, an open-source modular codebase for small-scale large multimodal models (LMMs) with a focus on simplicity of code implementations, extensibility of new features, and reproducibility of training results. Following the design philosophy of the factory pattern in software engineering, TinyLLaVA Factory modularizes the entire system into interchangeable components, with each component integrating a suite of cutting-edge models and methods, meanwhile leaving room for extensions to more features. In addition to allowing users to customize their own LMMs, TinyLLaVA Factory provides popular training recipes to let users pretrain and finetune their models with less coding effort. Empirical experiments validate the effectiveness of our codebase. The goal of TinyLLaVA Factory is to assist researchers and practitioners in exploring the wide landscape of designing and training small-scale LMMs with affordable computational resources.

* Our codebase is made public at https://github.com/TinyLLaVA/TinyLLaVA_Factory with documentation available at https://tinyllava-factory.readthedocs.io/en/latest/

Via

Access Paper or Ask Questions

Unsupervised Learning for Joint Beamforming Design in RIS-aided ISAC Systems

Mar 26, 2024

Junjie Ye, Lei Huang, Zhen Chen, Peichang Zhang, Mohamed Rihan

Figure 1 for Unsupervised Learning for Joint Beamforming Design in RIS-aided ISAC Systems

Figure 2 for Unsupervised Learning for Joint Beamforming Design in RIS-aided ISAC Systems

Figure 3 for Unsupervised Learning for Joint Beamforming Design in RIS-aided ISAC Systems

Figure 4 for Unsupervised Learning for Joint Beamforming Design in RIS-aided ISAC Systems

Abstract:It is critical to design efficient beamforming in reconfigurable intelligent surface (RIS)-aided integrated sensing and communication (ISAC) systems for enhancing spectrum utilization. However, conventional methods often have limitations, either incurring high computational complexity due to iterative algorithms or sacrificing performance when using heuristic methods. To achieve both low complexity and high spectrum efficiency, an unsupervised learning-based beamforming design is proposed in this work. We tailor image-shaped channel samples and develop an ISAC beamforming neural network (IBF-Net) model for beamforming. By leveraging unsupervised learning, the loss function incorporates key performance metrics like sensing and communication channel correlation and sensing channel gain, eliminating the need of labeling. Simulations show that the proposed method achieves competitive performance compared to benchmarks while significantly reduces computational complexity.

* 5 pages, 4 figures, references added

Via

Access Paper or Ask Questions

Incorporating Graph Attention Mechanism into Geometric Problem Solving Based on Deep Reinforcement Learning

Mar 14, 2024

Xiuqin Zhong, Shengyuan Yan, Gongqi Lin, Hongguang Fu, Liang Xu, Siwen Jiang, Lei Huang, Wei Fang

Abstract:In the context of online education, designing an automatic solver for geometric problems has been considered a crucial step towards general math Artificial Intelligence (AI), empowered by natural language understanding and traditional logical inference. In most instances, problems are addressed by adding auxiliary components such as lines or points. However, adding auxiliary components automatically is challenging due to the complexity in selecting suitable auxiliary components especially when pivotal decisions have to be made. The state-of-the-art performance has been achieved by exhausting all possible strategies from the category library to identify the one with the maximum likelihood. However, an extensive strategy search have to be applied to trade accuracy for ef-ficiency. To add auxiliary components automatically and efficiently, we present deep reinforcement learning framework based on the language model, such as BERT. We firstly apply the graph attention mechanism to reduce the strategy searching space, called AttnStrategy, which only focus on the conclusion-related components. Meanwhile, a novel algorithm, named Automatically Adding Auxiliary Components using Reinforcement Learning framework (A3C-RL), is proposed by forcing an agent to select top strategies, which incorporates the AttnStrategy and BERT as the memory components. Results from extensive experiments show that the proposed A3C-RL algorithm can substantially enhance the average precision by 32.7% compared to the traditional MCTS. In addition, the A3C-RL algorithm outperforms humans on the geometric questions from the annual University Entrance Mathematical Examination of China.

Via

Access Paper or Ask Questions

Robust Synthetic-to-Real Transfer for Stereo Matching

Mar 12, 2024

Jiawei Zhang, Jiahe Li, Lei Huang, Xiaohan Yu, Lin Gu, Jin Zheng, Xiao Bai

Abstract:With advancements in domain generalized stereo matching networks, models pre-trained on synthetic data demonstrate strong robustness to unseen domains. However, few studies have investigated the robustness after fine-tuning them in real-world scenarios, during which the domain generalization ability can be seriously degraded. In this paper, we explore fine-tuning stereo matching networks without compromising their robustness to unseen domains. Our motivation stems from comparing Ground Truth (GT) versus Pseudo Label (PL) for fine-tuning: GT degrades, but PL preserves the domain generalization ability. Empirically, we find the difference between GT and PL implies valuable information that can regularize networks during fine-tuning. We also propose a framework to utilize this difference for fine-tuning, consisting of a frozen Teacher, an exponential moving average (EMA) Teacher, and a Student network. The core idea is to utilize the EMA Teacher to measure what the Student has learned and dynamically improve GT and PL for fine-tuning. We integrate our framework with state-of-the-art networks and evaluate its effectiveness on several real-world datasets. Extensive experiments show that our method effectively preserves the domain generalization ability during fine-tuning.

* Accepted at CVPR 2024

Via

Access Paper or Ask Questions

One-Bit Target Detection in Collocated MIMO Radar with Colored Background Noise

Mar 11, 2024

Yu-Hang Xiao, David Ramírez, Lei Huang, Xiao Peng Li, Hing Cheung So

Figure 1 for One-Bit Target Detection in Collocated MIMO Radar with Colored Background Noise

Figure 2 for One-Bit Target Detection in Collocated MIMO Radar with Colored Background Noise

Figure 3 for One-Bit Target Detection in Collocated MIMO Radar with Colored Background Noise

Figure 4 for One-Bit Target Detection in Collocated MIMO Radar with Colored Background Noise

Abstract:One-bit sampling has emerged as a promising technique in multiple-input multiple-output (MIMO) radar systems due to its ability to significantly reduce data volume and processing requirements. Nevertheless, current detection methods have not adequately addressed the impact of colored noise, which is frequently encountered in real scenarios. In this paper, we present a novel detection method that accounts for colored noise in MIMO radar systems. Specifically, we derive Rao's test by computing the derivative of the likelihood function with respect to the target reflectivity parameter and the Fisher information matrix, resulting in a detector that takes the form of a weighted matched filter. To ensure the constant false alarm rate (CFAR) property, we also consider noise covariance uncertainty and examine its effect on the probability of false alarm. The detection probability is also studied analytically. Simulation results demonstrate that the proposed detector provides considerable performance gains in the presence of colored noise.

Via

Access Paper or Ask Questions

LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition

Mar 06, 2024

Jialu Shi, Zhiqiang Wei, Jie Nie, Lei Huang

Figure 1 for LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition

Figure 2 for LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition

Figure 3 for LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition

Figure 4 for LoDisc: Learning Global-Local Discriminative Features for Self-Supervised Fine-Grained Visual Recognition

Abstract:Self-supervised contrastive learning strategy has attracted remarkable attention due to its exceptional ability in representation learning. However, current contrastive learning tends to learn global coarse-grained representations of the image that benefit generic object recognition, whereas such coarse-grained features are insufficient for fine-grained visual recognition. In this paper, we present to incorporate the subtle local fine-grained feature learning into global self-supervised contrastive learning through a pure self-supervised global-local fine-grained contrastive learning framework. Specifically, a novel pretext task called Local Discrimination (LoDisc) is proposed to explicitly supervise self-supervised model's focus towards local pivotal regions which are captured by a simple-but-effective location-wise mask sampling strategy. We show that Local Discrimination pretext task can effectively enhance fine-grained clues in important local regions, and the global-local framework further refines the fine-grained feature representations of images. Extensive experimental results on different fine-grained object recognition tasks demonstrate that the proposed method can lead to a decent improvement in different evaluation settings. Meanwhile, the proposed method is also effective in general object recognition tasks.

* 11 pages, submitted

Via

Access Paper or Ask Questions

TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Feb 22, 2024

Baichuan Zhou, Ying Hu, Xi Weng, Junlong Jia, Jie Luo, Xien Liu, Ji Wu, Lei Huang

Figure 1 for TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Figure 2 for TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Figure 3 for TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Figure 4 for TinyLLaVA: A Framework of Small-scale Large Multimodal Models

Abstract:We present the TinyLLaVA framework that provides a unified perspective in designing and analyzing the small-scale Large Multimodal Models (LMMs). We empirically study the effects of different vision encoders, connection modules, language models, training data and training recipes. Our extensive experiments showed that better quality of data combined with better training recipes, smaller LMMs can consistently achieve on-par performances compared to bigger LMMs. Under our framework, we train a family of small-scale LMMs. Our best model, TinyLLaVA-3.1B, achieves better overall performance against existing 7B models such as LLaVA-1.5 and Qwen-VL. We hope our findings can serve as baselines for future research in terms of data scaling, training setups and model selections. Our model weights and codes will be made public.

* Our model weights and codes will be made public at https://github.com/DLCV-BUAA/TinyLLaVABench

Via

Access Paper or Ask Questions

Energy Efficiency Optimization in Active Reconfigurable Intelligent Surface-Aided Integrated Sensing and Communication Systems

Nov 28, 2023

Junjie Ye, Mohamed Rihan, Peichang Zhang, Lei Huang, Stefano Buzzi, Zhen Chen

Figure 1 for Energy Efficiency Optimization in Active Reconfigurable Intelligent Surface-Aided Integrated Sensing and Communication Systems

Figure 2 for Energy Efficiency Optimization in Active Reconfigurable Intelligent Surface-Aided Integrated Sensing and Communication Systems

Figure 3 for Energy Efficiency Optimization in Active Reconfigurable Intelligent Surface-Aided Integrated Sensing and Communication Systems

Figure 4 for Energy Efficiency Optimization in Active Reconfigurable Intelligent Surface-Aided Integrated Sensing and Communication Systems

Abstract:Energy efficiency (EE) is a challenging task in integrated sensing and communication (ISAC) systems, where high spectral efficiency and low energy consumption appear as conflicting requirements. Although passive reconfigurable intelligent surface (RIS) has emerged as a promising technology for enhancing the EE of the ISAC system, the multiplicative fading feature hinders its effectiveness. This paper proposes the use of active RIS with its amplification gains to assist the ISAC system for EE improvement. Specifically, we formulate an EE optimization problem in an active RIS-aided ISAC system under system power budgets, considering constraints on user communication quality of service and sensing signal-to-noise ratio (SNR). A novel alternating optimization algorithm is developed to address the highly non-convex problem by leveraging a combination of the generalized Rayleigh quotient optimization approach, semidefinite relaxation (SDR), and the majorization-minimization (MM) framework. Furthermore, to accelerate the algorithm and reduce computational complexity, we derive a semi-closed form for eigenvalue determination. Numerical results demonstrate the effectiveness of the proposed approach, showcasing significant improvements in EE compared to both passive RIS and spectrum efficiency optimization cases.

Via

Access Paper or Ask Questions

Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

Nov 10, 2023

Zhangyin Feng, Weitao Ma, Weijiang Yu, Lei Huang, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin, Ting liu

Figure 1 for Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

Figure 2 for Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

Figure 3 for Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

Figure 4 for Trends in Integration of Knowledge and Large Language Models: A Survey and Taxonomy of Methods, Benchmarks, and Applications

Abstract:Large language models (LLMs) exhibit superior performance on various natural language tasks, but they are susceptible to issues stemming from outdated data and domain-specific limitations. In order to address these challenges, researchers have pursued two primary strategies, knowledge editing and retrieval augmentation, to enhance LLMs by incorporating external information from different aspects. Nevertheless, there is still a notable absence of a comprehensive survey. In this paper, we propose a review to discuss the trends in integration of knowledge and large language models, including taxonomy of methods, benchmarks, and applications. In addition, we conduct an in-depth analysis of different methods and point out potential research directions in the future. We hope this survey offers the community quick access and a comprehensive overview of this research area, with the intention of inspiring future research endeavors.

* Work in progress; 22 pages

Via

Access Paper or Ask Questions

A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions

Nov 09, 2023

Lei Huang, Weijiang Yu, Weitao Ma, Weihong Zhong, Zhangyin Feng, Haotian Wang, Qianglong Chen, Weihua Peng, Xiaocheng Feng, Bing Qin(+1 more)

Abstract:The emergence of large language models (LLMs) has marked a significant breakthrough in natural language processing (NLP), leading to remarkable advancements in text understanding and generation. Nevertheless, alongside these strides, LLMs exhibit a critical tendency to produce hallucinations, resulting in content that is inconsistent with real-world facts or user inputs. This phenomenon poses substantial challenges to their practical deployment and raises concerns over the reliability of LLMs in real-world scenarios, which attracts increasing attention to detect and mitigate these hallucinations. In this survey, we aim to provide a thorough and in-depth overview of recent advances in the field of LLM hallucinations. We begin with an innovative taxonomy of LLM hallucinations, then delve into the factors contributing to hallucinations. Subsequently, we present a comprehensive overview of hallucination detection methods and benchmarks. Additionally, representative approaches designed to mitigate hallucinations are introduced accordingly. Finally, we analyze the challenges that highlight the current limitations and formulate open questions, aiming to delineate pathways for future research on hallucinations in LLMs.

* Work in progress; 49 pages

Via

Access Paper or Ask Questions