Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jie Xu

All-in-One Tuning and Structural Pruning for Domain-Specific LLMs

Dec 19, 2024

Lei Lu, Zhepeng Wang, Ruexue Bao, Mengbing Wang, Fangyi Li, Yawen Wu, Weiwen Jiang, Jie Xu, Yanzhi Wang, Shangqian Gao

Figure 1 for All-in-One Tuning and Structural Pruning for Domain-Specific LLMs

Figure 2 for All-in-One Tuning and Structural Pruning for Domain-Specific LLMs

Figure 3 for All-in-One Tuning and Structural Pruning for Domain-Specific LLMs

Figure 4 for All-in-One Tuning and Structural Pruning for Domain-Specific LLMs

Abstract:Existing pruning techniques for large language models (LLMs) targeting domain-specific applications typically follow a two-stage process: pruning the pretrained general-purpose LLMs and then fine-tuning the pruned LLMs on specific domains. However, the pruning decisions, derived from the pretrained weights, remain unchanged during fine-tuning, even if the weights have been updated. Therefore, such a combination of the pruning decisions and the finetuned weights may be suboptimal, leading to non-negligible performance degradation. To address these limitations, we propose ATP: All-in-One Tuning and Structural Pruning, a unified one-stage structural pruning and fine-tuning approach that dynamically identifies the current optimal substructure throughout the fine-tuning phase via a trainable pruning decision generator. Moreover, given the limited available data for domain-specific applications, Low-Rank Adaptation (LoRA) becomes a common technique to fine-tune the LLMs. In ATP, we introduce LoRA-aware forward and sparsity regularization to ensure that the substructures corresponding to the learned pruning decisions can be directly removed after the ATP process. ATP outperforms the state-of-the-art two-stage pruning methods on tasks in the legal and healthcare domains. More specifically, ATP recovers up to 88% and 91% performance of the dense model when pruning 40% parameters of LLaMA2-7B and LLaMA3-8B models, respectively.

Via

Access Paper or Ask Questions

Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach

Dec 05, 2024

Xiaowen Ye, Yuyi Mao, Xianghao Yu, Shu Sun, Liqun Fu, Jie Xu

Figure 1 for Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach

Figure 2 for Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach

Figure 3 for Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach

Figure 4 for Integrated Sensing and Communications for Low-Altitude Economy: A Deep Reinforcement Learning Approach

Abstract:This paper studies an integrated sensing and communications (ISAC) system for low-altitude economy (LAE), where a ground base station (GBS) provides communication and navigation services for authorized unmanned aerial vehicles (UAVs), while sensing the low-altitude airspace to monitor the unauthorized mobile target. The expected communication sum-rate over a given flight period is maximized by jointly optimizing the beamforming at the GBS and UAVs' trajectories, subject to the constraints on the average signal-to-noise ratio requirement for sensing, the flight mission and collision avoidance of UAVs, as well as the maximum transmit power at the GBS. Typically, this is a sequential decision-making problem with the given flight mission. Thus, we transform it to a specific Markov decision process (MDP) model called episode task. Based on this modeling, we propose a novel LAE-oriented ISAC scheme, referred to as Deep LAE-ISAC (DeepLSC), by leveraging the deep reinforcement learning (DRL) technique. In DeepLSC, a reward function and a new action selection policy termed constrained noise-exploration policy are judiciously designed to fulfill various constraints. To enable efficient learning in episode tasks, we develop a hierarchical experience replay mechanism, where the gist is to employ all experiences generated within each episode to jointly train the neural network. Besides, to enhance the convergence speed of DeepLSC, a symmetric experience augmentation mechanism, which simultaneously permutes the indexes of all variables to enrich available experience sets, is proposed. Simulation results demonstrate that compared with benchmarks, DeepLSC yields a higher sum-rate while meeting the preset constraints, achieves faster convergence, and is more robust against different settings.

* submitted for an IEEE publication

Via

Access Paper or Ask Questions

SAM-MPA: Applying SAM to Few-shot Medical Image Segmentation using Mask Propagation and Auto-prompting

Nov 26, 2024

Jie Xu, Xiaokang Li, Chengyu Yue, Yuanyuan Wang, Yi Guo

Abstract:Medical image segmentation often faces the challenge of prohibitively expensive annotation costs. While few-shot learning offers a promising solution to alleviate this burden, conventional approaches still rely heavily on pre-training with large volumes of labeled data from known categories. To address this issue, we propose leveraging the Segment Anything Model (SAM), pre-trained on over 1 billion masks, thus circumventing the need for extensive domain-specific annotated data. In light of this, we developed SAM-MPA, an innovative SAM-based framework for few-shot medical image segmentation using Mask Propagation-based Auto-prompting. Initially, we employ k-centroid clustering to select the most representative examples for labelling to construct the support set. These annotated examples are registered to other images yielding deformation fields that facilitate the propagation of the mask knowledge to obtain coarse masks across the dataset. Subsequently, we automatically generate visual prompts based on the region and boundary expansion of the coarse mask, including points, box and a coarse mask. Finally, we can obtain the segmentation predictions by inputting these prompts into SAM and refine the results by post refinement module. We validate the performance of the proposed framework through extensive experiments conducted on two medical image datasets with different modalities. Our method achieves Dices of 74.53%, 94.36% on Breast US, Chest X-ray, respectively. Experimental results substantiate that SAM-MPA yields high-accuracy segmentations within 10 labeled examples, outperforming other state-of-the-art few-shot auto-segmentation methods. Our method enables the customization of SAM for any medical image dataset with a small number of labeled examples.

* Accepted as an oral presentation at NeurIPS 2024 AIM-FM Workshop

Via

Access Paper or Ask Questions

LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Nov 22, 2024

Jieming Bian, Lei Wang, Letian Zhang, Jie Xu

Figure 1 for LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Figure 2 for LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Figure 3 for LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Figure 4 for LoRA-FAIR: Federated LoRA Fine-Tuning with Aggregation and Initialization Refinement

Abstract:Foundation models (FMs) achieve strong performance across diverse tasks with task-specific fine-tuning, yet full parameter fine-tuning is often computationally prohibitive for large models. Parameter-efficient fine-tuning (PEFT) methods like Low-Rank Adaptation (LoRA) reduce this cost by introducing low-rank matrices for tuning fewer parameters. While LoRA allows for efficient fine-tuning, it requires significant data for adaptation, making Federated Learning (FL) an appealing solution due to its privacy-preserving collaborative framework. However, combining LoRA with FL introduces two key challenges: the \textbf{Server-Side LoRA Aggregation Bias}, where server-side averaging of LoRA matrices diverges from the ideal global update, and the \textbf{Client-Side LoRA Initialization Drift}, emphasizing the need for consistent initialization across rounds. Existing approaches address these challenges individually, limiting their effectiveness. We propose LoRA-FAIR, a novel method that tackles both issues by introducing a correction term on the server while keeping the original LoRA modules, enhancing aggregation efficiency and accuracy. LoRA-FAIR maintains computational and communication efficiency, yielding superior performance over state-of-the-art methods. Experimental results on ViT and MLP-Mixer models across large-scale datasets demonstrate that LoRA-FAIR consistently achieves performance improvements in FL settings.

Via

Access Paper or Ask Questions

An Overview on IRS-Enabled Sensing and Communications for 6G: Architectures, Fundamental Limits, and Joint Beamforming Designs

Nov 11, 2024

Xianxin Song, Yuan Fang, Feng Wang, Zixiang Ren, Xianghao Yu, Ye Zhang, Fan Liu, Jie Xu, Derrick Wing Kwan Ng, Rui Zhang(+1 more)

Figure 1 for An Overview on IRS-Enabled Sensing and Communications for 6G: Architectures, Fundamental Limits, and Joint Beamforming Designs

Figure 2 for An Overview on IRS-Enabled Sensing and Communications for 6G: Architectures, Fundamental Limits, and Joint Beamforming Designs

Figure 3 for An Overview on IRS-Enabled Sensing and Communications for 6G: Architectures, Fundamental Limits, and Joint Beamforming Designs

Figure 4 for An Overview on IRS-Enabled Sensing and Communications for 6G: Architectures, Fundamental Limits, and Joint Beamforming Designs

Abstract:This paper presents an overview on intelligent reflecting surface (IRS)-enabled sensing and communication for the forthcoming sixth-generation (6G) wireless networks, in which IRSs are strategically deployed to proactively reconfigure wireless environments to improve both sensing and communication (S&C) performance. First, we exploit a single IRS to enable wireless sensing in the base station's (BS's) non-line-of-sight (NLoS) area. In particular, we present three IRS-enabled NLoS target sensing architectures with fully-passive, semi-passive, and active IRSs, respectively. We compare their pros and cons by analyzing the fundamental sensing performance limits for target detection and parameter estimation. Next, we consider a single IRS to facilitate integrated sensing and communication (ISAC), in which the transmit signals at the BS are used for achieving both S&C functionalities, aided by the IRS through reflective beamforming. We present joint transmit signal and receiver processing designs for realizing efficient ISAC, and jointly optimize the transmit beamforming at the BS and reflective beamforming at the IRS to balance the fundamental performance tradeoff between S&C. Furthermore, we discuss multi-IRS networked ISAC, by particularly focusing on multi-IRS-enabled multi-link ISAC, multi-region ISAC, and ISAC signal routing, respectively. Finally, we highlight various promising research topics in this area to motivate future work.

* 22 pages,7 figures

Via

Access Paper or Ask Questions

SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

Nov 01, 2024

Cheng-Chun Hsu, Bowen Wen, Jie Xu, Yashraj Narang, Xiaolong Wang, Yuke Zhu, Joydeep Biswas, Stan Birchfield

Figure 1 for SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

Figure 2 for SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

Figure 3 for SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

Figure 4 for SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation

Abstract:We introduce SPOT, an object-centric imitation learning framework. The key idea is to capture each task by an object-centric representation, specifically the SE(3) object pose trajectory relative to the target. This approach decouples embodiment actions from sensory inputs, facilitating learning from various demonstration types, including both action-based and action-less human hand demonstrations, as well as cross-embodiment generalization. Additionally, object pose trajectories inherently capture planning constraints from demonstrations without the need for manually crafted rules. To guide the robot in executing the task, the object trajectory is used to condition a diffusion policy. We show improvement compared to prior work on RLBench simulated tasks. In real-world evaluation, using only eight demonstrations shot on an iPhone, our approach completed all tasks while fully complying with task constraints. Project page: https://nvlabs.github.io/object_centric_diffusion

Via

Access Paper or Ask Questions

Exploiting Moving Arrays for Near-Field Sensing

Oct 12, 2024

Yilong Chen, Zixiang Ren, Xianghao Yu, Lei Liu, Jie Xu

Figure 1 for Exploiting Moving Arrays for Near-Field Sensing

Figure 2 for Exploiting Moving Arrays for Near-Field Sensing

Figure 3 for Exploiting Moving Arrays for Near-Field Sensing

Figure 4 for Exploiting Moving Arrays for Near-Field Sensing

Abstract:This letter exploits moving arrays to enable nearfield multiple-input multiple-output (MIMO) sensing via a limited number of antenna elements. We consider a scenario where a base station (BS) is equipped with a uniform linear array (ULA) on a moving platform. The objective is to locate a point target in the two-dimensional (2D) space by leveraging the near-field channel characteristics created by the movement of antenna arrays. Under this setup, we analyze the Cramer-Rao bound (CRB) for estimating the target's 2D coordinate, which provides the fundamental sensing performance limits for localization. It is revealed that our proposed design with a moving array achieves a CRB that is proportional to the CRB obtained by an equivalent extremely large ULA matching the platform's size. This shows that the movement of antenna array significantly enlarges its effective aperture to enable near-field sensing. Numerical results show that the proposed moving array design substantially enhances the target estimation performance compared to the conventional fixed array benchmark.

* 5 pages, 7 figures

Via

Access Paper or Ask Questions

Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid Views

Oct 12, 2024

Xinyue Chen, Yazhou Ren, Jie Xu, Fangfei Lin, Xiaorong Pu, Yang Yang

Figure 1 for Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid Views

Figure 2 for Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid Views

Figure 3 for Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid Views

Figure 4 for Bridging Gaps: Federated Multi-View Clustering in Heterogeneous Hybrid Views

Abstract:Recently, federated multi-view clustering (FedMVC) has emerged to explore cluster structures in multi-view data distributed on multiple clients. Existing approaches often assume that clients are isomorphic and all of them belong to either single-view clients or multi-view clients. Despite their success, these methods also present limitations when dealing with practical FedMVC scenarios involving heterogeneous hybrid views, where a mixture of both single-view and multi-view clients exhibit varying degrees of heterogeneity. In this paper, we propose a novel FedMVC framework, which concurrently addresses two challenges associated with heterogeneous hybrid views, i.e., client gap and view gap. To address the client gap, we design a local-synergistic contrastive learning approach that helps single-view clients and multi-view clients achieve consistency for mitigating heterogeneity among all clients. To address the view gap, we develop a global-specific weighting aggregation method, which encourages global models to learn complementary features from hybrid views. The interplay between local-synergistic contrastive learning and global-specific weighting aggregation mutually enhances the exploration of the data cluster structures distributed on multiple clients. Theoretical analysis and extensive experiments demonstrate that our method can handle the heterogeneous hybrid views in FedMVC and outperforms state-of-the-art methods. The code is available at \url{https://github.com/5Martina5/FMCSC}.

Via

Access Paper or Ask Questions

Generative Semantic Communication for Text-to-Speech Synthesis

Oct 04, 2024

Jiahao Zheng, Jinke Ren, Peng Xu, Zhihao Yuan, Jie Xu, Fangxin Wang, Gui Gui, Shuguang Cui

Abstract:Semantic communication is a promising technology to improve communication efficiency by transmitting only the semantic information of the source data. However, traditional semantic communication methods primarily focus on data reconstruction tasks, which may not be efficient for emerging generative tasks such as text-to-speech (TTS) synthesis. To address this limitation, this paper develops a novel generative semantic communication framework for TTS synthesis, leveraging generative artificial intelligence technologies. Firstly, we utilize a pre-trained large speech model called WavLM and the residual vector quantization method to construct two semantic knowledge bases (KBs) at the transmitter and receiver, respectively. The KB at the transmitter enables effective semantic extraction, while the KB at the receiver facilitates lifelike speech synthesis. Then, we employ a transformer encoder and a diffusion model to achieve efficient semantic coding without introducing significant communication overhead. Finally, numerical results demonstrate that our framework achieves much higher fidelity for the generated speech than four baselines, in both cases with additive white Gaussian noise channel and Rayleigh fading channel.

* The paper has been accepted by IEEE Globecom Workshop

Via

Access Paper or Ask Questions

Learning the Optimal Path and DNN Partition for Collaborative Edge Inference

Oct 02, 2024

Yin Huang, Letian Zhang, Jie Xu

Figure 1 for Learning the Optimal Path and DNN Partition for Collaborative Edge Inference

Figure 2 for Learning the Optimal Path and DNN Partition for Collaborative Edge Inference

Figure 3 for Learning the Optimal Path and DNN Partition for Collaborative Edge Inference

Figure 4 for Learning the Optimal Path and DNN Partition for Collaborative Edge Inference

Abstract:Recent advancements in Deep Neural Networks (DNNs) have catalyzed the development of numerous intelligent mobile applications and services. However, they also introduce significant computational challenges for resource-constrained mobile devices. To address this, collaborative edge inference has been proposed. This method involves partitioning a DNN inference task into several subtasks and distributing these across multiple network nodes. Despite its potential, most current approaches presume known network parameters -- like node processing speeds and link transmission rates -- or rely on a fixed sequence of nodes for processing the DNN subtasks. In this paper, we tackle a more complex scenario where network parameters are unknown and must be learned, and multiple network paths are available for distributing inference tasks. Specifically, we explore the learning problem of selecting the optimal network path and assigning DNN layers to nodes along this path, considering potential security threats and the costs of switching paths. We begin by deriving structural insights from the DNN layer assignment with complete network information, which narrows down the decision space and provides crucial understanding of optimal assignments. We then cast the learning problem with incomplete network information as a novel adversarial group linear bandits problem with switching costs, featuring rewards generation through a combined stochastic and adversarial process. We introduce a new bandit algorithm, B-EXPUCB, which combines elements of the classical blocked EXP3 and LinUCB algorithms, and demonstrate its sublinear regret. Extensive simulations confirm B-EXPUCB's superior performance in learning for collaborative edge inference over existing algorithms.

* 15 pages, 15 figures, submitted to IEEE journals for possible publication

Via

Access Paper or Ask Questions