Alert button
Picture for Nan Yang

Nan Yang

Alert button

PoSE: Efficient Context Window Extension of LLMs via Positional Skip-wise Training

Sep 19, 2023
Dawei Zhu, Nan Yang, Liang Wang, Yifan Song, Wenhao Wu, Furu Wei, Sujian Li

In this paper, we introduce Positional Skip-wisE (PoSE) training for efficient adaptation of large language models~(LLMs) to extremely long context windows. PoSE decouples train length from target context window size by simulating long inputs using a fixed context window with manipulated position indices during training. Concretely, we select several short chunks from a long input sequence, and introduce distinct skipping bias terms to modify the position indices of each chunk. These bias terms, along with the length of each chunk, are altered for each training example, allowing the model to adapt to all positions within the target context window without training on full length inputs. Experiments show that, compared with fine-tuning on the full length, PoSE greatly reduces memory and time overhead with minimal impact on performance. Leveraging this advantage, we have successfully extended the LLaMA model to 128k tokens. Furthermore, we empirically confirm that PoSE is compatible with all RoPE-based LLMs and various position interpolation strategies. Notably, by decoupling fine-tuning length from target context window, PoSE can theoretically extend the context window infinitely, constrained only by memory usage for inference. With ongoing advancements for efficient inference, we believe PoSE holds great promise for scaling the context window even further.

Viaarxiv icon

Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design

Jul 14, 2023
Milad Tatar Mamaghani, Xiangyun Zhou, Nan Yang, A. Lee Swindlehurst

Figure 1 for Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design
Figure 2 for Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design
Figure 3 for Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design
Figure 4 for Secure Short-Packet Communications via UAV-Enabled Mobile Relaying: Joint Resource Optimization and 3D Trajectory Design

Short-packet communication (SPC) and unmanned aerial vehicles (UAVs) are anticipated to play crucial roles in the development of 5G-and-beyond wireless networks and the Internet of Things (IoT). In this paper, we propose a secure SPC system, where a UAV serves as a mobile decode-and-forward (DF) relay, periodically receiving and relaying small data packets from a remote IoT device to its receiver in two hops with strict latency requirements, in the presence of an eavesdropper. This system requires careful optimization of important design parameters, such as the coding blocklengths of both hops, transmit powers, and UAV's trajectory. While the overall optimization problem is nonconvex, we tackle it by applying a block successive convex approximation (BSCA) approach to divide the original problem into three subproblems and solve them separately. Then, an overall iterative algorithm is proposed to obtain the final design with guaranteed convergence. Our proposed low-complexity algorithm incorporates 3D trajectory design and resource management to optimize the effective average secrecy throughput of the communication system over the course of UAV-relay's mission. Simulation results demonstrate significant performance improvements compared to various benchmark schemes and provide useful design insights on the coding blocklengths and transmit powers along the trajectory of the UAV.

Viaarxiv icon

Learning to Retrieve In-Context Examples for Large Language Models

Jul 14, 2023
Liang Wang, Nan Yang, Furu Wei

Figure 1 for Learning to Retrieve In-Context Examples for Large Language Models
Figure 2 for Learning to Retrieve In-Context Examples for Large Language Models
Figure 3 for Learning to Retrieve In-Context Examples for Large Language Models
Figure 4 for Learning to Retrieve In-Context Examples for Large Language Models

Large language models (LLMs) have demonstrated their ability to learn in-context, allowing them to perform various tasks based on a few input-output examples. However, the effectiveness of in-context learning is heavily reliant on the quality of the selected examples. In this paper, we propose a novel framework to iteratively train dense retrievers that can identify high-quality in-context examples for LLMs. Our framework initially trains a reward model based on LLM feedback to evaluate the quality of candidate examples, followed by knowledge distillation to train a bi-encoder based dense retriever. Our experiments on a suite of 30 tasks demonstrate that our framework significantly enhances in-context learning performance. Furthermore, we show the generalization ability of our framework to unseen tasks during training. An in-depth analysis reveals that our model improves performance by retrieving examples with similar patterns, and the gains are consistent across LLMs of varying sizes.

* 16 pages 
Viaarxiv icon

Learning to Rank in Generative Retrieval

Jun 27, 2023
Yongqi Li, Nan Yang, Liang Wang, Furu Wei, Wenjie Li

Figure 1 for Learning to Rank in Generative Retrieval
Figure 2 for Learning to Rank in Generative Retrieval
Figure 3 for Learning to Rank in Generative Retrieval
Figure 4 for Learning to Rank in Generative Retrieval

Generative retrieval is a promising new paradigm in text retrieval that generates identifier strings of relevant passages as the retrieval target. This paradigm leverages powerful generation models and represents a new paradigm distinct from traditional learning-to-rank methods. However, despite its rapid development, current generative retrieval methods are still limited. They typically rely on a heuristic function to transform predicted identifiers into a passage rank list, which creates a gap between the learning objective of generative retrieval and the desired passage ranking target. Moreover, the inherent exposure bias problem of text generation also persists in generative retrieval. To address these issues, we propose a novel framework, called LTRGR, that combines generative retrieval with the classical learning-to-rank paradigm. Our approach involves training an autoregressive model using a passage rank loss, which directly optimizes the autoregressive model toward the optimal passage ranking. This framework only requires an additional training step to enhance current generative retrieval systems and does not add any burden to the inference stage. We conducted experiments on three public datasets, and our results demonstrate that LTRGR achieves state-of-the-art performance among generative retrieval methods, indicating its effectiveness and robustness.

Viaarxiv icon

Multiview Identifiers Enhanced Generative Retrieval

May 26, 2023
Yongqi Li, Nan Yang, Liang Wang, Furu Wei, Wenjie Li

Figure 1 for Multiview Identifiers Enhanced Generative Retrieval
Figure 2 for Multiview Identifiers Enhanced Generative Retrieval
Figure 3 for Multiview Identifiers Enhanced Generative Retrieval
Figure 4 for Multiview Identifiers Enhanced Generative Retrieval

Instead of simply matching a query to pre-existing passages, generative retrieval generates identifier strings of passages as the retrieval target. At a cost, the identifier must be distinctive enough to represent a passage. Current approaches use either a numeric ID or a text piece (such as a title or substrings) as the identifier. However, these identifiers cannot cover a passage's content well. As such, we are motivated to propose a new type of identifier, synthetic identifiers, that are generated based on the content of a passage and could integrate contextualized information that text pieces lack. Furthermore, we simultaneously consider multiview identifiers, including synthetic identifiers, titles, and substrings. These views of identifiers complement each other and facilitate the holistic ranking of passages from multiple perspectives. We conduct a series of experiments on three public datasets, and the results indicate that our proposed approach performs the best in generative retrieval, demonstrating its effectiveness and robustness.

* ACL 2023 Main Conference 
Viaarxiv icon

Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion

May 24, 2023
Xingxing Zuo, Nan Yang, Nathaniel Merrill, Binbin Xu, Stefan Leutenegger

Figure 1 for Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion
Figure 2 for Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion
Figure 3 for Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion
Figure 4 for Incremental Dense Reconstruction from Monocular Video with Guided Sparse Feature Volume Fusion

Incrementally recovering 3D dense structures from monocular videos is of paramount importance since it enables various robotics and AR applications. Feature volumes have recently been shown to enable efficient and accurate incremental dense reconstruction without the need to first estimate depth, but they are not able to achieve as high of a resolution as depth-based methods due to the large memory consumption of high-resolution feature volumes. This letter proposes a real-time feature volume-based dense reconstruction method that predicts TSDF (Truncated Signed Distance Function) values from a novel sparsified deep feature volume, which is able to achieve higher resolutions than previous feature volume-based methods, and is favorable in large-scale outdoor scenarios where the majority of voxels are empty. An uncertainty-aware multi-view stereo (MVS) network is leveraged to infer initial voxel locations of the physical surface in a sparse feature volume. Then for refining the recovered 3D geometry, deep features are attentively aggregated from multiview images at potential surface locations, and temporally fused. Besides achieving higher resolutions than before, our method is shown to produce more complete reconstructions with finer detail in many cases. Extensive evaluations on both public and self-collected datasets demonstrate a very competitive real-time reconstruction result for our method compared to state-of-the-art reconstruction methods in both indoor and outdoor settings.

* 8 pages, 5 figures, RA-L 2023 
Viaarxiv icon

UAV-assisted IoT Monitoring Network: Adaptive Multiuser Access for Low-Latency and High-Reliability Under Bursty Traffic

Apr 25, 2023
Nilupuli Senadhira, Salman Durrani, Sheeraz A. Alvi, Nan Yang, Xiangyun Zhou

Figure 1 for UAV-assisted IoT Monitoring Network: Adaptive Multiuser Access for Low-Latency and High-Reliability Under Bursty Traffic
Figure 2 for UAV-assisted IoT Monitoring Network: Adaptive Multiuser Access for Low-Latency and High-Reliability Under Bursty Traffic
Figure 3 for UAV-assisted IoT Monitoring Network: Adaptive Multiuser Access for Low-Latency and High-Reliability Under Bursty Traffic
Figure 4 for UAV-assisted IoT Monitoring Network: Adaptive Multiuser Access for Low-Latency and High-Reliability Under Bursty Traffic

In this work, we propose an adaptive system design for an Internet of Things (IoT) monitoring network with latency and reliability requirements, where IoT devices generate time-critical and event-triggered bursty traffic, and an unmanned aerial vehicle (UAV) aggregates and relays sensed data to the base station. Existing transmission schemes based on the overall average traffic rates over-utilize network resources when traffic is smooth, and suffer from packet collisions when traffic is bursty which occurs in an event of interest. We address such problems by designing an adaptive transmission scheme employing multiuser shared access (MUSA) based grant-free non-orthogonal multiple access and use short packet communication for low latency of the IoT-to-UAV communication. Specifically, to accommodate bursty traffic, we design an analytical framework and formulate an optimization problem to maximize the performance by determining the optimal number of transmission time slots, subject to the stringent reliability and latency constraints. We compare the performance of the proposed scheme with a non-adaptive power-diversity based scheme with a fixed number of time slots. Our results show that the proposed scheme has superior reliability and stability in comparison to the state-of-the-art scheme at moderate to high average traffic rates, while satisfying the stringent latency requirements.

* Submitted for possible journal publication 
Viaarxiv icon

Combining Adversaries with Anti-adversaries in Training

Apr 25, 2023
Xiaoling Zhou, Nan Yang, Ou Wu

Figure 1 for Combining Adversaries with Anti-adversaries in Training
Figure 2 for Combining Adversaries with Anti-adversaries in Training
Figure 3 for Combining Adversaries with Anti-adversaries in Training
Figure 4 for Combining Adversaries with Anti-adversaries in Training

Adversarial training is an effective learning technique to improve the robustness of deep neural networks. In this study, the influence of adversarial training on deep learning models in terms of fairness, robustness, and generalization is theoretically investigated under more general perturbation scope that different samples can have different perturbation directions (the adversarial and anti-adversarial directions) and varied perturbation bounds. Our theoretical explorations suggest that the combination of adversaries and anti-adversaries (samples with anti-adversarial perturbations) in training can be more effective in achieving better fairness between classes and a better tradeoff between robustness and generalization in some typical learning scenarios (e.g., noisy label learning and imbalance learning) compared with standard adversarial training. On the basis of our theoretical findings, a more general learning objective that combines adversaries and anti-adversaries with varied bounds on each training sample is presented. Meta learning is utilized to optimize the combination weights. Experiments on benchmark datasets under different learning scenarios verify our theoretical findings and the effectiveness of the proposed methodology.

* AAAI2023  
* 8 pages, 5 figures 
Viaarxiv icon

Inference with Reference: Lossless Acceleration of Large Language Models

Apr 10, 2023
Nan Yang, Tao Ge, Liang Wang, Binxing Jiao, Daxin Jiang, Linjun Yang, Rangan Majumder, Furu Wei

Figure 1 for Inference with Reference: Lossless Acceleration of Large Language Models
Figure 2 for Inference with Reference: Lossless Acceleration of Large Language Models
Figure 3 for Inference with Reference: Lossless Acceleration of Large Language Models
Figure 4 for Inference with Reference: Lossless Acceleration of Large Language Models

We propose LLMA, an LLM accelerator to losslessly speed up Large Language Model (LLM) inference with references. LLMA is motivated by the observation that there are abundant identical text spans between the decoding result by an LLM and the reference that is available in many real world scenarios (e.g., retrieved documents). LLMA first selects a text span from the reference and copies its tokens to the decoder and then efficiently checks the tokens' appropriateness as the decoding result in parallel within one decoding step. The improved computational parallelism allows LLMA to achieve over 2x speed-up for LLMs with identical generation results as greedy decoding in many practical generation scenarios where significant overlap between in-context reference and outputs exists (e.g., search engines and multi-turn conversations).

* 9 pages 
Viaarxiv icon

FedMAE: Federated Self-Supervised Learning with One-Block Masked Auto-Encoder

Mar 20, 2023
Nan Yang, Xuanyu Chen, Charles Z. Liu, Dong Yuan, Wei Bao, Lizhen Cui

Figure 1 for FedMAE: Federated Self-Supervised Learning with One-Block Masked Auto-Encoder
Figure 2 for FedMAE: Federated Self-Supervised Learning with One-Block Masked Auto-Encoder
Figure 3 for FedMAE: Federated Self-Supervised Learning with One-Block Masked Auto-Encoder
Figure 4 for FedMAE: Federated Self-Supervised Learning with One-Block Masked Auto-Encoder

Latest federated learning (FL) methods started to focus on how to use unlabeled data in clients for training due to users' privacy concerns, high labeling costs, or lack of expertise. However, current Federated Semi-Supervised/Self-Supervised Learning (FSSL) approaches fail to learn large-scale images because of the limited computing resources of local clients. In this paper, we introduce a new framework FedMAE, which stands for Federated Masked AutoEncoder, to address the problem of how to utilize unlabeled large-scale images for FL. Specifically, FedMAE can pre-train one-block Masked AutoEncoder (MAE) using large images in lightweight client devices, and then cascades multiple pre-trained one-block MAEs in the server to build a multi-block ViT backbone for downstream tasks. Theoretical analysis and experimental results on image reconstruction and classification show that our FedMAE achieves superior performance compared to the state-of-the-art FSSL methods.

Viaarxiv icon