Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Tianyi Chen

School of Civil and Environmental Engineering, Nanyang Technological University, Singapore

Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions

Feb 10, 2025

Zhaoxian Wu, Quan Xian, Tayfun Gokmen, Omobayode Fagbohungbe, Tianyi Chen

Figure 1 for Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions

Figure 2 for Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions

Figure 3 for Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions

Figure 4 for Analog In-memory Training on General Non-ideal Resistive Elements: The Impact of Response Functions

Abstract:As the economic and environmental costs of training and deploying large vision or language models increase dramatically, analog in-memory computing (AIMC) emerges as a promising energy-efficient solution. However, the training perspective, especially its training dynamic, is underexplored. In AIMC hardware, the trainable weights are represented by the conductance of resistive elements and updated using consecutive electrical pulses. Among all the physical properties of resistive elements, the response to the pulses directly affects the training dynamics. This paper first provides a theoretical foundation for gradient-based training on AIMC hardware and studies the impact of response functions. We demonstrate that noisy update and asymmetric response functions negatively impact Analog SGD by imposing an implicit penalty term on the objective. To overcome the issue, Tiki-Taka, a residual learning algorithm, converges exactly to a critical point by optimizing a main array and a residual array bilevelly. The conclusion is supported by simulations validating our theoretical insights.

Via

Access Paper or Ask Questions

Bilevel Joint Unsupervised and Supervised Training for Automatic Speech Recognition

Dec 11, 2024

Xiaodong Cui, A F M Saif, Songtao Lu, Lisha Chen, Tianyi Chen, Brian Kingsbury, George Saon

Abstract:In this paper, we propose a bilevel joint unsupervised and supervised training (BL-JUST) framework for automatic speech recognition. Compared to the conventional pre-training and fine-tuning strategy which is a disconnected two-stage process, BL-JUST tries to optimize an acoustic model such that it simultaneously minimizes both the unsupervised and supervised loss functions. Because BL-JUST seeks matched local optima of both loss functions, acoustic representations learned by the acoustic model strike a good balance between being generic and task-specific. We solve the BL-JUST problem using penalty-based bilevel gradient descent and evaluate the trained deep neural network acoustic models on various datasets with a variety of architectures and loss functions. We show that BL-JUST can outperform the widely-used pre-training and fine-tuning strategy and some other popular semi-supervised techniques.

* Accepted by IEEE/ACM Transactions on Audio, Speech and Language Processing

Via

Access Paper or Ask Questions

FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning

Dec 02, 2024

Lisha Chen, AFM Saif, Yanning Shen, Tianyi Chen

Figure 1 for FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning

Figure 2 for FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning

Figure 3 for FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning

Figure 4 for FERERO: A Flexible Framework for Preference-Guided Multi-Objective Learning

Abstract:Finding specific preference-guided Pareto solutions that represent different trade-offs among multiple objectives is critical yet challenging in multi-objective problems. Existing methods are restrictive in preference definitions and/or their theoretical guarantees. In this work, we introduce a Flexible framEwork for pREfeRence-guided multi-Objective learning (FERERO) by casting it as a constrained vector optimization problem. Specifically, two types of preferences are incorporated into this formulation -- the relative preference defined by the partial ordering induced by a polyhedral cone, and the absolute preference defined by constraints that are linear functions of the objectives. To solve this problem, convergent algorithms are developed with both single-loop and stochastic variants. Notably, this is the first single-loop primal algorithm for constrained vector optimization to our knowledge. The proposed algorithms adaptively adjust to both constraint and objective values, eliminating the need to solve different subproblems at different stages of constraint satisfaction. Experiments on multiple benchmarks demonstrate the proposed method is very competitive in finding preference-guided optimal solutions. Code is available at https://github.com/lisha-chen/FERERO/.

Via

Access Paper or Ask Questions

Primal-Dual Spectral Representation for Off-policy Evaluation

Oct 23, 2024

Yang Hu, Tianyi Chen, Na Li, Kai Wang, Bo Dai

Figure 1 for Primal-Dual Spectral Representation for Off-policy Evaluation

Figure 2 for Primal-Dual Spectral Representation for Off-policy Evaluation

Figure 3 for Primal-Dual Spectral Representation for Off-policy Evaluation

Figure 4 for Primal-Dual Spectral Representation for Off-policy Evaluation

Abstract:Off-policy evaluation (OPE) is one of the most fundamental problems in reinforcement learning (RL) to estimate the expected long-term payoff of a given target policy with only experiences from another behavior policy that is potentially unknown. The distribution correction estimation (DICE) family of estimators have advanced the state of the art in OPE by breaking the curse of horizon. However, the major bottleneck of applying DICE estimators lies in the difficulty of solving the saddle-point optimization involved, especially with neural network implementations. In this paper, we tackle this challenge by establishing a linear representation of value function and stationary distribution correction ratio, i.e., primal and dual variables in the DICE framework, using the spectral decomposition of the transition operator. Such primal-dual representation not only bypasses the non-convex non-concave optimization in vanilla DICE, therefore enabling an computational efficient algorithm, but also paves the way for more efficient utilization of historical data. We highlight that our algorithm, SpectralDICE, is the first to leverage the linear representation of primal-dual variables that is both computation and sample efficient, the performance of which is supported by a rigorous theoretical sample complexity guarantee and a thorough empirical evaluation on various benchmarks.

* 29 pages, 5 figures

Via

Access Paper or Ask Questions

Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Oct 20, 2024

Heshan Fernando, Han Shen, Parikshit Ram, Yi Zhou, Horst Samulowitz, Nathalie Baracaldo, Tianyi Chen

Figure 1 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Figure 2 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Figure 3 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Figure 4 for Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning

Abstract:Post-training of pre-trained LLMs, which typically consists of the supervised fine-tuning (SFT) stage and the preference learning (RLHF or DPO) stage, is crucial to effective and safe LLM applications. The widely adopted approach in post-training popular open-source LLMs is to sequentially perform SFT and RLHF/DPO. However, sequential training is sub-optimal in terms of SFT and RLHF/DPO trade-off: the LLM gradually forgets about the first stage's training when undergoing the second stage's training. We theoretically prove the sub-optimality of sequential post-training. Furthermore, we propose a practical joint post-training framework with theoretical convergence guarantees and empirically outperforms sequential post-training framework, while having similar computational cost. Our code is available at https://github.com/heshandevaka/XRIGHT.

Via

Access Paper or Ask Questions

Pipeline Gradient-based Model Training on Analog In-memory Accelerators

Oct 19, 2024

Zhaoxian Wu, Quan Xiao, Tayfun Gokmen, Hsinyu Tsai, Kaoutar El Maghraoui, Tianyi Chen

Figure 1 for Pipeline Gradient-based Model Training on Analog In-memory Accelerators

Figure 2 for Pipeline Gradient-based Model Training on Analog In-memory Accelerators

Figure 3 for Pipeline Gradient-based Model Training on Analog In-memory Accelerators

Figure 4 for Pipeline Gradient-based Model Training on Analog In-memory Accelerators

Abstract:Aiming to accelerate the training of large deep neural models (DNN) in an energy-efficient way, an analog in-memory computing (AIMC) accelerator emerges as a solution with immense potential. In AIMC accelerators, trainable weights are kept in memory without the need to move from memory to processors during the training, reducing a bunch of overhead. However, although the in-memory feature enables efficient computation, it also constrains the use of data parallelism since copying weights from one AIMC to another is expensive. To enable parallel training using AIMC, we propose synchronous and asynchronous pipeline parallelism for AIMC accelerators inspired by the pipeline in digital domains. This paper provides a theoretical convergence guarantee for both synchronous and asynchronous pipelines in terms of both sampling and clock cycle complexity, which is non-trivial since the physical characteristic of AIMC accelerators leads to analog updates that suffer from asymmetric bias. The simulations of training DNN on real datasets verify the efficiency of pipeline training.

Via

Access Paper or Ask Questions

Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Oct 14, 2024

Yanjie Ze, Zixuan Chen, Wenhao Wang, Tianyi Chen, Xialin He, Ying Yuan, Xue Bin Peng, Jiajun Wu

Figure 1 for Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Figure 2 for Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Figure 3 for Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Figure 4 for Generalizable Humanoid Manipulation with Improved 3D Diffusion Policies

Abstract:Humanoid robots capable of autonomous operation in diverse environments have long been a goal for roboticists. However, autonomous manipulation by humanoid robots has largely been restricted to one specific scene, primarily due to the difficulty of acquiring generalizable skills. Recent advances in 3D visuomotor policies, such as the 3D Diffusion Policy (DP3), have shown promise in extending these capabilities to wilder environments. However, 3D visuomotor policies often rely on camera calibration and point-cloud segmentation, which present challenges for deployment on mobile robots like humanoids. In this work, we introduce the Improved 3D Diffusion Policy (iDP3), a novel 3D visuomotor policy that eliminates these constraints by leveraging egocentric 3D visual representations. We demonstrate that iDP3 enables a full-sized humanoid robot to autonomously perform skills in diverse real-world scenarios, using only data collected in the lab. Videos are available at: https://humanoid-manipulation.github.io

* Project website: https://humanoid-manipulation.github.io

Via

Access Paper or Ask Questions

SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Oct 09, 2024

Han Shen, Pin-Yu Chen, Payel Das, Tianyi Chen

Figure 1 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Figure 2 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Figure 3 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Figure 4 for SEAL: Safety-enhanced Aligned LLM Fine-tuning via Bilevel Data Selection

Abstract:Fine-tuning on task-specific data to boost downstream performance is a crucial step for leveraging Large Language Models (LLMs). However, previous studies have demonstrated that fine-tuning the models on several adversarial samples or even benign data can greatly comprise the model's pre-equipped alignment and safety capabilities. In this work, we propose SEAL, a novel framework to enhance safety in LLM fine-tuning. SEAL learns a data ranker based on the bilevel optimization to up rank the safe and high-quality fine-tuning data and down rank the unsafe or low-quality ones. Models trained with SEAL demonstrate superior quality over multiple baselines, with 8.5% and 9.7% win rate increase compared to random selection respectively on Llama-3-8b-Instruct and Merlinite-7b models. Our code is available on github https://github.com/hanshen95/SEAL.

Via

Access Paper or Ask Questions

Zero-Shot Text-to-Speech from Continuous Text Streams

Oct 01, 2024

Trung Dang, David Aponte, Dung Tran, Tianyi Chen, Kazuhito Koishida

Figure 1 for Zero-Shot Text-to-Speech from Continuous Text Streams

Figure 2 for Zero-Shot Text-to-Speech from Continuous Text Streams

Figure 3 for Zero-Shot Text-to-Speech from Continuous Text Streams

Figure 4 for Zero-Shot Text-to-Speech from Continuous Text Streams

Abstract:Existing zero-shot text-to-speech (TTS) systems are typically designed to process complete sentences and are constrained by the maximum duration for which they have been trained. However, in many streaming applications, texts arrive continuously in short chunks, necessitating instant responses from the system. We identify the essential capabilities required for chunk-level streaming and introduce LiveSpeech 2, a stream-aware model that supports infinitely long speech generation, text-audio stream synchronization, and seamless transitions between short speech chunks. To achieve these, we propose (1) adopting Mamba, a class of sequence modeling distinguished by linear-time decoding, which is augmented by cross-attention mechanisms for conditioning, (2) utilizing rotary positional embeddings in the computation of cross-attention, enabling the model to process an infinite text stream by sliding a window, and (3) decoding with semantic guidance, a technique that aligns speech with the transcript during inference with minimal overhead. Experimental results demonstrate that our models are competitive with state-of-the-art language model-based zero-shot TTS models, while also providing flexibility to support a wide range of streaming scenarios.

Via

Access Paper or Ask Questions

Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning

Aug 28, 2024

Momin Abbas, Koushik Kar, Tianyi Chen

Figure 1 for Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning

Figure 2 for Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning

Figure 3 for Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning

Figure 4 for Leveraging Large Language Models for Wireless Symbol Detection via In-Context Learning

Abstract:Deep neural networks (DNNs) have made significant strides in tackling challenging tasks in wireless systems, especially when an accurate wireless model is not available. However, when available data is limited, traditional DNNs often yield subpar results due to underfitting. At the same time, large language models (LLMs) exemplified by GPT-3, have remarkably showcased their capabilities across a broad range of natural language processing tasks. But whether and how LLMs can benefit challenging non-language tasks in wireless systems is unexplored. In this work, we propose to leverage the in-context learning ability (a.k.a. prompting) of LLMs to solve wireless tasks in the low data regime without any training or fine-tuning, unlike DNNs which require training. We further demonstrate that the performance of LLMs varies significantly when employed with different prompt templates. To solve this issue, we employ the latest LLM calibration methods. Our results reveal that using LLMs via ICL methods generally outperforms traditional DNNs on the symbol demodulation task and yields highly confident predictions when coupled with calibration techniques.

* Accepted at IEEE GLOBECOM 2024

Via

Access Paper or Ask Questions