Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Songtao Lu

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Jan 13, 2024

A F M Saif, Xiaodong Cui, Han Shen, Songtao Lu, Brian Kingsbury, Tianyi Chen

Figure 1 for Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Figure 2 for Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Figure 3 for Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Figure 4 for Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Abstract:In this paper, we present a novel bilevel optimization-based training approach to training acoustic models for automatic speech recognition (ASR) tasks that we term {bi-level joint unsupervised and supervised training (BL-JUST)}. {BL-JUST employs a lower and upper level optimization with an unsupervised loss and a supervised loss respectively, leveraging recent advances in penalty-based bilevel optimization to solve this challenging ASR problem with affordable complexity and rigorous convergence guarantees.} To evaluate BL-JUST, extensive experiments on the LibriSpeech and TED-LIUM v2 datasets have been conducted. BL-JUST achieves superior performance over the commonly used pre-training followed by fine-tuning strategy.

* This paper has been accepted in ICASSP-2024 conference

Via

Access Paper or Ask Questions

Soft Random Sampling: A Theoretical and Empirical Analysis

Nov 24, 2023

Xiaodong Cui, Ashish Mittal, Songtao Lu, Wei Zhang, George Saon, Brian Kingsbury

Figure 1 for Soft Random Sampling: A Theoretical and Empirical Analysis

Figure 2 for Soft Random Sampling: A Theoretical and Empirical Analysis

Figure 3 for Soft Random Sampling: A Theoretical and Empirical Analysis

Figure 4 for Soft Random Sampling: A Theoretical and Empirical Analysis

Abstract:Soft random sampling (SRS) is a simple yet effective approach for efficient training of large-scale deep neural networks when dealing with massive data. SRS selects a subset uniformly at random with replacement from the full data set in each epoch. In this paper, we conduct a theoretical and empirical analysis of SRS. First, we analyze its sampling dynamics including data coverage and occupancy. Next, we investigate its convergence with non-convex objective functions and give the convergence rate. Finally, we provide its generalization performance. We empirically evaluate SRS for image recognition on CIFAR10 and automatic speech recognition on Librispeech and an in-house payload dataset to demonstrate its effectiveness. Compared to existing coreset-based data selection methods, SRS offers a better accuracy-efficiency trade-off. Especially on real-world industrial scale data sets, it is shown to be a powerful training strategy with significant speedup and competitive performance with almost no additional computing cost.

Via

Access Paper or Ask Questions

Ontology Revision based on Pre-trained Language Models

Oct 27, 2023

Qiu Ji, Guilin Qi, Yuxin Ye, Jiaye Li, Site Li, Jianjie Ren, Songtao Lu

Abstract:Ontology revision aims to seamlessly incorporate new information into an existing ontology and plays a crucial role in tasks such as ontology evolution, ontology maintenance, and ontology alignment. Similar to repair single ontologies, resolving logical incoherence in the task of ontology revision is also important and meaningful since incoherence is a main potential factor to cause inconsistency and reasoning with an inconsistent ontology will obtain meaningless answers. To deal with this problem, various ontology revision methods have been proposed to define revision operators and design ranking strategies for axioms in an ontology. However, they rarely consider axiom semantics which provides important information to differentiate axioms. On the other hand, pre-trained models can be utilized to encode axiom semantics, and have been widely applied in many natural language processing tasks and ontology-related ones in recent years. Therefore, in this paper, we define four scoring functions to rank axioms based on a pre-trained model by considering various information from a rebuttal ontology and its corresponding reliable ontology. Based on such a scoring function, we propose an ontology revision algorithm to deal with unsatisfiable concepts at once. If it is hard to resolve all unsatisfiable concepts in a rebuttal ontology together, an adapted revision algorithm is designed to deal with them group by group. We conduct experiments over 19 ontology pairs and compare our algorithms and scoring functions with existing ones. According to the experiments, it shows that our algorithms could achieve promising performance. The adapted revision algorithm could improve the efficiency largely, and at most 96% time could be saved for some ontology pairs. Some of our scoring functions help a revision algorithm obtain better results in many cases, especially for the challenging pairs.

Via

Access Paper or Ask Questions

On the Convergence and Sample Complexity Analysis of Deep Q-Networks with $ε$-Greedy Exploration

Oct 24, 2023

Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury

Abstract:This paper provides a theoretical understanding of Deep Q-Network (DQN) with the $\varepsilon$-greedy exploration in deep reinforcement learning. Despite the tremendous empirical achievement of the DQN, its theoretical characterization remains underexplored. First, the exploration strategy is either impractical or ignored in the existing analysis. Second, in contrast to conventional Q-learning algorithms, the DQN employs the target network and experience replay to acquire an unbiased estimation of the mean-square Bellman error (MSBE) utilized in training the Q-network. However, the existing theoretical analysis of DQNs lacks convergence analysis or bypasses the technical challenges by deploying a significantly overparameterized neural network, which is not computationally efficient. This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy. We prove an iterative procedure with decaying $\epsilon$ converges to the optimal Q-value function geometrically. Moreover, a higher level of $\epsilon$ values enlarges the region of convergence but slows down the convergence, while the opposite holds for a lower level of $\epsilon$ values. Experiments justify our established theoretical insights on DQNs.

* Neurips 2023

Via

Access Paper or Ask Questions

FedLogic: Interpretable Federated Multi-Domain Chain-of-Thought Prompt Selection for Large Language Models

Aug 29, 2023

Pengwei Xing, Songtao Lu, Han Yu

Figure 1 for FedLogic: Interpretable Federated Multi-Domain Chain-of-Thought Prompt Selection for Large Language Models

Figure 2 for FedLogic: Interpretable Federated Multi-Domain Chain-of-Thought Prompt Selection for Large Language Models

Figure 3 for FedLogic: Interpretable Federated Multi-Domain Chain-of-Thought Prompt Selection for Large Language Models

Figure 4 for FedLogic: Interpretable Federated Multi-Domain Chain-of-Thought Prompt Selection for Large Language Models

Abstract:Leveraging ``chain-of-thought (CoT)'' reasoning to elicit rapid and precise responses from large language models (LLMs) is rapidly attracting research interest. A notable challenge here is how to design or select optimal prompts. The process of prompt selection relies on trial and error, involving continuous adjustments and combinations of input prompts by users based on the corresponding new responses generated from LLMs. Furthermore, minimal research has been conducted to explore how LLMs employ the mathematical problem-solving capabilities learned from user interactions to address issues in narrative writing. To improve interpretability and explore the balance principle between generality and personalization under a multi-domain CoT prompt selection scenario, we propose the Federated Logic rule learning approach (FedLogic). We introduce a theoretical formalization and interactive emulation of the multi-domain CoT prompt selection dilemma in the context of federated LLMs. We cast the problem of joint probability modeling as a bilevel program, where the CoT prompt selection intricacy can be likened to a fuzzy score-based rule selection with the LLMs function as rule generators. FedLogic solves this problem through variational expectation maximization (V-EM). In addition, we incorporate two KL-divergence constraints within this probabilistic modeling framework to surmount the intricacies of managing extensive search spaces and accomplishing cross-domain personalization of CoTs. To the best of our knowledge, FedLogic is the first interpretable and principled federated multi-domain CoT prompt selection approach for LLMs.

Via

Access Paper or Ask Questions

How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

Aug 26, 2023

Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky

Figure 1 for How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

Figure 2 for How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

Figure 3 for How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

Figure 4 for How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

Abstract:The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied. Motivated by the concept of personalized context-aware document-grounded conversational systems, we introduce the task of context-aware passage retrieval. We also construct a dataset specifically curated for this purpose. We describe multiple baseline systems to address this task, and propose a novel approach, Personalized Context-Aware Search (PCAS), that effectively harnesses contextual information during passage retrieval. Experimental evaluations conducted on multiple popular dense retrieval systems demonstrate that our proposed approach not only outperforms the baselines in retrieving the most relevant passage but also excels at identifying the pertinent context among all the available contexts. We envision that our contributions will serve as a catalyst for inspiring future research endeavors in this promising direction.

Via

Access Paper or Ask Questions

A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

Jun 06, 2023

Quan Xiao, Songtao Lu, Tianyi Chen

Figure 1 for A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

Figure 2 for A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

Figure 3 for A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

Figure 4 for A Generalized Alternating Method for Bilevel Learning under the Polyak-Łojasiewicz Condition

Abstract:Bilevel optimization has recently regained interest owing to its applications in emerging machine learning fields such as hyperparameter optimization, meta-learning, and reinforcement learning. Recent results have shown that simple alternating (implicit) gradient-based algorithms can achieve the same convergence rate of single-level gradient descent (GD) for bilevel problems with a strongly convex lower-level objective. However, it remains unclear whether this result can be generalized to bilevel problems beyond this basic setting. In this paper, we propose a Generalized ALternating mEthod for bilevel opTimization (GALET) with a nonconvex lower-level objective that satisfies the Polyak-{\L}ojasiewicz (PL) condition. We first introduce a stationary metric for the considered bilevel problems, which generalizes the existing metric. We then establish that GALET achieves an $\epsilon$-stationary metric for the considered problem within $\tilde{\cal O}(\epsilon^{-1})$ iterations, which matches the iteration complexity of GD for smooth nonconvex problems.

Via

Access Paper or Ask Questions

PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

Mar 05, 2023

Zhuqing Liu, Xin Zhang, Songtao Lu, Jia Liu

Figure 1 for PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

Figure 2 for PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

Figure 3 for PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

Figure 4 for PRECISION: Decentralized Constrained Min-Max Learning with Low Communication and Sample Complexities

Abstract:Recently, min-max optimization problems have received increasing attention due to their wide range of applications in machine learning (ML). However, most existing min-max solution techniques are either single-machine or distributed algorithms coordinated by a central server. In this paper, we focus on the decentralized min-max optimization for learning with domain constraints, where multiple agents collectively solve a nonconvex-strongly-concave min-max saddle point problem without coordination from any server. Decentralized min-max optimization problems with domain constraints underpins many important ML applications, including multi-agent ML fairness assurance, and policy evaluations in multi-agent reinforcement learning. We propose an algorithm called PRECISION (proximal gradient-tracking and stochastic recursive variance reduction) that enjoys a convergence rate of $O(1/T)$, where $T$ is the maximum number of iterations. To further reduce sample complexity, we propose PRECISION$^+$ with an adaptive batch size technique. We show that the fast $O(1/T)$ convergence of PRECISION and PRECISION$^+$ to an $\epsilon$-stationary point imply $O(\epsilon^{-2})$ communication complexity and $O(m\sqrt{n}\epsilon^{-2})$ sample complexity, where $m$ is the number of agents and $n$ is the size of dataset at each agent. To our knowledge, this is the first work that achieves $O(\epsilon^{-2})$ in both sample and communication complexities in decentralized min-max learning with domain constraints. Our experiments also corroborate the theoretical results.

Via

Access Paper or Ask Questions

Joint Edge-Model Sparse Learning is Provably Efficient for Graph Neural Networks

Feb 06, 2023

Shuai Zhang, Meng Wang, Pin-Yu Chen, Sijia Liu, Songtao Lu, Miao Liu

Abstract:Due to the significant computational challenge of training large-scale graph neural networks (GNNs), various sparse learning techniques have been exploited to reduce memory and storage costs. Examples include \textit{graph sparsification} that samples a subgraph to reduce the amount of data aggregation and \textit{model sparsification} that prunes the neural network to reduce the number of trainable weights. Despite the empirical successes in reducing the training cost while maintaining the test accuracy, the theoretical generalization analysis of sparse learning for GNNs remains elusive. To the best of our knowledge, this paper provides the first theoretical characterization of joint edge-model sparse learning from the perspective of sample complexity and convergence rate in achieving zero generalization error. It proves analytically that both sampling important nodes and pruning neurons with the lowest-magnitude can reduce the sample complexity and improve convergence without compromising the test accuracy. Although the analysis is centered on two-layer GNNs with structural constraints on data, the insights are applicable to more general setups and justified by both synthetic and practical citation datasets.

* The Eleventh International Conference on Learning Representations, 2023

Via

Access Paper or Ask Questions

Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Dec 19, 2022

Zichong Li, Pin-Yu Chen, Sijia Liu, Songtao Lu, Yangyang Xu

Figure 1 for Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Figure 2 for Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Figure 3 for Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Figure 4 for Stochastic Inexact Augmented Lagrangian Method for Nonconvex Expectation Constrained Optimization

Abstract:Many real-world problems not only have complicated nonconvex functional constraints but also use a large number of data points. This motivates the design of efficient stochastic methods on finite-sum or expectation constrained problems. In this paper, we design and analyze stochastic inexact augmented Lagrangian methods (Stoc-iALM) to solve problems involving a nonconvex composite (i.e. smooth+nonsmooth) objective and nonconvex smooth functional constraints. We adopt the standard iALM framework and design a subroutine by using the momentum-based variance-reduced proximal stochastic gradient method (PStorm) and a postprocessing step. Under certain regularity conditions (assumed also in existing works), to reach an $\varepsilon$-KKT point in expectation, we establish an oracle complexity result of $O(\varepsilon^{-5})$, which is better than the best-known $O(\varepsilon^{-6})$ result. Numerical experiments on the fairness constrained problem and the Neyman-Pearson classification problem with real data demonstrate that our proposed method outperforms an existing method with the previously best-known complexity result.

Via

Access Paper or Ask Questions