Picture for Xiaodong Cui

Xiaodong Cui

Training Nonlinear Transformers for Chain-of-Thought Inference: A Theoretical Generalization Analysis

Add code
Oct 03, 2024
Viaarxiv icon

Training Nonlinear Transformers for Efficient In-Context Learning: A Theoretical Learning and Generalization Analysis

Add code
Feb 23, 2024
Viaarxiv icon

Joint Unsupervised and Supervised Training for Automatic Speech Recognition via Bilevel Optimization

Add code
Jan 13, 2024
Viaarxiv icon

Soft Random Sampling: A Theoretical and Empirical Analysis

Add code
Nov 24, 2023
Viaarxiv icon

How Can Context Help? Exploring Joint Retrieval of Passage and Personalized Context

Add code
Aug 26, 2023
Viaarxiv icon

Diagonal State Space Augmented Transformers for Speech Recognition

Add code
Feb 27, 2023
Viaarxiv icon

Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization

Add code
Jun 16, 2022
Figure 1 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Figure 2 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Figure 3 for Accelerating Inference and Language Model Fusion of Recurrent Neural Network Transducers via End-to-End 4-bit Quantization
Viaarxiv icon

Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing

Add code
Mar 29, 2022
Figure 1 for Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
Figure 2 for Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
Figure 3 for Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
Figure 4 for Improving Generalization of Deep Neural Network Acoustic Models with Length Perturbation and N-best Based Label Smoothing
Viaarxiv icon

Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent

Add code
Dec 02, 2021
Figure 1 for Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Figure 2 for Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Figure 3 for Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Figure 4 for Loss Landscape Dependent Self-Adjusting Learning Rates in Decentralized Stochastic Gradient Descent
Viaarxiv icon

Asynchronous Decentralized Distributed Training of Acoustic Models

Add code
Oct 21, 2021
Figure 1 for Asynchronous Decentralized Distributed Training of Acoustic Models
Figure 2 for Asynchronous Decentralized Distributed Training of Acoustic Models
Figure 3 for Asynchronous Decentralized Distributed Training of Acoustic Models
Figure 4 for Asynchronous Decentralized Distributed Training of Acoustic Models
Viaarxiv icon