Picture for Tuo Zhao

Tuo Zhao

PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance

Add code
Jun 25, 2022
Figure 1 for PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Figure 2 for PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Figure 3 for PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Figure 4 for PLATON: Pruning Large Transformer Models with Upper Confidence Bound of Weight Importance
Viaarxiv icon

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Add code
Jun 09, 2022
Figure 1 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Figure 2 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Figure 3 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Figure 4 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Viaarxiv icon

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

Add code
Jun 06, 2022
Figure 1 for Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Figure 2 for Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Figure 3 for Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Figure 4 for Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Viaarxiv icon

A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks

Add code
May 04, 2022
Figure 1 for A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Figure 2 for A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Figure 3 for A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Figure 4 for A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Viaarxiv icon

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

Add code
Apr 28, 2022
Figure 1 for MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Figure 2 for MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Figure 3 for MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Figure 4 for MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Viaarxiv icon

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

Add code
Apr 18, 2022
Figure 1 for CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Figure 2 for CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Figure 3 for CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Figure 4 for CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Viaarxiv icon

CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data

Add code
Apr 08, 2022
Figure 1 for CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Figure 2 for CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Figure 3 for CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Figure 4 for CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Viaarxiv icon

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Add code
Feb 14, 2022
Figure 1 for No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Figure 2 for No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Figure 3 for No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Figure 4 for No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Viaarxiv icon

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Add code
Feb 07, 2022
Figure 1 for Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Viaarxiv icon

Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity

Add code
Jan 30, 2022
Figure 1 for Homotopic Policy Mirror Descent: Policy Convergence, Implicit Regularization, and Improved Sample Complexity
Viaarxiv icon