Alert button
Picture for Tuo Zhao

Tuo Zhao

Alert button

Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint

Add code
Bookmark button
Alert button
Jun 09, 2022
Hao Liu, Minshuo Chen, Siawpeng Er, Wenjing Liao, Tong Zhang, Tuo Zhao

Figure 1 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Figure 2 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Figure 3 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Figure 4 for Benefits of Overparameterized Convolutional Residual Networks: Function Approximation under Smoothness Constraint
Viaarxiv icon

Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks

Add code
Bookmark button
Alert button
Jun 06, 2022
Xiang Ji, Minshuo Chen, Mengdi Wang, Tuo Zhao

Figure 1 for Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Figure 2 for Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Figure 3 for Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Figure 4 for Sample Complexity of Nonparametric Off-Policy Evaluation on Low-Dimensional Manifolds using Deep Networks
Viaarxiv icon

A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks

Add code
Bookmark button
Alert button
May 04, 2022
Jie Wang, Minshuo Chen, Tuo Zhao, Wenjing Liao, Yao Xie

Figure 1 for A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Figure 2 for A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Figure 3 for A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Figure 4 for A Manifold Two-Sample Test Study: Integral Probability Metric with Neural Networks
Viaarxiv icon

MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation

Add code
Bookmark button
Alert button
Apr 28, 2022
Simiao Zuo, Qingru Zhang, Chen Liang, Pengcheng He, Tuo Zhao, Weizhu Chen

Figure 1 for MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Figure 2 for MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Figure 3 for MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Figure 4 for MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Viaarxiv icon

CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing

Add code
Bookmark button
Alert button
Apr 18, 2022
Chen Liang, Pengcheng He, Yelong Shen, Weizhu Chen, Tuo Zhao

Figure 1 for CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Figure 2 for CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Figure 3 for CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Figure 4 for CAMERO: Consistency Regularized Ensemble of Perturbed Language Models with Weight Sharing
Viaarxiv icon

CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data

Add code
Bookmark button
Alert button
Apr 08, 2022
Rui Feng, Chen Luo, Qingyu Yin, Bing Yin, Tuo Zhao, Chao Zhang

Figure 1 for CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Figure 2 for CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Figure 3 for CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Figure 4 for CERES: Pretraining of Graph-Conditioned Transformer for Semi-Structured Session Data
Viaarxiv icon

No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models

Add code
Bookmark button
Alert button
Feb 14, 2022
Chen Liang, Haoming Jiang, Simiao Zuo, Pengcheng He, Xiaodong Liu, Jianfeng Gao, Weizhu Chen, Tuo Zhao

Figure 1 for No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Figure 2 for No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Figure 3 for No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Figure 4 for No Parameters Left Behind: Sensitivity Guided Adaptive Learning Rate for Training Large Transformer Models
Viaarxiv icon

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Add code
Bookmark button
Alert button
Feb 07, 2022
Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

Figure 1 for Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably
Viaarxiv icon