Picture for Shuangfei Zhai

Shuangfei Zhai

Stabilizing Transformer Training by Preventing Attention Entropy Collapse

Add code
Mar 11, 2023
Viaarxiv icon

TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation

Add code
Mar 07, 2023
Figure 1 for TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation
Figure 2 for TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation
Figure 3 for TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation
Figure 4 for TRACT: Denoising Diffusion Models with Transitive Closure Time-Distillation
Viaarxiv icon

f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation

Add code
Oct 10, 2022
Figure 1 for f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
Figure 2 for f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
Figure 3 for f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
Figure 4 for f-DM: A Multi-stage Diffusion Model via Progressive Signal Transformation
Viaarxiv icon

GAUDI: A Neural Architect for Immersive 3D Scene Generation

Add code
Jul 27, 2022
Figure 1 for GAUDI: A Neural Architect for Immersive 3D Scene Generation
Figure 2 for GAUDI: A Neural Architect for Immersive 3D Scene Generation
Figure 3 for GAUDI: A Neural Architect for Immersive 3D Scene Generation
Figure 4 for GAUDI: A Neural Architect for Immersive 3D Scene Generation
Viaarxiv icon

Position Prediction as an Effective Pretraining Strategy

Add code
Jul 15, 2022
Figure 1 for Position Prediction as an Effective Pretraining Strategy
Figure 2 for Position Prediction as an Effective Pretraining Strategy
Figure 3 for Position Prediction as an Effective Pretraining Strategy
Figure 4 for Position Prediction as an Effective Pretraining Strategy
Viaarxiv icon

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

Add code
Jun 13, 2022
Figure 1 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 2 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 3 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Figure 4 for The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon
Viaarxiv icon

Learning Representation from Neural Fisher Kernel with Low-rank Approximation

Add code
Feb 04, 2022
Figure 1 for Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Figure 2 for Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Figure 3 for Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Figure 4 for Learning Representation from Neural Fisher Kernel with Low-rank Approximation
Viaarxiv icon

Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models

Add code
Dec 02, 2021
Figure 1 for Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models
Figure 2 for Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models
Figure 3 for Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models
Figure 4 for Robust Robotic Control from Pixels using Contrastive Recurrent State-Space Models
Viaarxiv icon

Regularized Training of Nearest Neighbor Language Models

Add code
Sep 16, 2021
Figure 1 for Regularized Training of Nearest Neighbor Language Models
Figure 2 for Regularized Training of Nearest Neighbor Language Models
Figure 3 for Regularized Training of Nearest Neighbor Language Models
Figure 4 for Regularized Training of Nearest Neighbor Language Models
Viaarxiv icon

Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks

Add code
Jul 02, 2021
Figure 1 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 2 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 3 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Figure 4 for Implicit Acceleration and Feature Learning in Infinitely Wide Neural Networks with Bottlenecks
Viaarxiv icon