Picture for Andi Han

Andi Han

Scalable Parameter and Memory Efficient Pretraining for LLM: Recent Algorithmic Advances and Benchmarking

Add code
May 28, 2025
Viaarxiv icon

On the Role of Label Noise in the Feature Learning Process

Add code
May 25, 2025
Viaarxiv icon

Efficient Optimization with Orthogonality Constraint: a Randomized Riemannian Submanifold Method

Add code
May 18, 2025
Viaarxiv icon

Can Diffusion Models Learn Hidden Inter-Feature Rules Behind Images?

Add code
Feb 07, 2025
Viaarxiv icon

On the Feature Learning in Diffusion Models

Add code
Dec 02, 2024
Viaarxiv icon

On the Comparison between Multi-modal and Single-modal Contrastive Learning

Add code
Nov 05, 2024
Viaarxiv icon

Provably Transformers Harness Multi-Concept Word Semantics for Efficient In-Context Learning

Add code
Nov 04, 2024
Viaarxiv icon

When Graph Neural Networks Meet Dynamic Mode Decomposition

Add code
Oct 08, 2024
Figure 1 for When Graph Neural Networks Meet Dynamic Mode Decomposition
Figure 2 for When Graph Neural Networks Meet Dynamic Mode Decomposition
Figure 3 for When Graph Neural Networks Meet Dynamic Mode Decomposition
Figure 4 for When Graph Neural Networks Meet Dynamic Mode Decomposition
Viaarxiv icon

Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning

Add code
Oct 08, 2024
Figure 1 for Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning
Figure 2 for Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning
Figure 3 for Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning
Figure 4 for Diffusing to the Top: Boost Graph Neural Networks with Minimal Hyperparameter Tuning
Viaarxiv icon

On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent

Add code
Oct 07, 2024
Figure 1 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 2 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 3 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Figure 4 for On the Optimization and Generalization of Two-layer Transformers with Sign Gradient Descent
Viaarxiv icon