Picture for Nhat Ho

Nhat Ho

HoRA: Cross-Head Low-Rank Adaptation with Joint Hypernetworks

Add code
Oct 05, 2025
Viaarxiv icon

DoRAN: Stabilizing Weight-Decomposed Low-Rank Adaptation via Noise Injection and Auxiliary Networks

Add code
Oct 05, 2025
Viaarxiv icon

On Minimax Estimation of Parameters in Softmax-Contaminated Mixture of Experts

Add code
May 24, 2025
Viaarxiv icon

Model Selection for Gaussian-gated Gaussian Mixture of Experts Using Dendrograms of Mixing Measures

Add code
May 19, 2025
Viaarxiv icon

CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition

Add code
May 19, 2025
Figure 1 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Figure 2 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Figure 3 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Figure 4 for CompeteSMoE -- Statistically Guaranteed Mixture of Experts Training via Competition
Viaarxiv icon

On DeepSeekMoE: Statistical Benefits of Shared Experts and Normalized Sigmoid Gating

Add code
May 16, 2025
Viaarxiv icon

Convergence Rates for Softmax Gating Mixture of Experts

Add code
Mar 05, 2025
Figure 1 for Convergence Rates for Softmax Gating Mixture of Experts
Figure 2 for Convergence Rates for Softmax Gating Mixture of Experts
Figure 3 for Convergence Rates for Softmax Gating Mixture of Experts
Figure 4 for Convergence Rates for Softmax Gating Mixture of Experts
Viaarxiv icon

MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification

Add code
Feb 11, 2025
Figure 1 for MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification
Figure 2 for MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification
Figure 3 for MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification
Figure 4 for MGPATH: Vision-Language Model with Multi-Granular Prompt Learning for Few-Shot WSI Classification
Viaarxiv icon

On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation

Add code
Feb 05, 2025
Figure 1 for On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation
Figure 2 for On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation
Figure 3 for On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation
Figure 4 for On Zero-Initialized Attention: Optimal Prompt and Gating Factor Estimation
Viaarxiv icon

RepLoRA: Reparameterizing Low-Rank Adaptation via the Perspective of Mixture of Experts

Add code
Feb 05, 2025
Viaarxiv icon