Picture for Fan Yu

Fan Yu

NVIDIA Corporation

HierarchicalKV: A GPU Hash Table with Cache Semantics for Continuous Online Embedding Storage

Add code
Mar 17, 2026
Viaarxiv icon

Joint Optimization of Storage and Loading for High-Performance 3D Point Cloud Data Processing

Add code
Mar 16, 2026
Viaarxiv icon

NCCL EP: Towards a Unified Expert Parallel Communication API for NCCL

Add code
Mar 13, 2026
Viaarxiv icon

Thinking Traps in Long Chain-of-Thought: A Measurable Study and Trap-Aware Adaptive Restart

Add code
Jan 17, 2026
Viaarxiv icon

JoyVoice: Long-Context Conditioning for Anthropomorphic Multi-Speaker Conversational Synthesis

Add code
Dec 22, 2025
Viaarxiv icon

SlotPi: Physics-informed Object-centric Reasoning Models

Add code
Jun 12, 2025
Viaarxiv icon

CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training

Add code
May 23, 2025
Viaarxiv icon

EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting

Add code
Apr 22, 2025
Viaarxiv icon

Accurate Expert Predictions in MoE Inference via Cross-Layer Gate

Add code
Feb 17, 2025
Figure 1 for Accurate Expert Predictions in MoE Inference via Cross-Layer Gate
Figure 2 for Accurate Expert Predictions in MoE Inference via Cross-Layer Gate
Figure 3 for Accurate Expert Predictions in MoE Inference via Cross-Layer Gate
Figure 4 for Accurate Expert Predictions in MoE Inference via Cross-Layer Gate
Viaarxiv icon

Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline

Add code
Feb 09, 2025
Figure 1 for Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Figure 2 for Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Figure 3 for Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Figure 4 for Klotski: Efficient Mixture-of-Expert Inference via Expert-Aware Multi-Batch Pipeline
Viaarxiv icon