Picture for Yan Bai

Yan Bai

NVIDIA

MMVIAD: Multi-view Multi-task Video Understanding for Industrial Anomaly Detection

Add code
May 11, 2026
Viaarxiv icon

Beyond the Last Layer: Multi-Layer Representation Fusion for Visual Tokenizatio

Add code
May 11, 2026
Viaarxiv icon

Unveiling Fine-Grained Visual Traces: Evaluating Multimodal Interleaved Reasoning Chains in Multimodal STEM Tasks

Add code
Apr 21, 2026
Viaarxiv icon

Learning to Edit Knowledge via Instruction-based Chain-of-Thought Prompting

Add code
Apr 07, 2026
Viaarxiv icon

LongCat-Next: Lexicalizing Modalities as Discrete Tokens

Add code
Mar 29, 2026
Viaarxiv icon

Diagnosing and Repairing Unsafe Channels in Vision-Language Models via Causal Discovery and Dual-Modal Safety Subspace Projection

Add code
Mar 28, 2026
Viaarxiv icon

VTC-Bench: Evaluating Agentic Multimodal Models via Compositional Visual Tool Chaining

Add code
Mar 16, 2026
Viaarxiv icon

Scalable Training of Mixture-of-Experts Models with Megatron Core

Add code
Mar 10, 2026
Viaarxiv icon

GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection

Add code
Mar 26, 2025
Figure 1 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Figure 2 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Figure 3 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Figure 4 for GLRD: Global-Local Collaborative Reason and Debate with PSL for 3D Open-Vocabulary Detection
Viaarxiv icon

Training Video Foundation Models with NVIDIA NeMo

Add code
Mar 17, 2025
Viaarxiv icon