Picture for Sicong Leng

Sicong Leng

RynnVLA-001: Using Human Demonstrations to Improve Robot Manipulation

Add code
Sep 18, 2025
Viaarxiv icon

VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning

Add code
Jul 30, 2025
Figure 1 for VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
Figure 2 for VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
Figure 3 for VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
Figure 4 for VL-Cogito: Progressive Curriculum Reinforcement Learning for Advanced Multimodal Reasoning
Viaarxiv icon

Two Is Better Than One: Rotations Scale LoRAs

Add code
May 29, 2025
Viaarxiv icon

Advancing Expert Specialization for Better MoE

Add code
May 28, 2025
Figure 1 for Advancing Expert Specialization for Better MoE
Figure 2 for Advancing Expert Specialization for Better MoE
Figure 3 for Advancing Expert Specialization for Better MoE
Figure 4 for Advancing Expert Specialization for Better MoE
Viaarxiv icon

Refining Positive and Toxic Samples for Dual Safety Self-Alignment of LLMs with Minimal Human Interventions

Add code
Feb 08, 2025
Viaarxiv icon

BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays

Add code
Oct 29, 2024
Figure 1 for BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
Figure 2 for BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
Figure 3 for BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
Figure 4 for BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays
Viaarxiv icon

Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss

Add code
Oct 22, 2024
Figure 1 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 2 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 3 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Figure 4 for Breaking the Memory Barrier: Near Infinite Batch Size Scaling for Contrastive Loss
Viaarxiv icon

The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio

Add code
Oct 16, 2024
Figure 1 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 2 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 3 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Figure 4 for The Curse of Multi-Modalities: Evaluating Hallucinations of Large Multimodal Models across Language, Visual, and Audio
Viaarxiv icon

AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention

Add code
Jun 18, 2024
Figure 1 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 2 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 3 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Figure 4 for AGLA: Mitigating Object Hallucinations in Large Vision-Language Models with Assembly of Global and Local Attention
Viaarxiv icon

VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs

Add code
Jun 11, 2024
Figure 1 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 2 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 3 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Figure 4 for VideoLLaMA 2: Advancing Spatial-Temporal Modeling and Audio Understanding in Video-LLMs
Viaarxiv icon