Picture for Lewei Lu

Lewei Lu

SenseNova-MARS: Empowering Multimodal Agentic Reasoning and Search via Reinforcement Learning

Add code
Dec 30, 2025
Viaarxiv icon

Towards Fine-Grained Recognition with Large Visual Language Models: Benchmark and Optimization Strategies

Add code
Dec 11, 2025
Figure 1 for Towards Fine-Grained Recognition with Large Visual Language Models: Benchmark and Optimization Strategies
Figure 2 for Towards Fine-Grained Recognition with Large Visual Language Models: Benchmark and Optimization Strategies
Figure 3 for Towards Fine-Grained Recognition with Large Visual Language Models: Benchmark and Optimization Strategies
Figure 4 for Towards Fine-Grained Recognition with Large Visual Language Models: Benchmark and Optimization Strategies
Viaarxiv icon

Scaling Spatial Intelligence with Multimodal Foundation Models

Add code
Nov 17, 2025
Figure 1 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 2 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 3 for Scaling Spatial Intelligence with Multimodal Foundation Models
Figure 4 for Scaling Spatial Intelligence with Multimodal Foundation Models
Viaarxiv icon

Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling

Add code
Nov 13, 2025
Figure 1 for Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
Figure 2 for Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
Figure 3 for Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
Figure 4 for Enhancing the Outcome Reward-based RL Training of MLLMs with Self-Consistency Sampling
Viaarxiv icon

From Pixels to Words -- Towards Native Vision-Language Primitives at Scale

Add code
Oct 16, 2025
Viaarxiv icon

Spatial Preference Rewarding for MLLMs Spatial Understanding

Add code
Oct 16, 2025
Viaarxiv icon

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

Add code
Oct 09, 2025
Viaarxiv icon

ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding

Add code
Aug 29, 2025
Figure 1 for ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding
Figure 2 for ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding
Figure 3 for ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding
Figure 4 for ELV-Halluc: Benchmarking Semantic Aggregation Hallucinations in Long Video Understanding
Viaarxiv icon

Has GPT-5 Achieved Spatial Intelligence? An Empirical Study

Add code
Aug 18, 2025
Viaarxiv icon

GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior

Add code
Jun 09, 2025
Figure 1 for GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Figure 2 for GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Figure 3 for GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Figure 4 for GUI-Reflection: Empowering Multimodal GUI Models with Self-Reflection Behavior
Viaarxiv icon