Picture for Lin Ma

Lin Ma

RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case

Add code
Aug 06, 2025
Figure 1 for RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
Figure 2 for RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
Figure 3 for RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
Figure 4 for RoboTron-Sim: Improving Real-World Driving via Simulated Hard-Case
Viaarxiv icon

DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios

Add code
Aug 01, 2025
Figure 1 for DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
Figure 2 for DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
Figure 3 for DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
Figure 4 for DocTron-Formula: Generalized Formula Recognition in Complex and Structured Scenarios
Viaarxiv icon

Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation

Add code
Jun 23, 2025
Figure 1 for Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
Figure 2 for Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
Figure 3 for Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
Figure 4 for Orthogonal Projection Subspace to Aggregate Online Prior-knowledge for Continual Test-time Adaptation
Viaarxiv icon

ImmerseGen: Agent-Guided Immersive World Generation with Alpha-Textured Proxies

Add code
Jun 18, 2025
Viaarxiv icon

Metis-RISE: RL Incentivizes and SFT Enhances Multimodal Reasoning Model Learning

Add code
Jun 16, 2025
Viaarxiv icon

M4V: Multi-Modal Mamba for Text-to-Video Generation

Add code
Jun 12, 2025
Viaarxiv icon

DisTime: Distribution-based Time Representation for Video Large Language Models

Add code
May 30, 2025
Figure 1 for DisTime: Distribution-based Time Representation for Video Large Language Models
Figure 2 for DisTime: Distribution-based Time Representation for Video Large Language Models
Figure 3 for DisTime: Distribution-based Time Representation for Video Large Language Models
Figure 4 for DisTime: Distribution-based Time Representation for Video Large Language Models
Viaarxiv icon

TopoDiT-3D: Topology-Aware Diffusion Transformer with Bottleneck Structure for 3D Point Cloud Generation

Add code
May 14, 2025
Viaarxiv icon

Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput

Add code
May 14, 2025
Viaarxiv icon

ScaleTrack: Scaling and back-tracking Automated GUI Agents

Add code
May 01, 2025
Viaarxiv icon