Picture for Si Liu

Si Liu

Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs

Add code
Mar 26, 2025
Figure 1 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 2 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 3 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Figure 4 for Instruction-Oriented Preference Alignment for Enhancing Multi-Modal Comprehension Capability of MLLMs
Viaarxiv icon

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

Add code
Mar 14, 2025
Figure 1 for Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Figure 2 for Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Figure 3 for Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Figure 4 for Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning
Viaarxiv icon

FlexDrive: Toward Trajectory Flexibility in Driving Scene Reconstruction and Rendering

Add code
Feb 28, 2025
Viaarxiv icon

MIH-TCCT: Mitigating Inconsistent Hallucinations in LLMs via Event-Driven Text-Code Cyclic Training

Add code
Feb 13, 2025
Viaarxiv icon

LLaVA-ST: A Multimodal Large Language Model for Fine-Grained Spatial-Temporal Understanding

Add code
Jan 14, 2025
Viaarxiv icon

GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance

Add code
Dec 23, 2024
Figure 1 for GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
Figure 2 for GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
Figure 3 for GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
Figure 4 for GaussianPainter: Painting Point Cloud into 3D Gaussians with Normal Guidance
Viaarxiv icon

Anchor3DLane++: 3D Lane Detection via Sample-Adaptive Sparse 3D Anchor Regression

Add code
Dec 22, 2024
Viaarxiv icon

Multimodal Music Generation with Explicit Bridges and Retrieval Augmentation

Add code
Dec 12, 2024
Viaarxiv icon

TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation

Add code
Nov 25, 2024
Figure 1 for TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Figure 2 for TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Figure 3 for TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Figure 4 for TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Viaarxiv icon

VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection

Add code
Nov 22, 2024
Figure 1 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 2 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 3 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Figure 4 for VideoEspresso: A Large-Scale Chain-of-Thought Dataset for Fine-Grained Video Reasoning via Core Frame Selection
Viaarxiv icon