Picture for Shijian Lu

Shijian Lu

Nanyang Technological University

A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models

Add code
Nov 19, 2025
Viaarxiv icon

Spatial Preference Rewarding for MLLMs Spatial Understanding

Add code
Oct 16, 2025
Viaarxiv icon

UniMRSeg: Unified Modality-Relax Segmentation via Hierarchical Self-Supervised Compensation

Add code
Sep 19, 2025
Viaarxiv icon

H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers

Add code
Sep 08, 2025
Figure 1 for H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Figure 2 for H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Figure 3 for H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Figure 4 for H$_{2}$OT: Hierarchical Hourglass Tokenizer for Efficient Video Pose Transformers
Viaarxiv icon

PacGDC: Label-Efficient Generalizable Depth Completion with Projection Ambiguity and Consistency

Add code
Jul 10, 2025
Viaarxiv icon

UniDet-D: A Unified Dynamic Spectral Attention Model for Object Detection under Adverse Weathers

Add code
Jun 14, 2025
Viaarxiv icon

ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models

Add code
May 24, 2025
Figure 1 for ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models
Figure 2 for ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models
Figure 3 for ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models
Figure 4 for ToDRE: Visual Token Pruning via Diversity and Task Awareness for Efficient Large Vision-Language Models
Viaarxiv icon

MTL-UE: Learning to Learn Nothing for Multi-Task Learning

Add code
May 08, 2025
Viaarxiv icon

Towards Model Resistant to Transferable Adversarial Examples via Trigger Activation

Add code
Apr 20, 2025
Viaarxiv icon

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

Add code
Mar 17, 2025
Viaarxiv icon