Picture for Renshan Zhang

Renshan Zhang

CogVLA: Cognition-Aligned Vision-Language-Action Model via Instruction-Driven Routing & Sparsification

Add code
Aug 28, 2025
Viaarxiv icon

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Add code
Jan 27, 2025
Figure 1 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 2 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 3 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 4 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Viaarxiv icon

Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding

Add code
Jul 19, 2024
Figure 1 for Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Figure 2 for Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Figure 3 for Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Figure 4 for Token-level Correlation-guided Compression for Efficient Multimodal Document Understanding
Viaarxiv icon

A Novel Dual Quaternion Based Dynamic Motion Primitives for Acrobatic Flight

Add code
Jul 13, 2021
Figure 1 for A Novel Dual Quaternion Based Dynamic Motion Primitives for Acrobatic Flight
Figure 2 for A Novel Dual Quaternion Based Dynamic Motion Primitives for Acrobatic Flight
Figure 3 for A Novel Dual Quaternion Based Dynamic Motion Primitives for Acrobatic Flight
Figure 4 for A Novel Dual Quaternion Based Dynamic Motion Primitives for Acrobatic Flight
Viaarxiv icon