Picture for Junyan Lin

Junyan Lin

Think-as-You-See: Streaming Chain-of-Thought Reasoning for Large Vision-Language Models

Add code
Mar 03, 2026
Viaarxiv icon

UTPTrack: Towards Simple and Unified Token Pruning for Visual Tracking

Add code
Feb 27, 2026
Viaarxiv icon

Compression Tells Intelligence: Visual Coding, Visual Token Technology, and the Unification

Add code
Jan 28, 2026
Viaarxiv icon

Speak While Watching: Unleashing TRUE Real-Time Video Understanding Capability of Multimodal Large Language Models

Add code
Jan 11, 2026
Viaarxiv icon

Rethinking Visual Layer Selection in Multimodal LLMs

Add code
Apr 30, 2025
Viaarxiv icon

Dynamic Cross-Modal Feature Interaction Network for Hyperspectral and LiDAR Data Classification

Add code
Mar 10, 2025
Viaarxiv icon

Multi-Layer Visual Feature Fusion in Multimodal LLMs: Methods, Analysis, and Best Practices

Add code
Mar 08, 2025
Viaarxiv icon

Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task

Add code
Dec 24, 2024
Figure 1 for Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task
Figure 2 for Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task
Figure 3 for Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task
Figure 4 for Semantics Disentanglement and Composition for Versatile Codec toward both Human-eye Perception and Machine Vision Task
Viaarxiv icon

To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models

Add code
Oct 09, 2024
Figure 1 for To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Figure 2 for To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Figure 3 for To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Figure 4 for To Preserve or To Compress: An In-Depth Study of Connector Selection in Multimodal Large Language Models
Viaarxiv icon

Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs

Add code
Aug 16, 2024
Figure 1 for Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs
Figure 2 for Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs
Figure 3 for Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs
Figure 4 for Tell Codec What Worth Compressing: Semantically Disentangled Image Coding for Machine with LMMs
Viaarxiv icon