Picture for Jungang Li

Jungang Li

AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents

Add code
Mar 19, 2026
Viaarxiv icon

Temporal Gains, Spatial Costs: Revisiting Video Fine-Tuning in Multimodal Large Language Models

Add code
Mar 18, 2026
Viaarxiv icon

Unlocking Multimodal Document Intelligence: From Current Triumphs to Future Frontiers of Visual Document Retrieval

Add code
Feb 23, 2026
Viaarxiv icon

BPDQ: Bit-Plane Decomposition Quantization on a Variable Grid for Large Language Models

Add code
Feb 04, 2026
Viaarxiv icon

OmniSIFT: Modality-Asymmetric Token Compression for Efficient Omni-modal Large Language Models

Add code
Feb 04, 2026
Viaarxiv icon

A Visual Semantic Adaptive Watermark grounded by Prefix-Tuning for Large Vision-Language Model

Add code
Jan 12, 2026
Viaarxiv icon

JavisGPT: A Unified Multi-modal LLM for Sounding-Video Comprehension and Generation

Add code
Dec 28, 2025
Viaarxiv icon

Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents

Add code
Aug 27, 2025
Figure 1 for Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Figure 2 for Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Figure 3 for Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Figure 4 for Mind the Third Eye! Benchmarking Privacy Awareness in MLLM-powered Smartphone Agents
Viaarxiv icon

VSI: Visual Subtitle Integration for Keyframe Selection to enhance Long Video Understanding

Add code
Aug 09, 2025
Viaarxiv icon

Unveiling Instruction-Specific Neurons & Experts: An Analytical Framework for LLM's Instruction-Following Capabilities

Add code
May 27, 2025
Viaarxiv icon