Picture for Hao Tang

Hao Tang

Learning Compact Vision Tokens for Efficient Large Multimodal Models

Add code
Jun 08, 2025
Viaarxiv icon

Token Transforming: A Unified and Training-Free Token Compression Framework for Vision Transformer Acceleration

Add code
Jun 06, 2025
Viaarxiv icon

Effective Context in Neural Speech Models

Add code
May 28, 2025
Viaarxiv icon

Enabling Flexible Multi-LLM Integration for Scalable Knowledge Aggregation

Add code
May 28, 2025
Viaarxiv icon

SpikeStereoNet: A Brain-Inspired Framework for Stereo Depth Estimation from Spike Streams

Add code
May 26, 2025
Viaarxiv icon

Token Reduction Should Go Beyond Efficiency in Generative Models -- From Vision, Language to Multimodality

Add code
May 23, 2025
Viaarxiv icon

SAMba-UNet: Synergizing SAM2 and Mamba in UNet with Heterogeneous Aggregation for Cardiac MRI Segmentation

Add code
May 22, 2025
Viaarxiv icon

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Add code
May 22, 2025
Viaarxiv icon

Programmatic Video Prediction Using Large Language Models

Add code
May 20, 2025
Viaarxiv icon

Replace in Translation: Boost Concept Alignment in Counterfactual Text-to-Image

Add code
May 20, 2025
Viaarxiv icon