Picture for Liqiang Nie

Liqiang Nie

LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant

Add code
Mar 05, 2025
Figure 1 for LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Figure 2 for LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Figure 3 for LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Figure 4 for LION-FS: Fast & Slow Video-Language Thinker as Online Video Assistant
Viaarxiv icon

HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models

Add code
Feb 28, 2025
Figure 1 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 2 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 3 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Figure 4 for HAIC: Improving Human Action Understanding and Generation with Better Captions for Multi-modal Large Language Models
Viaarxiv icon

3D-AffordanceLLM: Harnessing Large Language Models for Open-Vocabulary Affordance Detection in 3D Worlds

Add code
Feb 27, 2025
Viaarxiv icon

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Add code
Feb 27, 2025
Figure 1 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 2 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 3 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 4 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Viaarxiv icon

A Comprehensive Survey on Composed Image Retrieval

Add code
Feb 19, 2025
Figure 1 for A Comprehensive Survey on Composed Image Retrieval
Figure 2 for A Comprehensive Survey on Composed Image Retrieval
Figure 3 for A Comprehensive Survey on Composed Image Retrieval
Figure 4 for A Comprehensive Survey on Composed Image Retrieval
Viaarxiv icon

Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis

Add code
Feb 18, 2025
Figure 1 for Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Figure 2 for Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Figure 3 for Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Figure 4 for Benchmarking Post-Training Quantization in LLMs: Comprehensive Taxonomy, Unified Evaluation, and Comparative Analysis
Viaarxiv icon

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Add code
Jan 27, 2025
Figure 1 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 2 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 3 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 4 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Viaarxiv icon

Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation

Add code
Jan 13, 2025
Figure 1 for Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
Figure 2 for Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
Figure 3 for Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
Figure 4 for Dynamic Multimodal Fusion via Meta-Learning Towards Micro-Video Recommendation
Viaarxiv icon

ReTaKe: Reducing Temporal and Knowledge Redundancy for Long Video Understanding

Add code
Dec 29, 2024
Viaarxiv icon

Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM

Add code
Dec 20, 2024
Figure 1 for Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM
Figure 2 for Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM
Figure 3 for Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM
Viaarxiv icon