Picture for Jinfa Huang

Jinfa Huang

SpecEyes: Accelerating Agentic Multimodal LLMs via Speculative Perception and Planning

Add code
Mar 24, 2026
Viaarxiv icon

SocialOmni: Benchmarking Audio-Visual Social Interactivity in Omni Models

Add code
Mar 17, 2026
Viaarxiv icon

A Survey on Latent Reasoning

Add code
Jul 08, 2025
Figure 1 for A Survey on Latent Reasoning
Figure 2 for A Survey on Latent Reasoning
Figure 3 for A Survey on Latent Reasoning
Figure 4 for A Survey on Latent Reasoning
Viaarxiv icon

LeanPO: Lean Preference Optimization for Likelihood Alignment in Video-LLMs

Add code
Jun 05, 2025
Viaarxiv icon

OpenS2V-Nexus: A Detailed Benchmark and Million-Scale Dataset for Subject-to-Video Generation

Add code
May 28, 2025
Viaarxiv icon

TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration

Add code
May 21, 2025
Figure 1 for TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
Figure 2 for TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
Figure 3 for TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
Figure 4 for TACO: Enhancing Multimodal In-context Learning via Task Mapping-Guided Sequence Configuration
Viaarxiv icon

QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension

Add code
Mar 11, 2025
Figure 1 for QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
Figure 2 for QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
Figure 3 for QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
Figure 4 for QuoTA: Query-oriented Token Assignment via CoT Query Decouple for Long Video Comprehension
Viaarxiv icon

Identity-Preserving Text-to-Video Generation by Frequency Decomposition

Add code
Nov 26, 2024
Figure 1 for Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Figure 2 for Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Figure 3 for Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Figure 4 for Identity-Preserving Text-to-Video Generation by Frequency Decomposition
Viaarxiv icon

Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension

Add code
Nov 20, 2024
Figure 1 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 2 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 3 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Figure 4 for Video-RAG: Visually-aligned Retrieval-Augmented Long Video Comprehension
Viaarxiv icon

Autoregressive Models in Vision: A Survey

Add code
Nov 08, 2024
Figure 1 for Autoregressive Models in Vision: A Survey
Figure 2 for Autoregressive Models in Vision: A Survey
Figure 3 for Autoregressive Models in Vision: A Survey
Figure 4 for Autoregressive Models in Vision: A Survey
Viaarxiv icon