Picture for Huijia Zhu

Huijia Zhu

VideoVeritas: AI-Generated Video Detection via Perception Pretext Reinforcement Learning

Add code
Feb 09, 2026
Viaarxiv icon

Adaptive and Balanced Re-initialization for Long-timescale Continual Test-time Domain Adaptation

Add code
Feb 06, 2026
Viaarxiv icon

Up to 36x Speedup: Mask-based Parallel Inference Paradigm for Key Information Extraction in MLLMs

Add code
Jan 27, 2026
Viaarxiv icon

EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs

Add code
Dec 11, 2025
Figure 1 for EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs
Figure 2 for EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs
Figure 3 for EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs
Figure 4 for EchoingPixels: Cross-Modal Adaptive Token Reduction for Efficient Audio-Visual LLMs
Viaarxiv icon

Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning

Add code
Aug 28, 2025
Figure 1 for Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
Figure 2 for Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
Figure 3 for Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
Figure 4 for Veritas: Generalizable Deepfake Detection via Pattern-Aware Reasoning
Viaarxiv icon

Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs

Add code
Jun 08, 2025
Figure 1 for Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs
Figure 2 for Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs
Figure 3 for Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs
Figure 4 for Interpretable and Reliable Detection of AI-Generated Images via Grounded Reasoning in MLLMs
Viaarxiv icon

BR-ASR: Efficient and Scalable Bias Retrieval Framework for Contextual Biasing ASR in Speech LLM

Add code
May 25, 2025
Viaarxiv icon

Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting

Add code
Apr 27, 2025
Figure 1 for Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting
Figure 2 for Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting
Figure 3 for Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting
Figure 4 for Keep the General, Inject the Specific: Structured Dialogue Fine-Tuning for Knowledge Injection without Catastrophic Forgetting
Viaarxiv icon

COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization

Add code
Apr 25, 2025
Figure 1 for COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization
Figure 2 for COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization
Figure 3 for COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization
Figure 4 for COCO-Inpaint: A Benchmark for Image Inpainting Detection and Manipulation Localization
Viaarxiv icon

Towards Explainable Fake Image Detection with Multi-Modal Large Language Models

Add code
Apr 19, 2025
Figure 1 for Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Figure 2 for Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Figure 3 for Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Figure 4 for Towards Explainable Fake Image Detection with Multi-Modal Large Language Models
Viaarxiv icon