Picture for Haojie Zhang

Haojie Zhang

MSD-Score: Multi-Scale Distributional Scoring for Reference-Free Image Caption Evaluation

Add code
May 07, 2026
Viaarxiv icon

DiffCap-Bench: A Comprehensive, Challenging, Robust Benchmark for Image Difference Captioning

Add code
May 06, 2026
Viaarxiv icon

MuSS: A Large-Scale Dataset and Cinematic Narrative Benchmark for Multi-Shot Subject-to-Video Generation

Add code
Apr 26, 2026
Viaarxiv icon

UniVTAC: A Unified Simulation Platform for Visuo-Tactile Manipulation Data Generation, Learning, and Benchmarking

Add code
Feb 10, 2026
Viaarxiv icon

Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs

Add code
Oct 02, 2025
Figure 1 for Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Figure 2 for Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Figure 3 for Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Figure 4 for Patch-as-Decodable-Token: Towards Unified Multi-Modal Vision Tasks in MLLMs
Viaarxiv icon

DropLoRA: Sparse Low-Rank Adaptation for Parameter-Efficient Fine-Tuning

Add code
Aug 24, 2025
Viaarxiv icon

MAD-UV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge

Add code
Jan 08, 2025
Figure 1 for MAD-UV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge
Figure 2 for MAD-UV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge
Figure 3 for MAD-UV: The 1st INTERSPEECH Mice Autism Detection via Ultrasound Vocalization Challenge
Viaarxiv icon

LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis

Add code
Nov 24, 2024
Figure 1 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 2 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 3 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Figure 4 for LetsTalk: Latent Diffusion Transformer for Talking Video Synthesis
Viaarxiv icon

GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval

Add code
Sep 17, 2024
Figure 1 for GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval
Figure 2 for GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval
Figure 3 for GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval
Figure 4 for GenCRF: Generative Clustering and Reformulation Framework for Enhanced Intent-Driven Information Retrieval
Viaarxiv icon

A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition

Add code
Apr 26, 2024
Figure 1 for A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition
Figure 2 for A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition
Figure 3 for A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition
Figure 4 for A Unified Label-Aware Contrastive Learning Framework for Few-Shot Named Entity Recognition
Viaarxiv icon