Picture for Jing Shi

Jing Shi

Toward Robust Hyper-Detailed Image Captioning: A Multiagent Approach and Dual Evaluation Metrics for Factuality and Coverage

Add code
Dec 24, 2024
Viaarxiv icon

GUI Agents: A Survey

Add code
Dec 18, 2024
Viaarxiv icon

SUGAR: Subject-Driven Video Customization in a Zero-Shot Manner

Add code
Dec 13, 2024
Viaarxiv icon

FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity

Add code
Nov 23, 2024
Figure 1 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 2 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 3 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Figure 4 for FINECAPTION: Compositional Image Captioning Focusing on Wherever You Want at Any Granularity
Viaarxiv icon

GroundingBooth: Grounding Text-to-Image Customization

Add code
Sep 13, 2024
Viaarxiv icon

Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images

Add code
Aug 24, 2024
Figure 1 for Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
Figure 2 for Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
Figure 3 for Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
Figure 4 for Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
Viaarxiv icon

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Add code
Jun 11, 2024
Figure 1 for AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Figure 2 for AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Figure 3 for AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Figure 4 for AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Viaarxiv icon

FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction

Add code
Apr 23, 2024
Figure 1 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 2 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 3 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Figure 4 for FINEMATCH: Aspect-based Fine-grained Image and Text Mismatch Detection and Correction
Viaarxiv icon

VIXEN: Visual Text Comparison Network for Image Difference Captioning

Add code
Mar 14, 2024
Viaarxiv icon

Text-to-Audio Generation Synchronized with Videos

Add code
Mar 08, 2024
Viaarxiv icon