Picture for Tengda Han

Tengda Han

CountGD: Multi-Modal Open-World Counting

Add code
Jul 05, 2024
Viaarxiv icon

AutoAD III: The Prequel -- Back to the Pixels

Add code
Apr 22, 2024
Viaarxiv icon

Stale Diffusion: Hyper-realistic 5D Movie Generation Using Old-school Methods

Add code
Apr 01, 2024
Viaarxiv icon

A Strong Baseline for Temporal Video-Text Alignment

Add code
Dec 21, 2023
Viaarxiv icon

AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description

Add code
Oct 10, 2023
Figure 1 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 2 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 3 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Figure 4 for AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Viaarxiv icon

Semantic Counting from Self-Collages

Add code
Jul 17, 2023
Figure 1 for Semantic Counting from Self-Collages
Figure 2 for Semantic Counting from Self-Collages
Figure 3 for Semantic Counting from Self-Collages
Figure 4 for Semantic Counting from Self-Collages
Viaarxiv icon

Open-world Text-specified Object Counting

Add code
Jun 02, 2023
Figure 1 for Open-world Text-specified Object Counting
Figure 2 for Open-world Text-specified Object Counting
Figure 3 for Open-world Text-specified Object Counting
Figure 4 for Open-world Text-specified Object Counting
Viaarxiv icon

AutoAD: Movie Description in Context

Add code
Mar 29, 2023
Figure 1 for AutoAD: Movie Description in Context
Figure 2 for AutoAD: Movie Description in Context
Figure 3 for AutoAD: Movie Description in Context
Figure 4 for AutoAD: Movie Description in Context
Viaarxiv icon

WhisperX: Time-Accurate Speech Transcription of Long-Form Audio

Add code
Mar 01, 2023
Figure 1 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 2 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 3 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Figure 4 for WhisperX: Time-Accurate Speech Transcription of Long-Form Audio
Viaarxiv icon

Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers

Add code
Oct 12, 2022
Figure 1 for Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers
Figure 2 for Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers
Figure 3 for Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers
Figure 4 for Prompt Generation Networks for Efficient Adaptation of Frozen Vision Transformers
Viaarxiv icon