Picture for Difei Gao

Difei Gao

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Add code
Jun 14, 2024
Figure 1 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 2 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 3 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Figure 4 for VideoGUI: A Benchmark for GUI Automation from Instructional Videos
Viaarxiv icon

LOVA3: Learning to Visual Question Answering, Asking and Assessment

Add code
May 23, 2024
Figure 1 for LOVA3: Learning to Visual Question Answering, Asking and Assessment
Figure 2 for LOVA3: Learning to Visual Question Answering, Asking and Assessment
Figure 3 for LOVA3: Learning to Visual Question Answering, Asking and Assessment
Figure 4 for LOVA3: Learning to Visual Question Answering, Asking and Assessment
Viaarxiv icon

Delocate: Detection and Localization for Deepfake Videos with Randomly-Located Tampered Traces

Add code
Jan 24, 2024
Viaarxiv icon

ASSISTGUI: Task-Oriented Desktop Graphical User Interface Automation

Add code
Jan 01, 2024
Viaarxiv icon

ViT-Lens-2: Gateway to Omni-modal Intelligence

Add code
Nov 27, 2023
Figure 1 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 2 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 3 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Figure 4 for ViT-Lens-2: Gateway to Omni-modal Intelligence
Viaarxiv icon

CVPR 2023 Text Guided Video Editing Competition

Add code
Oct 24, 2023
Figure 1 for CVPR 2023 Text Guided Video Editing Competition
Figure 2 for CVPR 2023 Text Guided Video Editing Competition
Figure 3 for CVPR 2023 Text Guided Video Editing Competition
Figure 4 for CVPR 2023 Text Guided Video Editing Competition
Viaarxiv icon

Show-1: Marrying Pixel and Latent Diffusion Models for Text-to-Video Generation

Add code
Sep 27, 2023
Viaarxiv icon

Recap: Detecting Deepfake Video with Unpredictable Tampered Traces via Recovering Faces and Mapping Recovered Faces

Add code
Aug 19, 2023
Viaarxiv icon

UniVTG: Towards Unified Video-Language Temporal Grounding

Add code
Aug 18, 2023
Figure 1 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 2 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 3 for UniVTG: Towards Unified Video-Language Temporal Grounding
Figure 4 for UniVTG: Towards Unified Video-Language Temporal Grounding
Viaarxiv icon

AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn

Add code
Jun 28, 2023
Viaarxiv icon