Picture for Mike Zheng Shou

Mike Zheng Shou

GUI Action Narrator: Where and When Did That Action Take Place?

Add code
Jun 19, 2024
Viaarxiv icon

VideoLLM-online: Online Video Large Language Model for Streaming Video

Add code
Jun 17, 2024
Viaarxiv icon

VideoGUI: A Benchmark for GUI Automation from Instructional Videos

Add code
Jun 14, 2024
Viaarxiv icon

Steganalysis on Digital Watermarking: Is Your Defense Truly Impervious?

Add code
Jun 13, 2024
Viaarxiv icon

WMAdapter: Adding WaterMark Control to Latent Diffusion Models

Add code
Jun 12, 2024
Viaarxiv icon

ProcessPainter: Learn Painting Process from Sequence Data

Add code
Jun 10, 2024
Figure 1 for ProcessPainter: Learn Painting Process from Sequence Data
Figure 2 for ProcessPainter: Learn Painting Process from Sequence Data
Figure 3 for ProcessPainter: Learn Painting Process from Sequence Data
Figure 4 for ProcessPainter: Learn Painting Process from Sequence Data
Viaarxiv icon

Leveraging Visual Tokens for Extended Text Contexts in Multi-Modal Learning

Add code
Jun 04, 2024
Viaarxiv icon

Visual Perception by Large Language Model's Weights

Add code
May 30, 2024
Viaarxiv icon

Multi-Modal Generative Embedding Model

Add code
May 29, 2024
Viaarxiv icon

LOVA3: Learning to Visual Question Answering, Asking and Assessment

Add code
May 23, 2024
Viaarxiv icon