Picture for Qi Dai

Qi Dai

Microsoft Research Asia

REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents

Add code
Nov 20, 2024
Figure 1 for REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Figure 2 for REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Figure 3 for REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Figure 4 for REDUCIO! Generating 1024$\times$1024 Video within 16 Seconds using Extremely Compressed Motion Latents
Viaarxiv icon

LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation

Add code
Nov 07, 2024
Figure 1 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 2 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 3 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Figure 4 for LLM2CLIP: Powerful Language Model Unlock Richer Visual Representation
Viaarxiv icon

Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions

Add code
Jun 27, 2024
Figure 1 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 2 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 3 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Figure 4 for Human-Aware Vision-and-Language Navigation: Bridging Simulation to Reality with Dynamic Human Interactions
Viaarxiv icon

Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms

Add code
Jun 13, 2024
Figure 1 for Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Figure 2 for Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Figure 3 for Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Figure 4 for Aligning Vision Models with Human Aesthetics in Retrieval: Benchmarks and Algorithms
Viaarxiv icon

AID: Adapting Image2Video Diffusion Models for Instruction-guided Video Prediction

Add code
Jun 10, 2024
Viaarxiv icon

MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion

Add code
May 30, 2024
Figure 1 for MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
Figure 2 for MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
Figure 3 for MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
Figure 4 for MotionFollower: Editing Video Motion via Lightweight Score-Guided Diffusion
Viaarxiv icon

An edge detection-based deep learning approach for tear meniscus height measurement

Add code
Mar 23, 2024
Figure 1 for An edge detection-based deep learning approach for tear meniscus height measurement
Figure 2 for An edge detection-based deep learning approach for tear meniscus height measurement
Figure 3 for An edge detection-based deep learning approach for tear meniscus height measurement
Figure 4 for An edge detection-based deep learning approach for tear meniscus height measurement
Viaarxiv icon

MotionEditor: Editing Video Motion via Content-Aware Diffusion

Add code
Nov 30, 2023
Figure 1 for MotionEditor: Editing Video Motion via Content-Aware Diffusion
Figure 2 for MotionEditor: Editing Video Motion via Content-Aware Diffusion
Figure 3 for MotionEditor: Editing Video Motion via Content-Aware Diffusion
Figure 4 for MotionEditor: Editing Video Motion via Content-Aware Diffusion
Viaarxiv icon

VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models

Add code
Nov 30, 2023
Figure 1 for VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Figure 2 for VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Figure 3 for VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Figure 4 for VIDiff: Translating Videos via Multi-Modal Instructions with Diffusion Models
Viaarxiv icon

ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models

Add code
Nov 30, 2023
Figure 1 for ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models
Figure 2 for ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models
Figure 3 for ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models
Figure 4 for ART$\boldsymbol{\cdot}$V: Auto-Regressive Text-to-Video Generation with Diffusion Models
Viaarxiv icon