Picture for Bei Liu

Bei Liu

Revisiting Latent Space of GAN Inversion for Real Image Editing

Add code
Jul 18, 2023
Figure 1 for Revisiting Latent Space of GAN Inversion for Real Image Editing
Figure 2 for Revisiting Latent Space of GAN Inversion for Real Image Editing
Figure 3 for Revisiting Latent Space of GAN Inversion for Real Image Editing
Figure 4 for Revisiting Latent Space of GAN Inversion for Real Image Editing
Viaarxiv icon

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

Add code
Jul 15, 2023
Figure 1 for SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Figure 2 for SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Figure 3 for SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Figure 4 for SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Viaarxiv icon

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Add code
Jun 25, 2023
Figure 1 for Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots
Figure 2 for Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots
Figure 3 for Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots
Figure 4 for Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots
Viaarxiv icon

Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space

Add code
May 31, 2023
Viaarxiv icon

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

Add code
May 30, 2023
Figure 1 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 2 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 3 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 4 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Viaarxiv icon

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Add code
May 18, 2023
Figure 1 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Figure 2 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Figure 3 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Figure 4 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Viaarxiv icon

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Add code
Dec 19, 2022
Viaarxiv icon

Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

Add code
Nov 02, 2022
Figure 1 for Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Figure 2 for Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Figure 3 for Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Figure 4 for Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022
Viaarxiv icon

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

Add code
Oct 12, 2022
Figure 1 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 2 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 3 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 4 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Viaarxiv icon

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Add code
Sep 23, 2022
Figure 1 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 2 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 3 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 4 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Viaarxiv icon