Picture for Bei Liu

Bei Liu

Revisiting Latent Space of GAN Inversion for Real Image Editing

Add code
Jul 18, 2023
Viaarxiv icon

SINC: Self-Supervised In-Context Learning for Vision-Language Tasks

Add code
Jul 15, 2023
Figure 1 for SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Figure 2 for SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Figure 3 for SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Figure 4 for SINC: Self-Supervised In-Context Learning for Vision-Language Tasks
Viaarxiv icon

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Add code
Jun 25, 2023
Viaarxiv icon

Balancing Reconstruction and Editing Quality of GAN Inversion for Real Image Editing with StyleGAN Prior Latent Space

Add code
May 31, 2023
Viaarxiv icon

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

Add code
May 30, 2023
Viaarxiv icon

Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR

Add code
May 18, 2023
Figure 1 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Figure 2 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Figure 3 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Figure 4 for Whisper-KDQ: A Lightweight Whisper via Guided Knowledge Distillation and Quantization for Efficient ASR
Viaarxiv icon

MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation

Add code
Dec 19, 2022
Viaarxiv icon

Build a SRE Challenge System: Lessons from VoxSRC 2022 and CNSRC 2022

Add code
Nov 02, 2022
Viaarxiv icon

Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning

Add code
Oct 12, 2022
Figure 1 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 2 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 3 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Figure 4 for Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Viaarxiv icon

CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment

Add code
Sep 23, 2022
Figure 1 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 2 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 3 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Figure 4 for CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Viaarxiv icon