Picture for Qihang Yu

Qihang Yu

Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models

Add code
Jun 13, 2024
Figure 1 for Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Figure 2 for Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Figure 3 for Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Figure 4 for Alleviating Distortion in Image Generation via Multi-Resolution Diffusion Models
Viaarxiv icon

An Image is Worth 32 Tokens for Reconstruction and Generation

Add code
Jun 11, 2024
Viaarxiv icon

Enhancing Temporal Consistency in Video Editing by Reconstructing Videos with 3D Gaussian Splatting

Add code
Jun 04, 2024
Viaarxiv icon

GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting

Add code
Apr 28, 2024
Figure 1 for GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting
Figure 2 for GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting
Figure 3 for GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting
Figure 4 for GaussianTalker: Speaker-specific Talking Head Synthesis via 3D Gaussian Splatting
Viaarxiv icon

COCONut: Modernizing COCO Segmentation

Add code
Apr 12, 2024
Viaarxiv icon

ViTamin: Designing Scalable Vision Models in the Vision-Language Era

Add code
Apr 03, 2024
Viaarxiv icon

MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation

Add code
Nov 30, 2023
Figure 1 for MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation
Figure 2 for MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation
Figure 3 for MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation
Figure 4 for MaXTron: Mask Transformer with Trajectory Attention for Video Panoptic Segmentation
Viaarxiv icon

Towards Open-Ended Visual Recognition with Large Language Model

Add code
Nov 14, 2023
Figure 1 for Towards Open-Ended Visual Recognition with Large Language Model
Figure 2 for Towards Open-Ended Visual Recognition with Large Language Model
Figure 3 for Towards Open-Ended Visual Recognition with Large Language Model
Figure 4 for Towards Open-Ended Visual Recognition with Large Language Model
Viaarxiv icon

3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers

Add code
Oct 11, 2023
Figure 1 for 3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers
Figure 2 for 3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers
Figure 3 for 3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers
Figure 4 for 3D TransUNet: Advancing Medical Image Segmentation through Vision Transformers
Viaarxiv icon

Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP

Add code
Aug 04, 2023
Figure 1 for Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Figure 2 for Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Figure 3 for Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Figure 4 for Convolutions Die Hard: Open-Vocabulary Segmentation with Single Frozen Convolutional CLIP
Viaarxiv icon