Alert button

"Image": models, code, and papers
Alert button

Advancing Generative Model Evaluation: A Novel Algorithm for Realistic Image Synthesis and Comparison in OCR System

Feb 28, 2024
Majid Memari, Khaled R. Ahmed, Shahram Rahimi, Noorbakhsh Amiri Golilarz

Viaarxiv icon

Adapting Learned Image Codecs to Screen Content via Adjustable Transformations

Feb 27, 2024
H. Burak Dogaroglu, A. Burakhan Koyuncu, Atanas Boev, Elena Alshina, Eckehard Steinbach

Viaarxiv icon

SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net

Mar 13, 2024
Helin Cao, Sven Behnke

Figure 1 for SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net
Figure 2 for SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net
Figure 3 for SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net
Figure 4 for SLCF-Net: Sequential LiDAR-Camera Fusion for Semantic Scene Completion using a 3D Recurrent U-Net
Viaarxiv icon

Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization

Mar 13, 2024
Renjie Pi, Tianyang Han, Wei Xiong, Jipeng Zhang, Runtao Liu, Rui Pan, Tong Zhang

Figure 1 for Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Figure 2 for Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Figure 3 for Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Figure 4 for Strengthening Multimodal Large Language Model with Bootstrapped Preference Optimization
Viaarxiv icon

PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise

Add code
Bookmark button
Alert button
Mar 13, 2024
Qinglong Meng, Chongkun Xia, Xueqian Wang

Figure 1 for PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise
Figure 2 for PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise
Figure 3 for PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise
Figure 4 for PaddingFlow: Improving Normalizing Flows with Padding-Dimensional Noise
Viaarxiv icon

Spatiotemporal Predictive Pre-training for Robotic Motor Control

Add code
Bookmark button
Alert button
Mar 14, 2024
Jiange Yang, Bei Liu, Jianlong Fu, Bocheng Pan, Gangshan Wu, Limin Wang

Figure 1 for Spatiotemporal Predictive Pre-training for Robotic Motor Control
Figure 2 for Spatiotemporal Predictive Pre-training for Robotic Motor Control
Figure 3 for Spatiotemporal Predictive Pre-training for Robotic Motor Control
Figure 4 for Spatiotemporal Predictive Pre-training for Robotic Motor Control
Viaarxiv icon

GiT: Towards Generalist Vision Transformer through Universal Language Interface

Add code
Bookmark button
Alert button
Mar 14, 2024
Haiyang Wang, Hao Tang, Li Jiang, Shaoshuai Shi, Muhammad Ferjad Naeem, Hongsheng Li, Bernt Schiele, Liwei Wang

Figure 1 for GiT: Towards Generalist Vision Transformer through Universal Language Interface
Figure 2 for GiT: Towards Generalist Vision Transformer through Universal Language Interface
Figure 3 for GiT: Towards Generalist Vision Transformer through Universal Language Interface
Figure 4 for GiT: Towards Generalist Vision Transformer through Universal Language Interface
Viaarxiv icon

PosSAM: Panoptic Open-vocabulary Segment Anything

Add code
Bookmark button
Alert button
Mar 14, 2024
Vibashan VS, Shubhankar Borse, Hyojin Park, Debasmit Das, Vishal Patel, Munawar Hayat, Fatih Porikli

Figure 1 for PosSAM: Panoptic Open-vocabulary Segment Anything
Figure 2 for PosSAM: Panoptic Open-vocabulary Segment Anything
Figure 3 for PosSAM: Panoptic Open-vocabulary Segment Anything
Figure 4 for PosSAM: Panoptic Open-vocabulary Segment Anything
Viaarxiv icon

FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing

Mar 10, 2024
Youyuan Zhang, Xuan Ju, James J. Clark

Figure 1 for FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing
Figure 2 for FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing
Figure 3 for FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing
Figure 4 for FastVideoEdit: Leveraging Consistency Models for Efficient Text-to-Video Editing
Viaarxiv icon

CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion

Mar 03, 2024
Jiao Ding, Jie Chang, Renrui Han, Li Yang

Figure 1 for CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion
Figure 2 for CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion
Figure 3 for CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion
Figure 4 for CDSE-UNet: Enhancing COVID-19 CT Image Segmentation with Canny Edge Detection and Dual-Path SENet Feature Fusion
Viaarxiv icon