Image To Image Translation


Image-to-image translation is the process of converting an image from one domain to another using deep learning techniques.

Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement

Add code
Aug 27, 2025
Figure 1 for Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement
Figure 2 for Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement
Figure 3 for Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement
Figure 4 for Bangla-Bayanno: A 52K-Pair Bengali Visual Question Answering Dataset with LLM-Assisted Translation Refinement
Viaarxiv icon

GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model

Add code
Sep 17, 2025
Figure 1 for GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
Figure 2 for GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
Figure 3 for GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
Figure 4 for GeoAware-VLA: Implicit Geometry Aware Vision-Language-Action Model
Viaarxiv icon

CoSwin: Convolution Enhanced Hierarchical Shifted Window Attention For Small-Scale Vision

Add code
Sep 10, 2025
Viaarxiv icon

Integrating Anatomical Priors into a Causal Diffusion Model

Add code
Sep 10, 2025
Figure 1 for Integrating Anatomical Priors into a Causal Diffusion Model
Figure 2 for Integrating Anatomical Priors into a Causal Diffusion Model
Figure 3 for Integrating Anatomical Priors into a Causal Diffusion Model
Figure 4 for Integrating Anatomical Priors into a Causal Diffusion Model
Viaarxiv icon

Camera Pose Refinement via 3D Gaussian Splatting

Add code
Aug 25, 2025
Figure 1 for Camera Pose Refinement via 3D Gaussian Splatting
Figure 2 for Camera Pose Refinement via 3D Gaussian Splatting
Figure 3 for Camera Pose Refinement via 3D Gaussian Splatting
Figure 4 for Camera Pose Refinement via 3D Gaussian Splatting
Viaarxiv icon

Beyond flattening: a geometrically principled positional encoding for vision transformers with Weierstrass elliptic functions

Add code
Aug 26, 2025
Figure 1 for Beyond flattening: a geometrically principled positional encoding for vision transformers with Weierstrass elliptic functions
Figure 2 for Beyond flattening: a geometrically principled positional encoding for vision transformers with Weierstrass elliptic functions
Figure 3 for Beyond flattening: a geometrically principled positional encoding for vision transformers with Weierstrass elliptic functions
Figure 4 for Beyond flattening: a geometrically principled positional encoding for vision transformers with Weierstrass elliptic functions
Viaarxiv icon

HERMES: Human-to-Robot Embodied Learning from Multi-Source Motion Data for Mobile Dexterous Manipulation

Add code
Aug 28, 2025
Viaarxiv icon

SATURN: Autoregressive Image Generation Guided by Scene Graphs

Add code
Aug 20, 2025
Viaarxiv icon

Designing Practical Models for Isolated Word Visual Speech Recognition

Add code
Aug 25, 2025
Viaarxiv icon

Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR

Add code
Aug 29, 2025
Figure 1 for Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Figure 2 for Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Figure 3 for Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Figure 4 for Why Stop at Words? Unveiling the Bigger Picture through Line-Level OCR
Viaarxiv icon