Picture for Erkut Erdem

Erkut Erdem

Shammie

Evaluating Linguistic Capabilities of Multimodal LLMs in the Lens of Few-Shot Learning

Add code
Jul 17, 2024
Viaarxiv icon

CLIPAway: Harmonizing Focused Embeddings for Removing Objects via Diffusion Models

Add code
Jun 13, 2024
Viaarxiv icon

SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models

Add code
May 01, 2024
Figure 1 for SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models
Figure 2 for SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models
Figure 3 for SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models
Figure 4 for SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models
Viaarxiv icon

Hippocrates: An Open-Source Framework for Advancing Large Language Models in Healthcare

Add code
Apr 25, 2024
Viaarxiv icon

Sequential Compositional Generalization in Multimodal Models

Add code
Apr 18, 2024
Viaarxiv icon

ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

Add code
Nov 13, 2023
Viaarxiv icon

Harnessing Dataset Cartography for Improved Compositional Generalization in Transformers

Add code
Oct 18, 2023
Viaarxiv icon

Hyperspectral Image Denoising via Self-Modulating Convolutional Neural Networks

Add code
Sep 15, 2023
Figure 1 for Hyperspectral Image Denoising via Self-Modulating Convolutional Neural Networks
Figure 2 for Hyperspectral Image Denoising via Self-Modulating Convolutional Neural Networks
Figure 3 for Hyperspectral Image Denoising via Self-Modulating Convolutional Neural Networks
Figure 4 for Hyperspectral Image Denoising via Self-Modulating Convolutional Neural Networks
Viaarxiv icon

Spherical Vision Transformer for 360-degree Video Saliency Prediction

Add code
Aug 24, 2023
Figure 1 for Spherical Vision Transformer for 360-degree Video Saliency Prediction
Figure 2 for Spherical Vision Transformer for 360-degree Video Saliency Prediction
Figure 3 for Spherical Vision Transformer for 360-degree Video Saliency Prediction
Figure 4 for Spherical Vision Transformer for 360-degree Video Saliency Prediction
Viaarxiv icon

CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing

Add code
Jul 18, 2023
Figure 1 for CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing
Figure 2 for CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing
Figure 3 for CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing
Figure 4 for CLIP-Guided StyleGAN Inversion for Text-Driven Real Image Editing
Viaarxiv icon