Picture for Simon Ging

Simon Ging

Computer Vision Group, University of Freiburg, Germany, Adaptive & Agentic AI

Learning to Read Where to Look: Disease-Aware Vision-Language Pretraining for 3D CT

Add code
Mar 02, 2026
Viaarxiv icon

Using Knowledge Graphs to harvest datasets for efficient CLIP model training

Add code
May 05, 2025
Viaarxiv icon

Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy

Add code
Feb 11, 2024
Figure 1 for Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Figure 2 for Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Figure 3 for Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Figure 4 for Open-ended VQA benchmarking of Vision-Language models by exploiting Classification datasets and their semantic hierarchy
Viaarxiv icon

Open-vocabulary Attribute Detection

Add code
Nov 23, 2022
Viaarxiv icon

COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning

Add code
Nov 01, 2020
Figure 1 for COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Figure 2 for COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Figure 3 for COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Figure 4 for COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning
Viaarxiv icon