Picture for Mohammed Bennamoun

Mohammed Bennamoun

LatentMove: Towards Complex Human Movement Video Generation

Add code
May 28, 2025
Viaarxiv icon

Watch and Listen: Understanding Audio-Visual-Speech Moments with Multimodal LLM

Add code
May 23, 2025
Viaarxiv icon

Multi-Resolution Pathology-Language Pre-training Model with Text-Guided Visual Representation

Add code
Apr 26, 2025
Viaarxiv icon

Polarisation-Inclusive Spiking Neural Networks for Real-Time RFI Detection in Modern Radio Telescopes

Add code
Apr 16, 2025
Viaarxiv icon

Advancing RFI-Detection in Radio Astronomy with Liquid State Machines

Add code
Apr 14, 2025
Viaarxiv icon

STING-BEE: Towards Vision-Language Model for Real-World X-ray Baggage Security Inspection

Add code
Apr 03, 2025
Viaarxiv icon

Dynamic Neural Surfaces for Elastic 4D Shape Representation and Analysis

Add code
Mar 05, 2025
Viaarxiv icon

AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis

Add code
Feb 03, 2025
Figure 1 for AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
Figure 2 for AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
Figure 3 for AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
Figure 4 for AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
Viaarxiv icon

GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing

Add code
Jan 23, 2025
Figure 1 for GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
Figure 2 for GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
Figure 3 for GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
Figure 4 for GeoPixel: Pixel Grounding Large Multimodal Model in Remote Sensing
Viaarxiv icon

Admitting Ignorance Helps the Video Question Answering Models to Answer

Add code
Jan 15, 2025
Viaarxiv icon