Picture for Jana Kosecka

Jana Kosecka

Toward Phonology-Guided Sign Language Motion Generation: A Diffusion Baseline and Conditioning Analysis

Add code
Mar 18, 2026
Viaarxiv icon

Gesture-Aware Pretraining and Token Fusion for 3D Hand Pose Estimation

Add code
Mar 18, 2026
Viaarxiv icon

VarSplat: Uncertainty-aware 3D Gaussian Splatting for Robust RGB-D SLAM

Add code
Mar 10, 2026
Viaarxiv icon

Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems

Add code
Dec 23, 2025
Figure 1 for Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems
Figure 2 for Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems
Figure 3 for Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems
Figure 4 for Multi-temporal Adaptive Red-Green-Blue and Long-Wave Infrared Fusion for You Only Look Once-Based Landmine Detection from Unmanned Aerial Systems
Viaarxiv icon

TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation

Add code
Feb 11, 2025
Figure 1 for TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
Figure 2 for TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
Figure 3 for TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
Figure 4 for TRAVEL: Training-Free Retrieval and Alignment for Vision-and-Language Navigation
Viaarxiv icon

Structured Spatial Reasoning with Open Vocabulary Object Detectors

Add code
Oct 09, 2024
Figure 1 for Structured Spatial Reasoning with Open Vocabulary Object Detectors
Figure 2 for Structured Spatial Reasoning with Open Vocabulary Object Detectors
Figure 3 for Structured Spatial Reasoning with Open Vocabulary Object Detectors
Figure 4 for Structured Spatial Reasoning with Open Vocabulary Object Detectors
Viaarxiv icon

GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs

Add code
Jun 19, 2024
Figure 1 for GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
Figure 2 for GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
Figure 3 for GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
Figure 4 for GSR-BENCH: A Benchmark for Grounded Spatial Reasoning Evaluation via Multimodal LLMs
Viaarxiv icon

Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM

Add code
Apr 29, 2024
Figure 1 for Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Figure 2 for Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Figure 3 for Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Figure 4 for Q-GroundCAM: Quantifying Grounding in Vision Language Models via GradCAM
Viaarxiv icon

Fingerspelling PoseNet: Enhancing Fingerspelling Translation with Pose-Based Transformer Models

Add code
Nov 20, 2023
Viaarxiv icon

Labeling Indoor Scenes with Fusion of Out-of-the-Box Perception Models

Add code
Nov 17, 2023
Viaarxiv icon