Scene Recognition


Beyond Logit Adjustment: A Residual Decomposition Framework for Long-Tailed Reranking

Add code
Apr 02, 2026
Viaarxiv icon

Unifying UAV Cross-View Geo-Localization via 3D Geometric Perception

Add code
Apr 02, 2026
Viaarxiv icon

Riemannian and Symplectic Geometry for Hierarchical Text-Driven Place Recognition

Add code
Apr 02, 2026
Viaarxiv icon

PrivHAR-Bench: A Graduated Privacy Benchmark Dataset for Video-Based Action Recognition

Add code
Apr 01, 2026
Viaarxiv icon

RegFormer: Transferable Relational Grounding for Efficient Weakly-Supervised Human-Object Interaction Detection

Add code
Apr 01, 2026
Viaarxiv icon

JaWildText: A Benchmark for Vision-Language Models on Japanese Scene Text Understanding

Add code
Mar 31, 2026
Viaarxiv icon

From Skeletons to Semantics: Design and Deployment of a Hybrid Edge-Based Action Detection System for Public Safety

Add code
Mar 31, 2026
Viaarxiv icon

Q-Mask: Query-driven Causal Masks for Text Anchoring in OCR-Oriented Vision-Language Models

Add code
Mar 31, 2026
Viaarxiv icon

Learning to See through Illumination Extremes with Event Streaming in Multimodal Large Language Models

Add code
Mar 29, 2026
Viaarxiv icon

How Class Ontology and Data Scale Affect Audio Transfer Learning

Add code
Mar 26, 2026
Viaarxiv icon