Text


MEAN-RIR: Multi-Modal Environment-Aware Network for Robust Room Impulse Response Estimation

Add code
Sep 05, 2025
Viaarxiv icon

SparkUI-Parser: Enhancing GUI Perception with Robust Grounding and Parsing

Add code
Sep 05, 2025
Viaarxiv icon

Towards Ontology-Based Descriptions of Conversations with Qualitatively-Defined Concepts

Add code
Sep 05, 2025
Viaarxiv icon

Cloning a Conversational Voice AI Agent from Call\,Recording Datasets for Telesales

Add code
Sep 05, 2025
Viaarxiv icon

HoPE: Hyperbolic Rotary Positional Encoding for Stable Long-Range Dependency Modeling in Large Language Models

Add code
Sep 05, 2025
Viaarxiv icon

ToM-SSI: Evaluating Theory of Mind in Situated Social Interactions

Add code
Sep 05, 2025
Viaarxiv icon

REMOTE: A Unified Multimodal Relation Extraction Framework with Multilevel Optimal Transport and Mixture-of-Experts

Add code
Sep 05, 2025
Viaarxiv icon

PRIM: Towards Practical In-Image Multilingual Machine Translation

Add code
Sep 05, 2025
Viaarxiv icon

VLSM-Ensemble: Ensembling CLIP-based Vision-Language Models for Enhanced Medical Image Segmentation

Add code
Sep 05, 2025
Viaarxiv icon

Towards an Accurate and Effective Robot Vision (The Problem of Topological Localization for Mobile Robots)

Add code
Sep 05, 2025
Viaarxiv icon