Picture for Komei Sugiura

Komei Sugiura

MLLM-as-a-Judge Exhibits Model Preference Bias

Add code
Apr 13, 2026
Viaarxiv icon

Stitch4D: Sparse Multi-Location 4D Urban Reconstruction via Spatio-Temporal Interpolation

Add code
Apr 09, 2026
Viaarxiv icon

ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning

Add code
Apr 09, 2026
Viaarxiv icon

HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching

Add code
Mar 28, 2026
Viaarxiv icon

LILAC: Language-Conditioned Object-Centric Optical Flow for Open-Loop Trajectory Generation

Add code
Mar 26, 2026
Viaarxiv icon

AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation

Add code
Mar 16, 2026
Viaarxiv icon

NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries

Add code
Mar 05, 2026
Viaarxiv icon

ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding

Add code
Feb 18, 2026
Viaarxiv icon

LLM-Free Image Captioning Evaluation in Reference-Flexible Settings

Add code
Dec 25, 2025
Viaarxiv icon

Affordance RAG: Hierarchical Multimodal Retrieval with Affordance-Aware Embodied Memory for Mobile Manipulation

Add code
Dec 22, 2025
Viaarxiv icon