Picture for Daichi Yashima

Daichi Yashima

MLLM-as-a-Judge Exhibits Model Preference Bias

Add code
Apr 13, 2026
Viaarxiv icon

ABMAMBA: Multimodal Large Language Model with Aligned Hierarchical Bidirectional Scan for Efficient Video Captioning

Add code
Apr 09, 2026
Viaarxiv icon

HiFlow: Tokenization-Free Scale-Wise Autoregressive Policy Learning via Flow Matching

Add code
Mar 28, 2026
Viaarxiv icon

AnoleVLA: Lightweight Vision-Language-Action Model with Deep State Space Models for Mobile Manipulation

Add code
Mar 16, 2026
Viaarxiv icon

NaiLIA: Multimodal Nail Design Retrieval Based on Dense Intent Descriptions and Palette Queries

Add code
Mar 05, 2026
Viaarxiv icon

ReMoRa: Multimodal Large Language Model based on Refined Motion Representation for Long-Video Understanding

Add code
Feb 18, 2026
Viaarxiv icon

Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement

Add code
Jan 28, 2025
Figure 1 for Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Figure 2 for Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Figure 3 for Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Figure 4 for Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Viaarxiv icon

Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling

Add code
Dec 24, 2024
Figure 1 for Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
Figure 2 for Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
Figure 3 for Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
Figure 4 for Open-Vocabulary Mobile Manipulation Based on Double Relaxed Contrastive Learning with Dense Labeling
Viaarxiv icon