Picture for Kashu Yamazaki

Kashu Yamazaki

Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective

Add code
Nov 18, 2025
Viaarxiv icon

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

Add code
Nov 10, 2025
Viaarxiv icon

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

Add code
Jun 01, 2024
Viaarxiv icon

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Add code
Oct 05, 2023
Figure 1 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 2 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 3 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 4 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Viaarxiv icon

AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Add code
Jun 12, 2023
Viaarxiv icon

Contextual Explainable Video Representation: Human Perception-based Understanding

Add code
Dec 17, 2022
Viaarxiv icon

CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection

Add code
Dec 09, 2022
Viaarxiv icon

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

Add code
Nov 28, 2022
Viaarxiv icon

AISFormer: Amodal Instance Segmentation with Transformer

Add code
Oct 13, 2022
Figure 1 for AISFormer: Amodal Instance Segmentation with Transformer
Figure 2 for AISFormer: Amodal Instance Segmentation with Transformer
Figure 3 for AISFormer: Amodal Instance Segmentation with Transformer
Figure 4 for AISFormer: Amodal Instance Segmentation with Transformer
Viaarxiv icon

AOE-Net: Entities Interactions Modeling with Adaptive Attention Mechanism for Temporal Action Proposals Generation

Add code
Oct 05, 2022
Viaarxiv icon