Picture for Kashu Yamazaki

Kashu Yamazaki

Learning to Assist: Physics-Grounded Human-Human Control via Multi-Agent Reinforcement Learning

Add code
Mar 11, 2026
Viaarxiv icon

ExtremControl: Low-Latency Humanoid Teleoperation with Direct Extremity Control

Add code
Feb 11, 2026
Viaarxiv icon

Rethinking Progression of Memory State in Robotic Manipulation: An Object-Centric Perspective

Add code
Nov 18, 2025
Viaarxiv icon

SlotVLA: Towards Modeling of Object-Relation Representations in Robotic Manipulation

Add code
Nov 10, 2025
Viaarxiv icon

HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model

Add code
Jun 01, 2024
Figure 1 for HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
Figure 2 for HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
Figure 3 for HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
Figure 4 for HENASY: Learning to Assemble Scene-Entities for Egocentric Video-Language Model
Viaarxiv icon

Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation

Add code
Oct 05, 2023
Figure 1 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 2 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 3 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Figure 4 for Open-Fusion: Real-time Open-Vocabulary 3D Mapping and Queryable Scene Representation
Viaarxiv icon

AerialFormer: Multi-resolution Transformer for Aerial Image Segmentation

Add code
Jun 12, 2023
Viaarxiv icon

Contextual Explainable Video Representation: Human Perception-based Understanding

Add code
Dec 17, 2022
Figure 1 for Contextual Explainable Video Representation: Human Perception-based Understanding
Figure 2 for Contextual Explainable Video Representation: Human Perception-based Understanding
Figure 3 for Contextual Explainable Video Representation: Human Perception-based Understanding
Figure 4 for Contextual Explainable Video Representation: Human Perception-based Understanding
Viaarxiv icon

CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection

Add code
Dec 09, 2022
Figure 1 for CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection
Figure 2 for CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection
Figure 3 for CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection
Figure 4 for CLIP-TSA: CLIP-Assisted Temporal Self-Attention for Weakly-Supervised Video Anomaly Detection
Viaarxiv icon

VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning

Add code
Nov 28, 2022
Figure 1 for VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Figure 2 for VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Figure 3 for VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Figure 4 for VLTinT: Visual-Linguistic Transformer-in-Transformer for Coherent Video Paragraph Captioning
Viaarxiv icon