Picture for Toru Tamaki

Toru Tamaki

BFMD: A Full-Match Badminton Dense Dataset for Dense Shot Captioning

Add code
Mar 26, 2026
Viaarxiv icon

M3DDM+: An improved video outpainting by a modified masking strategy

Add code
Jan 16, 2026
Viaarxiv icon

Action tube generation by person query matching for spatio-temporal action detection

Add code
Mar 17, 2025
Figure 1 for Action tube generation by person query matching for spatio-temporal action detection
Figure 2 for Action tube generation by person query matching for spatio-temporal action detection
Figure 3 for Action tube generation by person query matching for spatio-temporal action detection
Figure 4 for Action tube generation by person query matching for spatio-temporal action detection
Viaarxiv icon

Shift and matching queries for video semantic segmentation

Add code
Oct 10, 2024
Figure 1 for Shift and matching queries for video semantic segmentation
Figure 2 for Shift and matching queries for video semantic segmentation
Figure 3 for Shift and matching queries for video semantic segmentation
Figure 4 for Shift and matching queries for video semantic segmentation
Viaarxiv icon

Query matching for spatio-temporal action detection with query-based object detector

Add code
Sep 27, 2024
Figure 1 for Query matching for spatio-temporal action detection with query-based object detector
Figure 2 for Query matching for spatio-temporal action detection with query-based object detector
Figure 3 for Query matching for spatio-temporal action detection with query-based object detector
Figure 4 for Query matching for spatio-temporal action detection with query-based object detector
Viaarxiv icon

Online pre-training with long-form videos

Add code
Aug 28, 2024
Figure 1 for Online pre-training with long-form videos
Viaarxiv icon

Fine-grained length controllable video captioning with ordinal embeddings

Add code
Aug 27, 2024
Figure 1 for Fine-grained length controllable video captioning with ordinal embeddings
Figure 2 for Fine-grained length controllable video captioning with ordinal embeddings
Figure 3 for Fine-grained length controllable video captioning with ordinal embeddings
Figure 4 for Fine-grained length controllable video captioning with ordinal embeddings
Viaarxiv icon

Multi-model learning by sequential reading of untrimmed videos for action recognition

Add code
Jan 26, 2024
Viaarxiv icon

S3Aug: Segmentation, Sampling, and Shift for Action Recognition

Add code
Oct 23, 2023
Figure 1 for S3Aug: Segmentation, Sampling, and Shift for Action Recognition
Figure 2 for S3Aug: Segmentation, Sampling, and Shift for Action Recognition
Figure 3 for S3Aug: Segmentation, Sampling, and Shift for Action Recognition
Figure 4 for S3Aug: Segmentation, Sampling, and Shift for Action Recognition
Viaarxiv icon

Joint learning of images and videos with a single Vision Transformer

Add code
Aug 21, 2023
Figure 1 for Joint learning of images and videos with a single Vision Transformer
Figure 2 for Joint learning of images and videos with a single Vision Transformer
Figure 3 for Joint learning of images and videos with a single Vision Transformer
Figure 4 for Joint learning of images and videos with a single Vision Transformer
Viaarxiv icon