Picture for Triantafyllos Afouras

Triantafyllos Afouras

PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding

Add code
Apr 17, 2025
Figure 1 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 2 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 3 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Figure 4 for PerceptionLM: Open-Access Data and Models for Detailed Visual Understanding
Viaarxiv icon

Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation

Add code
Jan 02, 2025
Viaarxiv icon

VoiceVector: Multimodal Enrolment Vectors for Speaker Separation

Add code
Jan 02, 2025
Figure 1 for VoiceVector: Multimodal Enrolment Vectors for Speaker Separation
Figure 2 for VoiceVector: Multimodal Enrolment Vectors for Speaker Separation
Figure 3 for VoiceVector: Multimodal Enrolment Vectors for Speaker Separation
Figure 4 for VoiceVector: Multimodal Enrolment Vectors for Speaker Separation
Viaarxiv icon

MusicFlow: Cascaded Flow Matching for Text Guided Music Generation

Add code
Oct 27, 2024
Figure 1 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 2 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 3 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Figure 4 for MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
Viaarxiv icon

Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives

Add code
Nov 30, 2023
Figure 1 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 2 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 3 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Figure 4 for Ego-Exo4D: Understanding Skilled Human Activity from First- and Third-Person Perspectives
Viaarxiv icon

Video-Mined Task Graphs for Keystep Recognition in Instructional Videos

Add code
Jul 17, 2023
Figure 1 for Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Figure 2 for Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Figure 3 for Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Figure 4 for Video-Mined Task Graphs for Keystep Recognition in Instructional Videos
Viaarxiv icon

Learning to Ground Instructional Articles in Videos through Narrations

Add code
Jun 06, 2023
Figure 1 for Learning to Ground Instructional Articles in Videos through Narrations
Figure 2 for Learning to Ground Instructional Articles in Videos through Narrations
Figure 3 for Learning to Ground Instructional Articles in Videos through Narrations
Figure 4 for Learning to Ground Instructional Articles in Videos through Narrations
Viaarxiv icon

Scaling up sign spotting through sign language dictionaries

Add code
May 09, 2022
Figure 1 for Scaling up sign spotting through sign language dictionaries
Figure 2 for Scaling up sign spotting through sign language dictionaries
Figure 3 for Scaling up sign spotting through sign language dictionaries
Figure 4 for Scaling up sign spotting through sign language dictionaries
Viaarxiv icon

Audio-Visual Synchronisation in the wild

Add code
Dec 08, 2021
Figure 1 for Audio-Visual Synchronisation in the wild
Figure 2 for Audio-Visual Synchronisation in the wild
Figure 3 for Audio-Visual Synchronisation in the wild
Figure 4 for Audio-Visual Synchronisation in the wild
Viaarxiv icon

BBC-Oxford British Sign Language Dataset

Add code
Nov 05, 2021
Figure 1 for BBC-Oxford British Sign Language Dataset
Figure 2 for BBC-Oxford British Sign Language Dataset
Figure 3 for BBC-Oxford British Sign Language Dataset
Figure 4 for BBC-Oxford British Sign Language Dataset
Viaarxiv icon