Alert button
Picture for Nina Shvetsova

Nina Shvetsova

Alert button

HowToCaption: Prompting LLMs to Transform Video Annotations at Scale

Add code
Bookmark button
Alert button
Oct 07, 2023
Nina Shvetsova, Anna Kukleva, Xudong Hong, Christian Rupprecht, Bernt Schiele, Hilde Kuehne

Figure 1 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 2 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 3 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Figure 4 for HowToCaption: Prompting LLMs to Transform Video Annotations at Scale
Viaarxiv icon

In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval

Add code
Bookmark button
Alert button
Sep 16, 2023
Nina Shvetsova, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Figure 1 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 2 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 3 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Figure 4 for In-Style: Bridging Text and Uncurated Videos with Style Transfer for Text-Video Retrieval
Viaarxiv icon

Preserving Modality Structure Improves Multi-Modal Learning

Add code
Bookmark button
Alert button
Aug 24, 2023
Swetha Sirnam, Mamshad Nayeem Rizve, Nina Shvetsova, Hilde Kuehne, Mubarak Shah

Figure 1 for Preserving Modality Structure Improves Multi-Modal Learning
Figure 2 for Preserving Modality Structure Improves Multi-Modal Learning
Figure 3 for Preserving Modality Structure Improves Multi-Modal Learning
Figure 4 for Preserving Modality Structure Improves Multi-Modal Learning
Viaarxiv icon

What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions

Add code
Bookmark button
Alert button
Mar 29, 2023
Brian Chen, Nina Shvetsova, Andrew Rouditchenko, Daniel Kondermann, Samuel Thomas, Shih-Fu Chang, Rogerio Feris, James Glass, Hilde Kuehne

Figure 1 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 2 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 3 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Figure 4 for What, when, and where? -- Self-Supervised Spatio-Temporal Grounding in Untrimmed Multi-Action Videos from Narrated Instructions
Viaarxiv icon

MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge

Add code
Bookmark button
Alert button
Mar 15, 2023
Wei Lin, Leonid Karlinsky, Nina Shvetsova, Horst Possegger, Mateusz Kozinski, Rameswar Panda, Rogerio Feris, Hilde Kuehne, Horst Bischof

Figure 1 for MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Figure 2 for MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Figure 3 for MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Figure 4 for MAtch, eXpand and Improve: Unsupervised Finetuning for Zero-Shot Action Recognition with Language Knowledge
Viaarxiv icon

Learning by Sorting: Self-supervised Learning with Group Ordering Constraints

Add code
Bookmark button
Alert button
Jan 05, 2023
Nina Shvetsova, Felix Petersen, Anna Kukleva, Bernt Schiele, Hilde Kuehne

Figure 1 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Figure 2 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Figure 3 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Figure 4 for Learning by Sorting: Self-supervised Learning with Group Ordering Constraints
Viaarxiv icon

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Add code
Bookmark button
Alert button
Oct 07, 2022
Andrew Rouditchenko, Yung-Sung Chuang, Nina Shvetsova, Samuel Thomas, Rogerio Feris, Brian Kingsbury, Leonid Karlinsky, David Harwath, Hilde Kuehne, James Glass

Figure 1 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 2 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 3 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Figure 4 for C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval
Viaarxiv icon

VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models

Add code
Bookmark button
Alert button
Sep 12, 2022
Felix Vogel, Nina Shvetsova, Leonid Karlinsky, Hilde Kuehne

Figure 1 for VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Figure 2 for VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Figure 3 for VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Figure 4 for VL-Taboo: An Analysis of Attribute-based Zero-shot Capabilities of Vision-Language Models
Viaarxiv icon

Augmentation Learning for Semi-Supervised Classification

Add code
Bookmark button
Alert button
Aug 03, 2022
Tim Frommknecht, Pedro Alves Zipf, Quanfu Fan, Nina Shvetsova, Hilde Kuehne

Figure 1 for Augmentation Learning for Semi-Supervised Classification
Figure 2 for Augmentation Learning for Semi-Supervised Classification
Figure 3 for Augmentation Learning for Semi-Supervised Classification
Figure 4 for Augmentation Learning for Semi-Supervised Classification
Viaarxiv icon

Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval

Add code
Bookmark button
Alert button
Dec 08, 2021
Nina Shvetsova, Brian Chen, Andrew Rouditchenko, Samuel Thomas, Brian Kingsbury, Rogerio Feris, David Harwath, James Glass, Hilde Kuehne

Figure 1 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 2 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 3 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Figure 4 for Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Viaarxiv icon