Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Remy Sun

MObyGaze: a film dataset of multimodal objectification densely annotated by experts

May 28, 2025

Julie Tores, Elisa Ancarani, Lucile Sassatelli, Hui-Yin Wu, Clement Bergman, Lea Andolfi, Victor Ecrement, Remy Sun, Frederic Precioso, Thierry Devars(+3 more)

Figure 1 for MObyGaze: a film dataset of multimodal objectification densely annotated by experts

Figure 2 for MObyGaze: a film dataset of multimodal objectification densely annotated by experts

Figure 3 for MObyGaze: a film dataset of multimodal objectification densely annotated by experts

Figure 4 for MObyGaze: a film dataset of multimodal objectification densely annotated by experts

Abstract:Characterizing and quantifying gender representation disparities in audiovisual storytelling contents is necessary to grasp how stereotypes may perpetuate on screen. In this article, we consider the high-level construct of objectification and introduce a new AI task to the ML community: characterize and quantify complex multimodal (visual, speech, audio) temporal patterns producing objectification in films. Building on film studies and psychology, we define the construct of objectification in a structured thesaurus involving 5 sub-constructs manifesting through 11 concepts spanning 3 modalities. We introduce the Multimodal Objectifying Gaze (MObyGaze) dataset, made of 20 movies annotated densely by experts for objectification levels and concepts over freely delimited segments: it amounts to 6072 segments over 43 hours of video with fine-grained localization and categorization. We formulate different learning tasks, propose and investigate best ways to learn from the diversity of labels among a low number of annotators, and benchmark recent vision, text and audio models, showing the feasibility of the task. We make our code and our dataset available to the community and described in the Croissant format: https://anonymous.4open.science/r/MObyGaze-F600/.

Via

Access Paper or Ask Questions

MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks

Mar 18, 2021

Alexandre Rame, Remy Sun, Matthieu Cord

Figure 1 for MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks

Figure 2 for MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks

Figure 3 for MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks

Figure 4 for MixMo: Mixing Multiple Inputs for Multiple Outputs via Deep Subnetworks

Abstract:Recent strategies achieved ensembling "for free" by fitting concurrently diverse subnetworks inside a single base network. The main idea during training is that each subnetwork learns to classify only one of the multiple inputs simultaneously provided. However, the question of how to best mix these multiple inputs has not been studied so far. In this paper, we introduce MixMo, a new generalized framework for learning multi-input multi-output deep subnetworks. Our key motivation is to replace the suboptimal summing operation hidden in previous approaches by a more appropriate mixing mechanism. For that purpose, we draw inspiration from successful mixed sample data augmentations. We show that binary mixing in features - particularly with rectangular patches from CutMix - enhances results by making subnetworks stronger and more diverse. We improve state of the art for image classification on CIFAR-100 and Tiny ImageNet datasets. Our easy to implement models notably outperform data augmented deep ensembles, without the inference and memory overheads. As we operate in features and simply better leverage the expressiveness of large networks, we open a new line of research complementary to previous works.

* 8 pages, 10 figures, 6 tables

Via

Access Paper or Ask Questions