Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jefersson A. dos Santos

Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification

Sep 30, 2025

Artur Barros, Carlos Caetano, João Macedo, Jefersson A. dos Santos, Sandra Avila

Figure 1 for Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification

Figure 2 for Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification

Figure 3 for Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification

Figure 4 for Attention over Scene Graphs: Indoor Scene Representations Toward CSAI Classification

Abstract:Indoor scene classification is a critical task in computer vision, with wide-ranging applications that go from robotics to sensitive content analysis, such as child sexual abuse imagery (CSAI) classification. The problem is particularly challenging due to the intricate relationships between objects and complex spatial layouts. In this work, we propose the Attention over Scene Graphs for Sensitive Content Analysis (ASGRA), a novel framework that operates on structured graph representations instead of raw pixels. By first converting images into Scene Graphs and then employing a Graph Attention Network for inference, ASGRA directly models the interactions between a scene's components. This approach offers two key benefits: (i) inherent explainability via object and relationship identification, and (ii) privacy preservation, enabling model training without direct access to sensitive images. On Places8, we achieve 81.27% balanced accuracy, surpassing image-based methods. Real-world CSAI evaluation with law enforcement yields 74.27% balanced accuracy. Our results establish structured scene representations as a robust paradigm for indoor scene classification and CSAI classification. Code is publicly available at https://github.com/tutuzeraa/ASGRA.

* British Machine Vision Conference (BMVC 2025), in the From Scene Understanding to Human Modeling Workshop

Via

Access Paper or Ask Questions

Minimizing Risk Through Minimizing Model-Data Interaction: A Protocol For Relying on Proxy Tasks When Designing Child Sexual Abuse Imagery Detection Models

May 10, 2025

Thamiris Coelho, Leo S. F. Ribeiro, João Macedo, Jefersson A. dos Santos, Sandra Avila

Abstract:The distribution of child sexual abuse imagery (CSAI) is an ever-growing concern of our modern world; children who suffered from this heinous crime are revictimized, and the growing amount of illegal imagery distributed overwhelms law enforcement agents (LEAs) with the manual labor of categorization. To ease this burden researchers have explored methods for automating data triage and detection of CSAI, but the sensitive nature of the data imposes restricted access and minimal interaction between real data and learning algorithms, avoiding leaks at all costs. In observing how these restrictions have shaped the literature we formalize a definition of "Proxy Tasks", i.e., the substitute tasks used for training models for CSAI without making use of CSA data. Under this new terminology we review current literature and present a protocol for making conscious use of Proxy Tasks together with consistent input from LEAs to design better automation in this field. Finally, we apply this protocol to study -- for the first time -- the task of Few-shot Indoor Scene Classification on CSAI, showing a final model that achieves promising results on a real-world CSAI dataset whilst having no weights actually trained on sensitive data.

* ACM Conference on Fairness, Accountability, and Transparency (FAccT 2025)

Via

Access Paper or Ask Questions

Neglected Risks: The Disturbing Reality of Children's Images in Datasets and the Urgent Call for Accountability

Apr 20, 2025

Carlos Caetano, Gabriel O. dos Santos, Caio Petrucci, Artur Barros, Camila Laranjeira, Leo S. F. Ribeiro, Júlia F. de Mendonça, Jefersson A. dos Santos, Sandra Avila

Abstract:Including children's images in datasets has raised ethical concerns, particularly regarding privacy, consent, data protection, and accountability. These datasets, often built by scraping publicly available images from the Internet, can expose children to risks such as exploitation, profiling, and tracking. Despite the growing recognition of these issues, approaches for addressing them remain limited. We explore the ethical implications of using children's images in AI datasets and propose a pipeline to detect and remove such images. As a use case, we built the pipeline on a Vision-Language Model under the Visual Question Answering task and tested it on the #PraCegoVer dataset. We also evaluate the pipeline on a subset of 100,000 images from the Open Images V7 dataset to assess its effectiveness in detecting and removing images of children. The pipeline serves as a baseline for future research, providing a starting point for more comprehensive tools and methodologies. While we leverage existing models trained on potentially problematic data, our goal is to expose and address this issue. We do not advocate for training or deploying such models, but instead call for urgent community reflection and action to protect children's rights. Ultimately, we aim to encourage the research community to exercise - more than an additional - care in creating new datasets and to inspire the development of tools to protect the fundamental rights of vulnerable groups, particularly children.

* ACM Conference on Fairness, Accountability, and Transparency (FAccT 2025)

Via

Access Paper or Ask Questions

FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models

Sep 28, 2024

Diego A. B. Moreira, Alef Iury Ferreira, Gabriel Oliveira dos Santos, Luiz Pereira, João Medrado Gondim, Gustavo Bonil, Helena Maia, Nádia da Silva, Simone Tiemi Hashiguti, Jefersson A. dos Santos(+2 more)

Figure 1 for FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models

Figure 2 for FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models

Figure 3 for FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models

Figure 4 for FairPIVARA: Reducing and Assessing Biases in CLIP-Based Multimodal Models

Abstract:Despite significant advancements and pervasive use of vision-language models, a paucity of studies has addressed their ethical implications. These models typically require extensive training data, often from hastily reviewed text and image datasets, leading to highly imbalanced datasets and ethical concerns. Additionally, models initially trained in English are frequently fine-tuned for other languages, such as the CLIP model, which can be expanded with more data to enhance capabilities but can add new biases. The CAPIVARA, a CLIP-based model adapted to Portuguese, has shown strong performance in zero-shot tasks. In this paper, we evaluate four different types of discriminatory practices within visual-language models and introduce FairPIVARA, a method to reduce them by removing the most affected dimensions of feature embeddings. The application of FairPIVARA has led to a significant reduction of up to 98% in observed biases while promoting a more balanced word distribution within the model. Our model and code are available at: https://github.com/hiaac-nlp/FairPIVARA.

* 14 pages, 10 figures. Accepted to 35th British Machine Vision Conference (BMVC 2024), Workshop on Privacy, Fairness, Accountability and Transparency in Computer Vision

Via

Access Paper or Ask Questions

Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery

Mar 02, 2024

Pedro H. V. Valois, João Macedo, Leo S. F. Ribeiro, Jefersson A. dos Santos, Sandra Avila

Figure 1 for Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery

Figure 2 for Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery

Figure 3 for Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery

Figure 4 for Leveraging Self-Supervised Learning for Scene Recognition in Child Sexual Abuse Imagery

Abstract:Crime in the 21st century is split into a virtual and real world. However, the former has become a global menace to people's well-being and security in the latter. The challenges it presents must be faced with unified global cooperation, and we must rely more than ever on automated yet trustworthy tools to combat the ever-growing nature of online offenses. Over 10 million child sexual abuse reports are submitted to the US National Center for Missing & Exploited Children every year, and over 80% originated from online sources. Therefore, investigation centers and clearinghouses cannot manually process and correctly investigate all imagery. In light of that, reliable automated tools that can securely and efficiently deal with this data are paramount. In this sense, the scene recognition task looks for contextual cues in the environment, being able to group and classify child sexual abuse data without requiring to be trained on sensitive material. The scarcity and limitations of working with child sexual abuse images lead to self-supervised learning, a machine-learning methodology that leverages unlabeled data to produce powerful representations that can be more easily transferred to target tasks. This work shows that self-supervised deep learning models pre-trained on scene-centric data can reach 71.6% balanced accuracy on our indoor scene classification task and, on average, 2.2 percentage points better performance than a fully supervised version. We cooperate with Brazilian Federal Police experts to evaluate our indoor classification model on actual child abuse material. The results demonstrate a notable discrepancy between the features observed in widely used scene datasets and those depicted on sensitive materials.

* 13 pages, 5 figures, 4 tables. Under review

Via

Access Paper or Ask Questions

Data-Centric Machine Learning for Geospatial Remote Sensing Data

Dec 08, 2023

Ribana Roscher, Marc Rußwurm, Caroline Gevaert, Michael Kampffmeyer, Jefersson A. dos Santos, Maria Vakalopoulou, Ronny Hänsch, Stine Hansen, Keiller Nogueira, Jonathan Prexl(+1 more)

Figure 1 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 2 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 3 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Figure 4 for Data-Centric Machine Learning for Geospatial Remote Sensing Data

Abstract:Recent developments and research in modern machine learning have led to substantial improvements in the geospatial field. Although numerous deep learning models have been proposed, the majority of them have been developed on benchmark datasets that lack strong real-world relevance. Furthermore, the performance of many methods has already saturated on these datasets. We argue that shifting the focus towards a complementary data-centric perspective is necessary to achieve further improvements in accuracy, generalization ability, and real impact in end-user applications. This work presents a definition and precise categorization of automated data-centric learning approaches for geospatial data. It highlights the complementary role of data-centric learning with respect to model-centric in the larger machine learning deployment cycle. We review papers across the entire geospatial field and categorize them into different groups. A set of representative experiments shows concrete implementation examples. These examples provide concrete steps to act on geospatial data with data-centric machine learning approaches.

Via

Access Paper or Ask Questions

YOLOv7 for Mosquito Breeding Grounds Detection and Tracking

Oct 16, 2023

Camila Laranjeira, Daniel Andrade, Jefersson A. dos Santos

Figure 1 for YOLOv7 for Mosquito Breeding Grounds Detection and Tracking

Figure 2 for YOLOv7 for Mosquito Breeding Grounds Detection and Tracking

Figure 3 for YOLOv7 for Mosquito Breeding Grounds Detection and Tracking

Figure 4 for YOLOv7 for Mosquito Breeding Grounds Detection and Tracking

Abstract:With the looming threat of climate change, neglected tropical diseases such as dengue, zika, and chikungunya have the potential to become an even greater global concern. Remote sensing technologies can aid in controlling the spread of Aedes Aegypti, the transmission vector of such diseases, by automating the detection and mapping of mosquito breeding sites, such that local entities can properly intervene. In this work, we leverage YOLOv7, a state-of-the-art and computationally efficient detection approach, to localize and track mosquito foci in videos captured by unmanned aerial vehicles. We experiment on a dataset released to the public as part of the ICIP 2023 grand challenge entitled Automatic Detection of Mosquito Breeding Grounds. We show that YOLOv7 can be directly applied to detect larger foci categories such as pools, tires, and water tanks and that a cheap and straightforward aggregation of frame-by-frame detection can incorporate time consistency into the tracking process.

* Winning paper of ICIP 2023 Grand Challenge - Automatic Detection of Mosquito Breeding Grounds - https://www02.smt.ufrj.br/~tvdigital/mosquito/challenge/

Via

Access Paper or Ask Questions

Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

Apr 29, 2022

Camila Laranjeira, João Macedo, Sandra Avila, Jefersson A. dos Santos

Figure 1 for Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

Figure 2 for Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

Figure 3 for Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

Figure 4 for Seeing without Looking: Analysis Pipeline for Child Sexual Abuse Datasets

Abstract:The online sharing and viewing of Child Sexual Abuse Material (CSAM) are growing fast, such that human experts can no longer handle the manual inspection. However, the automatic classification of CSAM is a challenging field of research, largely due to the inaccessibility of target data that is - and should forever be - private and in sole possession of law enforcement agencies. To aid researchers in drawing insights from unseen data and safely providing further understanding of CSAM images, we propose an analysis template that goes beyond the statistics of the dataset and respective labels. It focuses on the extraction of automatic signals, provided both by pre-trained machine learning models, e.g., object categories and pornography detection, as well as image metrics such as luminance and sharpness. Only aggregated statistics of sparse signals are provided to guarantee the anonymity of children and adolescents victimized. The pipeline allows filtering the data by applying thresholds to each specified signal and provides the distribution of such signals within the subset, correlations between signals, as well as a bias evaluation. We demonstrated our proposal on the Region-based annotated Child Pornography Dataset (RCPD), one of the few CSAM benchmarks in the literature, composed of over 2000 samples among regular and CSAM images, produced in partnership with Brazil's Federal Police. Although noisy and limited in several senses, we argue that automatic signals can highlight important aspects of the overall distribution of data, which is valuable for databases that can not be disclosed. Our goal is to safely publicize the characteristics of CSAM datasets, encouraging researchers to join the field and perhaps other institutions to provide similar reports on their benchmarks.

* FAccT 2022 - 5th Conference on Fairness, Accountability and Transparency

Via

Access Paper or Ask Questions

Conditional Reconstruction for Open-set Semantic Segmentation

Mar 02, 2022

Ian Nunes, Matheus B. Pereira, Hugo Oliveira, Jefersson A. dos Santos, Marcus Poggi

Figure 1 for Conditional Reconstruction for Open-set Semantic Segmentation

Figure 2 for Conditional Reconstruction for Open-set Semantic Segmentation

Figure 3 for Conditional Reconstruction for Open-set Semantic Segmentation

Figure 4 for Conditional Reconstruction for Open-set Semantic Segmentation

Abstract:Open set segmentation is a relatively new and unexploredtask, with just a handful of methods proposed to model suchtasks.We propose a novel method called CoReSeg thattackles the issue using class conditional reconstruction ofthe input images according to their pixelwise mask. Ourmethod conditions each input pixel to all known classes,expecting higher errors for pixels of unknown classes. Itwas observed that the proposed method produces better se-mantic consistency in its predictions, resulting in cleanersegmentation maps that better fit object boundaries. CoRe-Seg outperforms state-of-the-art methods on the Vaihin-gen and Potsdam ISPRS datasets, while also being com-petitive on the Houston 2018 IEEE GRSS Data Fusiondataset. Official implementation for CoReSeg is availableat:https://github.com/iannunes/CoReSeg.

Via

Access Paper or Ask Questions

Weakly Supervised Few-Shot Segmentation Via Meta-Learning

Sep 03, 2021

Pedro H. T. Gama, Hugo Oliveira, José Marcato Junior, Jefersson A. dos Santos

Figure 1 for Weakly Supervised Few-Shot Segmentation Via Meta-Learning

Figure 2 for Weakly Supervised Few-Shot Segmentation Via Meta-Learning

Figure 3 for Weakly Supervised Few-Shot Segmentation Via Meta-Learning

Figure 4 for Weakly Supervised Few-Shot Segmentation Via Meta-Learning

Abstract:Semantic segmentation is a classic computer vision task with multiple applications, which includes medical and remote sensing image analysis. Despite recent advances with deep-based approaches, labeling samples (pixels) for training models is laborious and, in some cases, unfeasible. In this paper, we present two novel meta learning methods, named WeaSeL and ProtoSeg, for the few-shot semantic segmentation task with sparse annotations. We conducted extensive evaluation of the proposed methods in different applications (12 datasets) in medical imaging and agricultural remote sensing, which are very distinct fields of knowledge and usually subject to data scarcity. The results demonstrated the potential of our method, achieving suitable results for segmenting both coffee/orange crops and anatomical parts of the human body in comparison with full dense annotation.

Via

Access Paper or Ask Questions