Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Nico Lang

Taxonomy-Aware Evaluation of Vision-Language Models

Apr 07, 2025

Vésteinn Snæbjarnarson, Kevin Du, Niklas Stoehr, Serge Belongie, Ryan Cotterell, Nico Lang, Stella Frank

Abstract:When a vision-language model (VLM) is prompted to identify an entity depicted in an image, it may answer 'I see a conifer,' rather than the specific label 'norway spruce'. This raises two issues for evaluation: First, the unconstrained generated text needs to be mapped to the evaluation label space (i.e., 'conifer'). Second, a useful classification measure should give partial credit to less-specific, but not incorrect, answers ('norway spruce' being a type of 'conifer'). To meet these requirements, we propose a framework for evaluating unconstrained text predictions, such as those generated from a vision-language model, against a taxonomy. Specifically, we propose the use of hierarchical precision and recall measures to assess the level of correctness and specificity of predictions with regard to a taxonomy. Experimentally, we first show that existing text similarity measures do not capture taxonomic similarity well. We then develop and compare different methods to map textual VLM predictions onto a taxonomy. This allows us to compute hierarchical similarity measures between the generated text and the ground truth labels. Finally, we analyze modern VLMs on fine-grained visual classification tasks based on our proposed taxonomic evaluation scheme.

* CVPR 2025

Via

Access Paper or Ask Questions

Open-Set Recognition of Novel Species in Biodiversity Monitoring

Mar 03, 2025

Yuyan Chen, Nico Lang, B. Christian Schmidt, Aditya Jain, Yves Basset, Sara Beery, Maxim Larrivée, David Rolnick

Abstract:Machine learning is increasingly being applied to facilitate long-term, large-scale biodiversity monitoring. With most species on Earth still undiscovered or poorly documented, species-recognition models are expected to encounter new species during deployment. We introduce Open-Insects, a fine-grained image recognition benchmark dataset for open-set recognition and out-of-distribution detection in biodiversity monitoring. Open-Insects makes it possible to evaluate algorithms for new species detection on several geographical open-set splits with varying difficulty. Furthermore, we present a test set recently collected in the wild with 59 species that are likely new to science. We evaluate a variety of open-set recognition algorithms, including post-hoc methods, training-time regularization, and training with auxiliary data, finding that the simple post-hoc approach of utilizing softmax scores remains a strong baseline. We also demonstrate how to leverage auxiliary data to improve the detection performance when the training dataset is limited. Our results provide timely insights to guide the development of computer vision methods for biodiversity monitoring and species discovery.

Via

Access Paper or Ask Questions

Multimodal Fusion Strategies for Mapping Biophysical Landscape Features

Oct 07, 2024

Lucia Gordon, Nico Lang, Catherine Ressijac, Andrew Davies

Figure 1 for Multimodal Fusion Strategies for Mapping Biophysical Landscape Features

Figure 2 for Multimodal Fusion Strategies for Mapping Biophysical Landscape Features

Figure 3 for Multimodal Fusion Strategies for Mapping Biophysical Landscape Features

Figure 4 for Multimodal Fusion Strategies for Mapping Biophysical Landscape Features

Abstract:Multimodal aerial data are used to monitor natural systems, and machine learning can significantly accelerate the classification of landscape features within such imagery to benefit ecology and conservation. It remains under-explored, however, how these multiple modalities ought to be fused in a deep learning model. As a step towards filling this gap, we study three strategies (Early fusion, Late fusion, and Mixture of Experts) for fusing thermal, RGB, and LiDAR imagery using a dataset of spatially-aligned orthomosaics in these three modalities. In particular, we aim to map three ecologically-relevant biophysical landscape features in African savanna ecosystems: rhino middens, termite mounds, and water. The three fusion strategies differ in whether the modalities are fused early or late, and if late, whether the model learns fixed weights per modality for each class or generates weights for each class adaptively, based on the input. Overall, the three methods have similar macro-averaged performance with Late fusion achieving an AUC of 0.698, but their per-class performance varies strongly, with Early fusion achieving the best recall for middens and water and Mixture of Experts achieving the best recall for mounds.

* 9 pages, 4 figures, ECCV 2024 Workshop in CV for Ecology

Via

Access Paper or Ask Questions

Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

Jun 07, 2024

Venkanna Babu Guthula, Stefan Oehmcke, Remigio Chilaule, Hui Zhang, Nico Lang, Ankit Kariryaa, Johan Mottelson, Christian Igel

Figure 1 for Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

Figure 2 for Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

Figure 3 for Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

Figure 4 for Nacala-Roof-Material: Drone Imagery for Roof Detection, Classification, and Segmentation to Support Mosquito-borne Disease Risk Assessment

Abstract:As low-quality housing and in particular certain roof characteristics are associated with an increased risk of malaria, classification of roof types based on remote sensing imagery can support the assessment of malaria risk and thereby help prevent the disease. To support research in this area, we release the Nacala-Roof-Material dataset, which contains high-resolution drone images from Mozambique with corresponding labels delineating houses and specifying their roof types. The dataset defines a multi-task computer vision problem, comprising object detection, classification, and segmentation. In addition, we benchmarked various state-of-the-art approaches on the dataset. Canonical U-Nets, YOLOv8, and a custom decoder on pretrained DINOv2 served as baselines. We show that each of the methods has its advantages but none is superior on all tasks, which highlights the potential of our dataset for future research in multi-task learning. While the tasks are closely related, accurate segmentation of objects does not necessarily imply accurate instance separation, and vice versa. We address this general issue by introducing a variant of the deep ordinal watershed (DOW) approach that additionally separates the interior of objects, allowing for improved object delineation and separation. We show that our DOW variant is a generic approach that improves the performance of both U-Net and DINOv2 backbones, leading to a better trade-off between semantic segmentation and instance segmentation.

Via

Access Paper or Ask Questions

Labeled Data Selection for Category Discovery

Jun 07, 2024

Bingchen Zhao, Nico Lang, Serge Belongie, Oisin Mac Aodha

Figure 1 for Labeled Data Selection for Category Discovery

Figure 2 for Labeled Data Selection for Category Discovery

Figure 3 for Labeled Data Selection for Category Discovery

Figure 4 for Labeled Data Selection for Category Discovery

Abstract:Category discovery methods aim to find novel categories in unlabeled visual data. At training time, a set of labeled and unlabeled images are provided, where the labels correspond to the categories present in the images. The labeled data provides guidance during training by indicating what types of visual properties and features are relevant for performing discovery in the unlabeled data. As a result, changing the categories present in the labeled set can have a large impact on what is ultimately discovered in the unlabeled set. Despite its importance, the impact of labeled data selection has not been explored in the category discovery literature to date. We show that changing the labeled data can significantly impact discovery performance. Motivated by this, we propose two new approaches for automatically selecting the most suitable labeled data based on the similarity between the labeled and unlabeled data. Our observation is that, unlike in conventional supervised transfer learning, the best labeled is neither too similar, nor too dissimilar, to the unlabeled categories. Our resulting approaches obtains state-of-the-art discovery performance across a range of challenging fine-grained benchmark datasets.

Via

Access Paper or Ask Questions

MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

May 04, 2024

Vishal Nedungadi, Ankit Kariryaa, Stefan Oehmcke, Serge Belongie, Christian Igel, Nico Lang

Figure 1 for MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Figure 2 for MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Figure 3 for MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Figure 4 for MMEarth: Exploring Multi-Modal Pretext Tasks For Geospatial Representation Learning

Abstract:The volume of unlabelled Earth observation (EO) data is huge, but many important applications lack labelled training data. However, EO data offers the unique opportunity to pair data from different modalities and sensors automatically based on geographic location and time, at virtually no human labor cost. We seize this opportunity to create a diverse multi-modal pretraining dataset at global scale. Using this new corpus of 1.2 million locations, we propose a Multi-Pretext Masked Autoencoder (MP-MAE) approach to learn general-purpose representations for optical satellite images. Our approach builds on the ConvNeXt V2 architecture, a fully convolutional masked autoencoder (MAE). Drawing upon a suite of multi-modal pretext tasks, we demonstrate that our MP-MAE approach outperforms both MAEs pretrained on ImageNet and MAEs pretrained on domain-specific satellite images. This is shown on several downstream tasks including image classification and semantic segmentation. We find that multi-modal pretraining notably improves the linear probing performance, e.g. 4pp on BigEarthNet and 16pp on So2Sat, compared to pretraining on optical satellite images only. We show that this also leads to better label and parameter efficiency which are crucial aspects in global scale applications.

* Data and code is available on the project page: https://vishalned.github.io/mmearth

Via

Access Paper or Ask Questions

Familiarity-Based Open-Set Recognition Under Adversarial Attacks

Nov 08, 2023

Philip Enevoldsen, Christian Gundersen, Nico Lang, Serge Belongie, Christian Igel

Abstract:Open-set recognition (OSR), the identification of novel categories, can be a critical component when deploying classification models in real-world applications. Recent work has shown that familiarity-based scoring rules such as the Maximum Softmax Probability (MSP) or the Maximum Logit Score (MLS) are strong baselines when the closed-set accuracy is high. However, one of the potential weaknesses of familiarity-based OSR are adversarial attacks. Here, we present gradient-based adversarial attacks on familiarity scores for both types of attacks, False Familiarity and False Novelty attacks, and evaluate their effectiveness in informed and uninformed settings on TinyImageNet.

* Published in: The 2nd Workshop and Challenges for Out-of-Distribution Generalization in Computer Vision, ICCV 2023

Via

Access Paper or Ask Questions

Satellite-based high-resolution maps of cocoa planted area for Côte d'Ivoire and Ghana

Jun 13, 2022

Nikolai Kalischek, Nico Lang, Cécile Renier, Rodrigo Caye Daudt, Thomas Addoah, William Thompson, Wilma J. Blaser-Hart, Rachael Garrett, Konrad Schindler, Jan D. Wegner

Figure 1 for Satellite-based high-resolution maps of cocoa planted area for Côte d'Ivoire and Ghana

Figure 2 for Satellite-based high-resolution maps of cocoa planted area for Côte d'Ivoire and Ghana

Figure 3 for Satellite-based high-resolution maps of cocoa planted area for Côte d'Ivoire and Ghana

Figure 4 for Satellite-based high-resolution maps of cocoa planted area for Côte d'Ivoire and Ghana

Abstract:C\^ote d'Ivoire and Ghana, the world's largest producers of cocoa, account for two thirds of the global cocoa production. In both countries, cocoa is the primary perennial crop, providing income to almost two million farmers. Yet precise maps of cocoa planted area are missing, hindering accurate quantification of expansion in protected areas, production and yields, and limiting information available for improved sustainability governance. Here, we combine cocoa plantation data with publicly available satellite imagery in a deep learning framework and create high-resolution maps of cocoa plantations for both countries, validated in situ. Our results suggest that cocoa cultivation is an underlying driver of over 37% and 13% of forest loss in protected areas in C\^ote d'Ivoire and Ghana, respectively, and that official reports substantially underestimate the planted area, up to 40% in Ghana. These maps serve as a crucial building block to advance understanding of conservation and economic development in cocoa producing regions.

Via

Access Paper or Ask Questions

A high-resolution canopy height model of the Earth

Apr 13, 2022

Nico Lang, Walter Jetz, Konrad Schindler, Jan Dirk Wegner

Figure 1 for A high-resolution canopy height model of the Earth

Figure 2 for A high-resolution canopy height model of the Earth

Figure 3 for A high-resolution canopy height model of the Earth

Figure 4 for A high-resolution canopy height model of the Earth

Abstract:The worldwide variation in vegetation height is fundamental to the global carbon cycle and central to the functioning of ecosystems and their biodiversity. Geospatially explicit and, ideally, highly resolved information is required to manage terrestrial ecosystems, mitigate climate change, and prevent biodiversity loss. Here, we present the first global, wall-to-wall canopy height map at 10 m ground sampling distance for the year 2020. No single data source meets these requirements: dedicated space missions like GEDI deliver sparse height data, with unprecedented coverage, whereas optical satellite images like Sentinel-2 offer dense observations globally, but cannot directly measure vertical structures. By fusing GEDI with Sentinel-2, we have developed a probabilistic deep learning model to retrieve canopy height from Sentinel-2 images anywhere on Earth, and to quantify the uncertainty in these estimates. The presented approach reduces the saturation effect commonly encountered when estimating canopy height from satellite images, allowing to resolve tall canopies with likely high carbon stocks. According to our map, only 5% of the global landmass is covered by trees taller than 30 m. Such data play an important role for conservation, e.g., we find that only 34% of these tall canopies are located within protected areas. Our model enables consistent, uncertainty-informed worldwide mapping and supports an ongoing monitoring to detect change and inform decision making. The approach can serve ongoing efforts in forest conservation, and has the potential to foster advances in climate, carbon, and biodiversity modelling.

Via

Access Paper or Ask Questions

Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Bayesian Deep Learning

Nov 25, 2021

Alexander Becker, Stefania Russo, Stefano Puliti, Nico Lang, Konrad Schindler, Jan Dirk Wegner

Figure 1 for Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Bayesian Deep Learning

Figure 2 for Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Bayesian Deep Learning

Figure 3 for Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Bayesian Deep Learning

Figure 4 for Country-wide Retrieval of Forest Structure From Optical and SAR Satellite Imagery With Bayesian Deep Learning

Abstract:Monitoring and managing Earth's forests in an informed manner is an important requirement for addressing challenges like biodiversity loss and climate change. While traditional in situ or aerial campaigns for forest assessments provide accurate data for analysis at regional level, scaling them to entire countries and beyond with high temporal resolution is hardly possible. In this work, we propose a Bayesian deep learning approach to densely estimate forest structure variables at country-scale with 10-meter resolution, using freely available satellite imagery as input. Our method jointly transforms Sentinel-2 optical images and Sentinel-1 synthetic aperture radar images into maps of five different forest structure variables: 95th height percentile, mean height, density, Gini coefficient, and fractional cover. We train and test our model on reference data from 41 airborne laser scanning missions across Norway and demonstrate that it is able to generalize to unseen test regions, achieving normalized mean absolute errors between 11% and 15%, depending on the variable. Our work is also the first to propose a Bayesian deep learning approach so as to predict forest structure variables with well-calibrated uncertainty estimates. These increase the trustworthiness of the model and its suitability for downstream tasks that require reliable confidence estimates, such as informed decision making. We present an extensive set of experiments to validate the accuracy of the predicted maps as well as the quality of the predicted uncertainties. To demonstrate scalability, we provide Norway-wide maps for the five forest structure variables.

* 19 pages, 11 figures

Via

Access Paper or Ask Questions