Abstract:Intra-class variability is given according to the significance in the degree of dissimilarity between images within a class. In that sense, depending on its intensity, intra-class variability can hinder the learning process for DL models, specially when such classes are also underrepresented, which is a very common scenario in Fine-Grained Visual Categorization (FGVC) tasks. This paper proposes a novel method that aims at leveraging classification performance in FGVC tasks by learning fine-grained features via classification of class-wise cluster assignments. Our goal is to apply clustering over each class individually, which can allow to discover pseudo-labels that encodes a latent degree of similarity between images. In turn, those labels can be employed in a hierarchical classification process that allows to learn more fine-grained visual features and thereby mitigating intra-class variability issues. Initial experiments over the PlantNet300k enabled to shed light upon several key points in which future work will have to be developed in order to find more conclusive evidence regarding the effectiveness of our method. Our method still achieves state-of-the-art performance on the PlantNet300k dataset even though some of its components haven't been shown to be fully optimized. Our code is available at \href{https://github.com/ADAM-UEFS/FGDCC}{https://github.com/ADAM-UEFS/FGDCC}.
Abstract:This paper presents an approach developed to address the PlantClef 2025 challenge, which consists of a fine-grained multi-label species identification, over high-resolution images. Our solution focused on employing class prototypes obtained from the training dataset as a proxy guidance for training a segmentation Vision Transformer (ViT) on the test set images. To obtain these representations, the proposed method extracts features from training dataset images and create clusters, by applying K-Means, with $K$ equals to the number of classes in the dataset. The segmentation model is a customized narrow ViT, built by replacing the patch embedding layer with a frozen DinoV2, pre-trained on the training dataset for individual species classification. This model is trained to reconstruct the class prototypes of the training dataset from the test dataset images. We then use this model to obtain attention scores that enable to identify and localize areas of interest and consequently guide the classification process. The proposed approach enabled a domain-adaptation from multi-class identification with individual species, into multi-label classification from high-resolution vegetation plots. Our method achieved fifth place in the PlantCLEF 2025 challenge on the private leaderboard, with an F1 score of 0.33331. Besides that, in absolute terms our method scored 0.03 lower than the top-performing submission, suggesting that it may achieved competitive performance in the benchmark task. Our code is available at \href{https://github.com/ADAM-UEFS/PlantCLEF2025}{https://github.com/ADAM-UEFS/PlantCLEF2025}.
Abstract:This paper proposes a competitive and computationally efficient approach to probabilistic rainfall nowcasting. A video projector (V-JEPA Vision Transformer) associated to a lightweight probabilistic head is attached to a pre-trained satellite vision encoder (DINOv3-SAT493M) to map encoder tokens into a discrete empirical CDF (eCDF) over 4-hour accumulated rainfall. The projector-head is optimized end-to-end over the Ranked Probability Score (RPS). As an alternative, 3D-UNET baselines trained with an aggregate Rank Probability Score and a per-pixel Gamma-Hurdle objective are used. On the Weather4Cast 2025 benchmark, the proposed method achieved a promising performance, with a CRPS of 3.5102, which represents $\approx$ 26% in effectiveness gain against the best 3D-UNET.