Removing shadows requires an understanding of both lighting conditions and object textures in a scene. Existing methods typically learn pixel-level color mappings between shadow and non-shadow images, in which the joint modeling of lighting and object textures is implicit and inadequate. We observe that in a shadow region, the degradation degree of object textures depends on the local illumination, while simply enhancing the local illumination cannot fully recover the attenuated textures. Based on this observation, we propose to condition the restoration of attenuated textures on the corrected local lighting in the shadow region. Specifically, We first design a shadow-aware decomposition network to estimate the illumination and reflectance layers of shadow regions explicitly. We then propose a novel bilateral correction network to recast the lighting of shadow regions in the illumination layer via a novel local lighting correction module, and to restore the textures conditioned on the corrected illumination layer via a novel illumination-guided texture restoration module. We further annotate pixel-wise shadow masks for the public SRD dataset, which originally contains only image pairs. Experiments on three benchmarks show that our method outperforms existing state-of-the-art shadow removal methods.
The robust generalization of deep learning models in the presence of inherent noise remains a significant challenge, especially when labels are subjective and noise is indiscernible in natural settings. This problem is particularly pronounced in many practical applications. In this paper, we address a special and important scenario of monitoring suicidal ideation, where time-series data, such as photoplethysmography (PPG), is susceptible to such noise. Current methods predominantly focus on image and text data or address artificially introduced noise, neglecting the complexities of natural noise in time-series analysis. To tackle this, we introduce a novel neural network model tailored for analyzing noisy physiological time-series data, named TNANet, which merges advanced encoding techniques with confidence learning, enhancing prediction accuracy. Another contribution of our work is the collection of a specialized dataset of PPG signals derived from real-world environments for suicidal ideation prediction. Employing this dataset, our TNANet achieves the prediction accuracy of 63.33% in a binary classification task, outperforming state-of-the-art models. Furthermore, comprehensive evaluations were conducted on three other well-known public datasets with artificially introduced noise to rigorously test the TNANet's capabilities. These tests consistently demonstrated TNANet's superior performance by achieving an accuracy improvement of more than 10% compared to baseline methods.
Real-world data are long-tailed, the lack of tail samples leads to a significant limitation in the generalization ability of the model. Although numerous approaches of class re-balancing perform well for moderate class imbalance problems, additional knowledge needs to be introduced to help the tail class recover the underlying true distribution when the observed distribution from a few tail samples does not represent its true distribution properly, thus allowing the model to learn valuable information outside the observed domain. In this work, we propose to leverage the geometric information of the feature distribution of the well-represented head class to guide the model to learn the underlying distribution of the tail class. Specifically, we first systematically define the geometry of the feature distribution and the similarity measures between the geometries, and discover four phenomena regarding the relationship between the geometries of different feature distributions. Then, based on four phenomena, feature uncertainty representation is proposed to perturb the tail features by utilizing the geometry of the head class feature distribution. It aims to make the perturbed features cover the underlying distribution of the tail class as much as possible, thus improving the model's generalization performance in the test domain. Finally, we design a three-stage training scheme enabling feature uncertainty modeling to be successfully applied. Experiments on CIFAR-10/100-LT, ImageNet-LT, and iNaturalist2018 show that our proposed approach outperforms other similar methods on most metrics. In addition, the experimental phenomena we discovered are able to provide new perspectives and theoretical foundations for subsequent studies.
Earth Mover's Distance (EMD) is an important similarity measure between two distributions, used in computer vision and many other application domains. However, its exact calculation is computationally and memory intensive, which hinders its scalability and applicability for large-scale problems. Various approximate EMD algorithms have been proposed to reduce computational costs, but they suffer lower accuracy and may require additional memory usage or manual parameter tuning. In this paper, we present a novel approach, NNS-EMD, to approximate EMD using Nearest Neighbor Search (NNS), in order to achieve high accuracy, low time complexity, and high memory efficiency. The NNS operation reduces the number of data points compared in each NNS iteration and offers opportunities for parallel processing. We further accelerate NNS-EMD via vectorization on GPU, which is especially beneficial for large datasets. We compare NNS-EMD with both the exact EMD and state-of-the-art approximate EMD algorithms on image classification and retrieval tasks. We also apply NNS-EMD to calculate transport mapping and realize color transfer between images. NNS-EMD can be 44x to 135x faster than the exact EMD implementation, and achieves superior accuracy, speedup, and memory efficiency over existing approximate EMD methods.
Pre-trained large language models (LLMs) have powerful capabilities for generating creative natural text. Evolutionary algorithms (EAs) can discover diverse solutions to complex real-world problems. Motivated by the common collective and directionality of text sequence generation and evolution, this paper illustrates the strong consistency of LLMs and EAs, which includes multiple one-to-one key characteristics: token embedding and genotype-phenotype mapping, position encoding and fitness shaping, position embedding and selection, attention and crossover, feed-forward neural network and mutation, model training and parameter update, and multi-task learning and multi-objective optimization. Based on this consistency perspective, existing coupling studies are analyzed, including evolutionary fine-tuning and LLM-enhanced EAs. Leveraging these insights, we outline a fundamental roadmap for future research in coupling LLMs and EAs, while highlighting key challenges along the way. The consistency not only reveals the evolution mechanism behind LLMs but also facilitates the development of evolved artificial agents that approach or surpass biological organisms.
Applying a randomized algorithm to a subset of a dataset rather than the entire dataset is a common approach to amplify its privacy guarantees in the released information. We propose a class of subsampling methods named MUltistage Sampling Technique (MUST) for privacy amplification (PA) in the context of differential privacy (DP). We conduct comprehensive analyses of the PA effects and utility for several 2-stage MUST procedures, namely, MUST.WO, MUST.OW, and MUST.WW that respectively represent sampling with (W), without (O), with (W) replacement from the original dataset in stage I and then sampling without (O), with (W), with (W) replacement in stage II from the subset drawn in stage I. We also provide the privacy composition analysis over repeated applications of MUST via the Fourier accountant algorithm. Our theoretical and empirical results suggest that MUST.OW and MUST.WW have stronger PA in $\epsilon$ than the common one-stage sampling procedures including Poisson sampling, sampling without replacement, and sampling with replacement, while the results on $\delta$ vary case by case. We also prove that MUST.WO is equivalent to sampling with replacement in PA. Furthermore, the final subset generated by a MUST procedure is a multiset that may contain multiple copies of the same data points due to sampling with replacement involved, which enhances the computational efficiency of algorithms that require complex function calculations on distinct data points (e.g., gradient descent). Our utility experiments show that MUST delivers similar or improved utility and stability in the privacy-preserving outputs compared to one-stage subsampling methods at similar privacy loss. MUST can be seamlessly integrated into stochastic optimization algorithms or procedures that involve parallel or simultaneous subsampling (e.g., bagging and subsampling bootstrap) when DP guarantees are necessary.
Visible-infrared person re-identification (VI-ReID) aims to search the same pedestrian of interest across visible and infrared modalities. Existing models mainly focus on compensating for modality-specific information to reduce modality variation. However, these methods often lead to a higher computational overhead and may introduce interfering information when generating the corresponding images or features. To address this issue, it is critical to leverage pedestrian-attentive features and learn modality-complete and -consistent representation. In this paper, a novel Transferring Modality-Aware Pedestrian Attentive Learning (TMPA) model is proposed, focusing on the pedestrian regions to efficiently compensate for missing modality-specific features. Specifically, we propose a region-based data augmentation module PedMix to enhance pedestrian region coherence by mixing the corresponding regions from different modalities. A lightweight hybrid compensation module, i.e., the Modality Feature Transfer (MFT), is devised to integrate cross attention and convolution networks to fully explore the discriminative modality-complete features with minimal computational overhead. Extensive experiments conducted on the benchmark SYSU-MM01 and RegDB datasets demonstrated the effectiveness of our proposed TMPA model.
In the context of the long-tail scenario, models exhibit a strong demand for high-quality data. Data-centric approaches aim to enhance both the quantity and quality of data to improve model performance. Among these approaches, information augmentation has been progressively introduced as a crucial category. It achieves a balance in model performance by augmenting the richness and quantity of samples in the tail classes. However, there is currently a lack of research into the underlying mechanisms explaining the effectiveness of information augmentation methods. Consequently, the utilization of information augmentation in long-tail recognition tasks relies heavily on empirical and intricate fine-tuning. This work makes two primary contributions. Firstly, we approach the problem from the perspectives of feature diversity and distribution shift, introducing the concept of Feature Diversity Gain (FDG) to elucidate why information augmentation is effective. We find that the performance of information augmentation can be explained by FDG, and its performance peaks when FDG achieves an appropriate balance. Experimental results demonstrate that by using FDG to select augmented data, we can further enhance model performance without the need for any modifications to the model's architecture. Thus, data-centric approaches hold significant potential in the field of long-tail recognition, beyond the development of new model structures. Furthermore, we systematically introduce the core components and fundamental tasks of a data-centric long-tail learning framework for the first time. These core components guide the implementation and deployment of the system, while the corresponding fundamental tasks refine and expand the research area.
In scenarios with long-tailed distributions, the model's ability to identify tail classes is limited due to the under-representation of tail samples. Class rebalancing, information augmentation, and other techniques have been proposed to facilitate models to learn the potential distribution of tail classes. The disadvantage is that these methods generally pursue models with balanced class accuracy on the data manifold, while ignoring the ability of the model to resist interference. By constructing noisy data manifold, we found that the robustness of models trained on unbalanced data has a long-tail phenomenon. That is, even if the class accuracy is balanced on the data domain, it still has bias on the noisy data manifold. However, existing methods cannot effectively mitigate the above phenomenon, which makes the model vulnerable in long-tailed scenarios. In this work, we propose an Orthogonal Uncertainty Representation (OUR) of feature embedding and an end-to-end training strategy to improve the long-tail phenomenon of model robustness. As a general enhancement tool, OUR has excellent compatibility with other methods and does not require additional data generation, ensuring fast and efficient training. Comprehensive evaluations on long-tailed datasets show that our method significantly improves the long-tail phenomenon of robustness, bringing consistent performance gains to other long-tailed learning methods.