Motivated by that DETR-based approaches have established new records on COCO detection and segmentation benchmarks, many recent endeavors show increasing interest in how to further improve DETR-based approaches by pre-training the Transformer in a self-supervised manner while keeping the backbone frozen. Some studies already claimed significant improvements in accuracy. In this paper, we take a closer look at their experimental methodology and check if their approaches are still effective on the very recent state-of-the-art such as $\mathcal{H}$-Deformable-DETR. We conduct thorough experiments on COCO object detection tasks to study the influence of the choice of pre-training datasets, localization, and classification target generation schemes. Unfortunately, we find the previous representative self-supervised approach such as DETReg, fails to boost the performance of the strong DETR-based approaches on full data regimes. We further analyze the reasons and find that simply combining a more accurate box predictor and Objects$365$ benchmark can significantly improve the results in follow-up experiments. We demonstrate the effectiveness of our approach by achieving strong object detection results of AP=$59.3\%$ on COCO val set, which surpasses $\mathcal{H}$-Deformable-DETR + Swin-L by +$1.4\%$. Last, we generate a series of synthetic pre-training datasets by combining the very recent image-to-text captioning models (LLaVA) and text-to-image generative models (SDXL). Notably, pre-training on these synthetic datasets leads to notable improvements in object detection performance. Looking ahead, we anticipate substantial advantages through the future expansion of the synthetic pre-training dataset.
Training a robust policy is critical for policy deployment in real-world systems or dealing with unknown dynamics mismatch in different dynamic systems. Domain Randomization~(DR) is a simple and elegant approach that trains a conservative policy to counter different dynamic systems without expert knowledge about the target system parameters. However, existing works reveal that the policy trained through DR tends to be over-conservative and performs poorly in target domains. Our key insight is that dynamic systems with different parameters provide different levels of difficulty for the policy, and the difficulty of behaving well in a system is constantly changing due to the evolution of the policy. If we can actively sample the systems with proper difficulty for the policy on the fly, it will stabilize the training process and prevent the policy from becoming over-conservative or over-optimistic. To operationalize this idea, we introduce Active Dynamics Preference~(ADP), which quantifies the informativeness and density of sampled system parameters. ADP actively selects system parameters with high informativeness and low density. We validate our approach in four robotic locomotion tasks with various discrepancies between the training and testing environments. Extensive results demonstrate that our approach has superior robustness for system inconsistency compared to several baselines.
While Reinforcement Learning can achieve impressive results for complex tasks, the learned policies are generally prone to fail in downstream tasks with even minor model mismatch or unexpected perturbations. Recent works have demonstrated that a policy population with diverse behavior characteristics can generalize to downstream environments with various discrepancies. However, such policies might result in catastrophic damage during the deployment in practical scenarios like real-world systems due to the unrestricted behaviors of trained policies. Furthermore, training diverse policies without regulation of the behavior can result in inadequate feasible policies for extrapolating to a wide range of test conditions with dynamics shifts. In this work, we aim to train diverse policies under the regularization of the behavior patterns. We motivate our paradigm by observing the inverse dynamics in the environment with partial state information and propose Diversity in Regulation(DiR) training diverse policies with regulated behaviors to discover desired patterns that benefit the generalization. Considerable empirical results on various variations of different environments indicate that our method attains improvements over other diversity-driven counterparts.
A key challenge in Machine Learning is class imbalance, where the sample size of some classes (majority classes) are much higher than that of the other classes (minority classes). If we were to train a classifier directly on imbalanced data, it is more likely for the classifier to predict a new sample as one of the majority classes. In the extreme case, the classifier could completely ignore the minority classes. This could have serious sociological implications in healthcare, as the minority classes are usually the disease classes (e.g., death or positive clinical test result). In this paper, we introduce a software that uses Generative Adversarial Networks to oversample the minority classes so as to improve downstream classification. To the best of our knowledge, this is the first tool that allows multi-class classification (where the target can have an arbitrary number of classes). The code of the tool is publicly available in our github repository (https://github.com/yuxiaohuang/research/tree/master/gwu/working/cigan/code).
Evolutionary Algorithms (EAs) and Deep Reinforcement Learning (DRL) have recently been combined to integrate the advantages of the two solutions for better policy learning. However, in existing hybrid methods, EA is used to directly train the policy network, which will lead to sample inefficiency and unpredictable impact on the policy performance. To better integrate these two approaches and avoid the drawbacks caused by the introduction of EA, we devote ourselves to devising a more efficient and reasonable method of combining EA and DRL. In this paper, we propose Evolutionary Action Selection-Twin Delayed Deep Deterministic Policy Gradient (EAS-TD3), a novel combination of EA and DRL. In EAS, we focus on optimizing the action chosen by the policy network and attempt to obtain high-quality actions to guide policy learning through an evolutionary algorithm. We conduct several experiments on challenging continuous control tasks. The result shows that EAS-TD3 shows superior performance over other state-of-art methods.
Salient object detection in complex scenes and environments is a challenging research topic. Most works focus on RGB-based salient object detection, which limits its performance of real-life applications when confronted with adverse conditions such as dark environments and complex backgrounds. Taking advantage of RGB and thermal infrared images becomes a new research direction for detecting salient object in complex scenes recently, as thermal infrared spectrum imaging provides the complementary information and has been applied to many computer vision tasks. However, current research for RGBT salient object detection is limited by the lack of a large-scale dataset and comprehensive benchmark. This work contributes such a RGBT image dataset named VT5000, including 5000 spatially aligned RGBT image pairs with ground truth annotations. VT5000 has 11 challenges collected in different scenes and environments for exploring the robustness of algorithms. With this dataset, we propose a powerful baseline approach, which extracts multi-level features within each modality and aggregates these features of all modalities with the attention mechanism, for accurate RGBT salient object detection. Extensive experiments show that the proposed baseline approach outperforms the state-of-the-art methods on VT5000 dataset and other two public datasets. In addition, we carry out a comprehensive analysis of different algorithms of RGBT salient object detection on VT5000 dataset, and then make several valuable conclusions and provide some potential research directions for RGBT salient object detection.
Fully Convolutional Neural Network (FCN) has been widely applied to salient object detection recently by virtue of high-level semantic feature extraction, but existing FCN based methods still suffer from continuous striding and pooling operations leading to loss of spatial structure and blurred edges. To maintain the clear edge structure of salient objects, we propose a novel Edge-guided Non-local FCN (ENFNet) to perform edge guided feature learning for accurate salient object detection. In a specific, we extract hierarchical global and local information in FCN to incorporate non-local features for effective feature representations. To preserve good boundaries of salient objects, we propose a guidance block to embed edge prior knowledge into hierarchical feature maps. The guidance block not only performs feature-wise manipulation but also spatial-wise transformation for effective edge embeddings. Our model is trained on the MSRA-B dataset and tested on five popular benchmark datasets. Comparing with the state-of-the-art methods, the proposed method achieves the best performance on all datasets.
Image saliency detection is an active research topic in the community of computer vision and multimedia. Fusing complementary RGB and thermal infrared data has been proven to be effective for image saliency detection. In this paper, we propose an effective approach for RGB-T image saliency detection. Our approach relies on a novel collaborative graph learning algorithm. In particular, we take superpixels as graph nodes, and collaboratively use hierarchical deep features to jointly learn graph affinity and node saliency in a unified optimization framework. Moreover, we contribute a more challenging dataset for the purpose of RGB-T image saliency detection, which contains 1000 spatially aligned RGB-T image pairs and their ground truth annotations. Extensive experiments on the public dataset and the newly created dataset suggest that the proposed approach performs favorably against the state-of-the-art RGB-T saliency detection methods.