Zero-Shot Learning (ZSL) focuses on classifying samples of unseen classes with only their side semantic information presented during training. It cannot handle real-life, open-world scenarios where there are test samples of unknown classes for which neither samples (e.g., images) nor their side semantic information is known during training. Open-Set Recognition (OSR) is dedicated to addressing the unknown class issue, but existing OSR methods are not designed to model the semantic information of the unseen classes. To tackle this combined ZSL and OSR problem, we consider the case of "Zero-Shot Open-Set Recognition" (ZS-OSR), where a model is trained under the ZSL setting but it is required to accurately classify samples from the unseen classes while being able to reject samples from the unknown classes during inference. We perform large experiments on combining existing state-of-the-art ZSL and OSR models for the ZS-OSR task on four widely used datasets adapted from the ZSL task, and reveal that ZS-OSR is a non-trivial task as the simply combined solutions perform badly in distinguishing the unseen-class and unknown-class samples. We further introduce a novel approach specifically designed for ZS-OSR, in which our model learns to generate adversarial semantic embeddings of the unknown classes to train an unknowns-informed ZS-OSR classifier. Extensive empirical results show that our method 1) substantially outperforms the combined solutions in detecting the unknown classes while retaining the classification accuracy on the unseen classes and 2) achieves similar superiority under generalized ZS-OSR settings.
Tracking multiple moving objects of interest (OOI) with multiple robot systems (MRS) has been addressed by active sensing that maintains a shared belief of OOIs and plans the motion of robots to maximize the information quality. Mobility of robots enables the behavior of pursuing better visibility, which is constrained by sensor field of view (FoV) and occlusion objects. We first extend prior work to detect, maintain and share occlusion information explicitly, allowing us to generate occlusion-aware planning even if a priori semantic occlusion information is unavailable. The efficacy of active sensing approaches is often evaluated according to estimation error and information gain metrics. However, these metrics do not directly explain the level of cooperative behavior engendered by the active sensing algorithms. Next, we extract different emergent cooperative behaviors that stem from the same underlying algorithms but manifest differently under differing scenarios. In particular, we highlight and demonstrate three emergent behavior patterns in active sensing MRS: (i) Change of tracking responsibility between agents when tracking trajectories with divergent directions or due to a re-allocation of the resource among heterogeneous agents; (ii) Awareness of occlusions to a trajectory and temporal leave-and-return of the sensing agent; (iii) Sharing of local occlusion objects in MRS that subsequently improves the awareness of occlusion.
In target tracking with mobile multi-sensor systems, sensor deployment impacts the observation capabilities and the resulting state estimation quality. Based on a partially observable Markov decision process (POMDP) formulation comprised of the observable sensor dynamics, unobservable target states, and accompanying observation laws, we present a distributed information-driven solution approach to the multi-agent target tracking problem, namely, sequential multi-agent nominal belief-state optimization (SMA-NBO). SMA-NBO seeks to minimize the expected tracking error via receding horizon control including a heuristic expected cost-to-go (HECTG). SMA-NBO incorporates a computationally efficient approximation of the target belief-state over the horizon. The agent-by-agent decision-making is capable of leveraging on-board (edge) compute for selecting (sub-optimal) target-tracking maneuvers exhibiting non-myopic cooperative fleet behavior. The optimization problem explicitly incorporates semantic information defining target occlusions from a world model. To illustrate the efficacy of our approach, a random occlusion forest environment is simulated. SMA-NBO is compared to other baseline approaches. The simulation results show SMA-NBO 1) maintains tracking performance and reduces the computational cost by replacing the calculation of the expected target trajectory with a single sample trajectory based on maximum a posteriori estimation; 2) generates cooperative fleet decision by sequentially optimizing single-agent policy with efficient usage of other agents' policy of intent; 3) aptly incorporates the multiple weighted trace penalty (MWTP) HECTG, which improves tracking performance with a computationally efficient heuristic.
There has been emerging interest to use transductive learning for adversarial robustness (Goldwasser et al., NeurIPS 2020; Wu et al., ICML 2020). Compared to traditional "test-time" defenses, these defense mechanisms "dynamically retrain" the model based on test time input via transductive learning; and theoretically, attacking these defenses boils down to bilevel optimization, which seems to raise the difficulty for adaptive attacks. In this paper, we first formalize and analyze modeling aspects of transductive robustness. Then, we propose the principle of attacking model space for solving bilevel attack objectives, and present an instantiation of the principle which breaks previous transductive defenses. These attacks thus point to significant difficulties in the use of transductive learning to improve adversarial robustness. To this end, we present new theoretical and empirical evidence in support of the utility of transductive learning.
Significant attention is being paid to multi-person pose estimation methods recently, as there has been rapid progress in the field owing to convolutional neural networks. Especially, recent method which exploits part confidence maps and Part Affinity Fields (PAFs) has achieved accurate real-time prediction of multi-person keypoints. However, human annotated labels are sometimes inappropriate for learning models. For example, if there is a limb that extends outside an image, a keypoint for the limb may not have annotations because it exists outside of the image, and thus the labels for the limb can not be generated. If a model is trained with data including such missing labels, the output of the model for the location, even though it is correct, is penalized as a false positive, which is likely to cause negative effects on the performance of the model. In this paper, we point out the existence of some patterns of inappropriate labels, and propose a novel method for correcting such labels with a teacher model trained on such incomplete data. Experiments on the COCO dataset show that training with the corrected labels improves the performance of the model and also speeds up training.
We propose Progressive Structure-conditional Generative Adversarial Networks (PSGAN), a new framework that can generate full-body and high-resolution character images based on structural information. Recent progress in generative adversarial networks with progressive training has made it possible to generate high-resolution images. However, existing approaches have limitations in achieving both high image quality and structural consistency at the same time. Our method tackles the limitations by progressively increasing the resolution of both generated images and structural conditions during training. In this paper, we empirically demonstrate the effectiveness of this method by showing the comparison with existing approaches and video generation results of diverse anime characters at 1024x1024 based on target pose sequences. We also create a novel dataset containing full-body 1024x1024 high-resolution images and exact 2D pose keypoints using Unity 3D Avatar models.