Domain adaptation for object detection typically entails transferring knowledge from one visible domain to another visible domain. However, there are limited studies on adapting from the visible to the thermal domain, because the domain gap between the visible and thermal domains is much larger than expected, and traditional domain adaptation can not successfully facilitate learning in this situation. To overcome this challenge, we propose a Distinctive Dual-Domain Teacher (D3T) framework that employs distinct training paradigms for each domain. Specifically, we segregate the source and target training sets for building dual-teachers and successively deploy exponential moving average to the student model to individual teachers of each domain. The framework further incorporates a zigzag learning method between dual teachers, facilitating a gradual transition from the visible to thermal domains during training. We validate the superiority of our method through newly designed experimental protocols with well-known thermal datasets, i.e., FLIR and KAIST. Source code is available at https://github.com/EdwardDo69/D3T .
Object detectors are typically trained once and for all on a fixed set of classes. However, this closed-world assumption is unrealistic in practice, as new classes will inevitably emerge after the detector is deployed in the wild. In this work, we look at ways to extend a detector trained for a set of base classes so it can i) spot the presence of novel classes, and ii) automatically enrich its repertoire to be able to detect those newly discovered classes together with the base ones. We propose PANDAS, a method for novel class discovery and detection. It discovers clusters representing novel classes from unlabeled data, and represents old and new classes with prototypes. During inference, a distance-based classifier uses these prototypes to assign a label to each detected object instance. The simplicity of our method makes it widely applicable. We experimentally demonstrate the effectiveness of PANDAS on the VOC 2012 and COCO-to-LVIS benchmarks. It performs favorably against the state of the art for this task while being computationally more affordable.
Deepfakes have rapidly emerged as a profound and serious threat to society, primarily due to their ease of creation and dissemination. This situation has triggered an accelerated development of deepfake detection technologies. However, many existing detectors rely heavily on lab-generated datasets for validation, which may not effectively prepare them for novel, emerging, and real-world deepfake techniques. In this paper, we conduct an extensive and comprehensive review and analysis of the latest state-of-the-art deepfake detectors, evaluating them against several critical criteria. These criteria facilitate the categorization of these detectors into 4 high-level groups and 13 fine-grained sub-groups, all aligned with a unified standard conceptual framework. This classification and framework offer deep and practical insights into the factors that affect detector efficacy. We assess the generalizability of 16 leading detectors across various standard attack scenarios, including black-box, white-box, and gray-box settings. Our systematized analysis and experimentation lay the groundwork for a deeper understanding of deepfake detectors and their generalizability, paving the way for future research focused on creating detectors adept at countering various attack scenarios. Additionally, this work offers insights for developing more proactive defenses against deepfakes.
Recent approaches for semantic correspondence have focused on obtaining high-quality correspondences using a complicated network, refining the ambiguous or noisy matching points. Despite their performance improvements, they remain constrained by the limited training pairs due to costly point-level annotations. This paper proposes a simple yet effective method that performs training with unlabeled pairs to complement both limited image pairs and sparse point pairs, requiring neither extra labeled keypoints nor trainable modules. We fundamentally extend the data quantity and variety by augmenting new unannotated pairs not primitively provided as training pairs in benchmarks. Using a simple teacher-student framework, we offer reliable pseudo correspondences to the student network via machine supervision. Finally, the performance of our network is steadily improved by the proposed iterative training, putting back the student as a teacher to generate refined labels and train a new student repeatedly. Our models outperform the milestone baselines, including state-of-the-art methods on semantic correspondence benchmarks.
Lane detection is a vital task for vehicles to navigate and localize their position on the road. To ensure reliable results, lane detection algorithms must have robust generalization performance in various road environments. However, despite the significant performance improvement of deep learning-based lane detection algorithms, their generalization performance in response to changes in road environments still falls short of expectations. In this paper, we present a novel framework for single-source domain generalization (SSDG) in lane detection. By decomposing data into lane structures and surroundings, we enhance diversity using High-Definition (HD) maps and generative models. Rather than expanding data volume, we strategically select a core subset of data, maximizing diversity and optimizing performance. Our extensive experiments demonstrate that our framework enhances the generalization performance of lane detection, comparable to the domain adaptation-based method.
Understanding the interaction between multiple agents is crucial for realistic vehicle trajectory prediction. Existing methods have attempted to infer the interaction from the observed past trajectories of agents using pooling, attention, or graph-based methods, which rely on a deterministic approach. However, these methods can fail under complex road structures, as they cannot predict various interactions that may occur in the future. In this paper, we propose a novel approach that uses lane information to predict a stochastic future relationship among agents. To obtain a coarse future motion of agents, our method first predicts the probability of lane-level waypoint occupancy of vehicles. We then utilize the temporal probability of passing adjacent lanes for each agent pair, assuming that agents passing adjacent lanes will highly interact. We also model the interaction using a probabilistic distribution, which allows for multiple possible future interactions. The distribution is learned from the posterior distribution of interaction obtained from ground truth future trajectories. We validate our method on popular trajectory prediction datasets: nuScenes and Argoverse. The results show that the proposed method brings remarkable performance gain in prediction accuracy, and achieves state-of-the-art performance in long-term prediction benchmark dataset.
This paper proposes a graph-based approach to representing spatio-temporal trajectory data that allows an effective visualization and characterization of city-wide traffic dynamics. With the advance of sensor, mobile, and Internet of Things (IoT) technologies, vehicle and passenger trajectories are being increasingly collected on a massive scale and are becoming a critical source of insight into traffic pattern and traveller behaviour. To leverage such trajectory data to better understand traffic dynamics in a large-scale urban network, this study develops a trajectory-based network traffic analysis method that converts individual trajectory data into a sequence of graphs that evolve over time (known as dynamic graphs or time-evolving graphs) and analyses network-wide traffic patterns in terms of a compact and informative graph-representation of aggregated traffic flows. First, we partition the entire network into a set of cells based on the spatial distribution of data points in individual trajectories, where the cells represent spatial regions between which aggregated traffic flows can be measured. Next, dynamic flows of moving objects are represented as a time-evolving graph, where regions are graph vertices and flows between them are treated as weighted directed edges. Given a fixed set of vertices, edges can be inserted or removed at every time step depending on the presence of traffic flows between two regions at a given time window. Once a dynamic graph is built, we apply graph mining algorithms to detect change-points in time, which represent time points where the graph exhibits significant changes in its overall structure and, thus, correspond to change-points in city-wide mobility pattern throughout the day (e.g., global transition points between peak and off-peak periods).
Census and Household Travel Survey datasets are regularly collected from households and individuals and provide information on their daily travel behavior with demographic and economic characteristics. These datasets have important applications ranging from travel demand estimation to agent-based modeling. However, they often represent a limited sample of the population due to privacy concerns or are given aggregated. Synthetic data augmentation is a promising avenue in addressing these challenges. In this paper, we propose a framework to generate a synthetic population that includes both socioeconomic features (e.g., age, sex, industry) and trip chains (i.e., activity locations). Our model is tested and compared with other recently proposed models on multiple assessment metrics.
We present a novel semi-supervised learning framework that intelligently leverages the consistency regularization between the model's predictions from two strongly-augmented views of an image, weighted by a confidence of pseudo-label, dubbed ConMatch. While the latest semi-supervised learning methods use weakly- and strongly-augmented views of an image to define a directional consistency loss, how to define such direction for the consistency regularization between two strongly-augmented views remains unexplored. To account for this, we present novel confidence measures for pseudo-labels from strongly-augmented views by means of weakly-augmented view as an anchor in non-parametric and parametric approaches. Especially, in parametric approach, we present, for the first time, to learn the confidence of pseudo-label within the networks, which is learned with backbone model in an end-to-end manner. In addition, we also present a stage-wise training to boost the convergence of training. When incorporated in existing semi-supervised learners, ConMatch consistently boosts the performance. We conduct experiments to demonstrate the effectiveness of our ConMatch over the latest methods and provide extensive ablation studies. Code has been made publicly available at https://github.com/JiwonCocoder/ConMatch.
This paper addresses the problem of estimating link flows in a road network by combining limited traffic volume and vehicle trajectory data. While traffic volume data from loop detectors have been the common data source for link flow estimation, the detectors only cover a subset of links. Vehicle trajectory data collected from vehicle tracking sensors are also incorporated these days. However, trajectory data are often sparse in that the observed trajectories only represent a small subset of the whole population, where the exact sampling rate is unknown and may vary over space and time. This study proposes a novel generative modelling framework, where we formulate the link-to-link movements of a vehicle as a sequential decision-making problem using the Markov Decision Process framework and train an agent to make sequential decisions to generate realistic synthetic vehicle trajectories. We use Reinforcement Learning (RL)-based methods to find the best behaviour of the agent, based on which synthetic population vehicle trajectories can be generated to estimate link flows across the whole network. To ensure the generated population vehicle trajectories are consistent with the observed traffic volume and trajectory data, two methods based on Inverse Reinforcement Learning and Constrained Reinforcement Learning are proposed. The proposed generative modelling framework solved by either of these RL-based methods is validated by solving the link flow estimation problem in a real road network. Additionally, we perform comprehensive experiments to compare the performance with two existing methods. The results show that the proposed framework has higher estimation accuracy and robustness under realistic scenarios where certain behavioural assumptions about drivers are not met or the network coverage and penetration rate of trajectory data are low.