The success of self-supervised contrastive learning hinges on identifying positive data pairs that, when pushed together in embedding space, encode useful information for subsequent downstream tasks. However, in time-series, this is challenging because creating positive pairs via augmentations may break the original semantic meaning. We hypothesize that if we can retrieve information from one subsequence to successfully reconstruct another subsequence, then they should form a positive pair. Harnessing this intuition, we introduce our novel approach: REtrieval-BAsed Reconstruction (REBAR) contrastive learning. First, we utilize a convolutional cross-attention architecture to calculate the REBAR error between two different time-series. Then, through validation experiments, we show that the REBAR error is a predictor of mutual class membership, justifying its usage as a positive/negative labeler. Finally, once integrated into a contrastive learning framework, our REBAR method can learn an embedding that achieves state-of-the-art performance on downstream tasks across various modalities.
Visual emergence is the phenomenon in which the visual system obtains a holistic perception after grouping and reorganizing local signals. The picture Dalmatian dog is known for its use in explaining visual emergence. This type of image, which consists of a set of discrete black speckles (speckles), is called an emerging image. Not everyone can find the dog in Dalmatian dog, and among those who can, the time spent varies greatly. Although Gestalt theory summarizes perceptual organization into several principles, it remains ambiguous how these principles affect the perception of emerging images. This study, therefore, designed three psychological experiments to explore the factors that influence the perception of emerging images. In the first, we found that the density of speckles in the local area and the arrangements of some key speckles played a key role in the perception of an emerging case. We set parameters in the algorithm to characterize these two factors. We then automatically generated diversified emerging-test images (ETIs) through the algorithm and verified their effectiveness in two subsequent experiments.
Artificial objects usually have very stable shape features, which are stable, persistent properties in geometry. They can provide evidence for object recognition. Shape features are more stable and more distinguishing than appearance features, color features, grayscale features, or gradient features. The difficulty with object recognition based on shape features is that objects may differ in color, lighting, size, position, pose, and background interference, and it is not currently possible to predict all possible conditions. The variety of objects and conditions renders object recognition based on geometric features very challenging. This paper provides a method based on shape templates, which involves the selection, collection, and combination discrimination of geometric evidence of the edge segments of images, to find out the target object accurately from background, and it is able to identify the semantic attributes of each line segment of the target object. In essence, the method involves solving a global optimal combinatorial optimization problem. Although the complexity of the global optimal combinatorial optimization problem seems to be very high, there is no need to define the complex feature vector and no need for any expensive training process. It has very good generalization ability and environmental adaptability, and more solid basis for cognitive psychology than other methods. The process of collecting geometric evidence, which is simple and universal, shows considerable prospects for practical use. The experimental results prove that the method has great advantages in response to changes in the environment, invariant recognition, pinpointing the geometry of objects, search efficiency, and efficient calculation. This attempt contributes to understanding of some types of universal processing during the process of object recognition.
Current view planning (VP) systems usually adopt an iterative pipeline with next-best-view (NBV) methods that can autonomously perform 3D reconstruction of unknown objects. However, they are slowed down by local path planning, which is improved by our previously proposed set-covering-based network SCVP using one-shot view planning and global path planning. In this work, we propose a combined pipeline that selects a few NBVs before activating the network to improve model completeness. However, this pipeline will result in more views than expected because the SCVP has not been trained from multiview scenarios. To reduce the overall number of views and paths required, we propose a multiview-activated architecture MA-SCVP and an efficient dataset sampling method for view planning based on a long-tail distribution. Ablation studies confirm the optimal network architecture, the sampling method and the number of samples, the NBV method and the number of NBVs in our combined pipeline. Comparative experiments support the claim that our system achieves faster and more complete reconstruction than state-of-the-art systems. For the reference of the community, we make the source codes public.
Adversarial attacks on thermal infrared imaging expose the risk of related applications. Estimating the security of these systems is essential for safely deploying them in the real world. In many cases, realizing the attacks in the physical space requires elaborate special perturbations. These solutions are often \emph{impractical} and \emph{attention-grabbing}. To address the need for a physically practical and stealthy adversarial attack, we introduce \textsc{HotCold} Block, a novel physical attack for infrared detectors that hide persons utilizing the wearable Warming Paste and Cooling Paste. By attaching these readily available temperature-controlled materials to the body, \textsc{HotCold} Block evades human eyes efficiently. Moreover, unlike existing methods that build adversarial patches with complex texture and structure features, \textsc{HotCold} Block utilizes an SSP-oriented adversarial optimization algorithm that enables attacks with pure color blocks and explores the influence of size, shape, and position on attack performance. Extensive experimental results in both digital and physical environments demonstrate the performance of our proposed \textsc{HotCold} Block. \emph{Code is available: \textcolor{magenta}{https://github.com/weihui1308/HOTCOLDBlock}}.
Although Deep Neural Networks (DNNs) have achieved impressive results in computer vision, their exposed vulnerability to adversarial attacks remains a serious concern. A series of works has shown that by adding elaborate perturbations to images, DNNs could have catastrophic degradation in performance metrics. And this phenomenon does not only exist in the digital space but also in the physical space. Therefore, estimating the security of these DNNs-based systems is critical for safely deploying them in the real world, especially for security-critical applications, e.g., autonomous cars, video surveillance, and medical diagnosis. In this paper, we focus on physical adversarial attacks and provide a comprehensive survey of over 150 existing papers. We first clarify the concept of the physical adversarial attack and analyze its characteristics. Then, we define the adversarial medium, essential to perform attacks in the physical world. Next, we present the physical adversarial attack methods in task order: classification, detection, and re-identification, and introduce their performance in solving the trilemma: effectiveness, stealthiness, and robustness. In the end, we discuss the current challenges and potential future directions.
Trajectory prediction is a fundamental and challenging task for numerous applications, such as autonomous driving and intelligent robots. Currently, most of existing work treat the pedestrian trajectory as a series of fixed two-dimensional coordinates. However, in real scenarios, the trajectory often exhibits randomness, and has its own probability distribution. Inspired by this observed fact, also considering other movement characteristics of pedestrians, we propose one simple and intuitive movement description, probability trajectory, which maps the coordinate points of pedestrian trajectory into two-dimensional Gaussian distribution in images. Based on this unique description, we develop one novel trajectory prediction method, called social probability. The method combines the new probability trajectory and powerful convolution recurrent neural networks together. Both the input and output of our method are probability trajectories, which provide the recurrent neural network with sufficient spatial and random information of moving pedestrians. And the social probability extracts spatio-temporal features directly on the new movement description to generate robust and accurate predicted results. The experiments on public benchmark datasets show the effectiveness of the proposed method.
The neural mechanism of memory has a very close relation with the problem of representation in artificial intelligence. In this paper a computational model was proposed to simulate the network of neurons in brain and how they process information. The model refers to morphological and electrophysiological characteristics of neural information processing, and is based on the assumption that neurons encode their firing sequence. The network structure, functions for neural encoding at different stages, the representation of stimuli in memory, and an algorithm to form a memory were presented. It also analyzed the stability and recall rate for learning and the capacity of memory. Because neural dynamic processes, one succeeding another, achieve a neuron-level and coherent form by which information is represented and processed, it may facilitate examination of various branches of Artificial Intelligence, such as inference, problem solving, pattern recognition, natural language processing and learning. The processes of cognitive manipulation occurring in intelligent behavior have a consistent representation while all being modeled from the perspective of computational neuroscience. Thus, the dynamics of neurons make it possible to explain the inner mechanisms of different intelligent behaviors by a unified model of cognitive architecture at a micro-level.
A conceptual system with rich connotation is key to improving the performance of knowledge-based artificial intelligence systems. While a conceptual system, which has abundant concepts and rich semantic relationships, and is developable, evolvable, and adaptable to multi-task environments, its actual construction is not only one of the major challenges of knowledge engineering, but also the fundamental goal of research on knowledge and conceptualization. Finding a new method to represent concepts and construct a conceptual system will therefore greatly improve the performance of many intelligent systems. Fortunately the core of human cognition is a system with relatively complete concepts and a mechanism that ensures the establishment and development of the system. The human conceptual system can not be achieved immediately, but rather must develop gradually. Developmental psychology carefully observes the process of concept acquisition in humans at the behavioral level, and along with cognitive psychology has proposed some rough explanations of those observations. However, due to the lack of research in aspects such as representation, systematic models, algorithm details and realization, many of the results of developmental psychology have not been applied directly to the building of artificial conceptual systems. For example, Karmiloff-Smith's Representation Redescription (RR) supposition reflects a concept-acquisition process that re-describes a lower level representation of a concept to a higher one. This paper is inspired by this developmental psychology viewpoint. We use an object-oriented approach to re-explain and materialize RR supposition from the formal semantic perspective, because the OO paradigm is a natural way to describe the outside world, and it also has strict grammar regulations.