The question "Can machines think?" and the Turing Test to assess whether machines could achieve human-level intelligence is one of the roots of AI. With the philosophical argument "I think, therefore I am", this paper challenge the idea of a "thinking machine" supported by current AIs since there is no sense of self in them. Current artificial intelligence is only seemingly intelligent information processing and does not truly understand or be subjectively aware of oneself and perceive the world with the self as human intelligence does. In this paper, we introduce a Brain-inspired and Self-based Artificial Intelligence (BriSe AI) paradigm. This BriSe AI paradigm is dedicated to coordinating various cognitive functions and learning strategies in a self-organized manner to build human-level AI models and robotic applications. Specifically, BriSe AI emphasizes the crucial role of the Self in shaping the future AI, rooted with a practical hierarchical Self framework, including Perception and Learning, Bodily Self, Autonomous Self, Social Self, and Conceptual Self. The hierarchical framework of the Self highlights self-based environment perception, self-bodily modeling, autonomous interaction with the environment, social interaction and collaboration with others, and even more abstract understanding of the Self. Furthermore, the positive mutual promotion and support among multiple levels of Self, as well as between Self and learning, enhance the BriSe AI's conscious understanding of information and flexible adaptation to complex environments, serving as a driving force propelling BriSe AI towards real Artificial General Intelligence.
The theory of sampling and recovery of bandlimited graph signals has been extensively studied. However, in many cases, the observation of a signal is quite coarse. For example, users only provide simple comments such as "like" or "dislike" for a product on an e-commerce platform. This is a particular scenario where only the sign information of a graph signal can be measured. In this paper, we are interested in how to sample based on sign information in an online manner, by which the direction of the original graph signal can be estimated. The online signed sampling problem of a graph signal can be formulated as a Markov decision process in a finite horizon. Unfortunately, it is intractable for large size graphs. We propose a low-complexity greedy signed sampling algorithm (GSS) as well as a stopping criterion. Meanwhile, we prove that the objective function is adaptive monotonic and adaptive submodular, so that the performance is close enough to the global optimum with a lower bound. Finally, we demonstrate the effectiveness of the GSS algorithm by both synthesis and realworld data.
Object tracking based on the fusion of visible and thermal im-ages, known as RGB-T tracking, has gained increasing atten-tion from researchers in recent years. How to achieve a more comprehensive fusion of information from the two modalities with fewer computational costs has been a problem that re-searchers have been exploring. Recently, with the rise of prompt learning in computer vision, we can better transfer knowledge from visual large models to downstream tasks. Considering the strong complementarity between visible and thermal modalities, we propose a tracking architecture based on mutual prompt learning between the two modalities. We also design a lightweight prompter that incorporates attention mechanisms in two dimensions to transfer information from one modality to the other with lower computational costs, embedding it into each layer of the backbone. Extensive ex-periments have demonstrated that our proposed tracking ar-chitecture is effective and efficient, achieving state-of-the-art performance while maintaining high running speeds.
Few studies have worked on the effects of tonal coarticulation and prosodic positions on the low rising tone in Xiamen Dialect. This study addressed such an issue. To do so, a new method, the Tonal Contour Analysis in Tonal Triangle, was proposed to measure the subtle curvature of the tonal contour. Findings are as follows: (1) The low rising tone in Xiamen Dialect has a tendency towards the falling-rising tone, which is significantly affected by the tonal coarticulation and prosodic positions. (2) The low rising tone presents as a falling-rising tone when preceded by a tone with a high offset, and as a low rising tone when preceded by a tone that ends up low. (3) The curvature of the low rising tone is greatest in the sentence-initial position, and is positively correlated to its own duration.
When sampling multiple signals, the correlation between the signals can be exploited to reduce the overall number of samples. In this paper, we study the sampling theory of multiple correlated signals, using correlation to sample them at the lowest sampling rate. Based on the correlation between signal sources, we model multiple continuous-time signals as continuous time-vertex graph signals. The graph signals are projected onto orthogonal bases to remove spatial correlation and reduce dimensions by graph Fourier transform. When the bandwidths of the original signals and the reduced dimension signals are given, we prove the minimum sampling rate required for recovery of the original signals, and propose a feasible sampling scheme.
Under the semi-supervised framework, we propose an end-to-end memory-based segmentation network (MemSeg) to detect surface defects on industrial products. Considering the small intra-class variance of products in the same production line, from the perspective of differences and commonalities, MemSeg introduces artificially simulated abnormal samples and memory samples to assist the learning of the network. In the training phase, MemSeg explicitly learns the potential differences between normal and simulated abnormal images to obtain a robust classification hyperplane. At the same time, inspired by the mechanism of human memory, MemSeg uses a memory pool to store the general patterns of normal samples. By comparing the similarities and differences between input samples and memory samples in the memory pool to give effective guesses about abnormal regions; In the inference phase, MemSeg directly determines the abnormal regions of the input image in an end-to-end manner. Through experimental validation, MemSeg achieves the state-of-the-art (SOTA) performance on MVTec AD datasets with AUC scores of 99.56% and 98.84% at the image-level and pixel-level, respectively. In addition, MemSeg also has a significant advantage in inference speed benefiting from the end-to-end and straightforward network structure, which better meets the real-time requirement in industrial scenarios.
Sampling and interpolation have been extensively studied, in order to reconstruct or estimate the entire graph signal from the signal values on a subset of vertexes, of which most achievements are about continuous signals. While in a lot of signal processing tasks, signals are not fully observed, and only the signs of signals are available, for example a rating system may only provide several simple options. In this paper, the reconstruction of band-limited graph signals based on sign sampling is discussed and a greedy sampling strategy is proposed. The simulation experiments are presented, and the greedy sampling algorithm is compared with random sampling algorithm, which verify the validity of the proposed approach.
Capturing complex high-order interactions among data is an important task in many scenarios. A common way to model high-order interactions is to use hypergraphs whose topology can be mathematically represented by tensors. Existing methods use a fixed-order tensor to describe the topology of the whole hypergraph, which ignores the divergence of different-order interactions. In this work, we take this divergence into consideration, and propose a multi-order hypergraph Laplacian and the corresponding total variation. Taking this total variation as a regularization term, we can utilize the topology information contained by it to smooth the hypergraph signal. This can help distinguish different-order interactions and represent high-order interactions accurately.
When sailing at sea, the smart ship will inevitably produce swaying motion due to the action of wind, wave and current, which makes the image collected by the visual sensor appear motion blur. This will have an adverse effect on the object detection algorithm based on the vision sensor, thereby affect the navigation safety of the smart ship. In order to remove the motion blur in the images during the navigation of the smart ship, we propose SharpGAN, a new image deblurring method based on the generative adversarial network. First of all, the Receptive Field Block Net (RFBNet) is introduced to the deblurring network to strengthen the network's ability to extract the features of blurred image. Secondly, we propose a feature loss that combines different levels of image features to guide the network to perform higher-quality deblurring and improve the feature similarity between the restored images and the sharp image. Finally, we propose to use the lightweight RFB-s module to improve the real-time performance of deblurring network. Compared with the existing deblurring methods on large-scale real sea image datasets and large-scale deblurring datasets, the proposed method not only has better deblurring performance in visual perception and quantitative criteria, but also has higher deblurring efficiency.