Integrated sensing and communication (ISAC) has been envisioned as a promising technique to alleviate the spectrum congestion problem. Inspired by the applications of reconfigurable intelligent surface (RIS) in dynamically manipulating wireless propagation environment, in this paper, we investigate to deploy a RIS in an ISAC system to pursue performance improvement. Particularly, we consider a RIS-assisted ISAC system where a multi-antenna base station (BS) performs multi-target detection and multi-user communication with the assistance of a RIS. Our goal is maximizing the weighted summation of target detection signal-to-noise ratios (SNRs) by jointly optimizing the transmit beamforming and the RIS reflection coefficients, while satisfying the communication quality-of-service (QoS) requirement, the total transmit power budget, and the restriction of RIS phase-shift. An efficient alternating optimization algorithm combining the majorization-minimization (MM), penalty-based, and manifold optimization methods is developed to solve the resulting complicated non-convex optimization problem. Simulation results illustrate the advantages of deploying RIS in ISAC systems and the effectiveness of our proposed algorithm.
Integrated sensing and communication (ISAC) is recognized as a promising technology with great potential in saving hardware and spectrum resources, since it simultaneously realizes radar detection and user communication functions in the fully-shared platform. Employing reconfigurable intelligent surface (RIS) in ISAC systems is able to provide a virtual line-of-sight (LoS) path to conquer blockage problem as well as introduce new degrees of freedom (DoFs) to further enhance system performance. Nevertheless, the multiplicative fading effect of passive RIS limits its applications in the absence of direct links, which promotes the development of active RIS. In this paper, we consider an active RIS-assisted ISAC system and aim to jointly design the transmit beamformer, the active RIS reflection and the radar receive filter to maximize the radar output signal-to-noise ratio (SNR) while guaranteeing pre-defined signal-to-interference-plus-noise ratios (SINRs) for communication users. To solve for this non-convex problem, an efficient algorithm is developed by leveraging the techniques of block coordinate descent (BCD), Dinkelbach's transform and majorization-minimization (MM). Simulation results verify the significant advancement of deploying active RIS in ISAC systems, which can achieve up to 32dB radar SNR enhancement compared with the passive RIS-assisted ISAC systems.
How humans infer discrete emotions is a fundamental research question in the field of psychology. While conceptual knowledge about emotions (emotion knowledge) has been suggested to be essential for emotion inference, evidence to date is mostly indirect and inconclusive. As the large language models (LLMs) have been shown to support effective representations of various human conceptual knowledge, the present study further employed artificial neurons in LLMs to investigate the mechanism of human emotion inference. With artificial neurons activated by prompts, the LLM (RoBERTa) demonstrated a similar conceptual structure of 27 discrete emotions as that of human behaviors. Furthermore, the LLM-based conceptual structure revealed a human-like reliance on 14 underlying conceptual attributes of emotions for emotion inference. Most importantly, by manipulating attribute-specific neurons, we found that the corresponding LLM's emotion inference performance deteriorated, and the performance deterioration was correlated to the effectiveness of representations of the conceptual attributes on the human side. Our findings provide direct evidence for the emergence of emotion knowledge representation in large language models and suggest its casual support for discrete emotion inference.
In this paper, we investigate the integration of integrated sensing and communication (ISAC) and reconfigurable intelligent surfaces (RIS) for providing wide-coverage and ultra-reliable communication and high-accuracy sensing functions. In particular, we consider an RIS-assisted ISAC system in which a multi-antenna base station (BS) simultaneously performs multi-user multi-input single-output (MU-MISO) communications and radar sensing with the assistance of an RIS. We focus on both target detection and parameter estimation performance in terms of the signal-to-noise ratio (SNR) and Cramer-Rao bound (CRB), respectively. Two optimization problems are formulated for maximizing the achievable sum-rate of the multi-user communications under an SNR constraint for target detection or a CRB constraint for parameter estimation, the transmit power budget, and the unit-modulus constraint of the RIS reflection coefficients. Efficient algorithms are developed to solve these two complicated non-convex problems. Extensive simulation results demonstrate the advantages of the proposed joint beamforming and reflection designs compared with other schemes. In addition, it is shown that more RIS reflection elements bring larger performance gains for direct-of-arrival (DoA) estimation than for target detection.
Recently privacy-preserving action recognition (PPAR) has been becoming an appealing video understanding problem. Nevertheless, existing works focus on the frame-level (spatial) privacy preservation, ignoring the privacy leakage from a whole video and destroying the temporal continuity of actions. In this paper, we present a novel PPAR paradigm, i.e., performing privacy preservation from both spatial and temporal perspectives, and propose a STPrivacy framework. For the first time, our STPrivacy applies vision Transformers to PPAR and regards a video as a sequence of spatio-temporal tubelets, showing outstanding advantages over previous convolutional methods. Specifically, our STPrivacy adaptively treats privacy-containing tubelets in two different manners. The tubelets irrelevant to actions are directly abandoned, i.e., sparsification, and not published for subsequent tasks. In contrast, those highly involved in actions are anonymized, i.e., anonymization, to remove private information. These two transformation mechanisms are complementary and simultaneously optimized in our unified framework. Because there is no large-scale benchmarks, we annotate five privacy attributes for two of the most popular action recognition datasets, i.e., HMDB51 and UCF101, and conduct extensive experiments on them. Moreover, to verify the generalization ability of our STPrivacy, we further introduce a privacy-preserving facial expression recognition task and conduct experiments on a large-scale video facial attributes dataset, i.e., Celeb-VHQ. The thorough comparisons and visualization analysis demonstrate our significant superiority over existing works. The appendix contains more details and visualizations.
The emerging drone aerial survey has the advantages of low cost, high efficiency, and flexible use. However, UAVs are often equipped with cheap POS systems and non-measurement cameras, and their flight attitudes are easily affected. How to realize the large-scale mapping of UAV image-free control supported by POS faces many technical problems. The most basic and important core technology is how to accurately realize the absolute orientation of images through advanced aerial triangulation technology. In traditional aerial triangulation, image matching algorithms are constrained to varying degrees by preset prior knowledge. In recent years, deep learning has developed rapidly in the field of photogrammetric computer vision. It has surpassed the performance of traditional handcrafted features in many aspects. It has shown stronger stability in image-based navigation and positioning tasks, especially it has better resistance to unfavorable factors such as blur, illumination changes, and geometric distortion. Based on the introduction of the key technologies of aerial triangulation without image control points, this paper proposes a new drone image registration method based on deep learning image features to solve the problem of high mismatch rate in traditional methods. It adopts SuperPoint as the feature detector, uses the superior generalization performance of CNN to extract precise feature points from the UAV image, thereby achieving high-precision aerial triangulation. Experimental results show that under the same pre-processing and post-processing conditions, compared with the traditional method based on the SIFT algorithm, this method achieves suitable precision more efficiently, which can meet the requirements of UAV aerial triangulation without image control points in large-scale surveys.
We aim to bridge the gap between our common-sense few-sample human learning and large-data machine learning. We derive a theory of human-like few-shot learning from von-Neuman-Landauer's principle. modelling human learning is difficult as how people learn varies from one to another. Under commonly accepted definitions, we prove that all human or animal few-shot learning, and major models including Free Energy Principle and Bayesian Program Learning that model such learning, approximate our theory, under Church-Turing thesis. We find that deep generative model like variational autoencoder (VAE) can be used to approximate our theory and perform significantly better than baseline models including deep neural networks, for image recognition, low resource language processing, and character recognition.
Recent anti-spoofing systems focus on spoofing detection, where the task is only to determine whether the test audio is fake. However, there are few studies putting attention to identifying the methods of generating fake speech. Common spoofing attack algorithms in the logical access (LA) scenario, such as voice conversion and speech synthesis, can be divided into several stages: input processing, conversion, waveform generation, etc. In this work, we propose a system for classifying different spoofing attributes, representing characteristics of different modules in the whole pipeline. Classifying attributes for the spoofing attack other than determining the whole spoofing pipeline can make the system more robust when encountering complex combinations of different modules at different stages. In addition, our system can also be used as an auxiliary system for anti-spoofing against unseen spoofing methods. The experiments are conducted on ASVspoof 2019 LA data set and the proposed method achieved a 20\% relative improvement against conventional binary spoof detection methods.
Corals are the primary habitat-building life-form on reefs that support a quarter of the species in the ocean. A coral reef ecosystem usually consists of reefs, each of which is like a tall building in any city. These reef-building corals secrete hard calcareous exoskeletons that give them structural rigidity, and are also a prerequisite for our accurate 3D modeling and semantic mapping using advanced photogrammetric computer vision and machine learning. Underwater videography as a modern underwater remote sensing tool is a high-resolution coral habitat survey and mapping technique. In this paper, detailed 3D mesh models, digital surface models and orthophotos of the coral habitat are generated from the collected coral images and underwater control points. Meanwhile, a novel pixel-wise semantic segmentation approach of orthophotos is performed by advanced deep learning. Finally, the semantic map is mapped into 3D space. For the first time, 3D fine-grained semantic modeling and rugosity evaluation of coral reefs have been completed at millimeter (mm) accuracy. This provides a new and powerful method for understanding the processes and characteristics of coral reef change at high spatial and temporal resolution under climate change.
With increasing number of crowdsourced private automatic weather stations (called TPAWS) established to fill the gap of official network and obtain local weather information for various purposes, the data quality is a major concern in promoting their usage. Proper quality control and assessment are necessary to reach mutual agreement on the TPAWS observations. To derive near real-time assessment for operational system, we propose a simple, scalable and interpretable framework based on AI/Stats/ML models. The framework constructs separate models for individual data from official sources and then provides the final assessment by fusing the individual models. The performance of our proposed framework is evaluated by synthetic data and demonstrated by applying it to a re-al TPAWS network.