Earth system models (ESMs) are the principal tools used in climate science to generate future climate projections under various atmospheric emissions scenarios on a global or regional scale. Generative deep learning approaches are suitable for emulating these tools due to their computational efficiency and ability, once trained, to generate realizations in a fraction of the time required by ESMs. We extend previous work that used a generative probabilistic diffusion model to emulate ESMs by targeting the joint emulation of multiple variables, temperature and precipitation, by a single diffusion model. Joint generation of multiple variables is critical to generate realistic samples of phenomena resulting from the interplay of multiple variables. The diffusion model emulator takes in the monthly mean-maps of temperature and precipitation and produces the daily values of each of these variables that exhibit statistical properties similar to those generated by ESMs. Our results show the outputs from our extended model closely resemble those from ESMs on various climate metrics including dry spells and hot streaks, and that the joint distribution of temperature and precipitation in our sample closely matches those of ESMs.
The potential for zero-shot generalization in vision-language (V-L) models such as CLIP has spurred their widespread adoption in addressing numerous downstream tasks. Previous methods have employed test-time prompt tuning to adapt the model to unseen domains, but they overlooked the issue of imbalanced class distributions. In this study, we explicitly address this problem by employing class-aware prototype alignment weighted by mean class probabilities obtained for the test sample and filtered augmented views. Additionally, we ensure that the class probabilities are as accurate as possible by performing prototype discrimination using contrastive learning. The combination of alignment and discriminative loss serves as a geometric regularizer, preventing the prompt representation from collapsing onto a single class and effectively bridging the distribution gap between the source and test domains. Our method, named PromptSync, synchronizes the prompts for each test sample on both the text and vision branches of the V-L model. In empirical evaluations on the domain generalization benchmark, our method outperforms previous best methods by 2.33% in overall performance, by 1% in base-to-novel generalization, and by 2.84% in cross-dataset transfer tasks.
Monocular egocentric 3D human motion capture is a challenging and actively researched problem. Existing methods use synchronously operating visual sensors (e.g. RGB cameras) and often fail under low lighting and fast motions, which can be restricting in many applications involving head-mounted devices. In response to the existing limitations, this paper 1) introduces a new problem, i.e., 3D human motion capture from an egocentric monocular event camera with a fisheye lens, and 2) proposes the first approach to it called EventEgo3D (EE3D). Event streams have high temporal resolution and provide reliable cues for 3D human motion capture under high-speed human motions and rapidly changing illumination. The proposed EE3D framework is specifically tailored for learning with event streams in the LNES representation, enabling high 3D reconstruction accuracy. We also design a prototype of a mobile head-mounted device with an event camera and record a real dataset with event observations and the ground-truth 3D human poses (in addition to the synthetic dataset). Our EE3D demonstrates robustness and superior 3D accuracy compared to existing solutions across various challenging experiments while supporting real-time 3D pose update rates of 140Hz.
Nanopore sequencing offers the ability for real-time analysis of long DNA sequences at a low cost, enabling new applications such as early detection of cancer. Due to the complex nature of nanopore measurements and the high cost of obtaining ground truth datasets, there is a need for nanopore simulators. Existing simulators rely on handcrafted rules and parameters and do not learn an internal representation that would allow for analysing underlying biological factors of interest. Instead, we propose VADA, a purely data-driven method for simulating nanopores based on an autoregressive latent variable model. We embed subsequences of DNA and introduce a conditional prior to address the challenge of a collapsing conditioning. We introduce an auxiliary regressor on the latent variable to encourage our model to learn an informative latent representation. We empirically demonstrate that our model achieves competitive simulation performance on experimental nanopore data. Moreover, we show we have learned an informative latent representation that is predictive of the DNA labels. We hypothesize that other biological factors of interest, beyond the DNA labels, can potentially be extracted from such a learned latent representation.
In recent years, we have seen many advancements in wood species identification. Methods like DNA analysis, Near Infrared (NIR) spectroscopy, and Direct Analysis in Real Time (DART) mass spectrometry complement the long-established wood anatomical assessment of cell and tissue morphology. However, most of these methods have some limitations such as high costs, the need for skilled experts for data interpretation, and the lack of good datasets for professional reference. Therefore, most of these methods, and certainly the wood anatomical assessment, may benefit from tools based on Artificial Intelligence. In this paper, we apply two transfer learning techniques with Convolutional Neural Networks (CNNs) to a multi-view Congolese wood species dataset including sections from different orientations and viewed at different microscopic magnifications. We explore two feature extraction methods in detail, namely Global Average Pooling (GAP) and Random Encoding of Aggregated Deep Activation Maps (RADAM), for efficient and accurate wood species identification. Our results indicate superior accuracy on diverse datasets and anatomical sections, surpassing the results of other methods. Our proposal represents a significant advancement in wood species identification, offering a robust tool to support the conservation of forest ecosystems and promote sustainable forestry practices.
In recent years, reports of illegal drones threatening public safety have increased. For the invasion of fully autonomous drones, traditional methods such as radio frequency interference and GPS shielding may fail. This paper proposes a scheme that uses an autonomous multicopter with a strapdown camera to intercept a maneuvering intruder UAV. The interceptor multicopter can autonomously detect and intercept intruders moving at high speed in the air. The strapdown camera avoids the complex mechanical structure of the electro-optical pod, making the interceptor multicopter compact. However, the coupling of the camera and multicopter motion makes interception tasks difficult. To solve this problem, an Image-Based Visual Servoing (IBVS) controller is proposed to make the interception fast and accurate. Then, in response to the time delay of sensor imaging and image processing relative to attitude changes in high-speed scenarios, a Delayed Kalman Filter (DKF) observer is generalized to predict the current image position and increase the update frequency. Finally, Hardware-in-the-Loop (HITL) simulations and outdoor flight experiments verify that this method has a high interception accuracy and success rate. In the flight experiments, a high-speed interception is achieved with a terminal speed of 20 m/s.
A probabilistic framework to study the dependence structure induced by deterministic discrete-time state-space systems between input and output processes is introduced. General sufficient conditions are formulated under which output processes exist and are unique once an input process has been fixed, a property that in the deterministic state-space literature is known as the echo state property. When those conditions are satisfied, the given state-space system becomes a generative model for probabilistic dependences between two sequence spaces. Moreover, those conditions guarantee that the output depends continuously on the input when using the Wasserstein metric. The output processes whose existence is proved are shown to be causal in a specific sense and to generalize those studied in purely deterministic situations. The results in this paper constitute a significant stochastic generalization of sufficient conditions for the deterministic echo state property to hold, in the sense that the stochastic echo state property can be satisfied under contractivity conditions that are strictly weaker than those in deterministic situations. This means that state-space systems can induce a purely probabilistic dependence structure between input and output sequence spaces even when there is no functional relation between those two spaces.
This paper proposes to use cepstrum for artifact detection, recognition and removal in prefrontal EEG. This work focuses on the artifact caused by eye movement. A database containing artifact-free EEG and eye movement contaminated EEG from different subjects is established. A cepstral analysis-based feature extraction with support vector machine (SVM) based classifier is designed to identify the artifacts from the target EEG signals. The proposed method achieves an accuracy of 99.62% on the artifact detection task and a 82.79% accuracy on the 6-category eye movement classification task. A statistical value-based artifact removal method is proposed and evaluated on a public EEG database, where an accuracy improvement of 3.46% is obtained on the 3-category emotion classification task. In order to make a confident decision of each 5s EEG segment, the algorithm requires only 0.66M multiplication operations. Compared to the state-of-the-art approaches in artifact detection and removal, the proposed method features higher detection accuracy and lower computational cost, which makes it a more suitable solution to be integrated into a real-time and artifact robust Brain-Machine Interface (BMI).
Mental stress is a prevalent condition that can have negative impacts on one's health. Early detection and treatment are crucial for preventing related illnesses and maintaining overall wellness. This study presents a new method for identifying mental stress using a wearable biosensor worn in the ear. Data was gathered from 14 participants in a controlled environment using stress-inducing tasks such as memory and math tests. The raw photoplethysmography data was then processed by filtering, segmenting, and transforming it into scalograms using a continuous wavelet transform (CWT) which are based on two different mother wavelets, namely, a generalized Morse wavelet and the analytic Morlet (Gabor) wavelet. The scalograms were then passed through a convolutional neural network classifier, GoogLeNet, to classify the signals as stressed or non-stressed. The method achieved an outstanding result using the generalized Morse wavelet with an accuracy of 91.02% and an F1-score of 90.95%. This method demonstrates promise as a reliable tool for early detection and treatment of mental stress by providing real-time monitoring and allowing for preventive measures to be taken before it becomes a serious issue.
The Third Generation Partnership Project (3GPP) has successfully introduced standards for global mobility. However, the volume and complexity of these standards has increased over time, thus complicating access to relevant information for vendors and service providers. Use of Generative Artificial Intelligence (AI) and in particular Large Language Models (LLMs), may provide faster access to relevant information. In this paper, we evaluate the capability of state-of-art LLMs to be used as Question Answering (QA) assistants for 3GPP document reference. Our contribution is threefold. First, we provide a benchmark and measuring methods for evaluating performance of LLMs. Second, we do data preprocessing and fine-tuning for one of these LLMs and provide guidelines to increase accuracy of the responses that apply to all LLMs. Third, we provide a model of our own, TeleRoBERTa, that performs on-par with foundation LLMs but with an order of magnitude less number of parameters. Results show that LLMs can be used as a credible reference tool on telecom technical documents, and thus have potential for a number of different applications from troubleshooting and maintenance, to network operations and software product development.