In this paper, we overview, evaluate, and demonstrate the sparse processing particle image velocimetry (SPPIV) as a real-time flow field estimation method using the particle image velocimetry (PIV), whereas SPPIV was previously proposed with its feasibility study and its real-time demonstration is conducted for the first time in this study. In the wind tunnel test, the PIV measurement and real-time measurement using SPPIV were conducted for the flow velocity field around the NACA0015 airfoil model. The off-line analysis results of the test show that the flow velocity field can be estimated from a small number of processing points by applying SPPIV, and also illustrates the following characteristics of SPPIV. The estimation accuracy improves as the number of processing points increases, whereas the processing time per step increases in proportion to the number of processing points. Therefore, it is necessary to set an optimal number of processing points. In addition, the application of the Kalman filter significantly improves the estimation accuracy with a small number of processing points while suppressing the processing time. When the flow velocity fields with different angles of attack are used as the training data with that of test data, the estimation using SPPIV is found to be reasonable if the difference in angle of attack between the training and test data is equal to or less than 2 deg and the flow phenomena of the training data are similar to that of the test data. For this reason, training data should be prepared at least every 4 deg. Finally, the demonstration of SPPIV as a real-time flow observation was conducted for the first time. In this demonstration, the real-time measurement is found to be possible at a sampling rate of 2000 Hz at 20 or less processing points in the top 10 modes estimation as expected by the off-line analyses.
As a special infinite-order vector autoregressive (VAR) model, the vector autoregressive moving average (VARMA) model can capture much richer temporal patterns than the widely used finite-order VAR model. However, its practicality has long been hindered by its non-identifiability, computational intractability, and relative difficulty of interpretation. This paper introduces a novel infinite-order VAR model which, with only a little sacrifice of generality, inherits the essential temporal patterns of the VARMA model but avoids all of the above drawbacks. As another attractive feature, the temporal and cross-sectional dependence structures of this model can be interpreted separately, since they are characterized by different sets of parameters. For high-dimensional time series, this separation motivates us to impose sparsity on the parameters determining the cross-sectional dependence. As a result, greater statistical efficiency and interpretability can be achieved, while no loss of temporal information is incurred by the imposed sparsity. We introduce an $\ell_1$-regularized estimator for the proposed model and derive the corresponding nonasymptotic error bounds. An efficient block coordinate descent algorithm and a consistent model order selection method are developed. The merit of the proposed approach is supported by simulation studies and a real-world macroeconomic data analysis.
The goal of this paper is to augment a pre-trained text-to-image diffusion model with the ability of open-vocabulary objects grounding, i.e., simultaneously generating images and segmentation masks for the corresponding visual entities described in the text prompt. We make the following contributions: (i) we insert a grounding module into the existing diffusion model, that can be trained to align the visual and textual embedding space of the diffusion model with only a small number of object categories; (ii) we propose an automatic pipeline for constructing a dataset, that consists of {image, segmentation mask, text prompt} triplets, to train the proposed grounding module; (iii) we evaluate the performance of open-vocabulary grounding on images generated from the text-to-image diffusion model and show that the module can well segment the objects of categories beyond seen ones at training time; (iv) we adopt the guided diffusion model to build a synthetic semantic segmentation dataset, and show that training a standard segmentation model on such dataset demonstrates competitive performance on zero-shot segmentation(ZS3) benchmark, which opens up new opportunities for adopting the powerful diffusion model for discriminative tasks.
This paper investigates the problem of planning a minimum-length tour for a three-dimensional Dubins airplane model to visually inspect a series of targets located on the ground or exterior surface of objects in an urban environment. Objects are 2.5D extruded polygons representing buildings or other structures. A visibility volume defines the set of admissible (occlusion-free) viewing locations for each target that satisfy feasible airspace and imaging constraints. The Dubins traveling salesperson problem with neighborhoods (DTSPN) is extended to three dimensions with visibility volumes that are approximated by triangular meshes. Four sampling algorithms are proposed for sampling vehicle configurations within each visibility volume to define vertices of the underlying DTSPN. Additionally, a heuristic approach is proposed to improve computation time by approximating edge costs of the 3D Dubins airplane with a lower bound that is used to solve for a sequence of viewing locations. The viewing locations are then assigned pitch and heading angles based on their relative geometry. The proposed sampling methods and heuristics are compared through a Monte-Carlo experiment that simulates view planning tours over a realistic urban environment.
Technology advancements in wireless communications and high-performance Extended Reality (XR) have empowered the developments of the Metaverse. The demand for Metaverse applications and hence, real-time digital twinning of real-world scenes is increasing. Nevertheless, the replication of 2D physical world images into 3D virtual world scenes is computationally intensive and requires computation offloading. The disparity in transmitted scene dimension (2D as opposed to 3D) leads to asymmetric data sizes in uplink (UL) and downlink (DL). To ensure the reliability and low latency of the system, we consider an asynchronous joint UL-DL scenario where in the UL stage, the smaller data size of the physical world scenes captured by multiple extended reality users (XUs) will be uploaded to the Metaverse Console (MC) to be construed and rendered. In the DL stage, the larger-size 3D virtual world scenes need to be transmitted back to the XUs. The decisions pertaining to computation offloading and channel assignment are optimized in the UL stage, and the MC will optimize power allocation for users assigned with a channel in the UL transmission stage. Some problems arise therefrom: (i) interactive multi-process chain, specifically Asynchronous Markov Decision Process (AMDP), (ii) joint optimization in multiple processes, and (iii) high-dimensional objective functions, or hybrid reward scenarios. To ensure the reliability and low latency of the system, we design a novel multi-agent reinforcement learning algorithm structure, namely Asynchronous Actors Hybrid Critic (AAHC). Extensive experiments demonstrate that compared to proposed baselines, AAHC obtains better solutions with preferable training time.
Within smart manufacturing, data driven techniques are commonly adopted for condition monitoring and fault diagnosis of rotating machinery. Classical approaches use supervised learning where a classifier is trained on labeled data to predict or classify different operational states of the machine. However, in most industrial applications, labeled data is limited in terms of its size and type. Hence, it cannot serve the training purpose. In this paper, this problem is tackled by addressing the classification task as a similarity measure to a reference sample rather than a supervised classification task. Similarity-based approaches require a limited amount of labeled data and hence, meet the requirements of real-world industrial applications. Accordingly, the paper introduces a similarity-based framework for predictive maintenance (PdM) of rotating machinery. For each operational state of the machine, a reference vibration signal is generated and labeled according to the machine's operational condition. Consequentially, statistical time analysis, fast Fourier transform (FFT), and short-time Fourier transform (STFT) are used to extract features from the captured vibration signals. For each feature type, three similarity metrics, namely structural similarity measure (SSM), cosine similarity, and Euclidean distance are used to measure the similarity between test signals and reference signals in the feature space. Hence, nine settings in terms of feature type-similarity measure combinations are evaluated. Experimental results confirm the effectiveness of similarity-based approaches in achieving very high accuracy with moderate computational requirements compared to machine learning (ML)-based methods. Further, the results indicate that using FFT features with cosine similarity would lead to better performance compared to the other settings.
Text-to-speech (TTS) and voice conversion (VC) are two different tasks both aiming at generating high quality speaking voice according to different input modality. Due to their similarity, this paper proposes UnifySpeech, which brings TTS and VC into a unified framework for the first time. The model is based on the assumption that speech can be decoupled into three independent components: content information, speaker information, prosody information. Both TTS and VC can be regarded as mining these three parts of information from the input and completing the reconstruction of speech. For TTS, the speech content information is derived from the text, while in VC it's derived from the source speech, so all the remaining units are shared except for the speech content extraction module in the two tasks. We applied vector quantization and domain constrain to bridge the gap between the content domains of TTS and VC. Objective and subjective evaluation shows that by combining the two task, TTS obtains better speaker modeling ability while VC gets hold of impressive speech content decoupling capability.
In the Earth's magnetosphere, there are fewer than a dozen dedicated probes beyond low-Earth orbit making in-situ observations at any given time. As a result, we poorly understand its global structure and evolution, the mechanisms of its main activity processes, magnetic storms, and substorms. New Artificial Intelligence (AI) methods, including machine learning, data mining, and data assimilation, as well as new AI-enabled missions will need to be developed to meet this Sparse Data challenge.
Determining causal effects of temporal multi-intervention assists decision-making. Restricted by time-varying bias, selection bias, and interactions of multiple interventions, the disentanglement and estimation of multiple treatment effects from individual temporal data is still rare. To tackle these challenges, we propose a comprehensive framework of temporal counterfactual forecasting from an individual multiple treatment perspective (TCFimt). TCFimt constructs adversarial tasks in a seq2seq framework to alleviate selection and time-varying bias and designs a contrastive learning-based block to decouple a mixed treatment effect into separated main treatment effects and causal interactions which further improves estimation accuracy. Through implementing experiments on two real-world datasets from distinct fields, the proposed method shows satisfactory performance in predicting future outcomes with specific treatments and in choosing optimal treatment type and timing than state-of-the-art methods.
The exponential growth of machine learning (ML) has prompted a great deal of interest in quantifying the uncertainty of each prediction for a user-defined level of confidence. Reliable uncertainty quantification is crucial and is a step towards increased trust in AI results. It becomes especially important in high-stakes decision-making, where the true output must be within the confidence set with high probability. Conformal prediction (CP) is a distribution-free uncertainty quantification framework that works for any black-box model and yields prediction intervals (PIs) that are valid under the mild assumption of exchangeability. CP-type methods are gaining popularity due to being easy to implement and computationally cheap; however, the exchangeability assumption immediately excludes time series forecasting. Although recent papers tackle covariate shift, this is not enough for the general time series forecasting problem of producing H-step ahead valid PIs. To attain such a goal, we propose a new method called AEnbMIMOCQR (Adaptive ensemble batch multiinput multi-output conformalized quantile regression), which produces asymptotic valid PIs and is appropriate for heteroscedastic time series. We compare the proposed method against state-of-the-art competitive methods in the NN5 forecasting competition dataset. All the code and data to reproduce the experiments are made available