The Industrial Internet of Things (IIoT) enables industries to build large interconnected systems utilizing various technologies that require high data rates. Terahertz (THz) communication is envisioned as a candidate technology for achieving data rates of several terabits-per-second (Tbps). Despite this, establishing a reliable communication link at THz frequencies remains a challenge due to high pathloss and molecular absorption. To overcome these limitations, this paper proposes using intelligent reconfigurable surfaces (IRSs) with THz communications to enable future smart factories for the IIoT. In this paper, we formulate the power allocation and joint IIoT device and IRS association (JIIA) problem, which is a mixed-integer nonlinear programming (MINLP) problem. {Furthermore, the JIIA problem aims to maximize the sum rate with imperfect channel state information (CSI).} To address this non-deterministic polynomial-time hard (NP-hard) problem, we decompose the problem into multiple sub-problems, which we solve iteratively. Specifically, we propose a Gale-Shapley algorithm-based JIIA solution to obtain stable matching between uplink and downlink IRSs. {We validate the proposed solution by comparing the Gale-Shapley-based JIIA algorithm with exhaustive search (ES), greedy search (GS), and random association (RA) with imperfect CSI.} The complexity analysis shows that our algorithm is more efficient than the ES.
Digital Twins (DT) have become crucial to achieve sustainable and effective smart urban solutions. However, current DT modelling techniques cannot support the dynamicity of these smart city environments. This is caused by the lack of right-time data capturing in traditional approaches, resulting in inaccurate modelling and high resource and energy consumption challenges. To fill this gap, we explore spatiotemporal graphs and propose the Reinforcement Learning-based Adaptive Twining (RL-AT) mechanism with Deep Q Networks (DQN). By doing so, our study contributes to advancing Green Cities and showcases tangible benefits in accuracy, synchronisation, resource optimization, and energy efficiency. As a result, we note the spatiotemporal graphs are able to offer a consistent accuracy and 55% higher querying performance when implemented using graph databases. In addition, our model demonstrates right-time data capturing with 20% lower overhead and 25% lower energy consumption.
Automatic Speech Recognition (ASR) systems are used in the financial domain to enhance the caller experience by enabling natural language understanding and facilitating efficient and intuitive interactions. Increasing use of ASR systems requires that such systems exhibit very low error rates. The predominant ASR models to collect numeric data are large, general-purpose commercial models -- Google Speech-to-text (STT), or Amazon Transcribe -- or open source (OpenAI's Whisper). Such ASR models are trained on hundreds of thousands of hours of audio data and require considerable resources to run. Despite recent progress large speech recognition models, we highlight the potential of smaller, specialized "micro" models. Such light models can be trained perform well on number recognition specific tasks, competing with general models like Whisper or Google STT while using less than 80 minutes of training time and occupying at least an order of less memory resources. Also, unlike larger speech recognition models, micro-models are trained on carefully selected and curated datasets, which makes them highly accurate, agile, and easy to retrain, while using low compute resources. We present our work on creating micro models for multi-digit number recognition that handle diverse speaking styles reflecting real-world pronunciation patterns. Our work contributes to domain-specific ASR models, improving digit recognition accuracy, and privacy of data. An added advantage, their low resource consumption allows them to be hosted on-premise, keeping private data local instead uploading to an external cloud. Our results indicate that our micro-model makes less errors than the best-of-breed commercial or open-source ASRs in recognizing digits (1.8% error rate of our best micro-model versus 5.8% error rate of Whisper), and has a low memory footprint (0.66 GB VRAM for our model versus 11 GB VRAM for Whisper).
The spatio-temporal interpolation of large geophysical datasets has historically been adressed by Optimal Interpolation (OI) and more sophisticated model-based or data-driven DA techniques. In the last ten years, the link established between Stochastic Partial Differential Equations (SPDE) and Gaussian Markov Random Fields (GMRF) opened a new way of handling both large datasets and physically-induced covariance matrix in Optimal Interpolation. Recent advances in the deep learning community also enables to adress this problem as neural architecture embedding data assimilation variational framework. The reconstruction task is seen as a joint learning problem of the prior involved in the variational inner cost and the gradient-based minimization of the latter: both prior models and solvers are stated as neural networks with automatic differentiation which can be trained by minimizing a loss function, typically stated as the mean squared error between some ground truth and the reconstruction. In this work, we draw from the SPDE-based Gaussian Processes to estimate complex prior models able to handle non-stationary covariances in both space and time and provide a stochastic framework for interpretability and uncertainty quantification. Our neural variational scheme is modified to embed an augmented state formulation with both state and SPDE parametrization to estimate. Instead of a neural prior, we use a stochastic PDE as surrogate model along the data assimilation window. The training involves a loss function for both reconstruction task and SPDE prior model, where the likelihood of the SPDE parameters given the true states is involved in the training. Because the prior is stochastic, we can easily draw samples in the prior distribution before conditioning to provide a flexible way to estimate the posterior distribution based on thousands of members.
Despite their remarkable effectiveness and broad application, the drivers of success underlying ensembles of trees are still not fully understood. In this paper, we highlight how interpreting tree ensembles as adaptive and self-regularizing smoothers can provide new intuition and deeper insight to this topic. We use this perspective to show that, when studied as smoothers, randomized tree ensembles not only make predictions that are quantifiably more smooth than the predictions of the individual trees they consist of, but also further regulate their smoothness at test-time based on the dissimilarity between testing and training inputs. First, we use this insight to revisit, refine and reconcile two recent explanations of forest success by providing a new way of quantifying the conjectured behaviors of tree ensembles objectively by measuring the effective degree of smoothing they imply. Then, we move beyond existing explanations for the mechanisms by which tree ensembles improve upon individual trees and challenge the popular wisdom that the superior performance of forests should be understood as a consequence of variance reduction alone. We argue that the current high-level dichotomy into bias- and variance-reduction prevalent in statistics is insufficient to understand tree ensembles -- because the prevailing definition of bias does not capture differences in the expressivity of the hypothesis classes formed by trees and forests. Instead, we show that forests can improve upon trees by three distinct mechanisms that are usually implicitly entangled. In particular, we demonstrate that the smoothing effect of ensembling can reduce variance in predictions due to noise in outcome generation, reduce variability in the quality of the learned function given fixed input data and reduce potential bias in learnable functions by enriching the available hypothesis space.
The 2024 ICASSP Auditory EEG Signal Processing Grand Challenge concerns the decoding of electroencephalography (EEG) measurements taken from participants who listened to speech material. This work details our solution to the match-mismatch sub-task: given a short temporal segment of EEG recordings and several candidate speech segments, the task is to classify which of the speech segments was time-aligned with the EEG signals. We show that high-frequency gamma-band responses to the speech envelope can be detected with a high accuracy. By jointly assessing gamma-band responses and low-frequency envelope tracking, we develop a match-mismatch decoder which placed first in this task.
The Amazon, the world's largest rainforest, faces a severe historic drought. The Rio Negro River, one of the major Amazon River tributaries, reaches its lowest level in a century in October 2023. Here, we used a U-net deep learning model to map water surfaces in the Rio Negro River basin every 12 days in 2022 and 2023 using 10 m spatial resolution Sentinel-1 satellite radar images. The accuracy of the water surface model was high with an F1-score of 0.93. The 12 days mosaic time series of water surface was generated from the Sentinel-1 prediction. The water surface mask demonstrated relatively consistent agreement with the Global Surface Water (GSW) product from Joint Research Centre (F1-score: 0.708) and with the Brazilian Mapbiomas Water initiative (F1-score: 0.686). The main errors of the map were omission errors in flooded woodland, in flooded shrub and because of clouds. Rio Negro water surfaces reached their lowest level around the 25th of November 2023 and were reduced to 68.1\% (9,559.9 km$^2$) of the maximum water surfaces observed in the period 2022-2023 (14,036.3 km$^2$). Synthetic Aperture Radar (SAR) data, in conjunction with deep learning techniques, can significantly improve near real-time mapping of water surface in tropical regions.
System Verilog Assertion (SVA) formulation, a critical yet complex task, is a pre-requisite in the Formal Property Verification (FPV) process. Traditionally, SVA formulation involves expert-driven interpretation of specifications. This is time consuming and prone to human error. However, recent advances in Large Language Models (LLM), LLM-informed automatic assertion generation is gaining interest. We designed a novel LLM-based pipeline to generate assertions in English Language, Linear Temporal Logic, and SVA from natural language specifications. We developed a custom LLM-based on OpenAI GPT4 for our experiments. Furthermore, we developed testbenches to verify/validate the LLM-generated assertions. Only 43% of LLM-generated raw assertions had errors, including syntax and logical errors. By iteratively prompting the LLMs using carefully crafted prompts derived from test case failures, the pipeline could generate correct SVAs after a maximum of nine iterations of prompting. Our results show that LLMs can streamline the assertion generation workflow, reshaping verification workflows.
3D Gaussian Splatting has emerged as an alternative 3D representation of Neural Radiance Fields (NeRFs), benefiting from its high-quality rendering results and real-time rendering speed. Considering the 3D Gaussian representation remains unparsed, it is necessary first to execute object segmentation within this domain. Subsequently, scene editing and collision detection can be performed, proving vital to a multitude of applications, such as virtual reality (VR), augmented reality (AR), game/movie production, etc. In this paper, we propose a novel approach to achieve object segmentation in 3D Gaussian via an interactive procedure without any training process and learned parameters. We refer to the proposed method as SA-GS, for Segment Anything in 3D Gaussians. Given a set of clicked points in a single input view, SA-GS can generalize SAM to achieve 3D consistent segmentation via the proposed multi-view mask generation and view-wise label assignment methods. We also propose a cross-view label-voting approach to assign labels from different views. In addition, in order to address the boundary roughness issue of segmented objects resulting from the non-negligible spatial sizes of 3D Gaussian located at the boundary, SA-GS incorporates the simple but effective Gaussian Decomposition scheme. Extensive experiments demonstrate that SA-GS achieves high-quality 3D segmentation results, which can also be easily applied for scene editing and collision detection tasks. Codes will be released soon.
Authorship Attribution (AA) and Authorship Obfuscation (AO) are two competing tasks of increasing importance in privacy research. Modern AA leverages an author's consistent writing style to match a text to its author using an AA classifier. AO is the corresponding adversarial task, aiming to modify a text in such a way that its semantics are preserved, yet an AA model cannot correctly infer its authorship. To address privacy concerns raised by state-of-the-art (SOTA) AA methods, new AO methods have been proposed but remain largely impractical to use due to their prohibitively slow training and obfuscation speed, often taking hours. To this challenge, we propose a practical AO method, ALISON, that (1) dramatically reduces training/obfuscation time, demonstrating more than 10x faster obfuscation than SOTA AO methods, (2) achieves better obfuscation success through attacking three transformer-based AA methods on two benchmark datasets, typically performing 15% better than competing methods, (3) does not require direct signals from a target AA classifier during obfuscation, and (4) utilizes unique stylometric features, allowing sound model interpretation for explainable obfuscation. We also demonstrate that ALISON can effectively prevent four SOTA AA methods from accurately determining the authorship of ChatGPT-generated texts, all while minimally changing the original text semantics. To ensure the reproducibility of our findings, our code and data are available at: https://github.com/EricX003/ALISON.