Driven by the development goal of network paradigm and demand for various functions in the sixth-generation (6G) mission-critical Internet-of-Things (MC-IoT), we foresee a goal-oriented integration of sensing, communication, computing, and control (GIS3C) in this paper. We first provide an overview of the tasks, requirements, and challenges of MC-IoT. Then we introduce an end-to-end GIS3C architecture, in which goal-oriented communication is leveraged to bridge and empower sensing, communication, control, and computing functionalities. By revealing the interplay among multiple subsystems in terms of key performance indicators and parameters, this paper introduces unified metrics, i.e., task completion effectiveness and cost, to facilitate S3C co-design in MC-IoT. The preliminary results demonstrate the benefits of GIS3C in improving task completion effectiveness while reducing costs. We also identify and highlight the gaps and challenges in applying GIS3C in the future 6G networks.
We consider a wireless networked control system (WNCS) with bidirectional imperfect links for real-time applications such as smart grids. To maintain the stability of WNCS, captured by the probability that plant state violates preset values, at minimal cost, heterogeneous physical processes are monitored by multiple sensors. This status information, such as dynamic plant state and Markov Process-based context information, is then received/estimated by the controller for remote control. However, scheduling multiple sensors and designing the controller with limited resources is challenging due to their coupling, delay, and transmission loss. We formulate a Constrained Markov Decision Problem (CMDP) to minimize violation probability with cost constraints. We reveal the relationship between the goal and different updating actions by analyzing the significance of information that incorporates goal-related usefulness and contextual importance. Subsequently, a goal-oriented deterministic scheduling policy is proposed. Two sensing-assisted control strategies and a control-aware estimation policy are proposed to improve the violation probability-cost tradeoff, integrated with the scheduling policy to form a goal-oriented co-design framework. Additionally, we explore retransmission in downlink transmission and qualitatively analyze its preference scenario. Simulation results demonstrate that the proposed goal-oriented co-design policy outperforms previous work in simultaneously reducing violation probability and cost
Flow matching as a paradigm of generative model achieves notable success across various domains. However, existing methods use either multi-round training or knowledge within minibatches, posing challenges in finding a favorable coupling strategy for straight trajectories. To address this issue, we propose a novel approach, Straighter trajectories of Flow Matching (StraightFM). It straightens trajectories with the coupling strategy guided by diffusion model from entire distribution level. First, we propose a coupling strategy to straighten trajectories, creating couplings between image and noise samples under diffusion model guidance. Second, StraightFM also integrates real data to enhance training, employing a neural network to parameterize another coupling process from images to noise samples. StraightFM is jointly optimized with couplings from above two mutually complementary directions, resulting in straighter trajectories and enabling both one-step and few-step generation. Extensive experiments demonstrate that StraightFM yields high quality samples with fewer step. StraightFM generates visually appealing images with a lower FID among diffusion and traditional flow matching methods within 5 sampling steps when trained on pixel space. In the latent space (i.e., Latent Diffusion), StraightFM achieves a lower KID value compared to existing methods on the CelebA-HQ 256 dataset in fewer than 10 sampling steps.
Infrared small target detection based on deep learning offers unique advantages in separating small targets from complex and dynamic backgrounds. However, the features of infrared small targets gradually weaken as the depth of convolutional neural network (CNN) increases. To address this issue, we propose a novel method for detecting infrared small targets called improved dense nested attention network (IDNANet), which is based on the transformer architecture. We preserve the dense nested structure of dense nested attention network (DNANet) and introduce the Swin-transformer during feature extraction stage to enhance the continuity of features. Furthermore, we integrate the ACmix attention structure into the dense nested structure to enhance the features of intermediate layers. Additionally, we design a weighted dice binary cross-entropy (WD-BCE) loss function to mitigate the negative impact of foreground-background imbalance in the samples. Moreover, we develop a dataset specifically for infrared small targets, called BIT-SIRST. The dataset comprises a significant amount of real-world targets and manually annotated labels, as well as synthetic data and corresponding labels. We have evaluated the effectiveness of our method through experiments conducted on public datasets. In comparison to other state-of-the-art methods, our approach outperforms in terms of probability of detection (P_d), false-alarm rate (F_a), and mean intersection of union ($mIoU$). The $mIoU$ reaches 90.89 on the NUDT-SIRST dataset and 79.72 on the NUAA-SIRST dataset.
Fourier single-pixel imaging (FSI) is a data-efficient single-pixel imaging (SPI). However, there is still a serious challenge to obtain higher imaging quality using fewer measurements, which limits the development of real-time SPI. In this work, a uniform-sampling foveated FSI (UFFSI) is proposed with three features, uniform sampling, effective sampling and flexible fovea, to achieve under-sampling high-efficiency and high-quality SPI, even in a large-scale scene. First, by flexibly using the three proposed foveated pattern structures, data redundancy is reduced significantly to only require high resolution (HR) on regions of interest (ROIs), which radically reduces the need of total data number. Next, by the non-uniform weight distribution processing, non-uniform spatial sampling is transformed into uniform sampling, then the fast Fourier transform is used accurately and directly to obtain under-sampling high imaging quality with further reduced measurements. At a sampling ratio of 0.0084 referring to HR FSI with 1024*768 pixels, experimentally, by UFFSI with 255*341 cells of 89% reduction in data redundancy, the ROI has a significantly better imaging quality to meet imaging needs. We hope this work can provide a breakthrough for future real-time SPI.
Optimizations premised on open-loop metrics such as Age of Information (AoI) indirectly enhance the system's decision-making utility. We therefore propose a novel closed-loop metric named Goal-oriented Tensor (GoT) to directly quantify the impact of semantic mismatches on goal-oriented decision-making utility. Leveraging the GoT, we consider a sampler & decision-maker pair that works collaboratively and distributively to achieve a shared goal of communications. We formulate a two-agent infinite-horizon Decentralized Partially Observable Markov Decision Process (Dec-POMDP) to conjointly deduce the optimal deterministic sampling policy and decision-making policy. To circumvent the curse of dimensionality in obtaining an optimal deterministic joint policy through Brute-Force-Search, a sub-optimal yet computationally efficient algorithm is developed. This algorithm is predicated on the search for a Nash Equilibrium between the sampler and the decision-maker. Simulation results reveal that the proposed sampler & decision-maker co-design surpasses the current literature on AoI and its variants in terms of both goal achievement utility and sparse sampling rate, signifying progress in the semantics-conscious, goal-driven sparse sampling design.
In reinforcement learning (RL), there are two major settings for interacting with the environment: online and offline. Online methods explore the environment at significant time cost, and offline methods efficiently obtain reward signals by sacrificing exploration capability. We propose semi-offline RL, a novel paradigm that smoothly transits from offline to online settings, balances exploration capability and training cost, and provides a theoretical foundation for comparing different RL settings. Based on the semi-offline formulation, we present the RL setting that is optimal in terms of optimization cost, asymptotic error, and overfitting error bound. Extensive experiments show that our semi-offline approach is efficient and yields comparable or often better performance compared with state-of-the-art methods.
Denoising Diffusion Probabilistic Models have shown extraordinary ability on various generative tasks. However, their slow inference speed renders them impractical in speech synthesis. This paper proposes a linear diffusion model (LinDiff) based on an ordinary differential equation to simultaneously reach fast inference and high sample quality. Firstly, we employ linear interpolation between the target and noise to design a diffusion sequence for training, while previously the diffusion path that links the noise and target is a curved segment. When decreasing the number of sampling steps (i.e., the number of line segments used to fit the path), the ease of fitting straight lines compared to curves allows us to generate higher quality samples from a random noise with fewer iterations. Secondly, to reduce computational complexity and achieve effective global modeling of noisy speech, LinDiff employs a patch-based processing approach that partitions the input signal into small patches. The patch-wise token leverages Transformer architecture for effective modeling of global information. Adversarial training is used to further improve the sample quality with decreased sampling steps. We test proposed method with speech synthesis conditioned on acoustic feature (Mel-spectrograms). Experimental results verify that our model can synthesize high-quality speech even with only one diffusion step. Both subjective and objective evaluations demonstrate that our model can synthesize speech of a quality comparable to that of autoregressive models with faster synthesis speed (3 diffusion steps).