Generative processes that involve solving differential equations, such as diffusion models, frequently necessitate balancing speed and quality. ODE-based samplers are fast but plateau in performance while SDE-based samplers deliver higher sample quality at the cost of increased sampling time. We attribute this difference to sampling errors: ODE-samplers involve smaller discretization errors while stochasticity in SDE contracts accumulated errors. Based on these findings, we propose a novel sampling algorithm called Restart in order to better balance discretization errors and contraction. The sampling method alternates between adding substantial noise in additional forward steps and strictly following a backward ODE. Empirically, Restart sampler surpasses previous SDE and ODE samplers in both speed and accuracy. Restart not only outperforms the previous best SDE results, but also accelerates the sampling speed by 10-fold / 2-fold on CIFAR-10 / ImageNet $64 \times 64$. In addition, it attains significantly better sample quality than ODE samplers within comparable sampling times. Moreover, Restart better balances text-image alignment/visual quality versus diversity than previous samplers in the large-scale text-to-image Stable Diffusion model pre-trained on LAION $512 \times 512$. Code is available at https://github.com/Newbeeer/diffusion_restart_sampling
In the era of sixth-generation (6G) wireless communications, integrated sensing and communications (ISAC) is recognized as a promising solution to upgrading the physical system by endowing wireless communications with sensing capability. Existing ISAC is mainly oriented to static scenarios with radio-frequency sensors being the primary participants, thus lacking a comprehensive environment feature characterization and facing a severe performance bottleneck in dynamic environments. In light of this, we generalize the concept of ISAC by mimicking human synesthesia to support intelligent multi-modal sensing-communication integration. The so-termed Synesthesia of Machines (SoM) is not only oriented to generic scenarios, but also particularly suitable for solving challenges arising from dynamic scenarios. We commence by justifying the necessity and potentials of SoM. Subsequently, we offer the definition of SoM and zoom into the specific operating modes, followed by discussions of the state-of-the-art, corresponding objectives, and challenges. To facilitate SoM research, we overview the prerequisite of SoM research, that is, mixed multi-modal (MMM) datasets, and introduce our work. Built upon the MMM datasets, we introduce the mapping relationships between multi-modal sensing and communications, and discuss how channel modeling can be customized to support the exploration of such relationships. Afterwards, we delve into the current research state and implementing challenges of SoM-enhance-based and SoM-concert-based applications. We first overview the SoM-enhance-based communication system designs and present simulation results related to dual-function waveform and predictive beamforming design. Afterwards, we review the recent advances of SoM-concert for single-agent and multi-agent environment sensing. Finally, we propose some open issues and potential directions.
The sixth generation (6G) of mobile communication system is witnessing a new paradigm shift, i.e., integrated sensing-communication system. A comprehensive dataset is a prerequisite for 6G integrated sensing-communication research. This paper develops a novel simulation dataset, named M3SC, for mixed multi-modal (MMM) sensing-communication integration, and the generation framework of the M3SC dataset is further given. To obtain multi-modal sensory data in physical space and communication data in electromagnetic space, we utilize AirSim and WaveFarer to collect multi-modal sensory data and exploit Wireless InSite to collect communication data. Furthermore, the in-depth integration and precise alignment of AirSim, WaveFarer, and Wireless InSite are achieved. The M3SC dataset covers various weather conditions, various frequency bands, and different times of the day. Currently, the M3SC dataset contains 1500 snapshots, including 80 RGB images, 160 depth maps, 80 LiDAR point clouds, 256 sets of mmWave waveforms with 8 radar point clouds, and 72 channel impulse response (CIR) matrices per snapshot, thus totaling 120,000 RGB images, 240,000 depth maps, 120,000 LiDAR point clouds, 384,000 sets of mmWave waveforms with 12,000 radar point clouds, and 108,000 CIR matrices. The data processing result presents the multi-modal sensory information and communication channel statistical properties. Finally, the MMM sensing-communication application, which can be supported by the M3SC dataset, is discussed.
As a promising technique for high-mobility wireless communications, orthogonal time frequency space (OTFS) has been proved to enjoy excellent advantages with respect to traditional orthogonal frequency division multiplexing (OFDM). Although multiple studies have considered index modulation (IM) based OTFS (IM-OTFS) schemes to further improve system performance, a challenging and open problem is the development of effective IM schemes and efficient receivers for practical OTFS systems that must operate in the presence of channel delays and Doppler shifts. In this paper, we propose two novel block-wise IM schemes for OTFS systems, named delay-IM with OTFS (DeIM-OTFS) and Doppler-IM with OTFS (DoIM-OTFS), where a block of delay/Doppler resource bins are activated simultaneously. Based on a maximum likelihood (ML) detector, we analyze upper bounds on the average bit error rates for the proposed DeIM-OTFS and DoIM-OTFS schemes, and verify their performance advantages over the existing IM-OTFS systems. We also develop a multi-layer joint symbol and activation pattern detection (MLJSAPD) algorithm and a customized message passing detection (CMPD) algorithm for our proposed DeIMOTFS and DoIM-OTFS systems with low complexity. Simulation results demonstrate that our proposed MLJSAPD and CMPD algorithms can achieve desired performance with robustness to the imperfect channel state information (CSI).
MCMC algorithms offer empirically efficient tools for sampling from a target distribution $\pi(x) \propto \exp(-V(x))$. However, on the theory side, MCMC algorithms suffer from slow mixing rate when $\pi(x)$ is non-log-concave. Our work examines this gap and shows that when Poincar\'e-style inequality holds on a subset $\mathcal{X}$ of the state space, the conditional distribution of MCMC iterates over $\mathcal{X}$ mixes fast to the true conditional distribution. This fast mixing guarantee can hold in cases when global mixing is provably slow. We formalize the statement and quantify the conditional mixing rate. We further show that conditional mixing can have interesting implications for sampling from mixtures of Gaussians, parameter estimation for Gaussian mixture models and Gibbs-sampling with well-connected local minima.
Motivated by the striking ability of transformers for in-context learning, several works demonstrate that transformers can implement algorithms like gradient descent. By a careful construction of weights, these works show that multiple layers of transformers are expressive enough to simulate gradient descent iterations. Going beyond the question of expressivity, we ask: Can transformers learn to implement such algorithms by training over random problem instances? To our knowledge, we make the first theoretical progress toward this question via analysis of the loss landscape for linear transformers trained over random instances of linear regression. For a single attention layer, we prove the global minimum of the training objective implements a single iteration of preconditioned gradient descent. Notably, the preconditioning matrix not only adapts to the input distribution but also to the variance induced by data inadequacy. For a transformer with $k$ attention layers, we prove certain critical points of the training objective implement $k$ iterations of preconditioned gradient descent. Our results call for future theoretical studies on learning algorithms by training transformers.
Memory-safety bugs introduce critical software-security issues. Rust provides memory-safe mechanisms to avoid memory-safety bugs in programming, while still allowing unsafe escape hatches via unsafe code. However, the unsafe code that enhances the usability of Rust provides clear spots for finding memory-safety bugs in Rust source code. In this paper, we claim that these unsafe spots can still be identifiable in Rust binary code via machine learning and be leveraged for finding memory-safety bugs. To support our claim, we propose the tool textttrustspot, that enables reverse engineering to learn an unsafe classifier that proposes a list of functions in Rust binaries for downstream analysis. We empirically show that the function proposals by textttrustspot can recall $92.92\%$ of memory-safety bugs, while it covers only $16.79\%$ of the entire binary code. As an application, we demonstrate that the function proposals are used in targeted fuzzing on Rust packages, which contribute to reducing the fuzzing time compared to non-targeted fuzzing.
This article aims at providing insights for a comprehensive understanding of THz propagation channels. Specifically, we discuss essential THz channel characteristics to be well understood for the success of THz communications. The methodology of establishing realistic and 6G-compliant THz channel models based on measurements is then elaborated on, followed by a discussion on existing THz channel measurements in the literature. Finally, future research directions, challenges and measures to enrich the understanding of THz channels are discussed.
With the Deep Neural Networks (DNNs) as a powerful function approximator, Deep Reinforcement Learning (DRL) has been excellently demonstrated on robotic control tasks. Compared to DNNs with vanilla artificial neurons, the biologically plausible Spiking Neural Network (SNN) contains a diverse population of spiking neurons, making it naturally powerful on state representation with spatial and temporal information. Based on a hybrid learning framework, where a spike actor-network infers actions from states and a deep critic network evaluates the actor, we propose a Population-coding and Dynamic-neurons improved Spiking Actor Network (PDSAN) for efficient state representation from two different scales: input coding and neuronal coding. For input coding, we apply population coding with dynamically receptive fields to directly encode each input state component. For neuronal coding, we propose different types of dynamic-neurons (containing 1st-order and 2nd-order neuronal dynamics) to describe much more complex neuronal dynamics. Finally, the PDSAN is trained in conjunction with deep critic networks using the Twin Delayed Deep Deterministic policy gradient algorithm (TD3-PDSAN). Extensive experimental results show that our TD3-PDSAN model achieves better performance than state-of-the-art models on four OpenAI gym benchmark tasks. It is an important attempt to improve RL with SNN towards the effective computation satisfying biological plausibility.