Hybrid analog and digital beamforming has emerged as a key enabling technology for millimeter wave (mmWave) massive multiple-input multiple-output (MIMO) communication systems since it can balance the trade-off between system performance and hardware efficiency. Owing to the strong ability of central control, cooperative networks show great potential to enhance the spectral efficiency of mmWave communications. In this paper, we consider cooperative mmWave MIMO systems and propose user association and hybrid beamforming design algorithms for three typical hybrid beamforming architectures. The central processing unit (CPU) of the cooperative networks first matches the service pairs of base stations (BSs) and users. Then, an iterative hybrid beamforming design algorithm is proposed to maximize the weighted achievable sum-rate performance of the mmWave MIMO system with fully connected hybrid beamforming architecture. Moreover, a heuristic analog beamforming design algorithm is introduced for the fixed subarray hybrid beamforming architecture. In an effort to further exploit multiple-antenna diversities, we also consider the dynamic subarray architecture and propose a novel antenna design algorithm for the analog beamforming design. Simulation results illustrate that the proposed hybrid beamforming algorithms achieve a significant performance improvement than other existing approaches and the dynamic subarray architecture has great advantages of improving the energy efficiency (EE) performance.
Cell-free networks are regarded as a promising technology to meet higher rate requirements for beyond fifth-generation (5G) communications. Most works on cell-free networks focus on either fully centralized beamforming to maximally enhance system performance, or fully distributed beamforming to avoid extensive channel state information (CSI) exchange among access points (APs). In order to achieve both network capacity improvement and CSI exchange reduction, we propose a partially distributed beamforming design algorithm for reconfigurable intelligent surface (RIS)-aided cell-free networks. We aim at maximizing the weighted sum-rate of all users by designing active and passive beamforming subject to transmit power constraints of APs and unit-modulus constraints of RIS elements. The weighted sum-rate maximization problem is first transformed into an equivalent weighted sum-mean-square-error (sum-MSE) minimization problem, and then alternating optimization (AO) approach is adopted to iteratively design active and passive beamformer. Specifically, active beamforming vectors are obtained by local APs and passive beamforming vector is optimized by central processing unit (CPU). Numerical results not only illustrate the proposed partially distributed algorithm achieves the remarkable performance improvement compared with conventional local beamforming methods, but also further show the considerable potential of deploying RIS in cell-free networks.
Dual-function radar-communication (DFRC) systems, which can efficiently utilize the congested spectrum and costly hardware resources by employing one common waveform for both sensing and communication (S&C), have attracted increasing attention. While the orthogonal frequency division multiplexing (OFDM) technique has been widely adopted to support high-quality communications, it also has great potentials of improving radar sensing performance and providing flexible S&C. In this paper, we propose to jointly design the dual-functional transmit signals occupying several subcarriers to realize multi-user OFDM communications and detect one moving target in the presence of clutter. Meanwhile, the signals in other frequency subcarriers can be optimized in a similar way to perform other tasks. The transmit beamforming and receive filter are jointly optimized to maximize the radar output signal-to-interference-plus-noise ratio (SINR), while satisfying the communication SINR requirement and the power budget. An majorization minimization (MM) method based algorithm is developed to solve the resulting non-convex optimization problem. Numerical results reveal the significant wideband sensing gain brought by jointly designing the transmit signals in different subcarriers, and demonstrate the advantages of our proposed scheme and the effectiveness of the developed algorithm.
Integrated sensing and communication (ISAC) has been envisioned as a promising technology to tackle the spectrum congestion problem for future networks. In this correspondence, we investigate to deploy a reconfigurable intelligent surface (RIS) in an ISAC system for achieving better performance. In particular, a multi-antenna base station (BS) simultaneously serves multiple single-antenna users with the assistance of a RIS and detects potential targets. The active beamforming of the BS and the passive beamforming of the RIS are jointly optimized to maximize the achievable sum-rate of the communication users while satisfying the constraint of beampattern similarity for radar sensing, the restriction of the RIS, and the transmit power budget. An efficient alternating algorithm based on the fractional programming (FP), majorization-minimization (MM), and manifold optimization methods is developed to convert the resulting non-convex optimization problem into two solvable sub-problems and iteratively solve them. Simulation studies illustrate the advancement of deploying RIS in ISAC systems and the effectiveness of the proposed algorithm.
This paper describes our DKU-OPPO system for the 2022 Spoofing-Aware Speaker Verification (SASV) Challenge. First, we split the joint task into speaker verification (SV) and spoofing countermeasure (CM), these two tasks which are optimized separately. For ASV systems, four state-of-the-art methods are employed. For CM systems, we propose two methods on top of the challenge baseline to further improve the performance, namely Embedding Random Sampling Augmentation (ERSA) and One-Class Confusion Loss(OCCL). Second, we also explore whether SV embedding could help improve CM system performance. We observe a dramatic performance degradation of existing CM systems on the domain-mismatched Voxceleb2 dataset. Third, we compare different fusion strategies, including parallel score fusion and sequential cascaded systems. Compared to the 1.71% SASV-EER baseline, our submitted cascaded system obtains a 0.21% SASV-EER on the challenge official evaluation set.
Automatic speaker verification has achieved remarkable progress in recent years. However, there is little research on cross-age speaker verification (CASV) due to insufficient relevant data. In this paper, we mine cross-age test sets based on the VoxCeleb dataset and propose our age-invariant speaker representation(AISR) learning method. Since the VoxCeleb is collected from the YouTube platform, the dataset consists of cross-age data inherently. However, the meta-data does not contain the speaker age label. Therefore, we adopt the face age estimation method to predict the speaker age value from the associated visual data, then label the audio recording with the estimated age. We construct multiple Cross-Age test sets on VoxCeleb (Vox-CA), which deliberately select the positive trials with large age-gap. Also, the effect of nationality and gender is considered in selecting negative pairs to align with Vox-H cases. The baseline system performance drops from 1.939\% EER on the Vox-H test set to 10.419\% on the Vox-CA20 test set, which indicates how difficult the cross-age scenario is. Consequently, we propose an age-decoupling adversarial learning (ADAL) method to alleviate the negative effect of the age gap and reduce intra-class variance. Our method outperforms the baseline system by over 10\% related EER reduction on the Vox-CA20 test set. The source code and trial resources are available on https://github.com/qinxiaoyi/Cross-Age_Speaker_Verification
This paper proposes an online target speaker voice activity detection system for speaker diarization tasks, which does not require a priori knowledge from the clustering-based diarization system to obtain the target speaker embeddings. First, we employ a ResNet-based front-end model to extract the frame-level speaker embeddings for each coming block of a signal. Next, we predict the detection state of each speaker based on these frame-level speaker embeddings and the previously estimated target speaker embedding. Then, the target speaker embeddings are updated by aggregating these frame-level speaker embeddings according to the predictions in the current block. We iteratively extract the results for each block and update the target speaker embedding until reaching the end of the signal. Experimental results show that the proposed method is better than the offline clustering-based diarization system on the AliMeeting dataset.
Offline imitation learning (IL) is a powerful method to solve decision-making problems from expert demonstrations without reward labels. Existing offline IL methods suffer from severe performance degeneration under limited expert data due to covariate shift. Including a learned dynamics model can potentially improve the state-action space coverage of expert data, however, it also faces challenging issues like model approximation/generalization errors and suboptimality of rollout data. In this paper, we propose the Discriminator-guided Model-based offline Imitation Learning (DMIL) framework, which introduces a discriminator to simultaneously distinguish the dynamics correctness and suboptimality of model rollout data against real expert demonstrations. DMIL adopts a novel cooperative-yet-adversarial learning strategy, which uses the discriminator to guide and couple the learning process of the policy and dynamics model, resulting in improved model performance and robustness. Our framework can also be extended to the case when demonstrations contain a large proportion of suboptimal data. Experimental results show that DMIL and its extension achieve superior performance and robustness compared to state-of-the-art offline IL methods under small datasets.
Learning effective reinforcement learning (RL) policies to solve real-world complex tasks can be quite challenging without a high-fidelity simulation environment. In most cases, we are only given imperfect simulators with simplified dynamics, which inevitably lead to severe sim-to-real gaps in RL policy learning. The recently emerged field of offline RL provides another possibility to learn policies directly from pre-collected historical data. However, to achieve reasonable performance, existing offline RL algorithms need impractically large offline data with sufficient state-action space coverage for training. This brings up a new question: is it possible to combine learning from limited real data in offline RL and unrestricted exploration through imperfect simulators in online RL to address the drawbacks of both approaches? In this study, we propose the Dynamics-Aware Hybrid Offline-and-Online Reinforcement Learning (H2O) framework to provide an affirmative answer to this question. H2O introduces a dynamics-aware policy evaluation scheme, which adaptively penalizes the Q function learning on simulated state-action pairs with large dynamics gaps, while also simultaneously allowing learning from a fixed real-world dataset. Through extensive simulation and real-world tasks, as well as theoretical analysis, we demonstrate the superior performance of H2O against other cross-domain online and offline RL algorithms. H2O provides a brand new hybrid offline-and-online RL paradigm, which can potentially shed light on future RL algorithm design for solving practical real-world tasks.
Most real-world problems that machine learning algorithms are expected to solve face the situation with 1) unknown data distribution; 2) little domain-specific knowledge; and 3) datasets with limited annotation. We propose Non-Parametric learning by Compression with Latent Variables (NPC-LV), a learning framework for any dataset with abundant unlabeled data but very few labeled ones. By only training a generative model in an unsupervised way, the framework utilizes the data distribution to build a compressor. Using a compressor-based distance metric derived from Kolmogorov complexity, together with few labeled data, NPC-LV classifies without further training. We show that NPC-LV outperforms supervised methods on all three datasets on image classification in low data regime and even outperform semi-supervised learning methods on CIFAR-10. We demonstrate how and when negative evidence lowerbound (nELBO) can be used as an approximate compressed length for classification. By revealing the correlation between compression rate and classification accuracy, we illustrate that under NPC-LV, the improvement of generative models can enhance downstream classification accuracy.