In this paper, we investigate network-assisted full-duplex (NAFD) cell-free millimeter-wave (mmWave) massive multiple-input multiple-output (MIMO) systems with digital-to-analog converter (DAC) quantization and fronthaul compression. We propose to maximize the weighted uplink and downlink sum rate by jointly optimizing the power allocation of both the transmitting remote antenna units (T-RAUs) and uplink users and the variances of the downlink and uplink fronthaul compression noises. To deal with this challenging problem, we further apply a successive convex approximation (SCA) method to handle the non-convex bidirectional limited-capacity fronthaul constraints. The simulation results verify the convergence of the proposed SCA-based algorithm and analyze the impact of fronthaul capacity and DAC quantization on the spectral efficiency of the NAFD cell-free mmWave massive MIMO systems. Moreover, some insightful conclusions are obtained through the comparisons of spectral efficiency, which shows that NAFD achieves better performance gains than co-time co-frequency full-duplex cloud radio access network (CCFD C-RAN) in the cases of practical limited-resolution DACs. Specifically, their performance gaps with 8-bit DAC quantization are larger than that with 1-bit DAC quantization, which attains a 5.5-fold improvement.
This paper proposes a multi-level cooperative architecture to balance the spectral efficiency and scalability of cell-free massive multiple-input multiple-output (MIMO) systems. In the proposed architecture, spatial expansion units (SEUs) are introduced to avoid a large amount of computation at the access points (APs) and increase the degree of cooperation among APs. We first derive the closed-form expressions of the uplink user achievable rates under multi-level cooperative architecture with maximal ratio combination (MRC) and zero-forcing (ZF) receivers. The accuracy of the closed-form expressions is verified. Moreover, numerical results have demonstrated that the proposed multi-level cooperative architecture achieves a better trade-off between spectral efficiency and scalability than other forms of cell-free massive MIMO architectures.
Smart Internet of Vehicles (IoV) as a promising application in Internet of Things (IoT) emerges with the development of the fifth generation mobile communication (5G). Nevertheless, the heterogeneous requirements of sufficient battery capacity, powerful computing ability and energy efficiency for electric vehicles face great challenges due to the explosive data growth in 5G and the sixth generation of mobile communication (6G) networks. In order to alleviate the deficiencies mentioned above, this paper proposes a mobile edge computing (MEC) enabled IoV system, in which electric vehicle nodes (eVNs) upload and download data through an anchor node (AN) which is integrated with a MEC server. Meanwhile, the anchor node transmitters radio signal to electric vehicles with simultaneous wireless information and power transfer (SWIPT) technology so as to compensate the battery limitation of eletric vehicles. Moreover, the spectrum efficiency is further improved by multi-input and multi-output (MIMO) and full-duplex (FD) technologies which is equipped at the anchor node. In consideration of the issues above, we maximize the average energy efficiency of electric vehicles by jointly optimize the CPU frequency, vehicle transmitting power, computing tasks and uplink rate. Since the problem is nonconvex, we propose a novel alternate interior-point iterative scheme (AIIS) under the constraints of computing tasks, energy consumption and time latency. Results and discussion section verifies the effectiveness of the proposed AIIS scheme comparing with the benchmark schemes.
Network-assisted full-duplex (NAFD) distributed massive multiple input multiple output (M-MIMO) enables the in-band full-duplex with existing half-duplex devices at the network level, which exceptionally improves spectral efficiency. This paper analyzes the impact of low-resolution analog-to-digital converters (ADCs) on NAFD distributed M-MIMO and designs an efficient bit allocation algorithm for low-resolution ADCs. The beamforming training mechanism relieves the heavy pilot overhead for channel estimation, which remarkably enhances system performance by guiding the interference cancellation and coherence detection. Furthermore, closed-form expressions for spectral and energy efficiency with low-resolution ADCs are derived. The multi-objective optimization problem (MOOP) for spectral and energy efficiency is solved by the deep Q network and the non-dominated sorting genetic algorithm II. The simulation results corroborate the theoretical derivation and verify the effectiveness of introducing low-resolution ADCs in NAFD distributed M-MIMO systems. Meanwhile, a set of Pareto-optimal solutions for ADC accuracy flexibly provide guidelines for deploying in a practical NAFD distributed M-MIMO system.
Voice conversion for highly expressive speech is challenging. Current approaches struggle with the balancing between speaker similarity, intelligibility and expressiveness. To address this problem, we propose Expressive-VC, a novel end-to-end voice conversion framework that leverages advantages from both neural bottleneck feature (BNF) approach and information perturbation approach. Specifically, we use a BNF encoder and a Perturbed-Wav encoder to form a content extractor to learn linguistic and para-linguistic features respectively, where BNFs come from a robust pre-trained ASR model and the perturbed wave becomes speaker-irrelevant after signal perturbation. We further fuse the linguistic and para-linguistic features through an attention mechanism, where speaker-dependent prosody features are adopted as the attention query, which result from a prosody encoder with target speaker embedding and normalized pitch and energy of source speech as input. Finally the decoder consumes the integrated features and the speaker-dependent prosody feature to generate the converted speech. Experiments demonstrate that Expressive-VC is superior to several state-of-the-art systems, achieving both high expressiveness captured from the source speech and high speaker similarity with the target speaker; meanwhile intelligibility is well maintained.
To improve the poor performance of distributed operation and non-scalability of centralized operation in traditional cell-free massive MIMO, we propose a cell-free distributed collaborative (CFDC) massive multiple-input multiple-output (MIMO) system based on a novel two-layer model to take advantages of the distributed cloud-edge-end collaborative architecture in beyond 5G (B5G) internet of things (IoT) environment to provide strong flexibility and scalability. We further ultilize the proposed CFDC massive MIMO system to support the low altitude three-dimensional (3-D) coverage scenario with unmanned aerial vehicles (UAVs), while accounting for 3-D Rician channel estimation, user-centric association and different scalable receiving schemes. Since coexisted UAVs and ground users (GUEs) cause greater interference, we ultilize user-centric association strategy and minimum-mean-square error (MMSE) channel state information (CSI) estimation to obtain the estimated CSI of UAVs and GUEs. Under the CFDC scenarios, scalable receiving schemes as maximum ratio combing (MRC), partial zero-forcing (P-ZF) and partial minimum-mean-square error (P-MMSE) can be performed at edge servers and the closed-form expressions for uplink spectral efficiency (SE) are derived. Based on the derived expressions, we propose an efficient power control algorithm by solving a multi-objective optimization problem (MOOP) between maximizing the average SE of UAVs and GUEs simultaneously with Deep Q-Network (DQN). Numerical results verify the accuracy of the derived closed-form expressions and the effectiveness of the coexisted UAVs and GUEs transmission scheme in CFDC massive MIMO systems. The SE analysis under various system parameters offers numerous flexibilities for system optimization.
In this paper, we investigate the content popularity prediction problem in cache-enabled fog radio access networks (F-RANs). In order to predict the content popularity with high accuracy and low complexity, we propose a Gaussian process based regressor to model the content request pattern. Firstly, the relationship between content features and popularity is captured by our proposed model. Then, we utilize Bayesian learning to train the model parameters, which is robust to overfitting. However, Bayesian methods are usually unable to find a closed-form expression of the posterior distribution. To tackle this issue, we apply a stochastic variance reduced gradient Hamiltonian Monte Carlo (SVRG-HMC) method to approximate the posterior distribution. To utilize the computing resources of other fog access points (F-APs) and to reduce the communications overhead, we propose a quantized federated learning (FL) framework combining with Bayesian learning. The quantized federated Bayesian learning framework allows each F-AP to send gradients to the cloud server after quantizing and encoding. It can achieve a tradeoff between prediction accuracy and communications overhead effectively. Simulation results show that the performance of our proposed policy outperforms the existing policies.
Building a high-quality singing corpus for a person who is not good at singing is non-trivial, thus making it challenging to create a singing voice synthesizer for this person. Learn2Sing is dedicated to synthesizing the singing voice of a speaker without his or her singing data by learning from data recorded by others, i.e., the singing teacher. Inspired by the fact that pitch is the key style factor to distinguish singing from speaking voice, the proposed Learn2Sing 2.0 first generates the preliminary acoustic feature with averaged pitch value in the phone level, which allows the training of this process for different styles, i.e., speaking or singing, share same conditions except for the speaker information. Then, conditioned on the specific style, a diffusion decoder, which is accelerated by a fast sampling algorithm during the inference stage, is adopted to gradually restore the final acoustic feature. During the training, to avoid the information confusion of the speaker embedding and the style embedding, mutual information is employed to restrain the learning of speaker embedding and style embedding. Experiments show that the proposed approach is capable of synthesizing high-quality singing voice for the target speaker without singing data with 10 decoding steps.
This paper introduces Opencpop, a publicly available high-quality Mandarin singing corpus designed for singing voice synthesis (SVS). The corpus consists of 100 popular Mandarin songs performed by a female professional singer. Audio files are recorded with studio quality at a sampling rate of 44,100 Hz and the corresponding lyrics and musical scores are provided. All singing recordings have been phonetically annotated with phoneme boundaries and syllable (note) boundaries. To demonstrate the reliability of the released data and to provide a baseline for future research, we built baseline deep neural network-based SVS models and evaluated them with both objective metrics and subjective mean opinion score (MOS) measure. Experimental results show that the best SVS model trained on our database achieves 3.70 MOS, indicating the reliability of the provided corpus. Opencpop is released to the open-source community WeNet, and the corpus, as well as synthesized demos, can be found on the project homepage.