Abstract:Pinching-antenna systems (PASS) have recently attracted significant attention as a promising architecture for flexible and reconfigurable wireless communications. Despite notable advancements, research on energy efficiency (EE) maximization for PASS is limited as existing studies mainly focus on transmit power minimization or utilizing a simple power consumption model. This paper evaluates the impact of pinching antenna (PA) activation power on EE maximization in a downlink NOMA-assisted PASS by jointly optimizing PA activation and user power allocation under quality-of-service and transmit power constraints. To tackle the resulting mixed-integer nonlinear programming problem, we develop a two-layer iterative algorithm, where the outer layer performs matching-based PA selection and the inner layer computes a closed-form optimal power allocation solution. Numerical results demonstrate that the proposed solution achieves substantial EE gains over conventional fixed antennas systems and the considered benchmark schemes, approaches the exhaustive-search upper bound with significantly reduced complexity, while exhibiting fast convergence. It also demonstrates the significance of accounting for PA activation power in EE maximization problem.
Abstract:As a practical physical implementation of pinching-antenna systems, leaky coaxial cable (LCX) enables distributed radiation in more general wireless environments, particularly for lower-frequency applications. In this paper, a leaky-coaxial pinching-antenna system, referred to as the LCX pinching-antenna system, is investigated, and adjustable slot apertures are introduced, such that the slot size can be continuously adjusted rather than being restricted to binary activation. Specifically, the aperture adjustment is modeled as amplitude scaling of the channels induced by the corresponding slots, or equivalently, as power coefficients associated with different slots. Accordingly, analytical results are derived to quantify the performance gain of continuous aperture adjustment over binary slot activation and to reveal the impact of channel coherence on the achievable data rate improvement. Furthermore, static and dynamic time-division multiple access (TDMA) schemes are considered, and the corresponding sum rate maximization problems are formulated and efficiently solved by quadratic transform based optimization, combined with successive convex approximation and alternating updates. Simulation results demonstrate that the proposed design can significantly outperform conventional fixed-antenna systems, traditional LCX schemes, and binary slot activation in terms of both achievable sum rate and outage probability.
Abstract:By leveraging the distributed leakage radiation of leaky coaxial cables (LCXs), the concept of pinching antennas can be generalized from the conventional high-frequency waveguide based architectures to cable based structures in lower-frequency scenarios. This paper investigates an LCX based generalized pinching-antenna system with dual-port feeding. By enabling bidirectional excitation along each cable, the proposed design significantly enhances spatial degrees of freedom. A comprehensive channel model is developed to characterize intra-cable attenuation, bidirectional phase progression, slot based radiation, and wireless propagation. Based on this model, both analog and hybrid beamforming frameworks are studied with the objective of maximizing the minimum achievable data rate. For analog transmission, slot activation, port selection, and power allocation are jointly optimized using matching theory, coalitional games, and bisection based power control. For hybrid transmission, zero-forcing (ZF) digital precoding is incorporated to eliminate inter-user interference, thereby simplifying slot activation and enabling closed-form optimal power allocation. Simulation results demonstrate that dual-port feeding provides notable performance gains over single-port LCX systems and fixed-antenna benchmarks, validating the effectiveness of the proposed beamforming and resource allocation designs under various transmit power levels and cable parameters.




Abstract:Large language models (LLMs) have demonstrated promising performance in both automatic speech recognition (ASR) and text-to-speech (TTS) systems, gradually becoming the mainstream approach. However, most current approaches address these tasks separately rather than through a unified framework. This work aims to integrate these two tasks into one unified model. Although discrete speech tokenization enables joint modeling, its inherent information loss limits performance in both recognition and generation. In this work, we present UniVoice, a unified LLM framework through continuous representations that seamlessly integrates speech recognition and synthesis within a single model. Our approach combines the strengths of autoregressive modeling for speech recognition with flow matching for high-quality generation. To mitigate the inherent divergence between autoregressive and flow-matching models, we further design a dual attention mechanism, which switches between a causal mask for recognition and a bidirectional attention mask for synthesis. Furthermore, the proposed text-prefix-conditioned speech infilling method enables high-fidelity zero-shot voice cloning. Experimental results demonstrate that our method can achieve or exceed current single-task modeling methods in both ASR and zero-shot TTS tasks. This work explores new possibilities for end-to-end speech understanding and generation.
Abstract:Currently, zero-shot voice conversion systems are capable of synthesizing the voice of unseen speakers. However, most existing approaches struggle to accurately replicate the speaking style of the source speaker or mimic the distinctive speaking style of the target speaker, thereby limiting the controllability of voice conversion. In this work, we propose Discl-VC, a novel voice conversion framework that disentangles content and prosody information from self-supervised speech representations and synthesizes the target speaker's voice through in-context learning with a flow matching transformer. To enable precise control over the prosody of generated speech, we introduce a mask generative transformer that predicts discrete prosody tokens in a non-autoregressive manner based on prompts. Experimental results demonstrate the superior performance of Discl-VC in zero-shot voice conversion and its remarkable accuracy in prosody control for synthesized speech.
Abstract:Neural speech codecs are essential for advancing text-to-speech (TTS) systems. With the recent success of large language models in text generation, developing high-quality speech tokenizers has become increasingly important. This paper introduces DS-Codec, a novel neural speech codec featuring a dual-stage training framework with mirror and non-mirror architectures switching, designed to achieve superior speech reconstruction. We conduct extensive experiments and ablation studies to evaluate the effectiveness of our training strategy and compare the performance of the two architectures. Our results show that the mirrored structure significantly enhances the robustness of the learned codebooks, and the training strategy balances the advantages between mirrored and non-mirrored structures, leading to improved high-fidelity speech reconstruction.
Abstract:Pinching antennas, as a novel flexible-antenna technology capable of establishing line of sight (LoS) connections and effectively mitigating large-scale path loss, have recently attracted considerable research interests. However, the implementation of ideal pinching-antenna systems involves determining and adjusting pinching antennas to an arbitrary position on waveguides, which presents challenges to both practical deployment and related optimization. This paper investigates a practical pinching-antennas system in multi-waveguide scenarios, where pinching antennas are installed at pre-configured discrete positions to serve downlink users with non-orthogonal multiple access (NOMA). To improve system throughput, a sophisticated optimization problem is formulated by jointly considering waveguide assignment, antenna activation, successive interference cancellation (SIC) decoding order design, and power allocation. By treating waveguide assignment and antenna activation as two coalition-formation games, a novel game-theoretic algorithm is developed, in which the optimal decoding order is derived and incorporated. For power allocation, monotonic optimization and successive convex approximation (SCA) are employed to construct global optimal and low-complexity solutions, respectively. Simulation results demonstrate that the NOMA-based pinching-antenna system exhibits superior performance compared to the considered benchmark systems, and the proposed solutions provide significant improvement in terms of sum rate and outage probability.
Abstract:Recently, flow matching based speech synthesis has significantly enhanced the quality of synthesized speech while reducing the number of inference steps. In this paper, we introduce SlimSpeech, a lightweight and efficient speech synthesis system based on rectified flow. We have built upon the existing speech synthesis method utilizing the rectified flow model, modifying its structure to reduce parameters and serve as a teacher model. By refining the reflow operation, we directly derive a smaller model with a more straight sampling trajectory from the larger model, while utilizing distillation techniques to further enhance the model performance. Experimental results demonstrate that our proposed method, with significantly reduced model parameters, achieves comparable performance to larger models through one-step sampling.
Abstract:In this work, we develop a specialized dataset aimed at enhancing the evaluation and fine-tuning of large language models (LLMs) specifically for wireless communication applications. The dataset includes a diverse set of multi-hop questions, including true/false and multiple-choice types, spanning varying difficulty levels from easy to hard. By utilizing advanced language models for entity extraction and question generation, rigorous data curation processes are employed to maintain high quality and relevance. Additionally, we introduce a Pointwise V-Information (PVI) based fine-tuning method, providing a detailed theoretical analysis and justification for its use in quantifying the information content of training data with 2.24\% and 1.31\% performance boost for different models compared to baselines, respectively. To demonstrate the effectiveness of the fine-tuned models with the proposed methodologies on practical tasks, we also consider different tasks, including summarizing optimization problems from technical papers and solving the mathematical problems related to non-orthogonal multiple access (NOMA), which are generated by using the proposed multi-agent framework. Simulation results show significant performance gain in summarization tasks with 20.9\% in the ROUGE-L metrics. We also study the scaling laws of fine-tuning LLMs and the challenges LLMs face in the field of wireless communications, offering insights into their adaptation to wireless communication tasks. This dataset and fine-tuning methodology aim to enhance the training and evaluation of LLMs, contributing to advancements in LLMs for wireless communication research and applications.




Abstract:In this letter, a non-orthogonal multiple access (NOMA) assisted downlink pinching-antenna system is investigated, where multiple pinching antennas can be activated at pre-configured positions along a dielectric waveguide to serve users via NOMA. In particular, the objective of this letter is to study at what locations and how many pinching antennas should be activated in order to maximize the system throughput. To this end, a sum rate maximization problem with antenna activation is formulated. With the help of matching theory, the formulated problem can be recast as a one-sided one-to-one matching, for which a low-complexity algorithm is developed. Simulation results indicate that the considered NOMA assisted pinching-antenna system can outperform conventional fixed-antenna systems in terms of sum rate, and the proposed matching based antenna activation algorithm yields a significant performance gain over the considered benchmarks.