Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohammad Soleymani

DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Apr 05, 2025

Maksim Siniukov, Di Chang, Minh Tran, Hongkun Gong, Ashutosh Chaubey, Mohammad Soleymani

Figure 1 for DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Figure 2 for DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Figure 3 for DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Figure 4 for DiTaiListener: Controllable High Fidelity Listener Video Generation with Diffusion

Abstract:Generating naturalistic and nuanced listener motions for extended interactions remains an open problem. Existing methods often rely on low-dimensional motion codes for facial behavior generation followed by photorealistic rendering, limiting both visual fidelity and expressive richness. To address these challenges, we introduce DiTaiListener, powered by a video diffusion model with multimodal conditions. Our approach first generates short segments of listener responses conditioned on the speaker's speech and facial motions with DiTaiListener-Gen. It then refines the transitional frames via DiTaiListener-Edit for a seamless transition. Specifically, DiTaiListener-Gen adapts a Diffusion Transformer (DiT) for the task of listener head portrait generation by introducing a Causal Temporal Multimodal Adapter (CTM-Adapter) to process speakers' auditory and visual cues. CTM-Adapter integrates speakers' input in a causal manner into the video generation process to ensure temporally coherent listener responses. For long-form video generation, we introduce DiTaiListener-Edit, a transition refinement video-to-video diffusion model. The model fuses video segments into smooth and continuous videos, ensuring temporal consistency in facial expressions and image quality when merging short video segments produced by DiTaiListener-Gen. Quantitatively, DiTaiListener achieves the state-of-the-art performance on benchmark datasets in both photorealism (+73.8% in FID on RealTalk) and motion representation (+6.1% in FD metric on VICO) spaces. User studies confirm the superior performance of DiTaiListener, with the model being the clear preference in terms of feedback, diversity, and smoothness, outperforming competitors by a significant margin.

* Project page: https://havent-invented.github.io/DiTaiListener

Via

Access Paper or Ask Questions

The Wideband Analysis of the Impact of I/Q Imbalance on THz Communication

Feb 07, 2025

Dogus Can Sevdiren, Aydin Sezgin, Mohammad Soleymani

Abstract:The terahertz (THz) band is a promising solution to the increasing data traffic demands of future wireless networks. However, developing transceivers for THz communication is a complex and toilsome task due to the difficulty in designing devices that operate at this frequency and the impact of hardware impairments on performance. This paper investigates the impact of radio frequency (RF) impairment, in-phase/quadrature imbalance (IQI). To this end, we express an IQI model for the THzspecific array-of-subarrays (AoSA) architecture considering the unique features of THz communication; vast bandwidth, severe power drawdown, and pencil-like beams. We further model the impact of IQI in the power limited regime in order to investigate the power and ultra-wideband trade-off. To achieve this, we express the spectral efficiency in terms of wideband slope and bit energy to noise ratio which are the two important information theoretic metrics that reveals the performance of the ultrawideband systems as in THz communication. Our results show that THz systems with IQI have a strict limit in achievable rate although they provide immense spectrum. We also demonstrate with our simulation results that compared to low frequencies, IQI is a more serious concern in THz links.

Via

Access Paper or Ask Questions

A Framework for Fractional Matrix Programming Problems with Applications in FBL MU-MIMO

Feb 03, 2025

Mohammad Soleymani, Eduard Jorswieck, Robert Schober, Lajos Hanzo

Figure 1 for A Framework for Fractional Matrix Programming Problems with Applications in FBL MU-MIMO

Figure 2 for A Framework for Fractional Matrix Programming Problems with Applications in FBL MU-MIMO

Figure 3 for A Framework for Fractional Matrix Programming Problems with Applications in FBL MU-MIMO

Figure 4 for A Framework for Fractional Matrix Programming Problems with Applications in FBL MU-MIMO

Abstract:An efficient framework is conceived for fractional matrix programming (FMP) optimization problems (OPs) namely for minimization and maximization. In each generic OP, either the objective or the constraints are functions of multiple arbitrary continuous-domain fractional functions (FFs). This ensures the framework's versatility, enabling it to solve a broader range of OPs than classical FMP solvers, like Dinkelbach-based algorithms. Specifically, the generalized Dinkelbach algorithm can only solve multiple-ratio FMP problems. By contrast, our framework solves OPs associated with a sum or product of multiple FFs as the objective or constraint functions. Additionally, our framework provides a single-loop solution, while most FMP solvers require twin-loop algorithms. Many popular performance metrics of wireless communications are FFs. For instance, latency has a fractional structure, and minimizing the sum delay leads to an FMP problem. Moreover, the mean square error (MSE) and energy efficiency (EE) metrics have fractional structures. Thus, optimizing EE-related metrics such as the sum or geometric mean of EEs and enhancing the metrics related to spectral-versus-energy-efficiency tradeoff yield FMP problems. Furthermore, both the signal-to-interference-plus-noise ratio and the channel dispersion are FFs. In this paper, we also develop resource allocation schemes for multi-user multiple-input multiple-output (MU-MIMO) systems, using finite block length (FBL) coding, demonstrating attractive practical applications of FMP by optimizing the aforementioned metrics.

Via

Access Paper or Ask Questions

X-Dyna: Expressive Dynamic Human Image Animation

Jan 20, 2025

Di Chang, Hongyi Xu, You Xie, Yipeng Gao, Zhengfei Kuang, Shengqu Cai, Chenxu Zhang, Guoxian Song, Chao Wang, Yichun Shi(+5 more)

Abstract:We introduce X-Dyna, a novel zero-shot, diffusion-based pipeline for animating a single human image using facial expressions and body movements derived from a driving video, that generates realistic, context-aware dynamics for both the subject and the surrounding environment. Building on prior approaches centered on human pose control, X-Dyna addresses key shortcomings causing the loss of dynamic details, enhancing the lifelike qualities of human video animations. At the core of our approach is the Dynamics-Adapter, a lightweight module that effectively integrates reference appearance context into the spatial attentions of the diffusion backbone while preserving the capacity of motion modules in synthesizing fluid and intricate dynamic details. Beyond body pose control, we connect a local control module with our model to capture identity-disentangled facial expressions, facilitating accurate expression transfer for enhanced realism in animated scenes. Together, these components form a unified framework capable of learning physical human motion and natural scene dynamics from a diverse blend of human and scene videos. Comprehensive qualitative and quantitative evaluations demonstrate that X-Dyna outperforms state-of-the-art methods, creating highly lifelike and expressive animations. The code is available at https://github.com/bytedance/X-Dyna.

* Project page:https://x-dyna.github.io/xdyna.github.io/ Code:https://github.com/bytedance/X-Dyna Model:https://huggingface.co/Boese0601/X-Dyna

Via

Access Paper or Ask Questions

Towards a Generalizable Speech Marker for Parkinson's Disease Diagnosis

Jan 07, 2025

Maksim Siniukov, Ellie Xing, Sanaz, Attaripour Isfahani, Mohammad Soleymani

Figure 1 for Towards a Generalizable Speech Marker for Parkinson's Disease Diagnosis

Figure 2 for Towards a Generalizable Speech Marker for Parkinson's Disease Diagnosis

Figure 3 for Towards a Generalizable Speech Marker for Parkinson's Disease Diagnosis

Figure 4 for Towards a Generalizable Speech Marker for Parkinson's Disease Diagnosis

Abstract:Parkinson's Disease (PD) is a neurodegenerative disorder characterized by motor symptoms, including altered voice production in the early stages. Early diagnosis is crucial not only to improve PD patients' quality of life but also to enhance the efficacy of potential disease-modifying therapies during early neurodegeneration, a window often missed by current diagnostic tools. In this paper, we propose a more generalizable approach to PD recognition through domain adaptation and self-supervised learning. We demonstrate the generalization capabilities of the proposed approach across diverse datasets in different languages. Our approach leverages HuBERT, a large deep neural network originally trained for speech recognition and further trains it on unlabeled speech data from a population that is similar to the target group, i.e., the elderly, in a self-supervised manner. The model is then fine-tuned and adapted for use across different datasets in multiple languages, including English, Italian, and Spanish. Evaluations on four publicly available PD datasets demonstrate the model's efficacy, achieving an average specificity of 92.1% and an average sensitivity of 91.2%. This method offers objective and consistent evaluations across large populations, addressing the variability inherent in human assessments and providing a non-invasive, cost-effective and accessible diagnostic option.

Via

Access Paper or Ask Questions

Rate Splitting Multiple Access for RIS-aided URLLC MIMO Broadcast Channels

Nov 17, 2024

Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck, Marco Di Renzo, Robert Schober, Lajos Hanzo

Figure 1 for Rate Splitting Multiple Access for RIS-aided URLLC MIMO Broadcast Channels

Figure 2 for Rate Splitting Multiple Access for RIS-aided URLLC MIMO Broadcast Channels

Figure 3 for Rate Splitting Multiple Access for RIS-aided URLLC MIMO Broadcast Channels

Figure 4 for Rate Splitting Multiple Access for RIS-aided URLLC MIMO Broadcast Channels

Abstract:The performance of modern wireless communication systems is typically limited by interference. The impact of interference can be even more severe in ultra-reliable and low-latency communication (URLLC) use cases. A powerful tool for managing interference is rate splitting multiple access (RSMA), which encompasses many multiple-access technologies like non-orthogonal multiple access (NOMA), spatial division multiple access (SDMA), and broadcasting. Another effective technology to enhance the performance of URLLC systems and mitigate interference is constituted by reconfigurable intelligent surfaces (RISs). This paper develops RSMA schemes for multi-user multiple-input multiple-output (MIMO) RIS-aided broadcast channels (BCs) based on finite block length (FBL) coding. We show that RSMA and RISs can substantially improve the spectral efficiency (SE) and energy efficiency (EE) of MIMO RIS-aided URLLC systems. Additionally, the gain of employing RSMA and RISs noticeably increases when the reliability and latency constraints are more stringent. Furthermore, RISs impact RSMA differently, depending on the user load. If the system is underloaded, RISs are able to manage the interference sufficiently well, making the gains of RSMA small. However, when the user load is high, RISs and RSMA become synergetic.

Via

Access Paper or Ask Questions

URLLC Networks enabled by STAR-RIS, Rate Splitting, and Multiple Antennas

Nov 07, 2024

Eduard Jorswieck, Mohammad Soleymani, Ignacio Santamaria, Jesús Gutiérrez

Figure 1 for URLLC Networks enabled by STAR-RIS, Rate Splitting, and Multiple Antennas

Figure 2 for URLLC Networks enabled by STAR-RIS, Rate Splitting, and Multiple Antennas

Abstract:The challenges in dense ultra-reliable low-latency communication networks to deliver the required service to multiple devices are addressed by three main technologies: multiple antennas at the base station (MISO), rate splitting multiple access (RSMA) with private and common message encoding, and simultaneously transmitting and reflecting reconfigurable intelligent surfaces (STAR-RIS). Careful resource allocation, encompassing beamforming and RIS optimization, is required to exploit the synergy between the three. We propose an alternating optimization-based algorithm, relying on minorization-maximization. Numerical results show that the achievable second-order max-min rates of the proposed scheme outperform the baselines significantly. MISO, RSMA, and STAR-RIS all contribute to enabling ultra-reliable low-latency communication (URLLC).

* Accepted at 2025 International Conference on Mobile and Miniaturized Terahertz Systems (ICMMTS)

Via

Access Paper or Ask Questions

Rate Region of RIS-Aided URLLC Broadcast Channels: Diagonal versus Beyond Diagonal Globally Passive RIS

Oct 28, 2024

Mohammad Soleymani, Alessio Zappone, Eduard Jorswieck, Marco Di Renzo, Ignacio Santamaria

Figure 1 for Rate Region of RIS-Aided URLLC Broadcast Channels: Diagonal versus Beyond Diagonal Globally Passive RIS

Figure 2 for Rate Region of RIS-Aided URLLC Broadcast Channels: Diagonal versus Beyond Diagonal Globally Passive RIS

Figure 3 for Rate Region of RIS-Aided URLLC Broadcast Channels: Diagonal versus Beyond Diagonal Globally Passive RIS

Figure 4 for Rate Region of RIS-Aided URLLC Broadcast Channels: Diagonal versus Beyond Diagonal Globally Passive RIS

Abstract:We analyze the finite-block-length rate region of wireless systems aided by reconfigurable intelligent surfaces (RISs), employing treating interference as noise. We consider three nearly passive RIS architectures, including locally passive (LP) diagonal (D), globally passive (GP) D, and GP beyond diagonal (BD) RISs. In a GP RIS, the power constraint is applied globally to the whole surface, while some elements may amplify the incident signal locally. The considered RIS architectures provide substantial performance gains compared with systems operating without RIS. GP BD-RIS outperforms, at the price of increasing the complexity, LP and GP D-RIS as it enlarges the feasible set of allowed solutions. However, the gain provided by BD-RIS decreases with the number of RIS elements. Additionally, deploying RISs provides higher gains as the reliability/latency requirement becomes more stringent.

Via

Access Paper or Ask Questions

Energy Efficiency Comparison of RIS Architectures in MISO Broadcast Channels

Aug 08, 2024

Mohammad Soleymani, Ignacio Santamaria, Eduard Jorswieck, Marco Di Renzo, Jesús Gutiérrez

Figure 1 for Energy Efficiency Comparison of RIS Architectures in MISO Broadcast Channels

Figure 2 for Energy Efficiency Comparison of RIS Architectures in MISO Broadcast Channels

Abstract:In this paper, we develop energy-efficient schemes for multi-user multiple-input single-output (MISO) broadcast channels (BCs), assisted by reconfigurable intelligent surfaces (RISs). To this end, we consider three architectures of RIS: locally passive diagonal (LP-D), globally passive diagonal (GP-D), and globally passive beyond diagonal (GP-BD). In a globally passive RIS, the power of the output signal of the RIS is not greater than its input power, but some RIS elements can amplify the signal. In a locally passive RIS, every element cannot amplify the incident signal. We show that these RIS architectures can substantially improve energy efficiency (EE) if the static power of the RIS elements is not too high. Moreover, GP-BD RIS, which has a higher complexity and static power than LP-D RIS and GP-D RIS, provides better spectral efficiency, but its EE performance highly depends on the static power consumption and may be worse than its diagonal counterparts.

* Accepted at 25th IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC)

Via

Access Paper or Ask Questions

MIMO Capacity Maximization with Beyond-Diagonal RIS

Jun 04, 2024

Ignacio Santamaria, Mohammad Soleymani, Eduard Jorswieck, Jesús Gutiérrez

Figure 1 for MIMO Capacity Maximization with Beyond-Diagonal RIS

Figure 2 for MIMO Capacity Maximization with Beyond-Diagonal RIS

Figure 3 for MIMO Capacity Maximization with Beyond-Diagonal RIS

Figure 4 for MIMO Capacity Maximization with Beyond-Diagonal RIS

Abstract:This paper addresses the problem of maximizing the capacity of a multiple-input multiple-output (MIMO) link assisted by a beyond-diagonal reconfigurable intelligent surface (BD-RIS). We maximize the capacity by alternately optimizing the transmit covariance matrix, and the BD-RIS scattering matrix, which, according to network theory, should be unitary and symmetric. These constraints make the optimization of BD-RIS more challenging than that of diagonal RIS. To find a stationary point of the capacity we maximize a sequence of quadratic problems in the manifold of unitary matrices. This leads to an efficient algorithm that always improves the capacity obtained by a diagonal RIS. Through simulation examples, we study the capacity improvement provided by a passive BD-RIS architecture over the conventional RIS model in which the phase shift matrix is diagonal.

* 5 pages, 4 figures

Via

Access Paper or Ask Questions