Department of Control Science and Engineering, Zhejiang University, China
Abstract:This paper investigates projection-free algorithms for stochastic constrained multi-level optimization. In this context, the objective function is a nested composition of several smooth functions, and the decision set is closed and convex. Existing projection-free algorithms for solving this problem suffer from two limitations: 1) they solely focus on the gradient mapping criterion and fail to match the optimal sample complexities in unconstrained settings; 2) their analysis is exclusively applicable to non-convex functions, without considering convex and strongly convex objectives. To address these issues, we introduce novel projection-free variance reduction algorithms and analyze their complexities under different criteria. For gradient mapping, our complexities improve existing results and match the optimal rates for unconstrained problems. For the widely-used Frank-Wolfe gap criterion, we provide theoretical guarantees that align with those for single-level problems. Additionally, by using a stage-wise adaptation, we further obtain complexities for convex and strongly convex functions. Finally, numerical experiments on different tasks demonstrate the effectiveness of our methods.
Abstract:This paper explores adaptive variance reduction methods for stochastic optimization based on the STORM technique. Existing adaptive extensions of STORM rely on strong assumptions like bounded gradients and bounded function values, or suffer an additional $\mathcal{O}(\log T)$ term in the convergence rate. To address these limitations, we introduce a novel adaptive STORM method that achieves an optimal convergence rate of $\mathcal{O}(T^{-1/3})$ for non-convex functions with our newly designed learning rate strategy. Compared with existing approaches, our method requires weaker assumptions and attains the optimal convergence rate without the additional $\mathcal{O}(\log T)$ term. We also extend the proposed technique to stochastic compositional optimization, obtaining the same optimal rate of $\mathcal{O}(T^{-1/3})$. Furthermore, we investigate the non-convex finite-sum problem and develop another innovative adaptive variance reduction method that achieves an optimal convergence rate of $\mathcal{O}(n^{1/4} T^{-1/2} )$, where $n$ represents the number of component functions. Numerical experiments across various tasks validate the effectiveness of our method.
Abstract:Cell-free massive multi-input multi-output (MIMO) has recently attracted much attention, attributed to its potential to deliver uniform service quality. However, the adoption of a cell-free architecture raises concerns about the high implementation costs associated with deploying numerous distributed access points (APs) and the need for fronthaul network installation. To ensure the sustainability of next-generation wireless networks, it is crucial to improve cost-effectiveness, alongside achieving high performance. To address this, we conduct a cost analysis of cell-free massive MIMO and build a unified model with varying numbers of antennas per AP. Our objective is to explore whether employing multi-antenna APs could reduce system costs while maintaining performance. The analysis and evaluation result in the identification of a cost-effective design for cell-free massive MIMO, providing valuable insights for practical implementation.


Abstract:Sign stochastic gradient descent (signSGD) is a communication-efficient method that transmits only the sign of stochastic gradients for parameter updating. Existing literature has demonstrated that signSGD can achieve a convergence rate of $\mathcal{O}(d^{1/2}T^{-1/4})$, where $d$ represents the dimension and $T$ is the iteration number. In this paper, we improve this convergence rate to $\mathcal{O}(d^{1/2}T^{-1/3})$ by introducing the Sign-based Stochastic Variance Reduction (SSVR) method, which employs variance reduction estimators to track gradients and leverages their signs to update. For finite-sum problems, our method can be further enhanced to achieve a convergence rate of $\mathcal{O}(m^{1/4}d^{1/2}T^{-1/2})$, where $m$ denotes the number of component functions. Furthermore, we investigate the heterogeneous majority vote in distributed settings and introduce two novel algorithms that attain improved convergence rates of $\mathcal{O}(d^{1/2}T^{-1/2} + dn^{-1/2})$ and $\mathcal{O}(d^{1/4}T^{-1/4})$ respectively, outperforming the previous results of $\mathcal{O}(dT^{-1/4} + dn^{-1/2})$ and $\mathcal{O}(d^{3/8}T^{-1/8})$, where $n$ represents the number of nodes. Numerical experiments across different tasks validate the effectiveness of our proposed methods.
Abstract:Cell-free massive multi-input multi-output (MIMO) has recently gained a lot of attention due to its high potential in sixth-generation (6G) wireless systems. The goal of this paper is to first present a unified modeling for massive MIMO, encompassing both cellular and cell-free architectures with a variable number of antennas per access point. We derive signal transmission models and achievable spectral efficiency in both the downlink and uplink using zero-forcing and maximal-ratio schemes. We also provide performance comparisons in terms of per-user and sum spectral efficiency.
Abstract:Intelligent Reflecting Surface (IRS) is envisioned to be a technical enabler for the sixth-generation (6G) wireless system. Its potential lies in delivering high performance while maintaining both power efficiency and cost-effectiveness. Previous studies have primarily focused on point-to-point IRS communications involving a single user. Nevertheless, a practical system must serve multiple users simultaneously. The unique characteristics of IRS, such as non-frequency-selective reflection and the necessity for joint active/passive beamforming, create obstacles to the use of conventional multiple access (MA) techniques. This motivates us to review various MA techniques to make clear their functionalities in the presence of IRS. Through this paper, our aim is to provide researchers with a comprehensive understanding of challenges and available solutions, offering insights to foster their design of efficient multiple access for IRS-aided systems.
Abstract:Terahertz (THz) frequencies have recently garnered considerable attention due to their potential to offer abundant spectral resources for communication, as well as distinct advantages in sensing, positioning, and imaging. Nevertheless, practical implementation encounters challenges stemming from the limited distances of signal transmission, primarily due to notable propagation, absorption, and blockage losses. To address this issue, the current strategy involves employing ultra-massive multi-input multi-output (UMMIMO) to generate high beamforming gains, thereby extending the transmission range. This paper introduces an alternative solution through the utilization of cell-free massive MIMO (CFmMIMO) architecture, wherein the closest access point is actively chosen to reduce the distance, rather than relying solely on a substantial number of antennas. We compare these two techniques through simulations and the numerical results justify that CFmMIMO is superior to UMMIMO in both spectral and energy efficiency at THz frequencies.




Abstract:This paper investigates the challenging problem of learned image compression (LIC) with extreme low bitrates. Previous LIC methods based on transmitting quantized continuous features often yield blurry and noisy reconstruction due to the severe quantization loss. While previous LIC methods based on learned codebooks that discretize visual space usually give poor-fidelity reconstruction due to the insufficient representation power of limited codewords in capturing faithful details. We propose a novel dual-stream framework, HyrbidFlow, which combines the continuous-feature-based and codebook-based streams to achieve both high perceptual quality and high fidelity under extreme low bitrates. The codebook-based stream benefits from the high-quality learned codebook priors to provide high quality and clarity in reconstructed images. The continuous feature stream targets at maintaining fidelity details. To achieve the ultra low bitrate, a masked token-based transformer is further proposed, where we only transmit a masked portion of codeword indices and recover the missing indices through token generation guided by information from the continuous feature stream. We also develop a bridging correction network to merge the two streams in pixel decoding for final image reconstruction, where the continuous stream features rectify biases of the codebook-based pixel decoder to impose reconstructed fidelity details. Experimental results demonstrate superior performance across several datasets under extremely low bitrates, compared with existing single-stream codebook-based or continuous-feature-based LIC methods.
Abstract:We propose a framework for learned image and video compression using the generative sparse visual representation (SVR) guided by fidelity-preserving controls. By embedding inputs into a discrete latent space spanned by learned visual codebooks, SVR-based compression transmits integer codeword indices, which is efficient and cross-platform robust. However, high-quality (HQ) reconstruction in the decoder relies on intermediate feature inputs from the encoder via direct connections. Due to the prohibitively high transmission costs, previous SVR-based compression methods remove such feature links, resulting in largely degraded reconstruction quality. In this work, we treat the intermediate features as fidelity-preserving control signals that guide the conditioned generative reconstruction in the decoder. Instead of discarding or directly transferring such signals, we draw them from a low-quality (LQ) fidelity-preserving alternative input that is sent to the decoder with very low bitrate. These control signals provide complementary fidelity cues to improve reconstruction, and their quality is determined by the compression rate of the LQ alternative, which can be tuned to trade off bitrate, fidelity and perceptual quality. Our framework can be conveniently used for both learned image compression (LIC) and learned video compression (LVC). Since SVR is robust against input perturbations, a large portion of codeword indices between adjacent frames can be the same. By only transferring different indices, SVR-based LIC and LVC can share a similar processing pipeline. Experiments over standard image and video compression benchmarks demonstrate the effectiveness of our approach.




Abstract:The article explores the intersection of computer vision technology and robotic control, highlighting its importance in various fields such as industrial automation, healthcare, and environmental protection. Computer vision technology, which simulates human visual observation, plays a crucial role in enabling robots to perceive and understand their surroundings, leading to advancements in tasks like autonomous navigation, object recognition, and waste management. By integrating computer vision with robot control, robots gain the ability to interact intelligently with their environment, improving efficiency.