Jack
Abstract:Although diffusion-based real-world image restoration (Real-IR) has achieved remarkable progress, efficiently leveraging ultra-large-scale pre-trained text-to-image (T2I) models and fully exploiting their potential remain significant challenges. To address this issue, we propose ResFlow-Tuner, an image restoration framework based on the state-of-the-art flow matching model, FLUX.1-dev, which integrates unified multi-modal fusion (UMMF) with test-time scaling (TTS) to achieve unprecedented restoration performance. Our approach fully leverages the advantages of the Multi-Modal Diffusion Transformer (MM-DiT) architecture by encoding multi-modal conditions into a unified sequence that guides the synthesis of high-quality images. Furthermore, we introduce a training-free test-time scaling paradigm tailored for image restoration. During inference, this technique dynamically steers the denoising direction through feedback from a reward model (RM), thereby achieving significant performance gains with controllable computational overhead. Extensive experiments demonstrate that our method achieves state-of-the-art performance across multiple standard benchmarks. This work not only validates the powerful capabilities of the flow matching model in low-level vision tasks but, more importantly, proposes a novel and efficient inference-time scaling paradigm suitable for large pre-trained models.
Abstract:In mixed near-field and far-field systems, the nonorthogonality between near-field and far-field channels may cause severe inter-user interference and hence degrade rate performance, when the analog beamforming is designed based on the low-complexity full-array maximum ratio transmission (MRT). To tackle this issue, we propose in this paper an antenna selection-based transmission framework to effectively suppress mixed-field interference without mechanically altering antenna structures. To this end, an optimization problem is formulated to maximize the sum-rate of mixed-field systems, by jointly designing antenna selection and power allocation under the MRT-based analog beamforming. As the problem is non-convex and generally difficult to solve optimally, we first consider a typical two-user scenario to obtain useful insights. Interestingly, we analytically show that the strong mixed-field interference can be substantially mitigated by deactivating only a small portion of antennas, yet without compromising array gains too much. Moreover, an inherent tradeoff is revealed in antenna selection between interference suppression and array-gain enhancement, based on which a suboptimal number of deactivated antennas for achieving the maximum sum-rate is obtained. Next, for the general multi-user case, we develop an efficient penalty dual decomposition (PDD)-based two-layer framework to obtain its high quality solution by using the block coordinate descent (BCD) and successive convex approximation (SCA) techniques. To further reduce the computational complexity, a low-complexity antenna deactivation strategy is proposed capitalizing on an interference suppression criterion. Last, numerical results demonstrate that the proposed scheme achieves a favorable trade-off between interference suppression and array gain loss, hence achieving significant performance gains over various baseline schemes.
Abstract:Look-Up Table based methods have emerged as a promising direction for efficient image restoration tasks. Recent LUT-based methods focus on improving their performance by expanding the receptive field. However, they inevitably introduce extra computational and storage overhead, which hinders their deployment in edge devices. To address this issue, we propose ShiftLUT, a novel framework that attains the largest receptive field among all LUT-based methods while maintaining high efficiency. Our key insight lies in three complementary components. First, Learnable Spatial Shift module (LSS) is introduced to expand the receptive field by applying learnable, channel-wise spatial offsets on feature maps. Second, we propose an asymmetric dual-branch architecture that allocates more computation to the information-dense branch, substantially reducing inference latency without compromising restoration quality. Finally, we incorporate a feature-level LUT compression strategy called Error-bounded Adaptive Sampling (EAS) to minimize the storage overhead. Compared to the previous state-of-the-art method TinyLUT, ShiftLUT achieves a 3.8$\times$ larger receptive field and improves an average PSNR by over 0.21 dB across multiple standard benchmarks, while maintaining a small storage size and inference time. The code is available at: https://github.com/Sailor-t/ShiftLUT .
Abstract:How does the choice of optimization algorithm shape a model's ability to learn features? To address this question for steepest descent methods --including sign descent, which is closely related to Adam --we introduce steepest mirror flows as a unifying theoretical framework. This framework reveals how optimization geometry governs learning dynamics, implicit bias, and sparsity and it provides two explanations for why Adam and AdamW often outperform SGD in fine-tuning. Focusing on diagonal linear networks and deep diagonal linear reparameterizations (a simplified proxy for attention), we show that steeper descent facilitates both saddle-point escape and feature learning. In contrast, gradient descent requires unrealistically large learning rates to escape saddles, an uncommon regime in fine-tuning. Empirically, we confirm that saddle-point escape is a central challenge in fine-tuning. Furthermore, we demonstrate that decoupled weight decay, as in AdamW, stabilizes feature learning by enforcing novel balance equations. Together, these results highlight two mechanisms how steepest descent can aid modern optimization.
Abstract:While modern text-to-image models excel at prompt-based generation, they often lack the fine-grained control necessary for specific user requirements like spatial layouts or subject appearances. Multi-condition control addresses this, yet its integration into Diffusion Transformers (DiTs) is bottlenecked by the conventional ``concatenate-and-attend'' strategy, which suffers from quadratic computational and memory overhead as the number of conditions scales. Our analysis reveals that much of this cross-modal interaction is spatially or semantically redundant. To this end, we propose Position-aligned and Keyword-scoped Attention (PKA), a highly efficient framework designed to eliminate these redundancies. Specifically, Position-Aligned Attention (PAA) linearizes spatial control by enforcing localized patch alignment, while Keyword-Scoped Attention (KSA) prunes irrelevant subject-driven interactions via semantic-aware masking. To facilitate efficient learning, we further introduce a Conditional Sensitivity-Aware Sampling (CSAS) strategy that reweights the training objective towards critical denoising phases, drastically accelerating convergence and enhancing conditional fidelity. Empirically, PKA delivers a 10.0$\times$ inference speedup and a 5.1$\times$ VRAM saving, providing a scalable and resource-friendly solution for high-fidelity multi-conditioned generation.
Abstract:In this paper, we study efficient beam coverage design for multi-antenna systems in both far-field and near-field cases. To reduce the computational complexity of existing sampling-based optimization methods, we propose a new low-complexity yet efficient beam coverage design. To this end, we first formulate a general beam coverage optimization problem to maximize the worst-case beamforming gain over a target region. For the far-field case, we show that the beam coverage design can be viewed as a spatial-frequency filtering problem, where angular coverage can be achieved by weight-shaping in the antenna domain via an inverse FT, yielding an infinite-length weighting sequence. Under the constraint of a finite number of antennas, a surrogate scheme is proposed by directly truncating this sequence, which inevitably introduces a roll-off effect at the angular boundaries, yielding degraded worst-case beamforming gain. To address this issue, we characterize the finite-antenna-induced roll-off effect, based on which a roll-off-aware design with a protective zoom is developed to ensure a flat beamforming-gain profile within the target angular region. Next, we extend the proposed method to the near-field case. Specifically, by applying a first-order Taylor approximation to the near-field channel steering vector (CSV), the two-dimensional (2D) beam coverage design (in both angle and inverse-range) can be transformed into a 2D inverse FT, leading to a low-complexity beamforming design. Furthermore, an inherent near-field range defocusing effect is observed, indicating that sufficiently wide angular coverage results in range-insensitive beam steering. Finally, numerical results demonstrate that the proposed FT-based approach achieves a comparable worst-case beamforming performance with that of conventional sampling-based optimization methods while significantly reducing the computational complexity.
Abstract:In this paper, we study robust beamforming design for near-field physical-layer-security (PLS) systems, where a base station (BS) equipped with an extremely large-scale array (XL-array) serves multiple near-field legitimate users (Bobs) in the presence of multiple near-field eavesdroppers (Eves). Unlike existing works that mostly assume perfect channel state information (CSI) or location information of Eves, we consider a more practical and challenging scenario, where the locations of Bobs are perfectly known, while only imperfect location information of Eves is available at the BS. We first formulate a robust optimization problem to maximize the sum-rate of Bobs while guaranteeing a worst-case limit on the eavesdropping rate under location uncertainty. By transforming Cartesian position errors into the polar domain, we reveal an important near-field angular-error amplification effect: for the same location error, the closer the Eve, the larger the angle error, severely degrading the performance of conventional robust beamforming methods based on imperfect channel state information. To address this issue, we first establish the conditions for which the first-order Taylor approximation of the near-field channel steering vector under location uncertainty is largely accurate. Then, we propose a two-stage robust beamforming method, which first partitions the uncertainty region into multiple fan-shaped sub-regions, followed by the second stage to formulate and solve a refined linear-matrix-inequality (LMI)-based robust beamforming optimization problem. In addition, the proposed method is further extended to scenarios with multiple Bobs and multiple Eves. Finally, numerical results validate that the proposed method achieves a superior trade-off between rate performance and secrecy robustness, hence significantly outperforming existing benchmarks under Eve location uncertainty.
Abstract:This document consolidates publicly reported technical details about Metas Llama 4 model family. It summarizes (i) released variants (Scout and Maverick) and the broader herd context including the previewed Behemoth teacher model, (ii) architectural characteristics beyond a high-level MoE description covering routed/shared-expert structure, early-fusion multimodality, and long-context design elements reported for Scout (iRoPE and length generalization strategies), (iii) training disclosures spanning pre-training, mid-training for long-context extension, and post-training methodology (lightweight SFT, online RL, and lightweight DPO) as described in release materials, (iv) developer-reported benchmark results for both base and instruction-tuned checkpoints, and (v) practical deployment constraints observed across major serving environments, including provider-specific context limits and quantization packaging. The manuscript also summarizes licensing obligations relevant to redistribution and derivative naming, and reviews publicly described safeguards and evaluation practices. The goal is to provide a compact technical reference for researchers and practitioners who need precise, source-backed facts about Llama 4.
Abstract:In this paper, we study efficient codebook design for limited feedback in extremely large-scale multiple-input-multiple-output (XL-MIMO) frequency division duplexing (FDD) systems. It is worth noting that existing codebook designs for XL-MIMO, such as polar-domain codebook, have not well taken into account user (location) distribution in practice, thereby incurring excessive feedback overhead. To address this issue, we propose in this paper a novel and efficient feedback codebook tailored to user distribution. To this end, we first consider a typical scenario where users are uniformly distributed within a specific polar-region, based on which a sum-rate maximization problem is formulated to jointly optimize angle-range samples and bit allocation among angle/range feedback. This problem is challenging to solve due to the lack of a closed-form expression for the received power in terms of angle and range samples. By leveraging a Voronoi partitioning approach, we show that uniform angle sampling is optimal for received power maximization. For more challenging range sampling design, we obtain a tight lower-bound on the received power and show that geometric sampling, where the ratio between adjacent samples is constant, can maximize the lower bound and thus serves as a high-quality suboptimal solution. We then extend the proposed framework to accommodate more general non-uniform user distribution via an alternating sampling method. Furthermore, theoretical analysis reveals that as the array size increases, the optimal allocation of feedback bits increasingly favors range samples at the expense of angle samples. Finally, numerical results validate the superior rate performance and robustness of the proposed codebook design under various system setups, achieving significant gains over benchmark schemes, including the widely used polar-domain codebook.




Abstract:The prior works on near-field target localization have mostly assumed ideal hardware models and thus suffer from two limitations in practice. First, extremely large-scale arrays (XL-arrays) usually face a variety of hardware impairments (HIs) that may introduce unknown phase and/or amplitude errors. Second, the existing block coordinate descent (BCD)-based methods for joint estimation of the HI indicator, channel gain, angle, and range may induce considerable target localization error when the target is very close to the XL-array. To address these issues, we propose in this paper a new three-phase HI-aware near-field localization method, by efficiently detecting faulty antennas and estimating the positions of targets. Specifically, we first determine faulty antennas by using compressed sensing (CS) methods and improve detection accuracy based on coarse target localization. Then, a dedicated phase calibration method is designed to correct phase errors induced by detected faulty antennas. Subsequently, an efficient near-field localization method is devised to accurately estimate the positions of targets based on the full XL-array with phase calibration. Additionally, we resort to the misspecified Cramer-Rao bound (MCRB) to quantify the performance loss caused by HIs. Last, numerical results demonstrate that our proposed method significantly reduces the localization errors as compared to various benchmark schemes, especially for the case with a short target range and/or a high fault probability.