Sherman
Abstract:Burst super-resolution aims to reconstruct high-resolution images with higher quality and richer details by fusing the sub-pixel information from multiple burst low-resolution frames. In BusrtSR, the key challenge lies in extracting the base frame's content complementary sub-pixel details while simultaneously suppressing high-frequency noise disturbance. Existing methods attempt to extract sub-pixels by modeling inter-frame relationships frame by frame while overlooking the mutual correlations among multi-current frames and neglecting the intra-frame interactions, leading to inaccurate and noisy sub-pixels for base frame super-resolution. Further, existing methods mainly employ static upsampling with fixed parameters to improve spatial resolution for all scenes, failing to perceive the sub-pixel distribution difference across multiple frames and cannot balance the fusion weights of different frames, resulting in over-smoothed details and artifacts. To address these limitations, we introduce a novel Query Mamba Burst Super-Resolution (QMambaBSR) network, which incorporates a Query State Space Model (QSSM) and Adaptive Up-sampling module (AdaUp). Specifically, based on the observation that sub-pixels have consistent spatial distribution while random noise is inconsistently distributed, a novel QSSM is proposed to efficiently extract sub-pixels through inter-frame querying and intra-frame scanning while mitigating noise interference in a single step. Moreover, AdaUp is designed to dynamically adjust the upsampling kernel based on the spatial distribution of multi-frame sub-pixel information in the different burst scenes, thereby facilitating the reconstruction of the spatial arrangement of high-resolution details. Extensive experiments on four popular synthetic and real-world benchmarks demonstrate that our method achieves a new state-of-the-art performance.
Abstract:First-person hand-object interaction anticipation aims to predict the interaction process over a forthcoming period based on current scenes and prompts. This capability is crucial for embodied intelligence and human-robot collaboration. The complete interaction process involves both pre-contact interaction intention (i.e., hand motion trends and interaction hotspots) and post-contact interaction manipulation (i.e., manipulation trajectories and hand poses with contact). Existing research typically anticipates only interaction intention while neglecting manipulation, resulting in incomplete predictions and an increased likelihood of intention errors due to the lack of manipulation constraints. To address this, we propose a novel model, PEAR (Phrase-Based Hand-Object Interaction Anticipation), which jointly anticipates interaction intention and manipulation. To handle uncertainties in the interaction process, we employ a twofold approach. Firstly, we perform cross-alignment of verbs, nouns, and images to reduce the diversity of hand movement patterns and object functional attributes, thereby mitigating intention uncertainty. Secondly, we establish bidirectional constraints between intention and manipulation using dynamic integration and residual connections, ensuring consistency among elements and thus overcoming manipulation uncertainty. To rigorously evaluate the performance of the proposed model, we collect a new task-relevant dataset, EGO-HOIP, with comprehensive annotations. Extensive experimental results demonstrate the superiority of our method.
Abstract:Moir\'e patterns are commonly seen when taking photos of screens. Camera devices usually have limited hardware performance but take high-resolution photos. However, users are sensitive to the photo processing time, which presents a hardly considered challenge of efficiency for demoir\'eing methods. To balance the network speed and quality of results, we propose a \textbf{F}ully \textbf{C}onnected en\textbf{C}oder-de\textbf{C}oder based \textbf{D}emoir\'eing \textbf{Net}work (FC3DNet). FC3DNet utilizes features with multiple scales in each stage of the decoder for comprehensive information, which contains long-range patterns as well as various local moir\'e styles that both are crucial aspects in demoir\'eing. Besides, to make full use of multiple features, we design a Multi-Feature Multi-Attention Fusion (MFMAF) module to weigh the importance of each feature and compress them for efficiency. These designs enable our network to achieve performance comparable to state-of-the-art (SOTA) methods in real-world datasets while utilizing only a fraction of parameters, FLOPs, and runtime.
Abstract:Watermarking is broadly utilized to protect ownership of shared data while preserving data utility. However, existing watermarking methods for tabular datasets fall short on the desired properties (detectability, non-intrusiveness, and robustness) and only preserve data utility from the perspective of data statistics, ignoring the performance of downstream ML models trained on the datasets. Can we watermark tabular datasets without significantly compromising their utility for training ML models while preventing attackers from training usable ML models on attacked datasets? In this paper, we propose a hypothesis testing-based watermarking scheme, TabularMark. Data noise partitioning is utilized for data perturbation during embedding, which is adaptable for numerical and categorical attributes while preserving the data utility. For detection, a custom-threshold one proportion z-test is employed, which can reliably determine the presence of the watermark. Experiments on real-world and synthetic datasets demonstrate the superiority of TabularMark in detectability, non-intrusiveness, and robustness.
Abstract:Wireless devices can be easily attacked by jammers during transmission, which is a potential security threat for wireless communications. Active reconfigurable intelligent surface (RIS) attracts considerable attention and is expected to be employed in anti-jamming systems for secure transmission to significantly enhance the anti-jamming performance. However, active RIS introduces external power load, which increases the complexity of hardware and restricts the flexible deployment of active RIS. To overcome these drawbacks, we design a innovative self-sustainable structure in this paper, where the active RIS is energized by harvesting energy from base station (BS) signals through the time dividing based simultaneous wireless information and power transfer (TD-SWIPT) scheme. Based on the above structure, we develop the BS harvesting scheme based on joint transmit and reflecting beamforming with the aim of maximizing the achievable rate of active RIS-assisted system, where the alternating optimization (AO) algorithm based on stochastic successive convex approximation (SSCA) tackles the nonconvex optimization problem in the scheme. Simulation results verified the effectiveness of our developed BS harvesting scheme, which can attain higher anti-jamming performance than other schemes when given the same maximum transmit power.
Abstract:Existing image super-resolution (SR) techniques often fail to generalize effectively in complex real-world settings due to the significant divergence between training data and practical scenarios. To address this challenge, previous efforts have either manually simulated intricate physical-based degradations or utilized learning-based techniques, yet these approaches remain inadequate for producing large-scale, realistic, and diverse data simultaneously. In this paper, we introduce a novel Realistic Decoupled Data Generator (RealDGen), an unsupervised learning data generation framework designed for real-world super-resolution. We meticulously develop content and degradation extraction strategies, which are integrated into a novel content-degradation decoupled diffusion model to create realistic low-resolution images from unpaired real LR and HR images. Extensive experiments demonstrate that RealDGen excels in generating large-scale, high-quality paired data that mirrors real-world degradations, significantly advancing the performance of popular SR models on various real-world benchmarks.
Abstract:Although the emerging reconfigurable intelligent surface (RIS) paves a new way for next-generation wireless communications, it suffers from inherent flaws, i.e., double-fading attenuation effects and half-space coverage limitations. The state-of-the-art double-face active (DFA)-RIS architecture is proposed for significantly amplifying and transmitting incident signals in full-space. Despite the efficacy of DFA-RIS in mitigating the aforementioned flaws, its potential drawback is that the complex active hardware also incurs intolerable energy consumption. To overcome this drawback, in this paper we propose a novel dynamic energy-saving design for the DFA-RIS, called the sub-array based DFA-RIS architecture. This architecture divides the DFA-RIS into multiple sub-arrays, where the signal amplification function in each sub-array can be activated/deactivated dynamically and flexibly. Utilizing the above architecture, we develop the joint optimization scheme based on transmit beamforming, DFA-RIS configuration, and reflection amplifier (RA) operating pattern to maximize the energy efficiency (EE) of the DFA-RIS assisted multiuser MISO system considering the perfect/imperfect channel state information (CSI) case. Then, the penalty dual decomposition (PDD) based alternating optimization (AO) algorithm and the constrained stochastic majorization-minimization (CSMM) based AO algorithm address non-convex problems in the perfect/imperfect CSI case, respectively. Simulation results verified that our proposed sub-array based DFA-RIS architecture can benefit the EE of the system more than other RIS architectures.
Abstract:Motivated by the potential of large language models (LLMs) as optimizers for solving combinatorial optimization problems, this paper proposes a novel LLM-assisted optimizer (LLMO) to address adversarial robustness neural architecture search (ARNAS), a specific application of combinatorial optimization. We design the prompt using the standard CRISPE framework (i.e., Capacity and Role, Insight, Statement, Personality, and Experiment). In this study, we employ Gemini, a powerful LLM developed by Google. We iteratively refine the prompt, and the responses from Gemini are adapted as solutions to ARNAS instances. Numerical experiments are conducted on NAS-Bench-201-based ARNAS tasks with CIFAR-10 and CIFAR-100 datasets. Six well-known meta-heuristic algorithms (MHAs) including genetic algorithm (GA), particle swarm optimization (PSO), differential evolution (DE), and its variants serve as baselines. The experimental results confirm the competitiveness of the proposed LLMO and highlight the potential of LLMs as effective combinatorial optimizers. The source code of this research can be downloaded from \url{https://github.com/RuiZhong961230/LLMO}.
Abstract:This paper introduces a novel competitive mechanism into differential evolution (DE), presenting an effective DE variant named competitive DE (CDE). CDE features a simple yet efficient mutation strategy: DE/winner-to-best/1. Essentially, the proposed DE/winner-to-best/1 strategy can be recognized as an intelligent integration of the existing mutation strategies of DE/rand-to-best/1 and DE/cur-to-best/1. The incorporation of DE/winner-to-best/1 and the competitive mechanism provide new avenues for advancing DE techniques. Moreover, in CDE, the scaling factor $F$ and mutation rate $Cr$ are determined by a random number generator following a normal distribution, as suggested by previous research. To investigate the performance of the proposed CDE, comprehensive numerical experiments are conducted on CEC2017 and engineering simulation optimization tasks, with CMA-ES, JADE, and other state-of-the-art optimizers and DE variants employed as competitor algorithms. The experimental results and statistical analyses highlight the promising potential of CDE as an alternative optimizer for addressing diverse optimization challenges.
Abstract:High utility and rigorous data privacy are of the main goals of a federated learning (FL) system, which learns a model from the data distributed among some clients. The latter has been tried to achieve by using differential privacy in FL (DPFL). There is often heterogeneity in clients privacy requirements, and existing DPFL works either assume uniform privacy requirements for clients or are not applicable when server is not fully trusted (our setting). Furthermore, there is often heterogeneity in batch and/or dataset size of clients, which as shown, results in extra variation in the DP noise level across clients model updates. With these sources of heterogeneity, straightforward aggregation strategies, e.g., assigning clients aggregation weights proportional to their privacy parameters will lead to lower utility. We propose Robust-HDP, which efficiently estimates the true noise level in clients model updates and reduces the noise-level in the aggregated model updates considerably. Robust-HDP improves utility and convergence speed, while being safe to the clients that may maliciously send falsified privacy parameter to server. Extensive experimental results on multiple datasets and our theoretical analysis confirm the effectiveness of Robust-HDP. Our code can be found here.