Dataset distillation aims to generate small datasets with little information loss as large-scale datasets for reducing storage and training costs. Recent state-of-the-art methods mainly constrain the sample generation process by matching synthetic images and the original ones regarding gradients, embedding distributions, or training trajectories. Although there are various matching objectives, currently the method for selecting original images is limited to naive random sampling. We argue that random sampling inevitably involves samples near the decision boundaries, which may provide large or noisy matching targets. Besides, random sampling cannot guarantee the evenness and diversity of the sample distribution. These factors together lead to large optimization oscillations and degrade the matching efficiency. Accordingly, we propose a novel matching strategy named as \textbf{D}ataset distillation by \textbf{RE}present\textbf{A}tive \textbf{M}atching (DREAM), where only representative original images are selected for matching. DREAM is able to be easily plugged into popular dataset distillation frameworks and reduce the matching iterations by 10 times without performance drop. Given sufficient training time, DREAM further provides significant improvements and achieves state-of-the-art performances.
Well-performed deep neural networks (DNNs) generally require massive labelled data and computational resources for training. Various watermarking techniques are proposed to protect such intellectual properties (IPs), wherein the DNN providers implant secret information into the model so that they can later claim IP ownership by retrieving their embedded watermarks with some dedicated trigger inputs. While promising results are reported in the literature, existing solutions suffer from watermark removal attacks, such as model fine-tuning and model pruning. In this paper, we propose a novel DNN watermarking solution that can effectively defend against the above attacks. Our key insight is to enhance the coupling of the watermark and model functionalities such that removing the watermark would inevitably degrade the model's performance on normal inputs. To this end, unlike previous methods relying on secret features learnt from out-of-distribution data, our method only uses features learnt from in-distribution data. Specifically, on the one hand, we propose to sample inputs from the original training dataset and fuse them as watermark triggers. On the other hand, we randomly mask model weights during training so that the information of our embedded watermarks spreads in the network. By doing so, model fine-tuning/pruning would not forget our function-coupled watermarks. Evaluation results on various image classification tasks show a 100\% watermark authentication success rate under aggressive watermark removal attacks, significantly outperforming existing solutions. Code is available: https://github.com/cure-lab/Function-Coupled-Watermark.
In this paper, we consider the inventory management (IM) problem where we need to make replenishment decisions for a large number of stock keeping units (SKUs) to balance their supply and demand. In our setting, the constraint on the shared resources (such as the inventory capacity) couples the otherwise independent control for each SKU. We formulate the problem with this structure as Shared-Resource Stochastic Game (SRSG)and propose an efficient algorithm called Context-aware Decentralized PPO (CD-PPO). Through extensive experiments, we demonstrate that CD-PPO can accelerate the learning procedure compared with standard MARL algorithms.
Recently, learned image compression has achieved remarkable performance. Entropy model, which accurately estimates the distribution of latent representation, plays an important role in boosting rate distortion performance. Most entropy models capture correlations in one dimension. However, there are channel-wise, local and global spatial correlations in latent representation. To address this issue, we propose multi-reference entropy models MEM and MEM+ to capture channel, local and global spatial contexts. We divide latent representation into slices. When decoding current slice, we use previously decoded slices as contexts and use attention map of previously decoded slice to predict global correlations in current slice. To capture local contexts, we propose enhanced checkerboard context capturing to avoid performance degradation while retaining two-pass decoding. Based on MEM and MEM+, we propose image compression models MLIC and MLIC+. Extensive experimental evaluations have shown that our MLIC and MLIC+ achieve state-of-the-art performance and they reduce BD-rate by 9.77% and 13.09% on Kodak dataset over VVC when measured in PSNR.
In the context of cell-free massive multi-input multi-output (CFmMIMO), zero-forcing precoding (ZFP) is superior in terms of spectral efficiency. However, it suffers from channel aging owing to fronthaul and processing delays. In this paper, we propose a robust scheme coined delay-tolerant zero-forcing precoding (DT-ZFP), which exploits deep learning-aided channel prediction to alleviate the effect of outdated channel state information (CSI). A predictor consisting of a bank of user-specific predictive modules is specifically designed for such a multi-user scenario. Leveraging the degree of freedom brought by the prediction horizon, the delivery of CSI and precoded data through a fronthaul network and the transmission of user data and pilots over an air interface can be parallelized. Therefore, DT-ZFP not only effectively combats channel aging but also avoids the inefficient Stop-and-Wait mechanism of the canonical ZFP in CFmMIMO.
This letter aims to clarify the impact of channel aging and phase noise on the performance of intelligent reflecting surface-aided wireless systems. We first model mathematically the outdated channel state information (CSI) due to Doppler shifts and phase noise stemming from hardware impairment. Then, a closed-form expression of achievable spectral efficiency under noisy and aged CSI is theoretically derived. Some typical simulation results to numerically demonstrate the performance impact are illustrated.
Latency is inherent in almost all real-world networked applications. In this paper, we propose a distributed allocation strategy over multi-agent networks with delayed communications. The state of each agent (or node) represents its share of assigned resources out of a fixed amount (equal to overall demand). Every node locally updates its state toward optimizing a global allocation cost function via received information of its neighbouring nodes even when the data exchange over the network is heterogeneously delayed at different links. The update is based on the alternating direction method of multipliers (ADMM) formulation subject to both sum-preserving coupling-constraint and local box-constraints. The solution is derivative-free and holds for general (not necessarily differentiable) convex cost models. We use the notion of augmented consensus over undirected networks to model delayed information exchange and for convergence analysis. We simulate our \textit{delay-tolerant} algorithm for
KDD CUP 2022 proposes a time-series forecasting task on spatial dynamic wind power dataset, in which the participants are required to predict the future generation given the historical context factors. The evaluation metrics contain RMSE and MAE. This paper describes the solution of Team 88VIP, which mainly comprises two types of models: a gradient boosting decision tree to memorize the basic data patterns and a recurrent neural network to capture the deep and latent probabilistic transitions. Ensembling these models contributes to tackle the fluctuation of wind power, and training submodels targets on the distinguished properties in heterogeneous timescales of forecasting, from minutes to days. In addition, feature engineering, imputation techniques and the design of offline evaluation are also described in details. The proposed solution achieves an overall online score of -45.213 in Phase 3.
Variance reduction techniques such as SPIDER/SARAH/STORM have been extensively studied to improve the convergence rates of stochastic non-convex optimization, which usually maintain and update a sequence of estimators for a single function across iterations. {\it What if we need to track multiple functional mappings across iterations but only with access to stochastic samples of $\mathcal{O}(1)$ functional mappings at each iteration?} There is an important application in solving an emerging family of coupled compositional optimization problems in the form of $\sum_{i=1}^m f_i(g_i(\mathbf{w}))$, where $g_i$ is accessible through a stochastic oracle. The key issue is to track and estimate a sequence of $\mathbf g(\mathbf{w})=(g_1(\mathbf{w}), \ldots, g_m(\mathbf{w}))$ across iterations, where $\mathbf g(\mathbf{w})$ has $m$ blocks and it is only allowed to probe $\mathcal{O}(1)$ blocks to attain their stochastic values and Jacobians. To improve the complexity for solving these problems, we propose a novel stochastic method named Multi-block-Single-probe Variance Reduced (MSVR) estimator to track the sequence of $\mathbf g(\mathbf{w})$. It is inspired by STORM but introduces a customized error correction term to alleviate the noise not only in stochastic samples for the selected blocks but also in those blocks that are not sampled. With the help of the MSVR estimator, we develop several algorithms for solving the aforementioned compositional problems with improved complexities across a spectrum of settings with non-convex/convex/strongly convex objectives. Our results improve upon prior ones in several aspects, including the order of sample complexities and dependence on the strong convexity parameter. Empirical studies on multi-task deep AUC maximization demonstrate the better performance of using the new estimator.