In Chinese text recognition, to compensate for the insufficient local data and improve the performance of local few-shot character recognition, it is often necessary for one organization to collect a large amount of data from similar organizations. However, due to the natural presence of private information in text data, different organizations are unwilling to share private data, such as addresses and phone numbers. Therefore, it becomes increasingly important to design a privacy-preserving collaborative training framework for the Chinese text recognition task. In this paper, we introduce personalized federated learning (pFL) into the Chinese text recognition task and propose the pFedCR algorithm, which significantly improves the model performance of each client (organization) without sharing private data. Specifically, based on CRNN, to handle the non-iid problem of client data, we add several attention layers to the model and design a two-stage training approach for the client. In addition, we fine-tune the output layer of the model using a virtual dataset on the server, mitigating the problem of character imbalance in Chinese documents. The proposed approach is validated on public benchmarks and two self-built real-world industrial scenario datasets. The experimental results show that the pFedCR algorithm can improve the performance of local personalized models while also improving their generalization performance on other client data domains. Compared to local training within an organization, pFedCR improves model performance by about 20%. Compared to other state-of-the-art personalized federated learning methods, pFedCR improves performance by 6%~8%. Moreover, through federated learning, pFedCR can correct erroneous information in the ground truth.
Federated learning is a privacy-preserving collaborative learning approach. Recently, some studies have proposed the semi-supervised federated learning setting to handle the commonly seen real-world scenarios with labeled data on the server and unlabeled data on the clients. However, existing methods still face challenges such as high communication costs, training pressure on the client devices, and distribution differences among the server and the clients. In this paper, we introduce the powerful pre-trained diffusion models into federated learning and propose FedDISC, a Federated Diffusion Inspired Semi-supervised Co-training method, to address these challenges. Specifically, we first extract prototypes from the labeled data on the server and send them to the clients. The clients then use these prototypes to predict pseudo-labels of the local data, and compute the cluster centroids and domain-specific features to represent their personalized distributions. After adding noise, the clients send these features and their corresponding pseudo-labels back to the server, which uses a pre-trained diffusion model to conditionally generate pseudo-samples complying with the client distributions and train an aggregated model on them. Our method does not require local training and only involves forward inference on the clients. Our extensive experiments on DomainNet, Openimage, and NICO++ demonstrate that the proposed FedDISC method effectively addresses the one-shot semi-supervised problem on Non-IID clients and outperforms the compared SOTA methods. We also demonstrate through visualization that it is of neglectable possibility for FedDISC to leak privacy-sensitive information of the clients.
Domain generalization (DG), aiming at models able to work on multiple unseen domains, is a must-have characteristic of general artificial intelligence. DG based on single source domain training data is more challenging due to the lack of comparable information to help identify domain invariant features. In this paper, it is determined that the domain invariant features could be contained in the single source domain training samples, then the task is to find proper ways to extract such domain invariant features from the single source domain samples. An assumption is made that the domain invariant features are closely related to the frequency. Then, a new method that learns through multiple frequency domains is proposed. The key idea is, dividing the frequency domain of each original image into multiple subdomains, and learning features in the subdomain by a designed two branches network. In this way, the model is enforced to learn features from more samples of the specifically limited spectrum, which increases the possibility of obtaining the domain invariant features that might have previously been defiladed by easily learned features. Extensive experimental investigation reveals that 1) frequency decomposition can help the model learn features that are difficult to learn. 2) the proposed method outperforms the state-of-the-art methods of single-source domain generalization.
Let us rethink the real-world scenarios that require human motion prediction techniques, such as human-robot collaboration. Current works simplify the task of predicting human motions into a one-off process of forecasting a short future sequence (usually no longer than 1 second) based on a historical observed one. However, such simplification may fail to meet practical needs due to the neglect of the fact that motion prediction in real applications is not an isolated ``observe then predict'' unit, but a consecutive process composed of many rounds of such unit, semi-overlapped along the entire sequence. As time goes on, the predicted part of previous round has its corresponding ground truth observable in the new round, but their deviation in-between is neither exploited nor able to be captured by existing isolated learning fashion. In this paper, we propose DeFeeNet, a simple yet effective network that can be added on existing one-off prediction models to realize deviation perception and feedback when applied to consecutive motion prediction task. At each prediction round, the deviation generated by previous unit is first encoded by our DeFeeNet, and then incorporated into the existing predictor to enable a deviation-aware prediction manner, which, for the first time, allows for information transmit across adjacent prediction units. We design two versions of DeFeeNet as MLP-based and GRU-based, respectively. On Human3.6M and more complicated BABEL, experimental results indicate that our proposed network improves consecutive human motion prediction performance regardless of the basic model.
Predicting high-fidelity future human poses, from a historically observed sequence, is decisive for intelligent robots to interact with humans. Deep end-to-end learning approaches, which typically train a generic pre-trained model on external datasets and then directly apply it to all test samples, emerge as the dominant solution to solve this issue. Despite encouraging progress, they remain non-optimal, as the unique properties (e.g., motion style, rhythm) of a specific sequence cannot be adapted. More generally, at test-time, once encountering unseen motion categories (out-of-distribution), the predicted poses tend to be unreliable. Motivated by this observation, we propose a novel test-time adaptation framework that leverages two self-supervised auxiliary tasks to help the primary forecasting network adapt to the test sequence. In the testing phase, our model can adjust the model parameters by several gradient updates to improve the generation quality. However, due to catastrophic forgetting, both auxiliary tasks typically tend to the low ability to automatically present the desired positive incentives for the final prediction performance. For this reason, we also propose a meta-auxiliary learning scheme for better adaptation. In terms of general setup, our approach obtains higher accuracy, and under two new experimental designs for out-of-distribution data (unseen subjects and categories), achieves significant improvements.
Resource sharing between multiple workloads has become a prominent practice among cloud service providers, motivated by demand for improved resource utilization and reduced cost of ownership. Effective resource sharing, however, remains an open challenge due to the adverse effects that resource contention can have on high-priority, user-facing workloads with strict Quality of Service (QoS) requirements. Although recent approaches have demonstrated promising results, those works remain largely impractical in public cloud environments since workloads are not known in advance and may only run for a brief period, thus prohibiting offline learning and significantly hindering online learning. In this paper, we propose RAPID, a novel framework for fast, fully-online resource allocation policy learning in highly dynamic operating environments. RAPID leverages lightweight QoS predictions, enabled by domain-knowledge-inspired techniques for sample efficiency and bias reduction, to decouple control from conventional feedback sources and guide policy learning at a rate orders of magnitude faster than prior work. Evaluation on a real-world server platform with representative cloud workloads confirms that RAPID can learn stable resource allocation policies in minutes, as compared with hours in prior state-of-the-art, while improving QoS by 9.0x and increasing best-effort workload performance by 19-43%.
We present FengWu, an advanced data-driven global medium-range weather forecast system based on Artificial Intelligence (AI). Different from existing data-driven weather forecast methods, FengWu solves the medium-range forecast problem from a multi-modal and multi-task perspective. Specifically, a deep learning architecture equipped with model-specific encoder-decoders and cross-modal fusion Transformer is elaborately designed, which is learned under the supervision of an uncertainty loss to balance the optimization of different predictors in a region-adaptive manner. Besides this, a replay buffer mechanism is introduced to improve medium-range forecast performance. With 39-year data training based on the ERA5 reanalysis, FengWu is able to accurately reproduce the atmospheric dynamics and predict the future land and atmosphere states at 37 vertical levels on a 0.25{\deg} latitude-longitude resolution. Hindcasts of 6-hourly weather in 2018 based on ERA5 demonstrate that FengWu performs better than GraphCast in predicting 80\% of the 880 reported predictands, e.g., reducing the root mean square error (RMSE) of 10-day lead global z500 prediction from 733 to 651 $m^{2}/s^2$. In addition, the inference cost of each iteration is merely 600ms on NVIDIA Tesla A100 hardware. The results suggest that FengWu can significantly improve the forecast skill and extend the skillful global medium-range weather forecast out to 10.75 days lead (with ACC of z500 > 0.6) for the first time.
Language models have achieved impressive results in natural language processing tasks, but their ability to perform symbolic operations and arithmetic operations, remains limited, which attribute to their learn the rules implicitly from data. We explore how to incorporate compiled neural networks (CoNNs) which weight is specially designed, into the architecture of language models to enable the language model trained by gradient to obtain fully rule comprehension ability. The incorporation of compiled neural networks offers a promising direction for improving the performance of language models on compound tasks, particularly in areas that require a deeper comprehension of abstract rules beyond recognizing patterns in training data. Our method, which call "Neural Comprehension", helps language models achieve absolute accuracy in symbolic operations, thereby enhancing their ability for rule reasoning, symbolic reasoning, and arithmetic reasoning. Our code is publicly available at: \url{https://github.com/WENGSYX/Neural-Comprehension}.
In many applications, ads are displayed together with the prices, so as to provide a direct comparison among similar products or services. The price-displaying feature not only influences the consumers' decisions, but also affects the advertisers' bidding behaviors. In this paper, we study ad auctions with display prices from the perspective of mechanism design, in which advertisers are asked to submit both the costs and prices of their products. We provide a characterization for all incentive compatible auctions with display prices, and use it to design auctions under two scenarios. In the former scenario, the display prices are assumed to be exogenously determined. For this setting, we derive the welfare-maximizing and revenue-maximizing auctions for any realization of the price profile. In the latter, advertisers are allowed to strategize display prices in their own interests. We investigate two families of allocation policies within the scenario and identify the equilibrium prices accordingly. Our results reveal that the display prices do affect the design of ad auctions and the platform can leverage such information to optimize the performance of ad delivery.