Driven by the teacher-student paradigm, knowledge distillation is one of the de facto ways for language model compression. Recent studies have uncovered that conventional distillation is less effective when facing a large capacity gap between the teacher and the student, and introduced teacher assistant-based distillation to bridge the gap. As a connection, the scale and the performance of the teacher assistant is crucial for transferring the knowledge from the teacher to the student. However, existing teacher assistant-based methods manually select the scale of the teacher assistant, which fails to identify the teacher assistant with the optimal scale-performance tradeoff. To this end, we propose an Automatic Distillation Schedule (AutoDisc) for large language model compression. In particular, AutoDisc first specifies a set of teacher assistant candidates at different scales with gridding and pruning, and then optimizes all candidates in an once-for-all optimization with two approximations. The best teacher assistant scale is automatically selected according to the scale-performance tradeoff. AutoDisc is evaluated with an extensive set of experiments on a language understanding benchmark GLUE. Experimental results demonstrate the improved performance and applicability of our AutoDisc. We further apply AutoDisc on a language model with over one billion parameters and show the scalability of AutoDisc.
With the development of next-generation wireless networks, the Internet of Things (IoT) is evolving towards the intelligent IoT (iIoT), where intelligent applications usually have stringent delay and jitter requirements. In order to provide low-latency services to heterogeneous users in the emerging iIoT, multi-tier computing was proposed by effectively combining edge computing and fog computing. More specifically, multi-tier computing systems compensate for cloud computing through task offloading and dispersing computing tasks to multi-tier nodes along the continuum from the cloud to things. In this paper, we investigate key techniques and directions for wireless communications and resource allocation approaches to enable task offloading in multi-tier computing systems. A multi-tier computing model, with its main functionality and optimization methods, is presented in details. We hope that this paper will serve as a valuable reference and guide to the theoretical, algorithmic, and systematic opportunities of multi-tier computing towards next-generation wireless networks.
Measuring perceptual color differences (CDs) is of great importance in modern smartphone photography. Despite the long history, most CD measures have been constrained by psychophysical data of homogeneous color patches or a limited number of simplistic natural images. It is thus questionable whether existing CD measures generalize in the age of smartphone photography characterized by greater content complexities and learning-based image signal processors. In this paper, we put together so far the largest image dataset for perceptual CD assessment, in which the natural images are 1) captured by six flagship smartphones, 2) altered by Photoshop, 3) post-processed by built-in filters of the smartphones, and 4) reproduced with incorrect color profiles. We then conduct a large-scale psychophysical experiment to gather perceptual CDs of 30,000 image pairs in a carefully controlled laboratory environment. Based on the newly established dataset, we make one of the first attempts to construct an end-to-end learnable CD formula based on a lightweight neural network, as a generalization of several previous metrics. Extensive experiments demonstrate that the optimized formula outperforms 28 existing CD measures by a large margin, offers reasonable local CD maps without the use of dense supervision, generalizes well to color patch data, and empirically behaves as a proper metric in the mathematical sense.
In the intelligent communication field, deep learning (DL) has attracted much attention due to its strong fitting ability and data-driven learning capability. Compared with the typical DL feedforward network structures, an enhancement structure with direct data feedback have been studied and proved to have better performance than the feedfoward networks. However, due to the above simple feedback methods lack sufficient analysis and learning ability on the feedback data, it is inadequate to deal with more complicated nonlinear systems and therefore the performance is limited for further improvement. In this paper, a novel multi-agent feedback enabled neural network (MAFENN) framework is proposed, which make the framework have stronger feedback learning capabilities and more intelligence on feature abstraction, denoising or generation, etc. Furthermore, the MAFENN framework is theoretically formulated into a three-player Feedback Stackelberg game, and the game is proved to converge to the Feedback Stackelberg equilibrium. The design of MAFENN framework and algorithm are dedicated to enhance the learning capability of the feedfoward DL networks or their variations with the simple data feedback. To verify the MAFENN framework's feasibility in wireless communications, a multi-agent MAFENN based equalizer (MAFENN-E) is developed for wireless fading channels with inter-symbol interference (ISI). Experimental results show that when the quadrature phase-shift keying (QPSK) modulation scheme is adopted, the SER performance of our proposed method outperforms that of the traditional equalizers by about 2 dB in linear channels. When in nonlinear channels, the SER performance of our proposed method outperforms that of either traditional or DL based equalizers more significantly, which shows the effectiveness and robustness of our proposal in the complex channel environment.
Modelling prosody variation is critical for synthesizing natural and expressive speech in end-to-end text-to-speech (TTS) systems. In this paper, a cross-utterance conditional VAE (CUC-VAE) is proposed to estimate a posterior probability distribution of the latent prosody features for each phoneme by conditioning on acoustic features, speaker information, and text features obtained from both past and future sentences. At inference time, instead of the standard Gaussian distribution used by VAE, CUC-VAE allows sampling from an utterance-specific prior distribution conditioned on cross-utterance information, which allows the prosody features generated by the TTS system to be related to the context and is more similar to how humans naturally produce prosody. The performance of CUC-VAE is evaluated via a qualitative listening test for naturalness, intelligibility and quantitative measurements, including word error rates and the standard deviation of prosody attributes. Experimental results on LJ-Speech and LibriTTS data show that the proposed CUC-VAE TTS system improves naturalness and prosody diversity with clear margins.
The growing inequality in gig work between workers and platforms has become a critical social issue as gig work plays an increasingly prominent role in the future of work. The AI inequality is caused by (1) the technology divide in who has access to AI technologies in gig work; and (2) the data divide in who owns the data in gig work leads to unfair working conditions, growing pay gap, neglect of workers' diverse preferences, and workers' lack of trust in the platforms. In this position paper, we argue that a bottom-up approach that empowers individual workers to access AI-enabled work planning support and share data among a group of workers through a network of end-user-programmable intelligent assistants is a practical way to bridge AI inequality in gig work under the current paradigm of privately owned platforms. This position paper articulates a set of research challenges, potential approaches, and community engagement opportunities, seeking to start a dialogue on this important research topic in the interdisciplinary CHIWORK community.
Deep learning has shown its human-level performance in various applications. However, current deep learning models are characterised by catastrophic forgetting of old knowledge when learning new classes. This poses a challenge particularly in intelligent diagnosis systems where initially only training data of a limited number of diseases are available. In this case, updating the intelligent system with data of new diseases would inevitably downgrade its performance on previously learned diseases. Inspired by the process of learning new knowledge in human brains, we propose a Bayesian generative model for continual learning built on a fixed pre-trained feature extractor. In this model, knowledge of each old class can be compactly represented by a collection of statistical distributions, e.g. with Gaussian mixture models, and naturally kept from forgetting in continual learning over time. Unlike existing class-incremental learning methods, the proposed approach is not sensitive to the continual learning process and can be additionally well applied to the data-incremental learning scenario. Experiments on multiple medical and natural image classification tasks showed that the proposed approach outperforms state-of-the-art approaches which even keep some images of old classes during continual learning of new classes.
Graph Neural Networks (GNNs) are powerful tools for graph representation learning. Despite their rapid development, GNNs also faces some challenges, such as over-fitting, over-smoothing, and non-robustness. Previous works indicate that these problems can be alleviated by random dropping methods, which integrate noises into models by randomly masking parts of the input. However, some open-ended problems of random dropping on GNNs remain to solve. First, it is challenging to find a universal method that are suitable for all cases considering the divergence of different datasets and models. Second, random noises introduced to GNNs cause the incomplete coverage of parameters and unstable training process. In this paper, we propose a novel random dropping method called DropMessage, which performs dropping operations directly on the message matrix and can be applied to any message-passing GNNs. Furthermore, we elaborate the superiority of DropMessage: it stabilizes the training process by reducing sample variance; it keeps information diversity from the perspective of information theory, which makes it a theoretical upper bound of other methods. Also, we unify existing random dropping methods into our framework and analyze their effects on GNNs. To evaluate our proposed method, we conduct experiments that aims for multiple tasks on five public datasets and two industrial datasets with various backbone models. The experimental results show that DropMessage has both advantages of effectiveness and generalization.
Task-oriented communication is a new paradigm that aims at providing efficient connectivity for accomplishing intelligent tasks rather than the reception of every transmitted bit. In this paper, a deep learning-based task-oriented communication architecture is proposed where the user extracts, compresses and transmits semantics in an end-to-end (E2E) manner. Furthermore, an approach is proposed to compress the semantics according to their importance relevant to the task, namely, adaptable semantic compression (ASC). Assuming a delay-intolerant system, supporting multiple users indicates a problem that executing with the higher compression ratio requires fewer channel resources but leads to the distortion of semantics, while executing with the lower compression ratio requires more channel resources and thus may lead to a transmission failure due to delay constraint. To solve the problem, both compression ratio and resource allocation are optimized for the task-oriented communication system to maximize the success probability of tasks. Specifically, due to the nonconvexity of the problem, we propose a compression ratio and resource allocation (CRRA) algorithm by separating the problem into two subproblems and solving iteratively to obtain the convergent solution. Furthermore, considering the scenarios where users have various service levels, a compression ratio, resource allocation, and user selection (CRRAUS) algorithm is proposed to deal with the problem. In CRRAUS, users are adaptively selected to complete the corresponding intelligent tasks based on branch and bound method at the expense of higher algorithm complexity compared with CRRA. Simulation results show that the proposed CRRA and CRRAUS algorithms can obtain at least 15% and 10% success gains over baseline algorithms, respectively.
Face privacy-preserving is one of the hotspots that arises dramatic interests of research. However, the existing face privacy-preserving methods aim at causing the missing of semantic information of face and cannot preserve the reusability of original facial information. To achieve the naturalness of the processed face and the recoverability of the original protected face, this paper proposes face privacy-preserving method based on Invertible "Mask" Network (IMN). In IMN, we introduce a Mask-net to generate "Mask" face firstly. Then, put the "Mask" face onto the protected face and generate the masked face, in which the masked face is indistinguishable from "Mask" face. Finally, "Mask" face can be put off from the masked face and obtain the recovered face to the authorized users, in which the recovered face is visually indistinguishable from the protected face. The experimental results show that the proposed method can not only effectively protect the privacy of the protected face, but also almost perfectly recover the protected face from the masked face.