Federated learning (FL) is a promising approach to enable the future Internet of vehicles consisting of intelligent connected vehicles (ICVs) with powerful sensing, computing and communication capabilities. We consider a base station (BS) coordinating nearby ICVs to train a neural network in a collaborative yet distributed manner, in order to limit data traffic and privacy leakage. However, due to the mobility of vehicles, the connections between the BS and ICVs are short-lived, which affects the resource utilization of ICVs, and thus, the convergence speed of the training process. In this paper, we propose an accelerated FL-ICV framework, by optimizing the duration of each training round and the number of local iterations, for better convergence performance of FL. We propose a mobility-aware optimization algorithm called MOB-FL, which aims at maximizing the resource utilization of ICVs under short-lived wireless connections, so as to increase the convergence speed. Simulation results based on the beam selection and the trajectory prediction tasks verify the effectiveness of the proposed solution.
As a promising solution for model compression, knowledge distillation (KD) has been applied in recommender systems (RS) to reduce inference latency. Traditional solutions first train a full teacher model from the training data, and then transfer its knowledge (\ie \textit{soft labels}) to supervise the learning of a compact student model. However, we find such a standard distillation paradigm would incur serious bias issue -- popular items are more heavily recommended after the distillation. This effect prevents the student model from making accurate and fair recommendations, decreasing the effectiveness of RS. In this work, we identify the origin of the bias in KD -- it roots in the biased soft labels from the teacher, and is further propagated and intensified during the distillation. To rectify this, we propose a new KD method with a stratified distillation strategy. It first partitions items into multiple groups according to their popularity, and then extracts the ranking knowledge within each group to supervise the learning of the student. Our method is simple and teacher-agnostic -- it works on distillation stage without affecting the training of the teacher model. We conduct extensive theoretical and empirical studies to validate the effectiveness of our proposal. We release our code at: https://github.com/chengang95/UnKD.
3D convolutional neural networks have revealed superior performance in processing volumetric data such as video and medical imaging. However, the competitive performance by leveraging 3D networks results in huge computational costs, which are far beyond that of 2D networks. In this paper, we propose a novel Hilbert curve-based cross-dimensionality distillation approach that facilitates the knowledge of 3D networks to improve the performance of 2D networks. The proposed Hilbert Distillation (HD) method preserves the structural information via the Hilbert curve, which maps high-dimensional (>=2) representations to one-dimensional continuous space-filling curves. Since the distilled 2D networks are supervised by the curves converted from dimensionally heterogeneous 3D features, the 2D networks are given an informative view in terms of learning structural information embedded in well-trained high-dimensional representations. We further propose a Variable-length Hilbert Distillation (VHD) method to dynamically shorten the walking stride of the Hilbert curve in activation feature areas and lengthen the stride in context feature areas, forcing the 2D networks to pay more attention to learning from activation features. The proposed algorithm outperforms the current state-of-the-art distillation techniques adapted to cross-dimensionality distillation on two classification tasks. Moreover, the distilled 2D networks by the proposed method achieve competitive performance with the original 3D networks, indicating the lightweight distilled 2D networks could potentially be the substitution of cumbersome 3D networks in the real-world scenario.
Edge intelligence is an emerging paradigm for real-time training and inference at the wireless edge, thus enabling mission-critical applications. Accordingly, base stations (BSs) and edge servers (ESs) need to be densely deployed, leading to huge deployment and operation costs, in particular the energy costs. In this article, we propose a new framework called Mobility-Enhanced Edge inTelligence (MEET), which exploits the sensing, communication, computing, and self-powering capabilities of intelligent connected vehicles for the smart and green 6G networks. Specifically, the operators can incorporate infrastructural vehicles as movable BSs or ESs, and schedule them in a more flexible way to align with the communication and computation traffic fluctuations. Meanwhile, the remaining compute resources of opportunistic vehicles are exploited for edge training and inference, where mobility can further enhance edge intelligence by bringing more compute resources, communication opportunities, and diverse data. In this way, the deployment and operation costs are spread over the vastly available vehicles, so that the edge intelligence is realized cost-effectively and sustainably. Furthermore, these vehicles can be either powered by renewable energy to reduce carbon emissions, or charged more flexibly during off-peak hours to cut electricity bills.
Data-free knowledge distillation (DFKD) aims at training lightweight student networks from teacher networks without training data. Existing approaches mainly follow the paradigm of generating informative samples and progressively updating student models by targeting data priors, boundary samples or memory samples. However, it is difficult for the previous DFKD methods to dynamically adjust the generation strategy at different training stages, which in turn makes it difficult to achieve efficient and stable training. In this paper, we explore how to teach students the model from a curriculum learning (CL) perspective and propose a new approach, namely "CuDFKD", i.e., "Data-Free Knowledge Distillation with Curriculum". It gradually learns from easy samples to difficult samples, which is similar to the way humans learn. In addition, we provide a theoretical analysis of the majorization minimization (MM) algorithm and explain the convergence of CuDFKD. Experiments conducted on benchmark datasets show that with a simple course design strategy, CuDFKD achieves the best performance over state-of-the-art DFKD methods and different benchmarks, such as 95.28\% top1 accuracy of the ResNet18 model on CIFAR10, which is better than training from scratch with data. The training is fast, reaching the highest accuracy of 90\% within 30 epochs, and the variance during training is stable. Also in this paper, the applicability of CuDFKD is also analyzed and discussed.
Vehicle-to-Everything (V2X) network has enabled collaborative perception in autonomous driving, which is a promising solution to the fundamental defect of stand-alone intelligence including blind zones and long-range perception. However, the lack of datasets has severely blocked the development of collaborative perception algorithms. In this work, we release DOLPHINS: Dataset for cOllaborative Perception enabled Harmonious and INterconnected Self-driving, as a new simulated large-scale various-scenario multi-view multi-modality autonomous driving dataset, which provides a ground-breaking benchmark platform for interconnected autonomous driving. DOLPHINS outperforms current datasets in six dimensions: temporally-aligned images and point clouds from both vehicles and Road Side Units (RSUs) enabling both Vehicle-to-Vehicle (V2V) and Vehicle-to-Infrastructure (V2I) based collaborative perception; 6 typical scenarios with dynamic weather conditions make the most various interconnected autonomous driving dataset; meticulously selected viewpoints providing full coverage of the key areas and every object; 42376 frames and 292549 objects, as well as the corresponding 3D annotations, geo-positions, and calibrations, compose the largest dataset for collaborative perception; Full-HD images and 64-line LiDARs construct high-resolution data with sufficient details; well-organized APIs and open-source codes ensure the extensibility of DOLPHINS. We also construct a benchmark of 2D detection, 3D detection, and multi-view collaborative perception tasks on DOLPHINS. The experiment results show that the raw-level fusion scheme through V2X communication can help to improve the precision as well as to reduce the necessity of expensive LiDAR equipment on vehicles when RSUs exist, which may accelerate the popularity of interconnected self-driving vehicles. DOLPHINS is now available on https://dolphins-dataset.net/.
Clustering is a fundamental machine learning task which has been widely studied in the literature. Classic clustering methods follow the assumption that data are represented as features in a vectorized form through various representation learning techniques. As the data become increasingly complicated and complex, the shallow (traditional) clustering methods can no longer handle the high-dimensional data type. With the huge success of deep learning, especially the deep unsupervised learning, many representation learning techniques with deep architectures have been proposed in the past decade. Recently, the concept of Deep Clustering, i.e., jointly optimizing the representation learning and clustering, has been proposed and hence attracted growing attention in the community. Motivated by the tremendous success of deep learning in clustering, one of the most fundamental machine learning tasks, and the large number of recent advances in this direction, in this paper we conduct a comprehensive survey on deep clustering by proposing a new taxonomy of different state-of-the-art approaches. We summarize the essential components of deep clustering and categorize existing methods by the ways they design interactions between deep representation learning and clustering. Moreover, this survey also provides the popular benchmark datasets, evaluation metrics and open-source implementations to clearly illustrate various experimental settings. Last but not least, we discuss the practical applications of deep clustering and suggest challenging topics deserving further investigations as future directions.
Virtual try-on(VTON) aims at fitting target clothes to reference person images, which is widely adopted in e-commerce.Existing VTON approaches can be narrowly categorized into Parser-Based(PB) and Parser-Free(PF) by whether relying on the parser information to mask the persons' clothes and synthesize try-on images. Although abandoning parser information has improved the applicability of PF methods, the ability of detail synthesizing has also been sacrificed. As a result, the distraction from original cloth may persistin synthesized images, especially in complicated postures and high resolution applications. To address the aforementioned issue, we propose a novel PF method named Regional Mask Guided Network(RMGN). More specifically, a regional mask is proposed to explicitly fuse the features of target clothes and reference persons so that the persisted distraction can be eliminated. A posture awareness loss and a multi-level feature extractor are further proposed to handle the complicated postures and synthesize high resolution images. Extensive experiments demonstrate that our proposed RMGN outperforms both state-of-the-art PB and PF methods.Ablation studies further verify the effectiveness ofmodules in RMGN.
Federated edge learning (FEEL) is a promising distributed machine learning (ML) framework to drive edge intelligence applications. However, due to the dynamic wireless environments and the resource limitations of edge devices, communication becomes a major bottleneck. In this work, we propose time-correlated sparsification with hybrid aggregation (TCS-H) for communication-efficient FEEL, which exploits jointly the power of model compression and over-the-air computation. By exploiting the temporal correlations among model parameters, we construct a global sparsification mask, which is identical across devices, and thus enables efficient model aggregation over-the-air. Each device further constructs a local sparse vector to explore its own important parameters, which are aggregated via digital communication with orthogonal multiple access. We further design device scheduling and power allocation algorithms for TCS-H. Experiment results show that, under limited communication resources, TCS-H can achieve significantly higher accuracy compared to the conventional top-K sparsification with orthogonal model aggregation, with both i.i.d. and non-i.i.d. data distributions.
Cooperative perception of connected vehicles comes to the rescue when the field of view restricts stand-alone intelligence. While raw-level cooperative perception preserves most information to guarantee accuracy, it is demanding in communication bandwidth and computation power. Therefore, it is important to schedule the most beneficial vehicle to share its sensor in terms of supplementary view and stable network connection. In this paper, we present a model of raw-level cooperative perception and formulate the energy minimization problem of sensor sharing scheduling as a variant of the Multi-Armed Bandit (MAB) problem. Specifically, volatility of the neighboring vehicles, heterogeneity of V2X channels, and the time-varying traffic context are taken into consideration. Then we propose an online learning-based algorithm with logarithmic performance loss, achieving a decent trade-off between exploration and exploitation. Simulation results under different scenarios indicate that the proposed algorithm quickly learns to schedule the optimal cooperative vehicle and saves more energy as compared to baseline algorithms.