Federated learning (FL) enables edge devices to collaboratively learn a model in a distributed fashion. Many existing researches have focused on improving communication efficiency of high-dimensional models and addressing bias caused by local updates. However, most of FL algorithms are either based on reliable communications or assume fixed and known unreliability characteristics. In practice, networks could suffer from dynamic channel conditions and non-deterministic disruptions, with time-varying and unknown characteristics. To this end, in this paper we propose a sparsity enabled FL framework with both communication efficiency and bias reduction, termed as SAFARI. It makes novel use of a similarity among client models to rectify and compensate for bias that is resulted from unreliable communications. More precisely, sparse learning is implemented on local clients to mitigate communication overhead, while to cope with unreliable communications, a similarity-based compensation method is proposed to provide surrogates for missing model updates. We analyze SAFARI under bounded dissimilarity and with respect to sparse models. It is demonstrated that SAFARI under unreliable communications is guaranteed to converge at the same rate as the standard FedAvg with perfect communications. Implementations and evaluations on CIFAR-10 dataset validate the effectiveness of SAFARI by showing that it can achieve the same convergence speed and accuracy as FedAvg with perfect communications, with up to 80% of the model weights being pruned and a high percentage of client updates missing in each round.
Point normal, as an intrinsic geometric property of 3D objects, not only serves conventional geometric tasks such as surface consolidation and reconstruction, but also facilitates cutting-edge learning-based techniques for shape analysis and generation. In this paper, we propose a normal refinement network, called Refine-Net, to predict accurate normals for noisy point clouds. Traditional normal estimation wisdom heavily depends on priors such as surface shapes or noise distributions, while learning-based solutions settle for single types of hand-crafted features. Differently, our network is designed to refine the initial normal of each point by extracting additional information from multiple feature representations. To this end, several feature modules are developed and incorporated into Refine-Net by a novel connection module. Besides the overall network architecture of Refine-Net, we propose a new multi-scale fitting patch selection scheme for the initial normal estimation, by absorbing geometry domain knowledge. Also, Refine-Net is a generic normal estimation framework: 1) point normals obtained from other methods can be further refined, and 2) any feature module related to the surface geometric structures can be potentially integrated into the framework. Qualitative and quantitative evaluations demonstrate the clear superiority of Refine-Net over the state-of-the-arts on both synthetic and real-scanned datasets. Our code is available at https://github.com/hrzhou2/refinenet.
Recently, image quality has been generally describedby a mean opinion score (MOS). However, we observe that thequality scores of an image given by a group of subjects are verysubjective and diverse. Thus it is not enough to use a MOS todescribe the image quality. In this paper, we propose to describeimage quality using a parameterized distribution rather thana MOS, and an objective method is also proposed to predictthe image quality score distribution (IQSD). At first, the LIVEdatabase is re-recorded. Specifically, we have invited a largegroup of subjects to evaluate the quality of all images in theLIVE database, and each image is evaluated by a large numberof subjects (187 valid subjects), whose scores can form a reliableIQSD. By analyzing the obtained subjective quality scores, wefind that the IQSD can be well modeled by an alpha stable model,and it can reflect much more information than a single MOS, suchas the skewness of opinion score, the subject diversity and themaximum probability score for an image. Therefore, we proposeto model the IQSD using the alpha stable model. Moreover, wepropose a framework and an algorithm to predict the alphastable model based IQSD, where quality features are extractedfrom each image based on structural information and statisticalinformation, and support vector regressors are trained to predictthe alpha stable model parameters. Experimental results verifythe feasibility of using alpha stable model to describe the IQSD,and prove the effectiveness of objective alpha stable model basedIQSD prediction method.
Doppler shift is an important measurement for localization and synchronization (LAS), and is available in various practical systems. Existing studies on LAS techniques in a time division broadcast LAS system (TDBS) only use sequential time-of-arrival (TOA) measurements from the broadcast signals. In this paper, we develop a new optimal LAS method in the TDBS, namely LAS-SDT, by taking advantage of the sequential Doppler shift and TOA measurements. It achieves higher accuracy compared with the conventional TOA-only method for user devices (UDs) with motion and clock drift. Another two variant methods, LAS-SDT-v for the case with UD velocity aiding, and LAS-SDT-k for the case with UD clock drift aiding, are developed. We derive the Cramer-Rao lower bound (CRLB) for these different cases. We show analytically that the accuracies of the estimated UD position, clock offset, velocity and clock drift are all significantly higher than those of the conventional LAS method using TOAs only. Numerical results corroborate the theoretical analysis and show the optimal estimation performance of the LAS-SDT.
Video content is multifaceted, consisting of objects, scenes, interactions or actions. The existing datasets mostly label only one of the facets for model training, resulting in the video representation that biases to only one facet depending on the training dataset. There is no study yet on how to learn a video representation from multifaceted labels, and whether multifaceted information is helpful for video representation learning. In this paper, we propose a new learning framework, MUlti-Faceted Integration (MUFI), to aggregate facets from different datasets for learning a representation that could reflect the full spectrum of video content. Technically, MUFI formulates the problem as visual-semantic embedding learning, which explicitly maps video representation into a rich semantic embedding space, and jointly optimizes video representation from two perspectives. One is to capitalize on the intra-facet supervision between each video and its own label descriptions, and the second predicts the "semantic representation" of each video from the facets of other datasets as the inter-facet supervision. Extensive experiments demonstrate that learning 3D CNN via our MUFI framework on a union of four large-scale video datasets plus two image datasets leads to superior capability of video representation. The pre-learnt 3D CNN with MUFI also shows clear improvements over other approaches on several downstream video applications. More remarkably, MUFI achieves 98.1%/80.9% on UCF101/HMDB51 for action recognition and 101.5% in terms of CIDEr-D score on MSVD for video captioning.
A two-way time-of-arrival (TOA system is composed of anchor nodes (ANs and user devices (UDs . Two-way TOA measurements between AN-UD pairs are obtained via round-trip communications to achieve localization and synchronization (LAS for a UD. Existing LAS method for a moving UD with clock drift adopts an iterative algorithm, which requires accurate initialization and has high computational complexity. In this paper, we propose a new closed-form two-way TOA LAS approach, namely CFTWLAS, which does not require initialization, has low complexity and empirically achieves optimal LAS accuracy. We first linearize the LAS problem by squaring and differencing the two-way TOA equations. We employ two auxiliary variables to simplify the problem to finding the analytical solution of quadratic equations. Due to the measurement noise, we can only obtain a raw LAS estimation from the solution of the auxiliary variables. Then, a weighted least squares step is applied to further refine the raw estimation. We analyze the theoretical error of the new CFTWLAS and show that it empirically reaches the Cramer-Rao lower bound (CRLB with sufficient ANs under the condition with proper geometry and small noise. Numerical results in a 3D scenario verify the theoretical analysis that the estimation accuracy of the new CFTWLAS method reaches CRLB in the presented experiments when the number of the ANs is large, the geometry is appropriate, and the noise is small. Unlike the iterative method whose complexity increases with the iteration count, the new CFTWLAS has constant low complexity.
The plethora of Internet of Things (IoT) devices leads to explosive network traffic. The network traffic classification (NTC) is an essential tool to explore behaviours of network flows, and NTC is required for Internet service providers (ISPs) to manage the performance of the IoT network. We propose a novel network data representation, treating the traffic data as a series of images. Thus, the network data is realized as a video stream to employ time-distributed (TD) feature learning. The intra-temporal information within the network statistical data is learned using convolutional neural networks (CNN) and long short-term memory (LSTM), and the inter pseudo-temporal feature among the flows is learned by TD multi-layer perceptron (MLP). We conduct experiments using a large data-set with more number of classes. The experimental result shows that the TD feature learning elevates the network classification performance by 10%.
Cloud virtual reality (VR) gaming traffic characteristics such as frame size, inter-arrival time, and latency need to be carefully studied as a first step toward scalable VR cloud service provisioning. To this end, in this paper we analyze the behavior of VR gaming traffic and Quality of Service (QoS) when VR rendering is conducted remotely in the cloud. We first build a VR testbed utilizing a cloud server, a commercial VR headset, and an off-the-shelf WiFi router. Using this testbed, we collect and process cloud VR gaming traffic data from different games under a number of network conditions and fixed and adaptive video encoding schemes. To analyze the application-level characteristics such as video frame size, frame inter-arrival time, frame loss and frame latency, we develop an interval threshold based identification method for video frames. Based on the frame identification results, we present two statistical models that capture the behaviour of the VR gaming video traffic. The models can be used by researchers and practitioners to generate VR traffic models for simulations and experiments - and are paramount in designing advanced radio resource management (RRM) and network optimization for cloud VR gaming services. To the best of the authors' knowledge, this is the first measurement study and analysis conducted using a commercial cloud VR gaming platform, and under both fixed and adaptive bitrate streaming. We make our VR traffic data-sets publicly available for further research by the community.
For people who ardently love painting but unfortunately have visual impairments, holding a paintbrush to create a work is a very difficult task. People in this special group are eager to pick up the paintbrush, like Leonardo da Vinci, to create and make full use of their own talents. Therefore, to maximally bridge this gap, we propose a painting navigation system to assist blind people in painting and artistic creation. The proposed system is composed of cognitive system and guidance system. The system adopts drawing board positioning based on QR code, brush navigation based on target detection and bush real-time positioning. Meanwhile, this paper uses human-computer interaction on the basis of voice and a simple but efficient position information coding rule. In addition, we design a criterion to efficiently judge whether the brush reaches the target or not. According to the experimental results, the thermal curves extracted from the faces of testers show that it is relatively well accepted by blindfolded and even blind testers. With the prompt frequency of 1s, the painting navigation system performs best with the completion degree of 89% with SD of 8.37% and overflow degree of 347% with SD of 162.14%. Meanwhile, the excellent and good types of brush tip trajectory account for 74%, and the relative movement distance is 4.21 with SD of 2.51. This work demonstrates that it is practicable for the blind people to feel the world through the brush in their hands. In the future, we plan to deploy Angle's Eyes on the phone to make it more portable. The demo video of the proposed painting navigation system is available at: https://doi.org/10.6084/m9.figshare.9760004.v1.
We present a novel direction-aware feature-level frequency decomposition network for single image deraining. Compared with existing solutions, the proposed network has three compelling characteristics. First, unlike previous algorithms, we propose to perform frequency decomposition at feature-level instead of image-level, allowing both low-frequency maps containing structures and high-frequency maps containing details to be continuously refined during the training procedure. Second, we further establish communication channels between low-frequency maps and high-frequency maps to interactively capture structures from high-frequency maps and add them back to low-frequency maps and, simultaneously, extract details from low-frequency maps and send them back to high-frequency maps, thereby removing rain streaks while preserving more delicate features in the input image. Third, different from existing algorithms using convolutional filters consistent in all directions, we propose a direction-aware filter to capture the direction of rain streaks in order to more effectively and thoroughly purge the input images of rain streaks. We extensively evaluate the proposed approach in three representative datasets and experimental results corroborate our approach consistently outperforms state-of-the-art deraining algorithms.