Future wireless communications are largely inclined to deploy a massive number of antennas at the base stations (BS) by exploiting energy-efficient and environmentally friendly technologies. An emerging technology called dynamic metasurface antennas (DMAs) is promising to realize such massive antenna arrays with reduced physical size, hardware cost, and power consumption. This paper aims to optimize the energy efficiency (EE) performance of DMAs-assisted massive MIMO uplink communications. We propose an algorithmic framework for designing the transmit precoding of each multi-antenna user and the DMAs tuning strategy at the BS to maximize the EE performance, considering the availability of the instantaneous and statistical channel state information (CSI), respectively. Specifically, the proposed framework includes Dinkelbach's transform, alternating optimization, and deterministic equivalent methods. In addition, we obtain a closed-form solution to the optimal transmit signal directions for the statistical CSI case, which simplifies the corresponding transmission design. The numerical results show good convergence performance of our proposed algorithms as well as considerable EE performance gains of the DMAs-assisted massive MIMO uplink communications over the baseline schemes.
Lesion synthesis received much attention with the rise of efficient generative models for augmenting training data, drawing lesion evolution scenarios, or aiding expert training. The quality and diversity of synthesized data are highly dependent on the annotated data used to train the models, which not rarely struggle to derive very different yet realistic samples from the training ones. That adds an inherent bias to lesion segmentation algorithms and limits synthesizing lesion evolution scenarios efficiently. This paper presents a method for decoupling shape and density for liver lesion synthesis, creating a framework that allows straight-forwardly driving the synthesis. We offer qualitative results that show the synthesis control by modifying shape and density individually, and quantitative results that demonstrate that embedding the density information in the generator model helps to increase lesion segmentation performance compared to using the shape solely.
The nervous system encodes continuous information from the environment in the form of discrete spikes, and then decodes these to produce smooth motor actions. Understanding how spikes integrate, represent, and process information to produce behavior is one of the greatest challenges in neuroscience. Information theory has the potential to help us address this challenge. Informational analyses of deep and feed-forward artificial neural networks solving static input-output tasks, have led to the proposal of the \emph{Information Bottleneck} principle, which states that deeper layers encode more relevant yet minimal information about the inputs. Such an analyses on networks that are recurrent, spiking, and perform control tasks is relatively unexplored. Here, we present results from a Mutual Information analysis of a recurrent spiking neural network that was evolved to perform the classic pole-balancing task. Our results show that these networks deviate from the \emph{Information Bottleneck} principle prescribed for feed-forward networks.
Time series is a special type of sequence data, a set of observations collected at even intervals of time and ordered chronologically. Existing deep learning techniques use generic sequence models (e.g., recurrent neural network, Transformer model, or temporal convolutional network) for time series analysis, which ignore some of its unique properties. For example, the downsampling of time series data often preserves most of the information in the data, while this is not true for general sequence data such as text sequence and DNA sequence. Motivated by the above, in this paper, we propose a novel neural network architecture and apply it for the time series forecasting problem, wherein we conduct sample convolution and interaction at multiple resolutions for temporal modeling. The proposed architecture, namelySCINet, facilitates extracting features with enhanced predictability. Experimental results show that SCINet achieves significant prediction accuracy improvement over existing solutions across various real-world time series forecasting datasets. In particular, it can achieve high fore-casting accuracy for those temporal-spatial datasets without using sophisticated spatial modeling techniques. Our codes and data are presented in the supplemental material.
Any spatio-temporal movement or reorientation of the hand, done with the intention of conveying a specific meaning, can be considered as a hand gesture. Inputs to hand gesture recognition systems can be in several forms, such as depth images, monocular RGB, or skeleton joint points. We observe that raw depth images possess low contrasts in the hand regions of interest (ROI). They do not highlight important details to learn, such as finger bending information (whether a finger is overlapping the palm, or another finger). Recently, in deep-learning--based dynamic hand gesture recognition, researchers are tying to fuse different input modalities (e.g. RGB or depth images and hand skeleton joint points) to improve the recognition accuracy. In this paper, we focus on dynamic hand gesture (DHG) recognition using depth quantized image features and hand skeleton joint points. In particular, we explore the effect of using depth-quantized features in Convolutional Neural Network (CNN) and Recurrent Neural Network (RNN) based multi-modal fusion networks. We find that our method improves existing results on the SHREC-DHG-14 dataset. Furthermore, using our method, we show that it is possible to reduce the resolution of the input images by more than four times and still obtain comparable or better accuracy to that of the resolutions used in previous methods.
Domain name system (DNS) is a crucial part of the Internet, yet has been widely exploited by cyber attackers. Apart from making static methods like blacklists or sinkholes infeasible, some weasel attackers can even bypass detection systems with machine learning based classifiers. As a solution to this problem, we propose a robust domain detection system named HinDom. Instead of relying on manually selected features, HinDom models the DNS scene as a Heterogeneous Information Network (HIN) consist of clients, domains, IP addresses and their diverse relationships. Besides, the metapath-based transductive classification method enables HinDom to detect malicious domains with only a small fraction of labeled samples. So far as we know, this is the first work to apply HIN in DNS analysis. We build a prototype of HinDom and evaluate it in CERNET2 and TUNET. The results reveal that HinDom is accurate, robust and can identify previously unknown malicious domains.
Speech synthesis, voice cloning, and voice conversion techniques present severe privacy and security threats to users of voice user interfaces (VUIs). These techniques transform one or more elements of a speech signal, e.g., identity and emotion, while preserving linguistic information. Adversaries may use advanced transformation tools to trigger a spoofing attack using fraudulent biometrics for a legitimate speaker. Conversely, such techniques have been used to generate privacy-transformed speech by suppressing personally identifiable attributes in the voice signals, achieving anonymization. Prior works have studied the security and privacy vectors in parallel, and thus it raises alarm that if a benign user can achieve privacy by a transformation, it also means that a malicious user can break security by bypassing the anti-spoofing mechanism. In this paper, we take a step towards balancing two seemingly conflicting requirements: security and privacy. It remains unclear what the vulnerabilities in one domain imply for the other, and what dynamic interactions exist between them. A better understanding of these aspects is crucial for assessing and mitigating vulnerabilities inherent with VUIs and building effective defenses. In this paper,(i) we investigate the applicability of the current voice anonymization methods by deploying a tandem framework that jointly combines anti-spoofing and authentication models, and evaluate the performance of these methods;(ii) examining analytical and empirical evidence, we reveal a duality between the two mechanisms as they offer different ways to achieve the same objective, and we show that leveraging one vector significantly amplifies the effectiveness of the other;(iii) we demonstrate that to effectively defend from potential attacks against VUIs, it is necessary to investigate the attacks from multiple complementary perspectives(security and privacy).
As the number of IoT devices has increased rapidly, IoT botnets have exploited the vulnerabilities of IoT devices. However, it is still challenging to detect the initial intrusion on IoT devices prior to massive attacks. Recent studies have utilized power side-channel information to characterize this intrusion behavior on IoT devices but still lack real-time detection approaches. This study aimed to design an online intrusion detection system called DeepAuditor for IoT devices via power auditing. To realize the real-time system, we first proposed a lightweight power auditing device called Power Auditor. With the Power Auditor, we developed a Distributed CNN classifier for online inference in our laboratory setting. In order to protect data leakage and reduce networking redundancy, we also proposed a privacy-preserved inference protocol via Packed Homomorphic Encryption and a sliding window protocol in our system. The classification accuracy and processing time were measured in our laboratory settings. We also demonstrated that the distributed CNN design is secure against any distributed components. Overall, the measurements were shown to the feasibility of our real-time distributed system for intrusion detection on IoT devices.
Background noise and room reverberation are regarded as two major factors to degrade the subjective speech quality. In this paper, we propose an integrated framework to address simultaneous denoising and dereverberation under complicated scenario environments. It adopts a chain optimization strategy and designs four sub-stages accordingly. In the first two stages, we decouple the multi-task learning w.r.t. complex spectrum into magnitude and phase, and only implement noise and reverberation removal in the magnitude domain. Based on the estimated priors above, we further polish the spectrum in the third stage, where both magnitude and phase information are explicitly repaired with the residual learning. Due to the data mismatch and nonlinear effect of DNNs, the residual noise often exists in the DNN-processed spectrum. To resolve the problem, we adopt a light-weight algorithm as the post-processing module to capture and suppress the residual noise in the non-active regions. In the Interspeech 2021 Deep Noise Suppression (DNS) Challenge, our submitted system ranked top-1 for the real-time track in terms of Mean Opinion Score (MOS) with ITU-T P.835 framework
Homotopy model is an excellent tool exploited by diverse research works in the field of machine learning. However, its flexibility is limited due to lack of adaptiveness, i.e., manual fixing or tuning the appropriate homotopy coefficients. To address the problem above, we propose a novel adaptive homotopy framework (AH) in which the Maclaurin duality is employed, such that the homotopy parameters can be adaptively obtained. Accordingly, the proposed AH can be widely utilized to enhance the homotopy-based algorithm. In particular, in this paper, we apply AH to contrastive learning (AHCL) such that it can be effectively transferred from weak-supervised learning (given label priori) to unsupervised learning, where soft labels of contrastive learning are directly and adaptively learned. Accordingly, AHCL has the adaptive ability to extract deep features without any sort of prior information. Consequently, the affinity matrix formulated by the related adaptive labels can be constructed as the deep Laplacian graph that incorporates the topology of deep representations for the inputs. Eventually, extensive experiments on benchmark datasets validate the superiority of our method.