Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yong Zhou

GMM-ResNext: Combining Generative and Discriminative Models for Speaker Verification

Jul 03, 2024

Hui Yan, Zhenchun Lei, Changhong Liu, Yong Zhou

Abstract:With the development of deep learning, many different network architectures have been explored in speaker verification. However, most network architectures rely on a single deep learning architecture, and hybrid networks combining different architectures have been little studied in ASV tasks. In this paper, we propose the GMM-ResNext model for speaker verification. Conventional GMM does not consider the score distribution of each frame feature over all Gaussian components and ignores the relationship between neighboring speech frames. So, we extract the log Gaussian probability features based on the raw acoustic features and use ResNext-based network as the backbone to extract the speaker embedding. GMM-ResNext combines Generative and Discriminative Models to improve the generalization ability of deep learning models and allows one to more easily specify meaningful priors on model parameters. A two-path GMM-ResNext model based on two gender-related GMMs has also been proposed. The Experimental results show that the proposed GMM-ResNext achieves relative improvements of 48.1\% and 11.3\% in EER compared with ResNet34 and ECAPA-TDNN on VoxCeleb1-O test set.

Via

Access Paper or Ask Questions

GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

Jul 02, 2024

Zhenchun Lei, Hui Yan, Changhong Liu, Yong Zhou, Minglei Ma

Figure 1 for GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

Figure 2 for GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

Figure 3 for GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

Figure 4 for GMM-ResNet2: Ensemble of Group ResNet Networks for Synthetic Speech Detection

Abstract:Deep learning models are widely used for speaker recognition and spoofing speech detection. We propose the GMM-ResNet2 for synthesis speech detection. Compared with the previous GMM-ResNet model, GMM-ResNet2 has four improvements. Firstly, the different order GMMs have different capabilities to form smooth approximations to the feature distribution, and multiple GMMs are used to extract multi-scale Log Gaussian Probability features. Secondly, the grouping technique is used to improve the classification accuracy by exposing the group cardinality while reducing both the number of parameters and the training time. The final score is obtained by ensemble of all group classifier outputs using the averaging method. Thirdly, the residual block is improved by including one activation function and one batch normalization layer. Finally, an ensemble-aware loss function is proposed to integrate the independent loss functions of all ensemble members. On the ASVspoof 2019 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.0227 and an EER of 0.79\%. On the ASVspoof 2021 LA task, the GMM-ResNet2 achieves a minimum t-DCF of 0.2362 and an EER of 2.19\%, and represents a relative reductions of 31.4\% and 76.3\% compared with the LFCC-LCNN baseline.

Via

Access Paper or Ask Questions

Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

Apr 02, 2024

Yuanming Shi, Li Zeng, Jingyang Zhu, Yong Zhou, Chunxiao Jiang, Khaled B. Letaief

Figure 1 for Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

Figure 2 for Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

Figure 3 for Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

Figure 4 for Satellite Federated Edge Learning: Architecture Design and Convergence Analysis

Abstract:The proliferation of low-earth-orbit (LEO) satellite networks leads to the generation of vast volumes of remote sensing data which is traditionally transferred to the ground server for centralized processing, raising privacy and bandwidth concerns. Federated edge learning (FEEL), as a distributed machine learning approach, has the potential to address these challenges by sharing only model parameters instead of raw data. Although promising, the dynamics of LEO networks, characterized by the high mobility of satellites and short ground-to-satellite link (GSL) duration, pose unique challenges for FEEL. Notably, frequent model transmission between the satellites and ground incurs prolonged waiting time and large transmission latency. This paper introduces a novel FEEL algorithm, named FEDMEGA, tailored to LEO mega-constellation networks. By integrating inter-satellite links (ISL) for intra-orbit model aggregation, the proposed algorithm significantly reduces the usage of low data rate and intermittent GSL. Our proposed method includes a ring all-reduce based intra-orbit aggregation mechanism, coupled with a network flow-based transmission scheme for global model aggregation, which enhances transmission efficiency. Theoretical convergence analysis is provided to characterize the algorithm performance. Extensive simulations show that our FEDMEGA algorithm outperforms existing satellite FEEL algorithms, exhibiting an approximate 30% improvement in convergence rate.

* 16 pages, 15 figures

Via

Access Paper or Ask Questions

Learning from Reduced Labels for Long-Tailed Data

Mar 25, 2024

Meng Wei, Zhongnian Li, Yong Zhou, Xinzheng Xu

Figure 1 for Learning from Reduced Labels for Long-Tailed Data

Figure 2 for Learning from Reduced Labels for Long-Tailed Data

Figure 3 for Learning from Reduced Labels for Long-Tailed Data

Figure 4 for Learning from Reduced Labels for Long-Tailed Data

Abstract:Long-tailed data is prevalent in real-world classification tasks and heavily relies on supervised information, which makes the annotation process exceptionally labor-intensive and time-consuming. Unfortunately, despite being a common approach to mitigate labeling costs, existing weakly supervised learning methods struggle to adequately preserve supervised information for tail samples, resulting in a decline in accuracy for the tail classes. To alleviate this problem, we introduce a novel weakly supervised labeling setting called Reduced Label. The proposed labeling setting not only avoids the decline of supervised information for the tail samples, but also decreases the labeling costs associated with long-tailed data. Additionally, we propose an straightforward and highly efficient unbiased framework with strong theoretical guarantees to learn from these Reduced Labels. Extensive experiments conducted on benchmark datasets including ImageNet validate the effectiveness of our approach, surpassing the performance of state-of-the-art weakly supervised methods.

* 12 pages, 3 figures

Via

Access Paper or Ask Questions

Determined Multi-Label Learning via Similarity-Based Prompt

Mar 25, 2024

Meng Wei, Zhongnian Li, Peng Ying, Yong Zhou, Xinzheng Xu

Figure 1 for Determined Multi-Label Learning via Similarity-Based Prompt

Figure 2 for Determined Multi-Label Learning via Similarity-Based Prompt

Figure 3 for Determined Multi-Label Learning via Similarity-Based Prompt

Figure 4 for Determined Multi-Label Learning via Similarity-Based Prompt

Abstract:In multi-label classification, each training instance is associated with multiple class labels simultaneously. Unfortunately, collecting the fully precise class labels for each training instance is time- and labor-consuming for real-world applications. To alleviate this problem, a novel labeling setting termed \textit{Determined Multi-Label Learning} (DMLL) is proposed, aiming to effectively alleviate the labeling cost inherent in multi-label tasks. In this novel labeling setting, each training instance is associated with a \textit{determined label} (either "Yes" or "No"), which indicates whether the training instance contains the provided class label. The provided class label is randomly and uniformly selected from the whole candidate labels set. Besides, each training instance only need to be determined once, which significantly reduce the annotation cost of the labeling task for multi-label datasets. In this paper, we theoretically derive an risk-consistent estimator to learn a multi-label classifier from these determined-labeled training data. Additionally, we introduce a similarity-based prompt learning method for the first time, which minimizes the risk-consistent loss of large-scale pre-trained models to learn a supplemental prompt with richer semantic information. Extensive experimental validation underscores the efficacy of our approach, demonstrating superior performance compared to existing state-of-the-art methods.

* 10 pages, 4 figures

Via

Access Paper or Ask Questions

Dual Encoder: Exploiting the Potential of Syntactic and Semantic for Aspect Sentiment Triplet Extraction

Feb 23, 2024

Xiaowei Zhao, Yong Zhou, Xiujuan Xu

Figure 1 for Dual Encoder: Exploiting the Potential of Syntactic and Semantic for Aspect Sentiment Triplet Extraction

Figure 2 for Dual Encoder: Exploiting the Potential of Syntactic and Semantic for Aspect Sentiment Triplet Extraction

Figure 3 for Dual Encoder: Exploiting the Potential of Syntactic and Semantic for Aspect Sentiment Triplet Extraction

Figure 4 for Dual Encoder: Exploiting the Potential of Syntactic and Semantic for Aspect Sentiment Triplet Extraction

Abstract:Aspect Sentiment Triple Extraction (ASTE) is an emerging task in fine-grained sentiment analysis. Recent studies have employed Graph Neural Networks (GNN) to model the syntax-semantic relationships inherent in triplet elements. However, they have yet to fully tap into the vast potential of syntactic and semantic information within the ASTE task. In this work, we propose a \emph{Dual Encoder: Exploiting the potential of Syntactic and Semantic} model (D2E2S), which maximizes the syntactic and semantic relationships among words. Specifically, our model utilizes a dual-channel encoder with a BERT channel to capture semantic information, and an enhanced LSTM channel for comprehensive syntactic information capture. Subsequently, we introduce the heterogeneous feature interaction module to capture intricate interactions between dependency syntax and attention semantics, and to dynamically select vital nodes. We leverage the synergy of these modules to harness the significant potential of syntactic and semantic information in ASTE tasks. Testing on public benchmarks, our D2E2S model surpasses the current state-of-the-art(SOTA), demonstrating its effectiveness.

* Accepted by COLING 2024

Via

Access Paper or Ask Questions

Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment Analysis

Feb 13, 2024

Xiaowei Zhao, Yong Zhou, Xiujuan Xu, Yu Liu

Figure 1 for Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment Analysis

Figure 2 for Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment Analysis

Figure 3 for Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment Analysis

Figure 4 for Extensible Multi-Granularity Fusion Network for Aspect-based Sentiment Analysis

Abstract:Aspect-based Sentiment Analysis (ABSA) evaluates sentiment expressions within a text to comprehend sentiment information. Previous studies integrated external knowledge, such as knowledge graphs, to enhance the semantic features in ABSA models. Recent research has examined the use of Graph Neural Networks (GNNs) on dependency and constituent trees for syntactic analysis. With the ongoing development of ABSA, more innovative linguistic and structural features are being incorporated (e.g. latent graph), but this also introduces complexity and confusion. As of now, a scalable framework for integrating diverse linguistic and structural features into ABSA does not exist. This paper presents the Extensible Multi-Granularity Fusion (EMGF) network, which integrates information from dependency and constituent syntactic, attention semantic , and external knowledge graphs. EMGF, equipped with multi-anchor triplet learning and orthogonal projection, efficiently harnesses the combined potential of each granularity feature and their synergistic interactions, resulting in a cumulative effect without additional computational expenses. Experimental findings on SemEval 2014 and Twitter datasets confirm EMGF's superiority over existing ABSA methods.

* 8 pages, 4 figures

Via

Access Paper or Ask Questions

A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

Dec 10, 2023

Linxi Zhao, Jiankai Tang, Dongyu Chen, Xiaohong Liu, Yong Zhou, Guangyu Wang, Yuntao Wang

Figure 1 for A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

Figure 2 for A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

Figure 3 for A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

Figure 4 for A Comprehensive Dataset and Automated Pipeline for Nailfold Capillary Analysis

Abstract:Nailfold capillaroscopy is a well-established method for assessing health conditions, but the untapped potential of automated medical image analysis using machine learning remains despite recent advancements. In this groundbreaking study, we present a pioneering effort in constructing a comprehensive dataset-321 images, 219 videos, 68 clinic reports, with expert annotations-that serves as a crucial resource for training deep-learning models. Leveraging this dataset, we propose an end-to-end nailfold capillary analysis pipeline capable of automatically detecting and measuring diverse morphological and dynamic features. Experimental results demonstrate sub-pixel measurement accuracy and 90% accuracy in predicting abnormality portions, highlighting its potential for advancing quantitative medical research and enabling pervasive computing in healthcare. We've shared our open-source codes and data (available at https://github.com/THU-CS-PI-LAB/ANFC-Automated-Nailfold-Capillary) to contribute to transformative progress in computational medical image analysis.

* Dataset, code, pretrained models: https://github.com/THU-CS-PI-LAB/ANFC-Automated-Nailfold-Capillary

Via

Access Paper or Ask Questions

Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images

Dec 02, 2023

Jiaqi Zhao, Zeyu Ding, Yong Zhou, Hancheng Zhu, Wenliang Du, Rui Yao, Abdulmotaleb El Saddik

Figure 1 for Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images

Figure 2 for Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images

Figure 3 for Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images

Figure 4 for Efficient Decoder for End-to-End Oriented Object Detection in Remote Sensing Images

Abstract:Object instances in remote sensing images often distribute with multi-orientations, varying scales, and dense distribution. These issues bring challenges to end-to-end oriented object detectors including multi-scale features alignment and a large number of queries. To address these limitations, we propose an end-to-end oriented detector equipped with an efficient decoder, which incorporates two technologies, Rotated RoI attention (RRoI attention) and Selective Distinct Queries (SDQ). Specifically, RRoI attention effectively focuses on oriented regions of interest through a cross-attention mechanism and aligns multi-scale features. SDQ collects queries from intermediate decoder layers and then filters similar queries to obtain distinct queries. The proposed SDQ can facilitate the optimization of one-to-one label assignment, without introducing redundant initial queries or extra auxiliary branches. Extensive experiments on five datasets demonstrate the effectiveness of our method. Notably, our method achieves state-of-the-art performance on DIOR-R (67.31% mAP), DOTA-v1.5 (67.43% mAP), and DOTA-v2.0 (53.28% mAP) with the ResNet50 backbone.

* 11 pages, 7 figures, 13 tables

Via

Access Paper or Ask Questions

Over-the-Air Federated Learning and Optimization

Oct 16, 2023

Jingyang Zhu, Yuanming Shi, Yong Zhou, Chunxiao Jiang, Wei Chen, Khaled B. Letaief

Abstract:Federated learning (FL), as an emerging distributed machine learning paradigm, allows a mass of edge devices to collaboratively train a global model while preserving privacy. In this tutorial, we focus on FL via over-the-air computation (AirComp), which is proposed to reduce the communication overhead for FL over wireless networks at the cost of compromising in the learning performance due to model aggregation error arising from channel fading and noise. We first provide a comprehensive study on the convergence of AirComp-based FedAvg (AirFedAvg) algorithms under both strongly convex and non-convex settings with constant and diminishing learning rates in the presence of data heterogeneity. Through convergence and asymptotic analysis, we characterize the impact of aggregation error on the convergence bound and provide insights for system design with convergence guarantees. Then we derive convergence rates for AirFedAvg algorithms for strongly convex and non-convex objectives. For different types of local updates that can be transmitted by edge devices (i.e., local model, gradient, and model difference), we reveal that transmitting local model in AirFedAvg may cause divergence in the training procedure. In addition, we consider more practical signal processing schemes to improve the communication efficiency and further extend the convergence analysis to different forms of model aggregation error caused by these signal processing schemes. Extensive simulation results under different settings of objective functions, transmitted local information, and communication schemes verify the theoretical conclusions.

* 31 pages, 11 figures

Via

Access Paper or Ask Questions