Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Xin Wang

Jeff

A Benchmark for Multi-speaker Anonymization

Jul 08, 2024

Xiaoxiao Miao, Ruijie Tao, Chang Zeng, Xin Wang

Figure 1 for A Benchmark for Multi-speaker Anonymization

Figure 2 for A Benchmark for Multi-speaker Anonymization

Figure 3 for A Benchmark for Multi-speaker Anonymization

Figure 4 for A Benchmark for Multi-speaker Anonymization

Abstract:Privacy-preserving voice protection approaches primarily suppress privacy-related information derived from paralinguistic attributes while preserving the linguistic content. Existing solutions focus on single-speaker scenarios. However, they lack practicality for real-world applications, i.e., multi-speaker scenarios. In this paper, we present an initial attempt to provide a multi-speaker anonymization benchmark by defining the task and evaluation protocol, proposing benchmarking solutions, and discussing the privacy leakage of overlapping conversations. Specifically, ideal multi-speaker anonymization should preserve the number of speakers and the turn-taking structure of the conversation, ensuring accurate context conveyance while maintaining privacy. To achieve that, a cascaded system uses speaker diarization to aggregate the speech of each speaker and speaker anonymization to conceal speaker privacy and preserve speech content. Additionally, we propose two conversation-level speaker vector anonymization methods to improve the utility further. Both methods aim to make the original and corresponding pseudo-speaker identities of each speaker unlinkable while preserving or even improving the distinguishability among pseudo-speakers in a conversation. The first method minimizes the differential similarity across speaker pairs in the original and anonymized conversations to maintain original speaker relationships in the anonymized version. The other method minimizes the aggregated similarity across anonymized speakers to achieve better differentiation between speakers. Experiments conducted on both non-overlap simulated and real-world datasets demonstrate the effectiveness of the multi-speaker anonymization system with the proposed speaker anonymizers. Additionally, we analyzed overlapping speech regarding privacy leakage and provide potential solutions.

Via

Access Paper or Ask Questions

Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

Jul 03, 2024

Luyi Han, Tao Tan, Tianyu Zhang, Xin Wang, Yuan Gao, Chunyao Lu, Xinglong Liang, Haoran Dou, Yunzhi Huang, Ritse Mann

Figure 1 for Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

Figure 2 for Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

Figure 3 for Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

Figure 4 for Non-Adversarial Learning: Vector-Quantized Common Latent Space for Multi-Sequence MRI

Abstract:Adversarial learning helps generative models translate MRI from source to target sequence when lacking paired samples. However, implementing MRI synthesis with adversarial learning in clinical settings is challenging due to training instability and mode collapse. To address this issue, we leverage intermediate sequences to estimate the common latent space among multi-sequence MRI, enabling the reconstruction of distinct sequences from the common latent space. We propose a generative model that compresses discrete representations of each sequence to estimate the Gaussian distribution of vector-quantized common (VQC) latent space between multiple sequences. Moreover, we improve the latent space consistency with contrastive learning and increase model stability by domain augmentation. Experiments using BraTS2021 dataset show that our non-adversarial model outperforms other GAN-based methods, and VQC latent space aids our model to achieve (1) anti-interference ability, which can eliminate the effects of noise, bias fields, and artifacts, and (2) solid semantic representation ability, with the potential of one-shot segmentation. Our code is publicly available.

Via

Access Paper or Ask Questions

52B to 1T: Lessons Learned via Tele-FLM Series

Jul 03, 2024

Xiang Li, Yiqun Yao, Xin Jiang, Xuezhi Fang, Chao Wang, Xinzhang Liu, Zihan Wang, Yu Zhao, Xin Wang, Yuyao Huang(+10 more)

Figure 1 for 52B to 1T: Lessons Learned via Tele-FLM Series

Figure 2 for 52B to 1T: Lessons Learned via Tele-FLM Series

Figure 3 for 52B to 1T: Lessons Learned via Tele-FLM Series

Figure 4 for 52B to 1T: Lessons Learned via Tele-FLM Series

Abstract:Large Language Models (LLMs) represent a significant stride toward Artificial General Intelligence. As scaling laws underscore the potential of increasing model sizes, the academic community has intensified its investigations into LLMs with capacities exceeding 50 billion parameters. This technical report builds on our prior work with Tele-FLM (also known as FLM-2), a publicly available 52-billion-parameter model. We delve into two primary areas: we first discuss our observation of Supervised Fine-tuning (SFT) on Tele-FLM-52B, which supports the "less is more" approach for SFT data construction; second, we demonstrate our experiments and analyses on the best practices for progressively growing a model from 52 billion to 102 billion, and subsequently to 1 trillion parameters. We will open-source a 1T model checkpoint, namely Tele-FLM-1T, to advance further training and research.

* For the Tele-FLM-52B tech report, see also 2404.16645

Via

Access Paper or Ask Questions

Large Language Model Enhanced Knowledge Representation Learning: A Survey

Jul 01, 2024

Xin Wang, Zirui Chen, Haofen Wang, Leong Hou U, Zhao Li, Wenbin Guo

Figure 1 for Large Language Model Enhanced Knowledge Representation Learning: A Survey

Figure 2 for Large Language Model Enhanced Knowledge Representation Learning: A Survey

Figure 3 for Large Language Model Enhanced Knowledge Representation Learning: A Survey

Figure 4 for Large Language Model Enhanced Knowledge Representation Learning: A Survey

Abstract:The integration of Large Language Models (LLMs) with Knowledge Representation Learning (KRL) signifies a pivotal advancement in the field of artificial intelligence, enhancing the ability to capture and utilize complex knowledge structures. This synergy leverages the advanced linguistic and contextual understanding capabilities of LLMs to improve the accuracy, adaptability, and efficacy of KRL, thereby expanding its applications and potential. Despite the increasing volume of research focused on embedding LLMs within the domain of knowledge representation, a thorough review that examines the fundamental components and processes of these enhanced models is conspicuously absent. Our survey addresses this by categorizing these models based on three distinct Transformer architectures, and by analyzing experimental data from various KRL downstream tasks to evaluate the strengths and weaknesses of each approach. Finally, we identify and explore potential future research directions in this emerging yet underexplored domain, proposing pathways for continued progress.

Via

Access Paper or Ask Questions

PM-VIS+: High-Performance Video Instance Segmentation without Video Annotation

Jun 28, 2024

Zhangjing Yang, Dun Liu, Xin Wang, Zhe Li, Barathwaj Anandan, Yi Wu

Abstract:Video instance segmentation requires detecting, segmenting, and tracking objects in videos, typically relying on costly video annotations. This paper introduces a method that eliminates video annotations by utilizing image datasets. The PM-VIS algorithm is adapted to handle both bounding box and instance-level pixel annotations dynamically. We introduce ImageNet-bbox to supplement missing categories in video datasets and propose the PM-VIS+ algorithm to adjust supervision based on annotation types. To enhance accuracy, we use pseudo masks and semi-supervised optimization techniques on unannotated video data. This method achieves high video instance segmentation performance without manual video annotations, offering a cost-effective solution and new perspectives for video instance segmentation applications. The code will be available in https://github.com/ldknight/PM-VIS-plus

* MIPR 2024

Via

Access Paper or Ask Questions

Edge-DIRECT: A Deep Reinforcement Learning-based Method for Solving Heterogeneous Electric Vehicle Routing Problem with Time Window Constraints

Jun 28, 2024

Arash Mozhdehi, Mahdi Mohammadizadeh, Xin Wang

Abstract:In response to carbon-neutral policies in developed countries, electric vehicles route optimization has gained importance for logistics companies. With the increasing focus on customer expectations and the shift towards more customer-oriented business models, the integration of delivery time-windows has become essential in logistics operations. Recognizing the critical nature of these developments, this article studies the heterogeneous electric vehicle routing problem with time-window constraints (HEVRPTW). To solve this variant of vehicle routing problem (VRP), we propose a DRL-based approach, named Edge-enhanced Dual attentIon encoderR and feature-EnhanCed dual aTtention decoder (Edge-DIRECT). Edge-DIRECT features an extra graph representation, the node connectivity of which is based on the overlap of customer time-windows. Edge-DIRECT's self-attention encoding mechanism is enhanced by exploiting the energy consumption and travel time between the locations. To effectively account for the heterogeneity of the EVs' fleet, a dual attention decoder has been introduced. Experimental results based on two real-world datasets reveal that Edge-DIRECT outperforms a state-of-the-art DRL-based method and a well-established heuristic approach in solution quality and execution time. Furthermore, it exhibits competitive performance when compared to another leading heuristic method.

Via

Access Paper or Ask Questions

Robustly Optimized Deep Feature Decoupling Network for Fatty Liver Diseases Detection

Jun 25, 2024

Peng Huang, Shu Hu, Bo Peng, Jiashu Zhang, Xi Wu, Xin Wang

Abstract:Current medical image classification efforts mainly aim for higher average performance, often neglecting the balance between different classes. This can lead to significant differences in recognition accuracy between classes and obvious recognition weaknesses. Without the support of massive data, deep learning faces challenges in fine-grained classification of fatty liver. In this paper, we propose an innovative deep learning framework that combines feature decoupling and adaptive adversarial training. Firstly, we employ two iteratively compressed decouplers to supervised decouple common features and specific features related to fatty liver in abdominal ultrasound images. Subsequently, the decoupled features are concatenated with the original image after transforming the color space and are fed into the classifier. During adversarial training, we adaptively adjust the perturbation and balance the adversarial strength by the accuracy of each class. The model will eliminate recognition weaknesses by correctly classifying adversarial samples, thus improving recognition robustness. Finally, the accuracy of our method improved by 4.16%, achieving 82.95%. As demonstrated by extensive experiments, our method is a generalized learning framework that can be directly used to eliminate the recognition weaknesses of any classifier while improving its average performance. Code is available at https://github.com/HP-ML/MICCAI2024.

* MICCAI 2024

Via

Access Paper or Ask Questions

Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Jun 25, 2024

Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, Wenwu Zhu

Figure 1 for Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Figure 2 for Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Figure 3 for Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Figure 4 for Q-DiT: Accurate Post-Training Quantization for Diffusion Transformers

Abstract:Recent advancements in diffusion models, particularly the trend of architectural transformation from UNet-based Diffusion to Diffusion Transformer (DiT), have significantly improved the quality and scalability of image synthesis. Despite the incredible generative quality, the large computational requirements of these large-scale models significantly hinder the deployments in real-world scenarios. Post-training Quantization (PTQ) offers a promising solution by compressing model sizes and speeding up inference for the pretrained models while eliminating model retraining. However, we have observed the existing PTQ frameworks exclusively designed for both ViT and conventional Diffusion models fall into biased quantization and result in remarkable performance degradation. In this paper, we find that the DiTs typically exhibit considerable variance in terms of both weight and activation, which easily runs out of the limited numerical representations. To address this issue, we devise Q-DiT, which seamlessly integrates three techniques: fine-grained quantization to manage substantial variance across input channels of weights and activations, an automatic search strategy to optimize the quantization granularity and mitigate redundancies, and dynamic activation quantization to capture the activation changes across timesteps. Extensive experiments on the ImageNet dataset demonstrate the effectiveness of the proposed Q-DiT. Specifically, when quantizing DiT-XL/2 to W8A8 on ImageNet 256x256, Q-DiT achieves a remarkable reduction in FID by 1.26 compared to the baseline. Under a W4A8 setting, it maintains high fidelity in image generation, showcasing only a marginal increase in FID and setting a new benchmark for efficient, high-quality quantization in diffusion transformers. Code is available at \href{https://github.com/Juanerx/Q-DiT}{https://github.com/Juanerx/Q-DiT}.

Via

Access Paper or Ask Questions

Meta-GCN: A Dynamically Weighted Loss Minimization Method for Dealing with the Data Imbalance in Graph Neural Networks

Jun 24, 2024

Mahdi Mohammadizadeh, Arash Mozhdehi, Yani Ioannou, Xin Wang

Abstract:Although many real-world applications, such as disease prediction, and fault detection suffer from class imbalance, most existing graph-based classification methods ignore the skewness of the distribution of classes; therefore, tend to be biased towards the majority class(es). Conventional methods typically tackle this problem through the assignment of weights to each one of the class samples based on a function of their loss, which can lead to over-fitting on outliers. In this paper, we propose a meta-learning algorithm, named Meta-GCN, for adaptively learning the example weights by simultaneously minimizing the unbiased meta-data set loss and optimizing the model weights through the use of a small unbiased meta-data set. Through experiments, we have shown that Meta-GCN outperforms state-of-the-art frameworks and other baselines in terms of accuracy, the area under the receiver operating characteristic (AUC-ROC) curve, and macro F1-Score for classification tasks on two different datasets.

Via

Access Paper or Ask Questions

Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

Jun 24, 2024

Beini Xie, Heng Chang, Ziwei Zhang, Zeyang Zhang, Simin Wu, Xin Wang, Yuan Meng, Wenwu Zhu

Figure 1 for Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

Figure 2 for Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

Figure 3 for Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

Figure 4 for Towards Lightweight Graph Neural Network Search with Curriculum Graph Sparsification

Abstract:Graph Neural Architecture Search (GNAS) has achieved superior performance on various graph-structured tasks. However, existing GNAS studies overlook the applications of GNAS in resource-constraint scenarios. This paper proposes to design a joint graph data and architecture mechanism, which identifies important sub-architectures via the valuable graph data. To search for optimal lightweight Graph Neural Networks (GNNs), we propose a Lightweight Graph Neural Architecture Search with Graph SparsIfication and Network Pruning (GASSIP) method. In particular, GASSIP comprises an operation-pruned architecture search module to enable efficient lightweight GNN search. Meanwhile, we design a novel curriculum graph data sparsification module with an architecture-aware edge-removing difficulty measurement to help select optimal sub-architectures. With the aid of two differentiable masks, we iteratively optimize these two modules to efficiently search for the optimal lightweight architecture. Extensive experiments on five benchmarks demonstrate the effectiveness of GASSIP. Particularly, our method achieves on-par or even higher node classification performance with half or fewer model parameters of searched GNNs and a sparser graph.

* Accepted by KDD 2024. The two first authors made equal contributions

Via

Access Paper or Ask Questions