Alert button
Picture for Haitao Wang

Haitao Wang

Alert button

AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

Aug 25, 2023
Zhaohui Li, Haitao Wang, Xinghua Jiang

Figure 1 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Figure 2 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Figure 3 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Figure 4 for AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes

We propose a method named AudioFormer,which learns audio feature representations through the acquisition of discrete acoustic codes and subsequently fine-tunes them for audio classification tasks. Initially,we introduce a novel perspective by considering the audio classification task as a form of natural language understanding (NLU). Leveraging an existing neural audio codec model,we generate discrete acoustic codes and utilize them to train a masked language model (MLM),thereby obtaining audio feature representations. Furthermore,we pioneer the integration of a Multi-Positive sample Contrastive (MPC) learning approach. This method enables the learning of joint representations among multiple discrete acoustic codes within the same audio input. In our experiments,we treat discrete acoustic codes as textual data and train a masked language model using a cloze-like methodology,ultimately deriving high-quality audio representations. Notably,the MPC learning technique effectively captures collaborative representations among distinct positive samples. Our research outcomes demonstrate that AudioFormer attains significantly improved performance compared to prevailing monomodal audio classification models across multiple datasets,and even outperforms audio-visual multimodal classification models on select datasets. Specifically,our approach achieves remarkable results on datasets including AudioSet (2M,20K),and FSD50K,with performance scores of 53.9,45.1,and 65.6,respectively. We have openly shared both the code and models: https://github.com/LZH-0225/AudioFormer.git.

* Need to supplement more detailed experiments 
Viaarxiv icon

MSMix:An Interpolation-Based Text Data Augmentation Method Manifold Swap Mixup

May 31, 2023
Mao Ye, Haitao Wang, Zheqian Chen

Figure 1 for MSMix:An Interpolation-Based Text Data Augmentation Method Manifold Swap Mixup
Figure 2 for MSMix:An Interpolation-Based Text Data Augmentation Method Manifold Swap Mixup
Figure 3 for MSMix:An Interpolation-Based Text Data Augmentation Method Manifold Swap Mixup
Figure 4 for MSMix:An Interpolation-Based Text Data Augmentation Method Manifold Swap Mixup

To solve the problem of poor performance of deep neural network models due to insufficient data, a simple yet effective interpolation-based data augmentation method is proposed: MSMix (Manifold Swap Mixup). This method feeds two different samples to the same deep neural network model, and then randomly select a specific layer and partially replace hidden features at that layer of one of the samples by the counterpart of the other. The mixed hidden features are fed to the model and go through the rest of the network. Two different selection strategies are also proposed to obtain richer hidden representation. Experiments are conducted on three Chinese intention recognition datasets, and the results show that the MSMix method achieves better results than other methods in both full-sample and small-sample configurations.

Viaarxiv icon

Statistical Dataset Evaluation: Reliability, Difficulty, and Validity

Dec 19, 2022
Chengwen Wang, Qingxiu Dong, Xiaochen Wang, Haitao Wang, Zhifang Sui

Figure 1 for Statistical Dataset Evaluation: Reliability, Difficulty, and Validity
Figure 2 for Statistical Dataset Evaluation: Reliability, Difficulty, and Validity
Figure 3 for Statistical Dataset Evaluation: Reliability, Difficulty, and Validity
Figure 4 for Statistical Dataset Evaluation: Reliability, Difficulty, and Validity

Datasets serve as crucial training resources and model performance trackers. However, existing datasets have exposed a plethora of problems, inducing biased models and unreliable evaluation results. In this paper, we propose a model-agnostic dataset evaluation framework for automatic dataset quality evaluation. We seek the statistical properties of the datasets and address three fundamental dimensions: reliability, difficulty, and validity, following a classical testing theory. Taking the Named Entity Recognition (NER) datasets as a case study, we introduce $9$ statistical metrics for a statistical dataset evaluation framework. Experimental results and human evaluation validate that our evaluation framework effectively assesses various aspects of the dataset quality. Furthermore, we study how the dataset scores on our statistical metrics affect the model performance, and appeal for dataset quality evaluation or targeted dataset improvement before training or testing models.

Viaarxiv icon

Learning Symbolic Operators: A Neurosymbolic Solution for Autonomous Disassembly of Electric Vehicle Battery

Jun 07, 2022
Yidong Du, Wenshuo Wang, Zhigang Wang, Hua Yang, Haitao Wang, Yinghao Cai, Ming Chen

Figure 1 for Learning Symbolic Operators: A Neurosymbolic Solution for Autonomous Disassembly of Electric Vehicle Battery
Figure 2 for Learning Symbolic Operators: A Neurosymbolic Solution for Autonomous Disassembly of Electric Vehicle Battery
Figure 3 for Learning Symbolic Operators: A Neurosymbolic Solution for Autonomous Disassembly of Electric Vehicle Battery
Figure 4 for Learning Symbolic Operators: A Neurosymbolic Solution for Autonomous Disassembly of Electric Vehicle Battery

The booming of electric vehicles demands efficient battery disassembly for recycling to be environment-friendly. Currently, battery disassembly is still primarily done by humans, probably assisted by robots, due to the unstructured environment and high uncertainties. It is highly desirable to design autonomous solutions to improve work efficiency and lower human risks in high voltage and toxic environments. This paper proposes a novel neurosymbolic method, which augments the traditional Variational Autoencoder (VAE) model to learn symbolic operators based on raw sensory inputs and their relationships. The symbolic operators include a probabilistic state symbol grounding model and a state transition matrix for predicting states after each execution to enable autonomous task and motion planning. At last, the method's feasibility is verified through test results.

Viaarxiv icon

Autonomous Electric Vehicle Battery Disassembly Based on NeuroSymbolic Computing

May 16, 2022
Hengwei Zhang, Hua Yang, Haitao Wang, Zhigang Wang, Shengmin Zhang, Ming Chen

Figure 1 for Autonomous Electric Vehicle Battery Disassembly Based on NeuroSymbolic Computing
Figure 2 for Autonomous Electric Vehicle Battery Disassembly Based on NeuroSymbolic Computing
Figure 3 for Autonomous Electric Vehicle Battery Disassembly Based on NeuroSymbolic Computing
Figure 4 for Autonomous Electric Vehicle Battery Disassembly Based on NeuroSymbolic Computing

The booming of electric vehicles demands efficient battery disassembly for recycling to be environment-friendly. Due to the unstructured environment and high uncertainties, battery disassembly is still primarily done by humans, probably assisted by robots. It is highly desirable to design autonomous solutions to improve work efficiency and lower human risks in high voltage and toxic environments. This paper proposes a novel framework of the NeuroSymbolic task and motion planning method to disassemble batteries in an unstructured environment using robots automatically. It enables robots to independently locate and disassemble battery bolts, with or without obstacles. This study not only provides a solution for intelligently disassembling electric vehicle batteries but also verifies its feasibility through a set of test results with the robot accomplishing the disassembly tasks in a complex and dynamic environment.

* Accepted to IntelliSys 2022(Intelligent Systems Conference),15 pages with 6 figures 
Viaarxiv icon

AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data

Aug 28, 2021
Ruoqi Wang, Ziwang Huang, Haitao Wang, Hejun Wu

Figure 1 for AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data
Figure 2 for AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data
Figure 3 for AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data
Figure 4 for AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data

The use of multi-modal data such as the combination of whole slide images (WSIs) and gene expression data for survival analysis can lead to more accurate survival predictions. Previous multi-modal survival models are not able to efficiently excavate the intrinsic information within each modality. Moreover, despite experimental results show that WSIs provide more effective information than gene expression data, previous methods regard the information from different modalities as similarly important so they cannot flexibly utilize the potential connection between the modalities. To address the above problems, we propose a new asymmetrical multi-modal method, termed as AMMASurv. Specifically, we design an asymmetrical multi-modal attention mechanism (AMMA) in Transformer encoder for multi-modal data to enable a more flexible multi-modal information fusion for survival prediction. Different from previous works, AMMASurv can effectively utilize the intrinsic information within every modality and flexibly adapts to the modalities of different importance. Extensive experiments are conducted to validate the effectiveness of the proposed model. Encouraging results demonstrate the superiority of our method over other state-of-the-art methods.

* 8 pages 
Viaarxiv icon

Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction

Oct 30, 2020
Tong Zhu, Haitao Wang, Junjie Yu, Xiabing Zhou, Wenliang Chen, Wei Zhang, Min Zhang

Figure 1 for Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction
Figure 2 for Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction
Figure 3 for Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction
Figure 4 for Towards Accurate and Consistent Evaluation: A Dataset for Distantly-Supervised Relation Extraction

In recent years, distantly-supervised relation extraction has achieved a certain success by using deep neural networks. Distant Supervision (DS) can automatically generate large-scale annotated data by aligning entity pairs from Knowledge Bases (KB) to sentences. However, these DS-generated datasets inevitably have wrong labels that result in incorrect evaluation scores during testing, which may mislead the researchers. To solve this problem, we build a new dataset NYTH, where we use the DS-generated data as training data and hire annotators to label test data. Compared with the previous datasets, NYT-H has a much larger test set and then we can perform more accurate and consistent evaluation. Finally, we present the experimental results of several widely used systems on NYT-H. The experimental results show that the ranking lists of the comparison systems on the DS-labelled test data and human-annotated test data are different. This indicates that our human-annotated data is necessary for evaluation of distantly-supervised relation extraction.

* This paper has been accepted for publication in COLING2020 
Viaarxiv icon