Alert button
Picture for Yi Zhong

Yi Zhong

Alert button

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

Sep 22, 2023
Yongyi Zang, Yi Zhong, Frank Cwitkowitz, Zhiyao Duan

Guitar tablature is a form of music notation widely used among guitarists. It captures not only the musical content of a piece, but also its implementation and ornamentation on the instrument. Guitar Tablature Transcription (GTT) is an important task with broad applications in music education and entertainment. Existing datasets are limited in size and scope, causing state-of-the-art GTT models trained on such datasets to suffer from overfitting and to fail in generalization across datasets. To address this issue, we developed a methodology for synthesizing SynthTab, a large-scale guitar tablature transcription dataset using multiple commercial acoustic and electric guitar plugins. This dataset is built on tablatures from DadaGP, which offers a vast collection and the degree of specificity we wish to transcribe. The proposed synthesis pipeline produces audio which faithfully adheres to the original fingerings, styles, and techniques specified in the tablature with diverse timbre. Experiments show that pre-training state-of-the-art GTT model on SynthTab improves transcription accuracy in same-dataset tests. More importantly, it significantly mitigates overfitting problems of GTT models in cross-dataset evaluation.

* Submitted to ICASSP2024 
Viaarxiv icon

Music Generation based on Generative Adversarial Networks with Transformer

Sep 16, 2023
Ziyi Jiang, Yi Zhong, Ruoxue Wu, Zhenghan Chen, Xiaoxuan Liang

Autoregressive models based on Transformers have become the prevailing approach for generating music compositions that exhibit comprehensive musical structure. These models are typically trained by minimizing the negative log-likelihood (NLL) of the observed sequence in an autoregressive manner. However, when generating long sequences, the quality of samples from these models tends to significantly deteriorate due to exposure bias. To address this issue, we leverage classifiers trained to differentiate between real and sampled sequences to identify these failures. This observation motivates our exploration of adversarial losses as a complement to the NLL objective. We employ a pre-trained Span-BERT model as the discriminator in the Generative Adversarial Network (GAN) framework, which enhances training stability in our experiments. To optimize discrete sequences within the GAN framework, we utilize the Gumbel-Softmax trick to obtain a differentiable approximation of the sampling process. Additionally, we partition the sequences into smaller chunks to ensure that memory constraints are met. Through human evaluations and the introduction of a novel discriminative metric, we demonstrate that our approach outperforms a baseline model trained solely on likelihood maximization.

* Submitted to ICASSP2024 
Viaarxiv icon

Incorporating Neuro-Inspired Adaptability for Continual Learning in Artificial Intelligence

Aug 29, 2023
Liyuan Wang, Xingxing Zhang, Qian Li, Mingtian Zhang, Hang Su, Jun Zhu, Yi Zhong

Figure 1 for Incorporating Neuro-Inspired Adaptability for Continual Learning in Artificial Intelligence
Figure 2 for Incorporating Neuro-Inspired Adaptability for Continual Learning in Artificial Intelligence
Figure 3 for Incorporating Neuro-Inspired Adaptability for Continual Learning in Artificial Intelligence
Figure 4 for Incorporating Neuro-Inspired Adaptability for Continual Learning in Artificial Intelligence

Continual learning aims to empower artificial intelligence (AI) with strong adaptability to the real world. For this purpose, a desirable solution should properly balance memory stability with learning plasticity, and acquire sufficient compatibility to capture the observed distributions. Existing advances mainly focus on preserving memory stability to overcome catastrophic forgetting, but remain difficult to flexibly accommodate incremental changes as biological intelligence (BI) does. By modeling a robust Drosophila learning system that actively regulates forgetting with multiple learning modules, here we propose a generic approach that appropriately attenuates old memories in parameter distributions to improve learning plasticity, and accordingly coordinates a multi-learner architecture to ensure solution compatibility. Through extensive theoretical and empirical validation, our approach not only clearly enhances the performance of continual learning, especially over synaptic regularization methods in task-incremental settings, but also potentially advances the understanding of neurological adaptive mechanisms, serving as a novel paradigm to progress AI and BI together.

Viaarxiv icon

Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images

Jul 08, 2023
Yi Zhong, Mengqiu Xu, Kongming Liang, Kaixin Chen, Ming Wu

Figure 1 for Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images
Figure 2 for Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images
Figure 3 for Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images
Figure 4 for Ariadne's Thread:Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images

Segmentation of the infected areas of the lung is essential for quantifying the severity of lung disease like pulmonary infections. Existing medical image segmentation methods are almost uni-modal methods based on image. However, these image-only methods tend to produce inaccurate results unless trained with large amounts of annotated data. To overcome this challenge, we propose a language-driven segmentation method that uses text prompt to improve to the segmentation result. Experiments on the QaTa-COV19 dataset indicate that our method improves the Dice score by 6.09% at least compared to the uni-modal methods. Besides, our extended study reveals the flexibility of multi-modal methods in terms of the information granularity of text and demonstrates that multi-modal methods have a significant advantage over image-only methods in terms of the size of training data required.

* Provisional Acceptance by MICCAI 2023 
Viaarxiv icon

EE-TTS: Emphatic Expressive TTS with Linguistic Information

May 20, 2023
Yi Zhong, Chen Zhang, Xule Liu, Chenxi Sun, Weishan Deng, Haifeng Hu, Zhongqian Sun

Figure 1 for EE-TTS: Emphatic Expressive TTS with Linguistic Information
Figure 2 for EE-TTS: Emphatic Expressive TTS with Linguistic Information
Figure 3 for EE-TTS: Emphatic Expressive TTS with Linguistic Information
Figure 4 for EE-TTS: Emphatic Expressive TTS with Linguistic Information

While Current TTS systems perform well in synthesizing high-quality speech, producing highly expressive speech remains a challenge. Emphasis, as a critical factor in determining the expressiveness of speech, has attracted more attention nowadays. Previous works usually enhance the emphasis by adding intermediate features, but they can not guarantee the overall expressiveness of the speech. To resolve this matter, we propose Emphatic Expressive TTS (EE-TTS), which leverages multi-level linguistic information from syntax and semantics. EE-TTS contains an emphasis predictor that can identify appropriate emphasis positions from text and a conditioned acoustic model to synthesize expressive speech with emphasis and linguistic information. Experimental results indicate that EE-TTS outperforms baseline with MOS improvements of 0.49 and 0.67 in expressiveness and naturalness. EE-TTS also shows strong generalization across different datasets according to AB test results.

* Accepted by INTERSPEECH2023 
Viaarxiv icon

Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs

Feb 07, 2023
Yu Duan, Zhongfan Jia, Qian Li, Yi Zhong, Kaisheng Ma

Figure 1 for Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs
Figure 2 for Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs
Figure 3 for Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs
Figure 4 for Hebbian and Gradient-based Plasticity Enables Robust Memory and Rapid Learning in RNNs

Rapidly learning from ongoing experiences and remembering past events with a flexible memory system are two core capacities of biological intelligence. While the underlying neural mechanisms are not fully understood, various evidence supports that synaptic plasticity plays a critical role in memory formation and fast learning. Inspired by these results, we equip Recurrent Neural Networks (RNNs) with plasticity rules to enable them to adapt their parameters according to ongoing experiences. In addition to the traditional local Hebbian plasticity, we propose a global, gradient-based plasticity rule, which allows the model to evolve towards its self-determined target. Our models show promising results on sequential and associative memory tasks, illustrating their ability to robustly form and retain memories. In the meantime, these models can cope with many challenging few-shot learning problems. Comparing different plasticity rules under the same framework shows that Hebbian plasticity is well-suited for several memory and associative learning tasks; however, it is outperformed by gradient-based plasticity on few-shot regression tasks which require the model to infer the underlying mapping. Code is available at https://github.com/yuvenduan/PlasticRNNs.

* Published as a conference paper at ICLR 2023 
Viaarxiv icon

Interference-Limited Ultra-Reliable and Low-Latency Communications: Graph Neural Networks or Stochastic Geometry?

Jul 19, 2022
Yuhong Liu, Changyang She, Yi Zhong, Wibowo Hardjawana, Fu-Chun Zheng, Branka Vucetic

Figure 1 for Interference-Limited Ultra-Reliable and Low-Latency Communications: Graph Neural Networks or Stochastic Geometry?
Figure 2 for Interference-Limited Ultra-Reliable and Low-Latency Communications: Graph Neural Networks or Stochastic Geometry?
Figure 3 for Interference-Limited Ultra-Reliable and Low-Latency Communications: Graph Neural Networks or Stochastic Geometry?
Figure 4 for Interference-Limited Ultra-Reliable and Low-Latency Communications: Graph Neural Networks or Stochastic Geometry?

In this paper, we aim to improve the Quality-of-Service (QoS) of Ultra-Reliability and Low-Latency Communications (URLLC) in interference-limited wireless networks. To obtain time diversity within the channel coherence time, we first put forward a random repetition scheme that randomizes the interference power. Then, we optimize the number of reserved slots and the number of repetitions for each packet to minimize the QoS violation probability, defined as the percentage of users that cannot achieve URLLC. We build a cascaded Random Edge Graph Neural Network (REGNN) to represent the repetition scheme and develop a model-free unsupervised learning method to train it. We analyze the QoS violation probability using stochastic geometry in a symmetric scenario and apply a model-based Exhaustive Search (ES) method to find the optimal solution. Simulation results show that in the symmetric scenario, the QoS violation probabilities achieved by the model-free learning method and the model-based ES method are nearly the same. In more general scenarios, the cascaded REGNN generalizes very well in wireless networks with different scales, network topologies, cell densities, and frequency reuse factors. It outperforms the model-based ES method in the presence of the model mismatch.

* Submitted to IEEE journal for possible publication 
Viaarxiv icon

CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One

Jul 13, 2022
Liyuan Wang, Xingxing Zhang, Qian Li, Jun Zhu, Yi Zhong

Figure 1 for CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One
Figure 2 for CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One
Figure 3 for CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One
Figure 4 for CoSCL: Cooperation of Small Continual Learners is Stronger than a Big One

Continual learning requires incremental compatibility with a sequence of tasks. However, the design of model architecture remains an open question: In general, learning all tasks with a shared set of parameters suffers from severe interference between tasks; while learning each task with a dedicated parameter subspace is limited by scalability. In this work, we theoretically analyze the generalization errors for learning plasticity and memory stability in continual learning, which can be uniformly upper-bounded by (1) discrepancy between task distributions, (2) flatness of loss landscape and (3) cover of parameter space. Then, inspired by the robust biological learning system that processes sequential experiences with multiple parallel compartments, we propose Cooperation of Small Continual Learners (CoSCL) as a general strategy for continual learning. Specifically, we present an architecture with a fixed number of narrower sub-networks to learn all incremental tasks in parallel, which can naturally reduce the two errors through improving the three components of the upper bound. To strengthen this advantage, we encourage to cooperate these sub-networks by penalizing the difference of predictions made by their feature representations. With a fixed parameter budget, CoSCL can improve a variety of representative continual learning approaches by a large margin (e.g., up to 10.64% on CIFAR-100-SC, 9.33% on CIFAR-100-RS, 11.45% on CUB-200-2011 and 6.72% on Tiny-ImageNet) and achieve the new state-of-the-art performance.

* European Conference on Computer Vision (ECCV) 2022  
Viaarxiv icon