Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dong Yu

On the Dimensionality of Sentence Embeddings

Oct 23, 2023

Hongwei Wang, Hongming Zhang, Dong Yu

Abstract:Learning sentence embeddings is a fundamental problem in natural language processing. While existing research primarily focuses on enhancing the quality of sentence embeddings, the exploration of sentence embedding dimensions is limited. Here we present a comprehensive and empirical analysis of the dimensionality of sentence embeddings. First, we demonstrate that the optimal dimension of sentence embeddings is usually smaller than the default value. Subsequently, to compress the dimension of sentence embeddings with minimum performance degradation, we identify two components contributing to the overall performance loss: the encoder's performance loss and the pooler's performance loss. Therefore, we propose a two-step training method for sentence representation learning models, wherein the encoder and the pooler are optimized separately to mitigate the overall performance loss in low-dimension scenarios. Experimental results on seven STS tasks and seven sentence classification tasks demonstrate that our method significantly improves the performance of low-dimensional sentence embeddings.

Via

Access Paper or Ask Questions

Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

Oct 20, 2023

Wenyu Guo, Qingkai Fang, Dong Yu, Yang Feng

Figure 1 for Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

Figure 2 for Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

Figure 3 for Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

Figure 4 for Bridging the Gap between Synthetic and Authentic Images for Multimodal Machine Translation

Abstract:Multimodal machine translation (MMT) simultaneously takes the source sentence and a relevant image as input for translation. Since there is no paired image available for the input sentence in most cases, recent studies suggest utilizing powerful text-to-image generation models to provide image inputs. Nevertheless, synthetic images generated by these models often follow different distributions compared to authentic images. Consequently, using authentic images for training and synthetic images for inference can introduce a distribution shift, resulting in performance degradation during inference. To tackle this challenge, in this paper, we feed synthetic and authentic images to the MMT model, respectively. Then we minimize the gap between the synthetic and authentic images by drawing close the input image representations of the Transformer Encoder and the output distributions of the Transformer Decoder. Therefore, we mitigate the distribution disparity introduced by the synthetic images during inference, thereby freeing the authentic images from the inference process.Experimental results show that our approach achieves state-of-the-art performance on the Multi30K En-De and En-Fr datasets, while remaining independent of authentic images during inference.

* Accepted to EMNLP 2023 main conference

Via

Access Paper or Ask Questions

A High Fidelity and Low Complexity Neural Audio Coding

Oct 17, 2023

Wenzhe Liu, Wei Xiao, Meng Wang, Shan Yang, Yupeng Shi, Yuyong Kang, Dan Su, Shidong Shang, Dong Yu

Abstract:Audio coding is an essential module in the real-time communication system. Neural audio codecs can compress audio samples with a low bitrate due to the strong modeling and generative capabilities of deep neural networks. To address the poor high-frequency expression and high computational cost and storage consumption, we proposed an integrated framework that utilizes a neural network to model wide-band components and adopts traditional signal processing to compress high-band components according to psychological hearing knowledge. Inspired by auditory perception theory, a perception-based loss function is designed to improve harmonic modeling. Besides, generative adversarial network (GAN) compression is proposed for the first time for neural audio codecs. Our method is superior to prior advanced neural codecs across subjective and objective metrics and allows real-time inference on desktop and mobile.

Via

Access Paper or Ask Questions

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

Oct 02, 2023

Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Abstract:Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs. In this paper, we propose a Unified Speech Enhancement and Editing (uSee) model with conditional diffusion models to handle various tasks at the same time in a generative manner. Specifically, by providing multiple types of conditions including self-supervised learning embeddings and proper text prompts to the score-based diffusion model, we can enable controllable generation of the unified speech enhancement and editing model to perform corresponding actions on the source speech. Our experiments show that our proposed uSee model can achieve superior performance in both speech denoising and dereverberation compared to other related generative speech enhancement models, and can perform speech editing given desired environmental sound text description, signal-to-noise ratios (SNR), and room impulse responses (RIR). Demos of the generated speech are available at https://muqiaoy.github.io/usee.

Via

Access Paper or Ask Questions

From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Sep 30, 2023

Xuansheng Wu, Wenlin Yao, Jianshu Chen, Xiaoman Pan, Xiaoyang Wang, Ninghao Liu, Dong Yu

Figure 1 for From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Figure 2 for From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Figure 3 for From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Figure 4 for From Language Modeling to Instruction Following: Understanding the Behavior Shift in LLMs after Instruction Tuning

Abstract:Large Language Models (LLMs) have achieved remarkable success, demonstrating powerful instruction-following capabilities across diverse tasks. Instruction fine-tuning is critical in enabling LLMs to align with user intentions and effectively follow instructions. In this work, we investigate how instruction fine-tuning modifies pre-trained models, focusing on two perspectives: instruction recognition and knowledge evolution. To study the behavior shift of LLMs, we employ a suite of local and global explanation methods, including a gradient-based approach for input-output attribution and techniques for interpreting patterns and concepts in self-attention and feed-forward layers. Our findings reveal three significant impacts of instruction fine-tuning: 1) It empowers LLMs to better recognize the instruction parts from user prompts, thereby facilitating high-quality response generation and addressing the ``lost-in-the-middle'' issue observed in pre-trained models; 2) It aligns the knowledge stored in feed-forward layers with user-oriented tasks, exhibiting minimal shifts across linguistic levels. 3) It facilitates the learning of word-word relations with instruction verbs through the self-attention mechanism, particularly in the lower and middle layers, indicating enhanced recognition of instruction words. These insights contribute to a deeper understanding of the behavior shifts in LLMs after instruction fine-tuning and lay the groundwork for future research aimed at interpreting and optimizing LLMs for various applications. We will release our code and data soon.

* 28 pages, 13 figures, 12 tables

Via

Access Paper or Ask Questions

The Trickle-down Impact of Reward consistency on RLHF

Sep 28, 2023

Lingfeng Shen, Sihao Chen, Linfeng Song, Lifeng Jin, Baolin Peng, Haitao Mi, Daniel Khashabi, Dong Yu

Figure 1 for The Trickle-down Impact of Reward consistency on RLHF

Figure 2 for The Trickle-down Impact of Reward consistency on RLHF

Figure 3 for The Trickle-down Impact of Reward consistency on RLHF

Figure 4 for The Trickle-down Impact of Reward consistency on RLHF

Abstract:Standard practice within Reinforcement Learning from Human Feedback (RLHF) involves optimizing against a Reward Model (RM), which itself is trained to reflect human preferences for desirable generations. A notable subject that is understudied is the (in-)consistency of RMs -- whether they can recognize the semantic changes to different prompts and appropriately adapt their reward assignments -- and their impact on the downstream RLHF model. In this paper, we visit a series of research questions relevant to RM inconsistency: (1) How can we measure the consistency of reward models? (2) How consistent are the existing RMs and how can we improve them? (3) In what ways does reward inconsistency influence the chatbots resulting from the RLHF model training? We propose Contrast Instructions -- a benchmarking strategy for the consistency of RM. Each example in Contrast Instructions features a pair of lexically similar instructions with different ground truth responses. A consistent RM is expected to rank the corresponding instruction and response higher than other combinations. We observe that current RMs trained with the standard ranking objective fail miserably on Contrast Instructions compared to average humans. To show that RM consistency can be improved efficiently without using extra training budget, we propose two techniques ConvexDA and RewardFusion, which enhance reward consistency through extrapolation during the RM training and inference stage, respectively. We show that RLHF models trained with a more consistent RM yield more useful responses, suggesting that reward inconsistency exhibits a trickle-down effect on the downstream RLHF process.

Via

Access Paper or Ask Questions

Neural Network Augmented Kalman Filter for Robust Acoustic Howling Suppression

Sep 27, 2023

Yixuan Zhang, Hao Zhang, Meng Yu, Dong Yu

Abstract:Acoustic howling suppression (AHS) is a critical challenge in audio communication systems. In this paper, we propose a novel approach that leverages the power of neural networks (NN) to enhance the performance of traditional Kalman filter algorithms for AHS. Specifically, our method involves the integration of NN modules into the Kalman filter, enabling refining reference signal, a key factor in effective adaptive filtering, and estimating covariance metrics for the filter which are crucial for adaptability in dynamic conditions, thereby obtaining improved AHS performance. As a result, the proposed method achieves improved AHS performance compared to both standalone NN and Kalman filter methods. Experimental evaluations validate the effectiveness of our approach.

* Paper in submission

Via

Access Paper or Ask Questions

Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

Sep 27, 2023

Hao Zhang, Yixuan Zhang, Meng Yu, Dong Yu

Figure 1 for Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

Figure 2 for Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

Figure 3 for Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

Figure 4 for Advancing Acoustic Howling Suppression through Recursive Training of Neural Networks

Abstract:In this paper, we introduce a novel training framework designed to comprehensively address the acoustic howling issue by examining its fundamental formation process. This framework integrates a neural network (NN) module into the closed-loop system during training with signals generated recursively on the fly to closely mimic the streaming process of acoustic howling suppression (AHS). The proposed recursive training strategy bridges the gap between training and real-world inference scenarios, marking a departure from previous NN-based methods that typically approach AHS as either noise suppression or acoustic echo cancellation. Within this framework, we explore two methodologies: one exclusively relying on NN and the other combining NN with the traditional Kalman filter. Additionally, we propose strategies, including howling detection and initialization using pre-trained offline models, to bolster trainability and expedite the training process. Experimental results validate that this framework offers a substantial improvement over previous methodologies for acoustic howling suppression.

* Paper in submission

Via

Access Paper or Ask Questions

M3-AUDIODEC: Multi-channel multi-speaker multi-spatial audio codec

Sep 23, 2023

Anton Ratnarajah, Shi-Xiong Zhang, Dong Yu

Abstract:We introduce M3-AUDIODEC, an innovative neural spatial audio codec designed for efficient compression of multi-channel (binaural) speech in both single and multi-speaker scenarios, while retaining the spatial location information of each speaker. This model boasts versatility, allowing configuration and training tailored to a predetermined set of multi-channel, multi-speaker, and multi-spatial overlapping speech conditions. Key contributions are as follows: 1) Previous neural codecs are extended from single to multi-channel audios. 2) The ability of our proposed model to compress and decode for overlapping speech. 3) A groundbreaking architecture that compresses speech content and spatial cues separately, ensuring the preservation of each speaker's spatial context after decoding. 4) M3-AUDIODEC's proficiency in reducing the bandwidth for compressing two-channel speech by 48% when compared to individual binaural channel compression. Impressively, at a 12.6 kbps operation, it outperforms Opus at 24 kbps and AUDIODEC at 24 kbps by 37% and 52%, respectively. In our assessment, we employed speech enhancement and room acoustic metrics to ascertain the accuracy of clean speech and spatial cue estimates from M3-AUDIODEC. Audio demonstrations and source code are available online at https://github.com/anton-jeran/MULTI-AUDIODEC .

* More results and source code are available at https://anton-jeran.github.io/MAD/

Via

Access Paper or Ask Questions

Proposition from the Perspective of Chinese Language: A Chinese Proposition Classification Evaluation Benchmark

Sep 18, 2023

Conghui Niu, Mengyang Hu, Lin Bo, Xiaoli He, Dong Yu, Pengyuan Liu

Abstract:Existing propositions often rely on logical constants for classification. Compared with Western languages that lean towards hypotaxis such as English, Chinese often relies on semantic or logical understanding rather than logical connectives in daily expressions, exhibiting the characteristics of parataxis. However, existing research has rarely paid attention to this issue. And accurately classifying these propositions is crucial for natural language understanding and reasoning. In this paper, we put forward the concepts of explicit and implicit propositions and propose a comprehensive multi-level proposition classification system based on linguistics and logic. Correspondingly, we create a large-scale Chinese proposition dataset PEACE from multiple domains, covering all categories related to propositions. To evaluate the Chinese proposition classification ability of existing models and explore their limitations, We conduct evaluations on PEACE using several different methods including the Rule-based method, SVM, BERT, RoBERTA, and ChatGPT. Results show the importance of properly modeling the semantic features of propositions. BERT has relatively good proposition classification capability, but lacks cross-domain transferability. ChatGPT performs poorly, but its classification ability can be improved by providing more proposition information. Many issues are still far from being resolved and require further study.

Via

Access Paper or Ask Questions