Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Liang Huang

Member, IEEE

Citrus: Leveraging Expert Cognitive Pathways in a Medical Language Model for Advanced Medical Decision Support

Feb 26, 2025

Guoxin Wang, Minyu Gao, Shuai Yang, Ya Zhang, Lizhi He, Liang Huang, Hanlin Xiao, Yexuan Zhang, Wanyue Li, Lu Chen(+2 more)

Abstract:Large language models (LLMs), particularly those with reasoning capabilities, have rapidly advanced in recent years, demonstrating significant potential across a wide range of applications. However, their deployment in healthcare, especially in disease reasoning tasks, is hindered by the challenge of acquiring expert-level cognitive data. In this paper, we introduce Citrus, a medical language model that bridges the gap between clinical expertise and AI reasoning by emulating the cognitive processes of medical experts. The model is trained on a large corpus of simulated expert disease reasoning data, synthesized using a novel approach that accurately captures the decision-making pathways of clinicians. This approach enables Citrus to better simulate the complex reasoning processes involved in diagnosing and treating medical conditions. To further address the lack of publicly available datasets for medical reasoning tasks, we release the last-stage training data, including a custom-built medical diagnostic dialogue dataset. This open-source contribution aims to support further research and development in the field. Evaluations using authoritative benchmarks such as MedQA, covering tasks in medical reasoning and language understanding, show that Citrus achieves superior performance compared to other models of similar size. These results highlight Citrus potential to significantly enhance medical decision support systems, providing a more accurate and efficient tool for clinical decision-making.

Via

Access Paper or Ask Questions

Sampling-based Continuous Optimization with Coupled Variables for RNA Design

Dec 11, 2024

Wei Yu Tang, Ning Dai, Tianshuo Zhou, David H. Mathews, Liang Huang

Abstract:The task of RNA design given a target structure aims to find a sequence that can fold into that structure. It is a computationally hard problem where some version(s) have been proven to be NP-hard. As a result, heuristic methods such as local search have been popular for this task, but by only exploring a fixed number of candidates. They can not keep up with the exponential growth of the design space, and often perform poorly on longer and harder-to-design structures. We instead formulate these discrete problems as continuous optimization, which starts with a distribution over all possible candidate sequences, and uses gradient descent to improve the expectation of an objective function. We define novel distributions based on coupled variables to rule out invalid sequences given the target structure and to model the correlation between nucleotides. To make it universally applicable to any objective function, we use sampling to approximate the expected objective function, to estimate the gradient, and to select the final candidate. Compared to the state-of-the-art methods, our work consistently outperforms them in key metrics such as Boltzmann probability, ensemble defect, and energy gap, especially on long and hard-to-design puzzles in the Eterna100 benchmark. Our code is available at: http://github.com/weiyutang1010/ncrna_design.

Via

Access Paper or Ask Questions

Process Supervision-Guided Policy Optimization for Code Generation

Oct 23, 2024

Ning Dai, Zheng Wu, Renjie Zheng, Ziyun Wei, Wenlei Shi, Xing Jin, Guanlin Liu, Chen Dun, Liang Huang, Lin Yan

Figure 1 for Process Supervision-Guided Policy Optimization for Code Generation

Figure 2 for Process Supervision-Guided Policy Optimization for Code Generation

Figure 3 for Process Supervision-Guided Policy Optimization for Code Generation

Figure 4 for Process Supervision-Guided Policy Optimization for Code Generation

Abstract:Reinforcement Learning (RL) with unit test feedback has enhanced large language models (LLMs) code generation, but relies on sparse rewards provided only after complete code evaluation, limiting learning efficiency and incremental improvements. When generated code fails all unit tests, no learning signal is received, hindering progress on complex tasks. To address this, we propose a Process Reward Model (PRM) that delivers dense, line-level feedback on code correctness during generation, mimicking human code refinement and providing immediate guidance. We explore various strategies for training PRMs and integrating them into the RL framework, finding that using PRMs both as dense rewards and for value function initialization significantly boosts performance. Our approach increases our in-house LLM's pass rate from 28.2% to 29.8% on LiveCodeBench and from 31.8% to 35.8% on our internal benchmark. Our experimental results highlight the effectiveness of PRMs in enhancing RL-driven code generation, especially for long-horizon scenarios.

* 14 pages, 5 figures

Via

Access Paper or Ask Questions

Messenger and Non-Coding RNA Design via Expected Partition Function and Continuous Optimization

Dec 29, 2023

Ning Dai, Wei Yu Tang, Tianshuo Zhou, David H. Mathews, Liang Huang

Figure 1 for Messenger and Non-Coding RNA Design via Expected Partition Function and Continuous Optimization

Figure 2 for Messenger and Non-Coding RNA Design via Expected Partition Function and Continuous Optimization

Figure 3 for Messenger and Non-Coding RNA Design via Expected Partition Function and Continuous Optimization

Figure 4 for Messenger and Non-Coding RNA Design via Expected Partition Function and Continuous Optimization

Abstract:The tasks of designing messenger RNAs and non-coding RNAs are discrete optimization problems, and several versions of these problems are NP-hard. As an alternative to commonly used local search methods, we formulate these problems as continuous optimization and develop a general framework for this optimization based on a new concept of "expected partition function". The basic idea is to start with a distribution over all possible candidate sequences, and extend the objective function from a sequence to a distribution. We then use gradient descent-based optimization methods to improve the extended objective function, and the distribution will gradually shrink towards a one-hot sequence (i.e., a single sequence). We consider two important case studies within this framework, the mRNA design problem optimizing for partition function (i.e., ensemble free energy) and the non-coding RNA design problem optimizing for conditional (i.e., Boltzmann) probability. In both cases, our approach demonstrate promising preliminary results. We make our code available at https://github.com/KuNyaa/RNA_Design_codebase.

Via

Access Paper or Ask Questions

ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Nov 07, 2022

Xiaoran Fan, Chao Pang, Tian Yuan, He Bai, Renjie Zheng, Pengfei Zhu, Shuohuan Wang, Junkun Chen, Zeyu Chen, Liang Huang(+2 more)

Figure 1 for ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Figure 2 for ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Figure 3 for ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Figure 4 for ERNIE-SAT: Speech and Text Joint Pretraining for Cross-Lingual Multi-Speaker Text-to-Speech

Abstract:Speech representation learning has improved both speech understanding and speech synthesis tasks for single language. However, its ability in cross-lingual scenarios has not been explored. In this paper, we extend the pretraining method for cross-lingual multi-speaker speech synthesis tasks, including cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing. We propose a speech-text joint pretraining framework, where we randomly mask the spectrogram and the phonemes given a speech example and its transcription. By learning to reconstruct the masked parts of the input in different languages, our model shows great improvements over speaker-embedding-based multi-speaker TTS methods. Moreover, our framework is end-to-end for both the training and the inference without any finetuning effort. In cross-lingual multi-speaker voice cloning and cross-lingual multi-speaker speech editing tasks, our experiments show that our model outperforms speaker-embedding-based multi-speaker TTS methods. The code and model are publicly available at PaddleSpeech.

Via

Access Paper or Ask Questions

PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

May 20, 2022

Hui Zhang, Tian Yuan, Junkun Chen, Xintong Li, Renjie Zheng, Yuxin Huang, Xiaojie Chen, Enlei Gong, Zeyu Chen, Xiaoguang Hu(+3 more)

Figure 1 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

Figure 2 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

Figure 3 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

Figure 4 for PaddleSpeech: An Easy-to-Use All-in-One Speech Toolkit

Abstract:PaddleSpeech is an open-source all-in-one speech toolkit. It aims at facilitating the development and research of speech processing technologies by providing an easy-to-use command-line interface and a simple code structure. This paper describes the design philosophy and core architecture of PaddleSpeech to support several essential speech-to-text and text-to-speech tasks. PaddleSpeech achieves competitive or state-of-the-art performance on various speech datasets and implements the most popular methods. It also provides recipes and pretrained models to quickly reproduce the experimental results in this paper. PaddleSpeech is publicly avaiable at https://github.com/PaddlePaddle/PaddleSpeech.

Via

Access Paper or Ask Questions

A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

May 16, 2022

Liang Huang, Senjie Liang, Feiyang Ye, Nan Gao

Figure 1 for A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

Figure 2 for A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

Figure 3 for A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

Figure 4 for A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices

Abstract:Intent detection and slot filling are two main tasks in natural language understanding and play an essential role in task-oriented dialogue systems. The joint learning of both tasks can improve inference accuracy and is popular in recent works. However, most joint models ignore the inference latency and cannot meet the need to deploy dialogue systems at the edge. In this paper, we propose a Fast Attention Network (FAN) for joint intent detection and slot filling tasks, guaranteeing both accuracy and latency. Specifically, we introduce a clean and parameter-refined attention module to enhance the information exchange between intent and slot, improving semantic accuracy by more than 2%. FAN can be implemented on different encoders and delivers more accurate models at every speed level. Our experiments on the Jetson Nano platform show that FAN inferences fifteen utterances per second with a small accuracy drop, showing its effectiveness and efficiency on edge devices.

* 9 pages, 4 figures

Via

Access Paper or Ask Questions

Data-Driven Adaptive Simultaneous Machine Translation

Apr 27, 2022

Guangxu Xun, Mingbo Ma, Yuchen Bian, Xingyu Cai, Jiaji Huang, Renjie Zheng, Junkun Chen, Jiahong Yuan, Kenneth Church, Liang Huang

Figure 1 for Data-Driven Adaptive Simultaneous Machine Translation

Figure 2 for Data-Driven Adaptive Simultaneous Machine Translation

Figure 3 for Data-Driven Adaptive Simultaneous Machine Translation

Figure 4 for Data-Driven Adaptive Simultaneous Machine Translation

Abstract:In simultaneous translation (SimulMT), the most widely used strategy is the wait-k policy thanks to its simplicity and effectiveness in balancing translation quality and latency. However, wait-k suffers from two major limitations: (a) it is a fixed policy that can not adaptively adjust latency given context, and (b) its training is much slower than full-sentence translation. To alleviate these issues, we propose a novel and efficient training scheme for adaptive SimulMT by augmenting the training corpus with adaptive prefix-to-prefix pairs, while the training complexity remains the same as that of training full-sentence translation models. Experiments on two language pairs show that our method outperforms all strong baselines in terms of translation quality and latency.

Via

Access Paper or Ask Questions

A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Mar 18, 2022

He Bai, Renjie Zheng, Junkun Chen, Xintong Li, Mingbo Ma, Liang Huang

Figure 1 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Figure 2 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Figure 3 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Figure 4 for A$^3$T: Alignment-Aware Acoustic and Text Pretraining for Speech Synthesis and Editing

Abstract:Recently, speech representation learning has improved many speech-related tasks such as speech recognition, speech classification, and speech-to-text translation. However, all the above tasks are in the direction of speech understanding, but for the inverse direction, speech synthesis, the potential of representation learning is yet to be realized, due to the challenging nature of generating high-quality speech. To address this problem, we propose our framework, Alignment-Aware Acoustic-Text Pretraining (A$^3$T), which reconstructs masked acoustic signals with text input and acoustic-text alignment during training. In this way, the pretrained model can generate high quality of reconstructed spectrogram, which can be applied to the speech editing and unseen speaker TTS directly. Experiments show A$^3$T outperforms SOTA models on speech editing, and improves multi-speaker speech synthesis without the external speaker verification model.

* under review, 12 pages, 10 figures

Via

Access Paper or Ask Questions

Computation Rate Maximum for Mobile Terminals in UAV-assisted Wireless Powered MEC Networks with Fairness Constraint

Sep 13, 2021

Xiaoyi Zhou, Liang Huang, Tong Ye, Weiqiang Sun

Figure 1 for Computation Rate Maximum for Mobile Terminals in UAV-assisted Wireless Powered MEC Networks with Fairness Constraint

Figure 2 for Computation Rate Maximum for Mobile Terminals in UAV-assisted Wireless Powered MEC Networks with Fairness Constraint

Figure 3 for Computation Rate Maximum for Mobile Terminals in UAV-assisted Wireless Powered MEC Networks with Fairness Constraint

Figure 4 for Computation Rate Maximum for Mobile Terminals in UAV-assisted Wireless Powered MEC Networks with Fairness Constraint

Abstract:This paper investigates an unmanned aerial vehicle (UAV)-assisted wireless powered mobile-edge computing (MEC) system, where the UAV powers the mobile terminals by wireless power transfer (WPT) and provides computation service for them. We aim to maximize the computation rate of terminals while ensuring fairness among them. Considering the random trajectories of mobile terminals, we propose a soft actor-critic (SAC)-based UAV trajectory planning and resource allocation (SAC-TR) algorithm, which combines off-policy and maximum entropy reinforcement learning to promote the convergence of the algorithm. We design the reward as a heterogeneous function of computation rate, fairness, and reaching of destination. Simulation results show that SAC-TR can quickly adapt to varying network environments and outperform representative benchmarks in a variety of situations.

* 12 pages

Via

Access Paper or Ask Questions