The use of large language models in medical dialogue generation has garnered significant attention, with a focus on improving response quality and fluency. While previous studies have made progress in optimizing model performance for single-round medical Q&A tasks, there is a need to enhance the model's capability for multi-round conversations to avoid logical inconsistencies. To address this, we propose an approach called preference learning from process feedback~(PLPF), which integrates the doctor's diagnostic logic into LLMs. PLPF involves rule modeling, preference data generation, and preference alignment to train the model to adhere to the diagnostic process. Experimental results using Standardized Patient Testing show that PLPF enhances the diagnostic accuracy of the baseline model in medical conversations by 17.6%, outperforming traditional reinforcement learning from human feedback. Additionally, PLPF demonstrates effectiveness in both multi-round and single-round dialogue tasks, showcasing its potential for improving medical dialogue generation.
It is a grand challenge to model the emergence of swarm intelligence and many principles or models had been proposed. However, existing models do not catch the nature of swarm intelligence and they are not generic enough to describe various types of emergence phenomena. In this work, we propose a contradiction-centric model for emergence of swarm intelligence, in which individuals' contradictions dominate their appearances whilst they are associated and interacting to update their contradictions. This model hypothesizes that 1) the emergence of swarm intelligence is rooted in the development of contradictions of individuals and the interactions among associated individuals and 2) swarm intelligence is essentially a combinative reflection of the configurations of contradictions inside individuals and the distributions of contradictions among individuals. To verify the feasibility of the model, we simulate four types of swarm intelligence. As the simulations show, our model is truly generic and can describe the emergence of a variety of swarm intelligence, and it is also very simple and can be easily applied to demonstrate the emergence of swarm intelligence without needing complicated computations.