Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Chuan Wang

KwaiChat: A Large-Scale Video-Driven Multilingual Mixed-Type Dialogue Corpus

Mar 10, 2025

Xiaoming Shi, Zeming Liu, Yiming Lei, Chenkai Zhang, Haitao Leng, Chuan Wang, Qingjie Liu, Wanxiang Che, Shaoguo Liu, Size Li(+1 more)

Abstract:Video-based dialogue systems, such as education assistants, have compelling application value, thereby garnering growing interest. However, the current video-based dialogue systems are limited by their reliance on a single dialogue type, which hinders their versatility in practical applications across a range of scenarios, including question-answering, emotional dialog, etc. In this paper, we identify this challenge as how to generate video-driven multilingual mixed-type dialogues. To mitigate this challenge, we propose a novel task and create a human-to-human video-driven multilingual mixed-type dialogue corpus, termed KwaiChat, containing a total of 93,209 videos and 246,080 dialogues, across 4 dialogue types, 30 domains, 4 languages, and 13 topics. Additionally, we establish baseline models on KwaiChat. An extensive analysis of 7 distinct LLMs on KwaiChat reveals that GPT-4o achieves the best performance but still cannot perform well in this situation even with the help of in-context learning and fine-tuning, which indicates that the task is not trivial and needs further research.

Via

Access Paper or Ask Questions

STAMPsy: Towards SpatioTemporal-Aware Mixed-Type Dialogues for Psychological Counseling

Dec 21, 2024

Jieyi Wang, Yue Huang, Zeming Liu, Dexuan Xu, Chuan Wang, Xiaoming Shi, Ruiyuan Guan, Hongxing Wang, Weihua Yue, Yu Huang

Figure 1 for STAMPsy: Towards SpatioTemporal-Aware Mixed-Type Dialogues for Psychological Counseling

Figure 2 for STAMPsy: Towards SpatioTemporal-Aware Mixed-Type Dialogues for Psychological Counseling

Figure 3 for STAMPsy: Towards SpatioTemporal-Aware Mixed-Type Dialogues for Psychological Counseling

Figure 4 for STAMPsy: Towards SpatioTemporal-Aware Mixed-Type Dialogues for Psychological Counseling

Abstract:Online psychological counseling dialogue systems are trending, offering a convenient and accessible alternative to traditional in-person therapy. However, existing psychological counseling dialogue systems mainly focus on basic empathetic dialogue or QA with minimal professional knowledge and without goal guidance. In many real-world counseling scenarios, clients often seek multi-type help, such as diagnosis, consultation, therapy, console, and common questions, but existing dialogue systems struggle to combine different dialogue types naturally. In this paper, we identify this challenge as how to construct mixed-type dialogue systems for psychological counseling that enable clients to clarify their goals before proceeding with counseling. To mitigate the challenge, we collect a mixed-type counseling dialogues corpus termed STAMPsy, covering five dialogue types, task-oriented dialogue for diagnosis, knowledge-grounded dialogue, conversational recommendation, empathetic dialogue, and question answering, over 5,000 conversations. Moreover, spatiotemporal-aware knowledge enables systems to have world awareness and has been proven to affect one's mental health. Therefore, we link dialogues in STAMPsy to spatiotemporal state and propose a spatiotemporal-aware mixed-type psychological counseling dataset. Additionally, we build baselines on STAMPsy and develop an iterative self-feedback psychological dialogue generation framework, named Self-STAMPsy. Results indicate that clarifying dialogue goals in advance and utilizing spatiotemporal states are effective.

Via

Access Paper or Ask Questions

2M-NER: Contrastive Learning for Multilingual and Multimodal NER with Language and Modal Fusion

Apr 26, 2024

Dongsheng Wang, Xiaoqin Feng, Zeming Liu, Chuan Wang

Abstract:Named entity recognition (NER) is a fundamental task in natural language processing that involves identifying and classifying entities in sentences into pre-defined types. It plays a crucial role in various research fields, including entity linking, question answering, and online product recommendation. Recent studies have shown that incorporating multilingual and multimodal datasets can enhance the effectiveness of NER. This is due to language transfer learning and the presence of shared implicit features across different modalities. However, the lack of a dataset that combines multilingualism and multimodality has hindered research exploring the combination of these two aspects, as multimodality can help NER in multiple languages simultaneously. In this paper, we aim to address a more challenging task: multilingual and multimodal named entity recognition (MMNER), considering its potential value and influence. Specifically, we construct a large-scale MMNER dataset with four languages (English, French, German and Spanish) and two modalities (text and image). To tackle this challenging MMNER task on the dataset, we introduce a new model called 2M-NER, which aligns the text and image representations using contrastive learning and integrates a multimodal collaboration module to effectively depict the interactions between the two modalities. Extensive experimental results demonstrate that our model achieves the highest F1 score in multilingual and multimodal NER tasks compared to some comparative and representative baselines. Additionally, in a challenging analysis, we discovered that sentence-level alignment interferes a lot with NER models, indicating the higher level of difficulty in our dataset.

* 20 pages

Via

Access Paper or Ask Questions

SwG-former: Sliding-window Graph Convolutional Network Integrated with Conformer for Sound Event Localization and Detection

Oct 21, 2023

Weiming Huang, Qinghua Huang, Liyan Ma, Zhengyu Chen, Chuan Wang

Abstract:Sound event localization and detection (SELD) is a joint task of sound event detection (SED) and direction of arrival (DoA) estimation. SED mainly relies on temporal dependencies to distinguish different sound classes, while DoA estimation depends on spatial correlations to estimate source directions. To jointly optimize two subtasks, the SELD system should extract spatial correlations and model temporal dependencies simultaneously. However, numerous models mainly extract spatial correlations and model temporal dependencies separately. In this paper, the interdependence of spatial-temporal information in audio signals is exploited for simultaneous extraction to enhance the model performance. In response, a novel graph representation leveraging graph convolutional network (GCN) in non-Euclidean space is developed to extract spatial-temporal information concurrently. A sliding-window graph (SwG) module is designed based on the graph representation. It exploits sliding-windows with different sizes to learn temporal context information and dynamically constructs graph vertices in the frequency-channel (F-C) domain to capture spatial correlations. Furthermore, as the cornerstone of message passing, a robust Conv2dAgg function is proposed and embedded into the SwG module to aggregate the features of neighbor vertices. To improve the performance of SELD in a natural spatial acoustic environment, a general and efficient SwG-former model is proposed by integrating the SwG module with the Conformer. It exhibits superior performance in comparison to recent advanced SELD models. To further validate the generality and efficiency of the SwG-former, it is seamlessly integrated into the event-independent network version 2 (EINV2) called SwG-EINV2. The SwG-EINV2 surpasses the state-of-the-art (SOTA) methods under the same acoustic environment.

Via

Access Paper or Ask Questions

MEFLUT: Unsupervised 1D Lookup Tables for Multi-exposure Image Fusion

Sep 21, 2023

Ting Jiang, Chuan Wang, Xinpeng Li, Ru Li, Haoqiang Fan, Shuaicheng Liu

Abstract:In this paper, we introduce a new approach for high-quality multi-exposure image fusion (MEF). We show that the fusion weights of an exposure can be encoded into a 1D lookup table (LUT), which takes pixel intensity value as input and produces fusion weight as output. We learn one 1D LUT for each exposure, then all the pixels from different exposures can query 1D LUT of that exposure independently for high-quality and efficient fusion. Specifically, to learn these 1D LUTs, we involve attention mechanism in various dimensions including frame, channel and spatial ones into the MEF task so as to bring us significant quality improvement over the state-of-the-art (SOTA). In addition, we collect a new MEF dataset consisting of 960 samples, 155 of which are manually tuned by professionals as ground-truth for evaluation. Our network is trained by this dataset in an unsupervised manner. Extensive experiments are conducted to demonstrate the effectiveness of all the newly proposed components, and results show that our approach outperforms the SOTA in our and another representative dataset SICE, both qualitatively and quantitatively. Moreover, our 1D LUT approach takes less than 4ms to run a 4K image on a PC GPU. Given its high quality, efficiency and robustness, our method has been shipped into millions of Android mobiles across multiple brands world-wide. Code is available at: https://github.com/Hedlen/MEFLUT.

Via

Access Paper or Ask Questions

Application of Zone Method based Machine Learning and Physics-Informed Neural Networks in Reheating Furnaces

Aug 30, 2023

Ujjal Kr Dutta, Aldo Lipani, Chuan Wang, Yukun Hu

Abstract:Despite the high economic relevance of Foundation Industries, certain components like Reheating furnaces within their manufacturing chain are energy-intensive. Notable energy consumption reduction could be obtained by reducing the overall heating time in furnaces. Computer-integrated Machine Learning (ML) and Artificial Intelligence (AI) powered control systems in furnaces could be enablers in achieving the Net-Zero goals in Foundation Industries for sustainable manufacturing. In this work, due to the infeasibility of achieving good quality data in scenarios like reheating furnaces, classical Hottel's zone method based computational model has been used to generate data for ML and Deep Learning (DL) based model training via regression. It should be noted that the zone method provides an elegant way to model the physical phenomenon of Radiative Heat Transfer (RHT), the dominating heat transfer mechanism in high-temperature processes inside heating furnaces. Using this data, an extensive comparison among a wide range of state-of-the-art, representative ML and DL methods has been made against their temperature prediction performances in varying furnace environments. Owing to their holistic balance among inference times and model performance, DL stands out among its counterparts. To further enhance the Out-Of-Distribution (OOD) generalization capability of the trained DL models, we propose a Physics-Informed Neural Network (PINN) by incorporating prior physical knowledge using a set of novel Energy-Balance regularizers. Our setup is a generic framework, is geometry-agnostic of the 3D structure of the underlying furnace, and as such could accommodate any standard ML regression model, to serve as a Digital Twin of the underlying physical processes, for transitioning Foundation Industries towards Industry 4.0.

Via

Access Paper or Ask Questions

MidMed: Towards Mixed-Type Dialogues for Medical Consultation

Jun 14, 2023

Xiaoming Shi, Zeming Liu, Chuan Wang, Haitao Leng, Kui Xue, Xiaofan Zhang, Shaoting Zhang

Figure 1 for MidMed: Towards Mixed-Type Dialogues for Medical Consultation

Figure 2 for MidMed: Towards Mixed-Type Dialogues for Medical Consultation

Figure 3 for MidMed: Towards Mixed-Type Dialogues for Medical Consultation

Figure 4 for MidMed: Towards Mixed-Type Dialogues for Medical Consultation

Abstract:Most medical dialogue systems assume that patients have clear goals (medicine querying, surgical operation querying, etc.) before medical consultation. However, in many real scenarios, due to the lack of medical knowledge, it is usually difficult for patients to determine clear goals with all necessary slots. In this paper, we identify this challenge as how to construct medical consultation dialogue systems to help patients clarify their goals. To mitigate this challenge, we propose a novel task and create a human-to-human mixed-type medical consultation dialogue corpus, termed MidMed, covering five dialogue types: task-oriented dialogue for diagnosis, recommendation, knowledge-grounded dialogue, QA, and chitchat. MidMed covers four departments (otorhinolaryngology, ophthalmology, skin, and digestive system), with 8,175 dialogues. Furthermore, we build baselines on MidMed and propose an instruction-guiding medical dialogue generation framework, termed InsMed, to address this task. Experimental results show the effectiveness of InsMed.

* Accepted by ACL 2023 main conference. The first two authors contributed equally to this work

Via

Access Paper or Ask Questions

LSGNN: Towards General Graph Neural Network in Node Classification by Local Similarity

May 07, 2023

Yuhan Chen, Yihong Luo, Jing Tang, Liang Yang, Siya Qiu, Chuan Wang, Xiaochun Cao

Abstract:Heterophily has been considered as an issue that hurts the performance of Graph Neural Networks (GNNs). To address this issue, some existing work uses a graph-level weighted fusion of the information of multi-hop neighbors to include more nodes with homophily. However, the heterophily might differ among nodes, which requires to consider the local topology. Motivated by it, we propose to use the local similarity (LocalSim) to learn node-level weighted fusion, which can also serve as a plug-and-play module. For better fusion, we propose a novel and efficient Initial Residual Difference Connection (IRDC) to extract more informative multi-hop information. Moreover, we provide theoretical analysis on the effectiveness of LocalSim representing node homophily on synthetic graphs. Extensive evaluations over real benchmark datasets show that our proposed method, namely Local Similarity Graph Neural Network (LSGNN), can offer comparable or superior state-of-the-art performance on both homophilic and heterophilic graphs. Meanwhile, the plug-and-play model can significantly boost the performance of existing GNNs. Our code is provided at https://github.com/draym28/LSGNN.

* To be published at IJCAI23

Via

Access Paper or Ask Questions

Combinatorial optimization solving by coherent Ising machines based on spiking neural networks

Aug 16, 2022

Bo Lu, Yong-Pan Gao, Kai Wen, Chuan Wang

Figure 1 for Combinatorial optimization solving by coherent Ising machines based on spiking neural networks

Figure 2 for Combinatorial optimization solving by coherent Ising machines based on spiking neural networks

Figure 3 for Combinatorial optimization solving by coherent Ising machines based on spiking neural networks

Figure 4 for Combinatorial optimization solving by coherent Ising machines based on spiking neural networks

Abstract:Spiking neural network is a kind of neuromorphic computing which is believed to improve on the level of intelligence and provide advabtages for quantum computing. In this work, we address this issue by designing an optical spiking neural network and prove that it can be used to accelerate the speed of computation, especially on the combinatorial optimization problems. Here the spiking neural network is constructed by the antisymmetrically coupled degenerate optical parametric oscillator pulses and dissipative pulses. A nonlinear transfer function is chosen to mitigate amplitude inhomogeneities and destabilize the resulting local minima according to the dynamical behavior of spiking neurons. It is numerically proved that the spiking neural network-coherent Ising machines has excellent performance on combinatorial optimization problems, for which is expected to offer a new applications for neural computing and optical computing.

* 5 pages, 4 figures, comments are welcome

Via

Access Paper or Ask Questions

On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

May 21, 2022

Yongjie Wang, Chuan Wang, Ruobing Li, Hui Lin

Figure 1 for On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

Figure 2 for On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

Figure 3 for On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

Figure 4 for On the Use of BERT for Automated Essay Scoring: Joint Learning of Multi-Scale Essay Representation

Abstract:In recent years, pre-trained models have become dominant in most natural language processing (NLP) tasks. However, in the area of Automated Essay Scoring (AES), pre-trained models such as BERT have not been properly used to outperform other deep learning models such as LSTM. In this paper, we introduce a novel multi-scale essay representation for BERT that can be jointly learned. We also employ multiple losses and transfer learning from out-of-domain essays to further improve the performance. Experiment results show that our approach derives much benefit from joint learning of multi-scale essay representation and obtains almost the state-of-the-art result among all deep learning models in the ASAP task. Our multi-scale essay representation also generalizes well to CommonLit Readability Prize data set, which suggests that the novel text representation proposed in this paper may be a new and effective choice for long-text tasks.

* Accepted to NAACL 2022 as a long paper

Via

Access Paper or Ask Questions