Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Kun Qian

Speech Emotion Recognition under Resource Constraints with Data Distillation

Jun 21, 2024

Yi Chang, Zhao Ren, Zhonghao Zhao, Thanh Tam Nguyen, Kun Qian, Tanja Schultz, Björn W. Schuller

Figure 1 for Speech Emotion Recognition under Resource Constraints with Data Distillation

Figure 2 for Speech Emotion Recognition under Resource Constraints with Data Distillation

Figure 3 for Speech Emotion Recognition under Resource Constraints with Data Distillation

Figure 4 for Speech Emotion Recognition under Resource Constraints with Data Distillation

Abstract:Speech emotion recognition (SER) plays a crucial role in human-computer interaction. The emergence of edge devices in the Internet of Things (IoT) presents challenges in constructing intricate deep learning models due to constraints in memory and computational resources. Moreover, emotional speech data often contains private information, raising concerns about privacy leakage during the deployment of SER models. To address these challenges, we propose a data distillation framework to facilitate efficient development of SER models in IoT applications using a synthesised, smaller, and distilled dataset. Our experiments demonstrate that the distilled dataset can be effectively utilised to train SER models with fixed initialisation, achieving performances comparable to those developed using the original full emotional speech dataset.

Via

Access Paper or Ask Questions

Optimizing Psychological Counseling with Instruction-Tuned Large Language Models

Jun 19, 2024

Wenjie Li, Tianyu Sun, Kun Qian, Wenhong Wang

Figure 1 for Optimizing Psychological Counseling with Instruction-Tuned Large Language Models

Figure 2 for Optimizing Psychological Counseling with Instruction-Tuned Large Language Models

Figure 3 for Optimizing Psychological Counseling with Instruction-Tuned Large Language Models

Abstract:The advent of large language models (LLMs) has significantly advanced various fields, including natural language processing and automated dialogue systems. This paper explores the application of LLMs in psychological counseling, addressing the increasing demand for mental health services. We present a method for instruction tuning LLMs with specialized prompts to enhance their performance in providing empathetic, relevant, and supportive responses. Our approach involves developing a comprehensive dataset of counseling-specific prompts, refining them through feedback from professional counselors, and conducting rigorous evaluations using both automatic metrics and human assessments. The results demonstrate that our instruction-tuned model outperforms several baseline LLMs, highlighting its potential as a scalable and accessible tool for mental health support.

* 9 pages

Via

Access Paper or Ask Questions

Time Sensitive Knowledge Editing through Efficient Finetuning

Jun 06, 2024

Xiou Ge, Ali Mousavi, Edouard Grave, Armand Joulin, Kun Qian, Benjamin Han, Mostafa Arefiyan, Yunyao Li

Figure 1 for Time Sensitive Knowledge Editing through Efficient Finetuning

Figure 2 for Time Sensitive Knowledge Editing through Efficient Finetuning

Figure 3 for Time Sensitive Knowledge Editing through Efficient Finetuning

Figure 4 for Time Sensitive Knowledge Editing through Efficient Finetuning

Abstract:Large Language Models (LLMs) have demonstrated impressive capability in different tasks and are bringing transformative changes to many domains. However, keeping the knowledge in LLMs up-to-date remains a challenge once pretraining is complete. It is thus essential to design effective methods to both update obsolete knowledge and induce new knowledge into LLMs. Existing locate-and-edit knowledge editing (KE) method suffers from two limitations. First, the post-edit LLMs by such methods generally have poor capability in answering complex queries that require multi-hop reasoning. Second, the long run-time of such locate-and-edit methods to perform knowledge edits make it infeasible for large scale KE in practice. In this paper, we explore Parameter-Efficient Fine-Tuning (PEFT) techniques as an alternative for KE. We curate a more comprehensive temporal KE dataset with both knowledge update and knowledge injection examples for KE performance benchmarking. We further probe the effect of fine-tuning on a range of layers in an LLM for the multi-hop QA task. We find that PEFT performs better than locate-and-edit techniques for time-sensitive knowledge edits.

* Accepted to ACL 2024 main conference

Via

Access Paper or Ask Questions

LLM-based Robot Task Planning with Exceptional Handling for General Purpose Service Robots

May 24, 2024

Ruoyu Wang, Zhipeng Yang, Zinan Zhao, Xinyan Tong, Zhi Hong, Kun Qian

Abstract:The development of a general purpose service robot for daily life necessitates the robot's ability to deploy a myriad of fundamental behaviors judiciously. Recent advancements in training Large Language Models (LLMs) can be used to generate action sequences directly, given an instruction in natural language with no additional domain information. However, while the outputs of LLMs are semantically correct, the generated task plans may not accurately map to acceptable actions and might encompass various linguistic ambiguities. LLM hallucinations pose another challenge for robot task planning, which results in content that is inconsistent with real-world facts or user inputs. In this paper, we propose a task planning method based on a constrained LLM prompt scheme, which can generate an executable action sequence from a command. An exceptional handling module is further proposed to deal with LLM hallucinations problem. This module can ensure the LLM-generated results are admissible in the current environment. We evaluate our method on the commands generated by the RoboCup@Home Command Generator, observing that the robot demonstrates exceptional performance in both comprehending instructions and executing tasks.

Via

Access Paper or Ask Questions

Investigating KAN-Based Physics-Informed Neural Networks for EMI/EMC Simulations

May 21, 2024

Kun Qian, Mohamed Kheir

Abstract:The main objective of this paper is to investigate the feasibility of employing Physics-Informed Neural Networks (PINNs) techniques, in particular KolmogorovArnold Networks (KANs), for facilitating Electromagnetic Interference (EMI) simulations. It introduces some common EM problem formulations and how they can be solved using AI-driven solutions instead of lengthy and complex full-wave numerical simulations. This research may open new horizons for green EMI simulation workflows with less energy consumption and feasible computational capacity.

* 8 pages

Via

Access Paper or Ask Questions

FOTS: A Fast Optical Tactile Simulator for Sim2Real Learning of Tactile-motor Robot Manipulation Skills

May 01, 2024

Yongqiang Zhao, Kun Qian, Boyi Duan, Shan Luo

Abstract:Simulation is a widely used tool in robotics to reduce hardware consumption and gather large-scale data. Despite previous efforts to simulate optical tactile sensors, there remain challenges in efficiently synthesizing images and replicating marker motion under different contact loads. In this work, we propose a fast optical tactile simulator, named FOTS, for simulating optical tactile sensors. We utilize multi-layer perceptron mapping and planar shadow generation to simulate the optical response, while employing marker distribution approximation to simulate the motion of surface markers caused by the elastomer deformation. Experimental results demonstrate that FOTS outperforms other methods in terms of image generation quality and rendering speed, achieving 28.6 fps for optical simulation and 326.1 fps for marker motion simulation on a single CPU without GPU acceleration. In addition, we integrate the FOTS simulation model with physical engines like MuJoCo, and the peg-in-hole task demonstrates the effectiveness of our method in achieving zero-shot Sim2Real learning of tactile-motor robot manipulation skills. Our code is available at https://github.com/Rancho-zhao/FOTS.

Via

Access Paper or Ask Questions

DECIDER: A Rule-Controllable Decoding Strategy for Language Generation by Imitating Dual-System Cognitive Theory

Mar 04, 2024

Chen Xu, Tian Lan, Changlong Yu, Wei Wang, Jun Gao, Yu Ji, Qunxi Dong, Kun Qian, Piji Li, Wei Bi(+1 more)

Abstract:Lexicon-based constrained decoding approaches aim to control the meaning or style of the generated text through certain target concepts. Existing approaches over-focus the targets themselves, leading to a lack of high-level reasoning about how to achieve them. However, human usually tackles tasks by following certain rules that not only focuses on the targets but also on semantically relevant concepts that induce the occurrence of targets. In this work, we present DECIDER, a rule-controllable decoding strategy for constrained language generation inspired by dual-system cognitive theory. Specifically, in DECIDER, a pre-trained language model (PLM) is equiped with a logic reasoner that takes high-level rules as input. Then, the DECIDER allows rule signals to flow into the PLM at each decoding step. Extensive experimental results demonstrate that DECIDER can effectively follow given rules to guide generation direction toward the targets in a more human-like manner.

* Submitted to IEEE TKDE, 12 pages, 6 figures

Via

Access Paper or Ask Questions

STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Feb 02, 2024

Yi Chang, Zhao Ren, Zixing Zhang, Xin Jing, Kun Qian, Xi Shao, Bin Hu, Tanja Schultz, Björn W. Schuller

Figure 1 for STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Figure 2 for STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Figure 3 for STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Figure 4 for STAA-Net: A Sparse and Transferable Adversarial Attack for Speech Emotion Recognition

Abstract:Speech contains rich information on the emotions of humans, and Speech Emotion Recognition (SER) has been an important topic in the area of human-computer interaction. The robustness of SER models is crucial, particularly in privacy-sensitive and reliability-demanding domains like private healthcare. Recently, the vulnerability of deep neural networks in the audio domain to adversarial attacks has become a popular area of research. However, prior works on adversarial attacks in the audio domain primarily rely on iterative gradient-based techniques, which are time-consuming and prone to overfitting the specific threat model. Furthermore, the exploration of sparse perturbations, which have the potential for better stealthiness, remains limited in the audio domain. To address these challenges, we propose a generator-based attack method to generate sparse and transferable adversarial examples to deceive SER models in an end-to-end and efficient manner. We evaluate our method on two widely-used SER datasets, Database of Elicited Mood in Speech (DEMoS) and Interactive Emotional dyadic MOtion CAPture (IEMOCAP), and demonstrate its ability to generate successful sparse adversarial examples in an efficient manner. Moreover, our generated adversarial examples exhibit model-agnostic transferability, enabling effective adversarial attacks on advanced victim models.

Via

Access Paper or Ask Questions

Unicron: Economizing Self-Healing LLM Training at Scale

Dec 30, 2023

Tao He, Xue Li, Zhibin Wang, Kun Qian, Jingbo Xu, Wenyuan Yu, Jingren Zhou

Figure 1 for Unicron: Economizing Self-Healing LLM Training at Scale

Figure 2 for Unicron: Economizing Self-Healing LLM Training at Scale

Figure 3 for Unicron: Economizing Self-Healing LLM Training at Scale

Figure 4 for Unicron: Economizing Self-Healing LLM Training at Scale

Abstract:Training large-scale language models is increasingly critical in various domains, but it is hindered by frequent failures, leading to significant time and economic costs. Current failure recovery methods in cloud-based settings inadequately address the diverse and complex scenarios that arise, focusing narrowly on erasing downtime for individual tasks without considering the overall cost impact on a cluster. We introduce Unicron, a workload manager designed for efficient self-healing in large-scale language model training. Unicron optimizes the training process by minimizing failure-related costs across multiple concurrent tasks within a cluster. Its key features include in-band error detection for real-time error identification without extra overhead, a dynamic cost-aware plan generation mechanism for optimal reconfiguration, and an efficient transition strategy to reduce downtime during state changes. Deployed on a 128-GPU distributed cluster, Unicron demonstrates up to a 1.9x improvement in training efficiency over state-of-the-art methods, significantly reducing failure recovery costs and enhancing the reliability of large-scale language model training.

Via

Access Paper or Ask Questions

Enhancing Item-level Bundle Representation for Bundle Recommendation

Nov 28, 2023

Xiaoyu Du, Kun Qian, Yunshan Ma, Xinguang Xiang

Figure 1 for Enhancing Item-level Bundle Representation for Bundle Recommendation

Figure 2 for Enhancing Item-level Bundle Representation for Bundle Recommendation

Figure 3 for Enhancing Item-level Bundle Representation for Bundle Recommendation

Figure 4 for Enhancing Item-level Bundle Representation for Bundle Recommendation

Abstract:Bundle recommendation approaches offer users a set of related items on a particular topic. The current state-of-the-art (SOTA) method utilizes contrastive learning to learn representations at both the bundle and item levels. However, due to the inherent difference between the bundle-level and item-level preferences, the item-level representations may not receive sufficient information from the bundle affiliations to make accurate predictions. In this paper, we propose a novel approach EBRec, short of Enhanced Bundle Recommendation, which incorporates two enhanced modules to explore inherent item-level bundle representations. First, we propose to incorporate the bundle-user-item (B-U-I) high-order correlations to explore more collaborative information, thus to enhance the previous bundle representation that solely relies on the bundle-item affiliation information. Second, we further enhance the B-U-I correlations by augmenting the observed user-item interactions with interactions generated from pre-trained models, thus improving the item-level bundle representations. We conduct extensive experiments on three public datasets, and the results justify the effectiveness of our approach as well as the two core modules. Codes and datasets are available at https://github.com/answermycode/EBRec.

Via

Access Paper or Ask Questions