Alert button
Picture for Yifan Hu

Yifan Hu

Alert button

Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Sep 05, 2023
Shyam Sundhar Ramesh, Pier Giuseppe Sessa, Yifan Hu, Andreas Krause, Ilija Bogunovic

Figure 1 for Distributionally Robust Model-based Reinforcement Learning with Large State Spaces
Figure 2 for Distributionally Robust Model-based Reinforcement Learning with Large State Spaces
Figure 3 for Distributionally Robust Model-based Reinforcement Learning with Large State Spaces
Figure 4 for Distributionally Robust Model-based Reinforcement Learning with Large State Spaces

Three major challenges in reinforcement learning are the complex dynamical systems with large state spaces, the costly data acquisition processes, and the deviation of real-world dynamics from the training environment deployment. To overcome these issues, we study distributionally robust Markov decision processes with continuous state spaces under the widely used Kullback-Leibler, chi-square, and total variation uncertainty sets. We propose a model-based approach that utilizes Gaussian Processes and the maximum variance reduction algorithm to efficiently learn multi-output nominal transition dynamics, leveraging access to a generative model (i.e., simulator). We further demonstrate the statistical sample complexity of the proposed method for different uncertainty sets. These complexity bounds are independent of the number of states and extend beyond linear dynamics, ensuring the effectiveness of our approach in identifying near-optimal distributionally-robust policies. The proposed method can be further combined with other model-free distributionally robust reinforcement learning methods to obtain a near-optimal robust policy. Experimental results demonstrate the robustness of our algorithm to distributional shifts and its superior performance in terms of the number of samples needed.

Viaarxiv icon

Deep Directly-Trained Spiking Neural Networks for Object Detection

Jul 27, 2023
Qiaoyi Su, Yuhong Chou, Yifan Hu, Jianing Li, Shijie Mei, Ziyang Zhang, Guoqi Li

Figure 1 for Deep Directly-Trained Spiking Neural Networks for Object Detection
Figure 2 for Deep Directly-Trained Spiking Neural Networks for Object Detection
Figure 3 for Deep Directly-Trained Spiking Neural Networks for Object Detection
Figure 4 for Deep Directly-Trained Spiking Neural Networks for Object Detection

Spiking neural networks (SNNs) are brain-inspired energy-efficient models that encode information in spatiotemporal dynamics. Recently, deep SNNs trained directly have shown great success in achieving high performance on classification tasks with very few time steps. However, how to design a directly-trained SNN for the regression task of object detection still remains a challenging problem. To address this problem, we propose EMS-YOLO, a novel directly-trained SNN framework for object detection, which is the first trial to train a deep SNN with surrogate gradients for object detection rather than ANN-SNN conversion strategies. Specifically, we design a full-spike residual block, EMS-ResNet, which can effectively extend the depth of the directly-trained SNN with low power consumption. Furthermore, we theoretically analyze and prove the EMS-ResNet could avoid gradient vanishing or exploding. The results demonstrate that our approach outperforms the state-of-the-art ANN-SNN conversion methods (at least 500 time steps) in extremely fewer time steps (only 4 time steps). It is shown that our model could achieve comparable performance to the ANN with the same architecture while consuming 5.83 times less energy on the frame-based COCO Dataset and the event-based Gen1 Dataset.

* Accepted by ICCV2023 
Viaarxiv icon

Deep-Learning Tool for Early Identifying Non-Traumatic Intracranial Hemorrhage Etiology based on CT Scan

Feb 02, 2023
Meng Zhao, Yifan Hu, Ruixuan Jiang, Yuanli Zhao, Dong Zhang, Yan Zhang, Rong Wang, Yong Cao, Qian Zhang, Yonggang Ma, Jiaxi Li, Shaochen Yu, Wenjie Li, Ran Zhang, Yefeng Zheng, Shuo Wang, Jizong Zhao

Figure 1 for Deep-Learning Tool for Early Identifying Non-Traumatic Intracranial Hemorrhage Etiology based on CT Scan
Figure 2 for Deep-Learning Tool for Early Identifying Non-Traumatic Intracranial Hemorrhage Etiology based on CT Scan
Figure 3 for Deep-Learning Tool for Early Identifying Non-Traumatic Intracranial Hemorrhage Etiology based on CT Scan
Figure 4 for Deep-Learning Tool for Early Identifying Non-Traumatic Intracranial Hemorrhage Etiology based on CT Scan

Background: To develop an artificial intelligence system that can accurately identify acute non-traumatic intracranial hemorrhage (ICH) etiology based on non-contrast CT (NCCT) scans and investigate whether clinicians can benefit from it in a diagnostic setting. Materials and Methods: The deep learning model was developed with 1868 eligible NCCT scans with non-traumatic ICH collected between January 2011 and April 2018. We tested the model on two independent datasets (TT200 and SD 98) collected after April 2018. The model's diagnostic performance was compared with clinicians's performance. We further designed a simulated study to compare the clinicians's performance with and without the deep learning system augmentation. Results: The proposed deep learning system achieved area under the receiver operating curve of 0.986 (95% CI 0.967-1.000) on aneurysms, 0.952 (0.917-0.987) on hypertensive hemorrhage, 0.950 (0.860-1.000) on arteriovenous malformation (AVM), 0.749 (0.586-0.912) on Moyamoya disease (MMD), 0.837 (0.704-0.969) on cavernous malformation (CM), and 0.839 (0.722-0.959) on other causes in TT200 dataset. Given a 90% specificity level, the sensitivities of our model were 97.1% and 90.9% for aneurysm and AVM diagnosis, respectively. The model also shows an impressive generalizability in an independent dataset SD98. The clinicians achieve significant improvements in the sensitivity, specificity, and accuracy of diagnoses of certain hemorrhage etiologies with proposed system augmentation. Conclusions: The proposed deep learning algorithms can be an effective tool for early identification of hemorrhage etiologies based on NCCT scans. It may also provide more information for clinicians for triage and further imaging examination selection.

Viaarxiv icon

CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild

Jan 16, 2023
Thai Le, Ye Yiran, Yifan Hu, Dongwon Lee

Figure 1 for CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild
Figure 2 for CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild
Figure 3 for CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild
Figure 4 for CRYPTEXT: Database and Interactive Toolkit of Human-Written Text Perturbations in the Wild

User-generated textual contents on the Internet are often noisy, erroneous, and not in correct forms in grammar. In fact, some online users choose to express their opinions online through carefully perturbed texts, especially in controversial topics (e.g., politics, vaccine mandate) or abusive contexts (e.g., cyberbullying, hate-speech). However, to the best of our knowledge, there is no framework that explores these online ``human-written" perturbations (as opposed to algorithm-generated perturbations). Therefore, we introduce an interactive system called CRYPTEXT. CRYPTEXT is a data-intensive application that provides the users with a database and several tools to extract and interact with human-written perturbations. Specifically, CRYPTEXT helps look up, perturb, and normalize (i.e., de-perturb) texts. CRYPTEXT also provides an interactive interface to monitor and analyze text perturbations online. A short demo video is available at: https://youtu.be/8WT3G8xjIoI

* Accepted to IEEE International Conference on Data Engineering 2023 
Viaarxiv icon

Deep-Reinforcement-Learning-based Path Planning for Industrial Robots using Distance Sensors as Observation

Jan 14, 2023
Teham Bhuiyan, Linh Kästner, Yifan Hu, Benno Kutschank, Jens Lambrecht

Figure 1 for Deep-Reinforcement-Learning-based Path Planning for Industrial Robots using Distance Sensors as Observation
Figure 2 for Deep-Reinforcement-Learning-based Path Planning for Industrial Robots using Distance Sensors as Observation
Figure 3 for Deep-Reinforcement-Learning-based Path Planning for Industrial Robots using Distance Sensors as Observation
Figure 4 for Deep-Reinforcement-Learning-based Path Planning for Industrial Robots using Distance Sensors as Observation

Industrial robots are widely used in various manufacturing environments due to their efficiency in doing repetitive tasks such as assembly or welding. A common problem for these applications is to reach a destination without colliding with obstacles or other robot arms. Commonly used sampling-based path planning approaches such as RRT require long computation times, especially in complex environments. Furthermore, the environment in which they are employed needs to be known beforehand. When utilizing the approaches in new environments, a tedious engineering effort in setting hyperparameters needs to be conducted, which is time- and cost-intensive. On the other hand, Deep Reinforcement Learning has shown remarkable results in dealing with unknown environments, generalizing new problem instances, and solving motion planning problems efficiently. On that account, this paper proposes a Deep-Reinforcement-Learning-based motion planner for robotic manipulators. We evaluated our model against state-of-the-art sampling-based planners in several experiments. The results show the superiority of our planner in terms of path length and execution time.

* 9 pages, 10 figures 
Viaarxiv icon

FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis

Oct 27, 2022
Yifan Hu, Rui Liu, Guanglai Gao, Haizhou Li

Figure 1 for FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Figure 2 for FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Figure 3 for FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis
Figure 4 for FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis

Conversational Text-to-Speech (TTS) aims to synthesis an utterance with the right linguistic and affective prosody in a conversational context. The correlation between the current utterance and the dialogue history at the utterance level was used to improve the expressiveness of synthesized speech. However, the fine-grained information in the dialogue history at the word level also has an important impact on the prosodic expression of an utterance, which has not been well studied in the prior work. Therefore, we propose a novel expressive conversational TTS model, termed as FCTalker, that learn the fine and coarse grained context dependency at the same time during speech generation. Specifically, the FCTalker includes fine and coarse grained encoders to exploit the word and utterance-level context dependency. To model the word-level dependencies between an utterance and its dialogue history, the fine-grained dialogue encoder is built on top of a dialogue BERT model. The experimental results show that the proposed method outperforms all baselines and generates more expressive speech that is contextually appropriate. We release the source code at: https://github.com/walker-hyf/FCTalker.

* 5 pages, 4 figures, 1 table. Submitted to ICASSP 2023. We release the source code at: https://github.com/walker-hyf/FCTalker 
Viaarxiv icon

Attention Spiking Neural Networks

Sep 28, 2022
Man Yao, Guangshe Zhao, Hengyu Zhang, Yifan Hu, Lei Deng, Yonghong Tian, Bo Xu, Guoqi Li

Figure 1 for Attention Spiking Neural Networks
Figure 2 for Attention Spiking Neural Networks
Figure 3 for Attention Spiking Neural Networks
Figure 4 for Attention Spiking Neural Networks

Benefiting from the event-driven and sparse spiking characteristics of the brain, spiking neural networks (SNNs) are becoming an energy-efficient alternative to artificial neural networks (ANNs). However, the performance gap between SNNs and ANNs has been a great hindrance to deploying SNNs ubiquitously for a long time. To leverage the full potential of SNNs, we study the effect of attention mechanisms in SNNs. We first present our idea of attention with a plug-and-play kit, termed the Multi-dimensional Attention (MA). Then, a new attention SNN architecture with end-to-end training called "MA-SNN" is proposed, which infers attention weights along the temporal, channel, as well as spatial dimensions separately or simultaneously. Based on the existing neuroscience theories, we exploit the attention weights to optimize membrane potentials, which in turn regulate the spiking response in a data-dependent way. At the cost of negligible additional parameters, MA facilitates vanilla SNNs to achieve sparser spiking activity, better performance, and energy efficiency concurrently. Experiments are conducted in event-based DVS128 Gesture/Gait action recognition and ImageNet-1k image classification. On Gesture/Gait, the spike counts are reduced by 84.9%/81.6%, and the task accuracy and energy efficiency are improved by 5.9%/4.7% and 3.4$\times$/3.2$\times$. On ImageNet-1K, we achieve top-1 accuracy of 75.92% and 77.08% on single/4-step Res-SNN-104, which are state-of-the-art results in SNNs. To our best knowledge, this is for the first time, that the SNN community achieves comparable or even better performance compared with its ANN counterpart in the large-scale dataset. Our work lights up SNN's potential as a general backbone to support various applications for SNNs, with a great balance between effectiveness and efficiency.

* 18 pages, 8 figures, Under Review 
Viaarxiv icon

MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

Sep 22, 2022
Yifan Hu, Pengkai Yin, Rui Liu, Feilong Bao, Guanglai Gao

Figure 1 for MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Figure 2 for MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Figure 3 for MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Figure 4 for MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline

This paper introduces a high-quality open-source text-to-speech (TTS) synthesis dataset for Mongolian, a low-resource language spoken by over 10 million people worldwide. The dataset, named MnTTS, consists of about 8 hours of transcribed audio recordings spoken by a 22-year-old professional female Mongolian announcer. It is the first publicly available dataset developed to promote Mongolian TTS applications in both academia and industry. In this paper, we share our experience by describing the dataset development procedures and faced challenges. To demonstrate the reliability of our dataset, we built a powerful non-autoregressive baseline system based on FastSpeech2 model and HiFi-GAN vocoder, and evaluated it using the subjective mean opinion score (MOS) and real time factor (RTF) metrics. Evaluation results show that the powerful baseline system trained on our dataset achieves MOS above 4 and RTF about $3.30\times10^{-1}$, which makes it applicable for practical use. The dataset, training recipe, and pretrained TTS models are freely available \footnote{\label{github}\url{https://github.com/walker-hyf/MnTTS}}.

* Accepted at the 2022 International Conference on Asian Language Processing (IALP2022) 
Viaarxiv icon

BSDGAN: Balancing Sensor Data Generative Adversarial Networks for Human Activity Recognition

Aug 07, 2022
Yifan Hu, Yu Wang

Figure 1 for BSDGAN: Balancing Sensor Data Generative Adversarial Networks for Human Activity Recognition
Figure 2 for BSDGAN: Balancing Sensor Data Generative Adversarial Networks for Human Activity Recognition
Figure 3 for BSDGAN: Balancing Sensor Data Generative Adversarial Networks for Human Activity Recognition
Figure 4 for BSDGAN: Balancing Sensor Data Generative Adversarial Networks for Human Activity Recognition

The development of IoT technology enables a variety of sensors can be integrated into mobile devices. Human Activity Recognition (HAR) based on sensor data has become an active research topic in the field of machine learning and ubiquitous computing. However, due to the inconsistent frequency of human activities, the amount of data for each activity in the human activity dataset is imbalanced. Considering the limited sensor resources and the high cost of manually labeled sensor data, human activity recognition is facing the challenge of highly imbalanced activity datasets. In this paper, we propose Balancing Sensor Data Generative Adversarial Networks (BSDGAN) to generate sensor data for minority human activities. The proposed BSDGAN consists of a generator model and a discriminator model. Considering the extreme imbalance of human activity dataset, an autoencoder is employed to initialize the training process of BSDGAN, ensure the data features of each activity can be learned. The generated activity data is combined with the original dataset to balance the amount of activity data across human activity classes. We deployed multiple human activity recognition models on two publicly available imbalanced human activity datasets, WISDM and UNIMIB. Experimental results show that the proposed BSDGAN can effectively capture the data features of real human activity sensor data, and generate realistic synthetic sensor data. Meanwhile, the balanced activity dataset can effectively help the activity recognition model to improve the recognition accuracy.

Viaarxiv icon