Alert button
Picture for Han Wang

Han Wang

Alert button

Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach

Sep 19, 2023
Leonardo F. Toso, Han Wang, James Anderson

We investigate the problem of learning an $\epsilon$-approximate solution for the discrete-time Linear Quadratic Regulator (LQR) problem via a Stochastic Variance-Reduced Policy Gradient (SVRPG) approach. Whilst policy gradient methods have proven to converge linearly to the optimal solution of the model-free LQR problem, the substantial requirement for two-point cost queries in gradient estimations may be intractable, particularly in applications where obtaining cost function evaluations at two distinct control input configurations is exceptionally costly. To this end, we propose an oracle-efficient approach. Our method combines both one-point and two-point estimations in a dual-loop variance-reduced algorithm. It achieves an approximate optimal solution with only $O\left(\log\left(1/\epsilon\right)^{\beta}\right)$ two-point cost information for $\beta \in (0,1)$.

Viaarxiv icon

Measuring and Mitigating Interference in Reinforcement Learning

Jul 10, 2023
Vincent Liu, Han Wang, Ruo Yu Tao, Khurram Javed, Adam White, Martha White

Figure 1 for Measuring and Mitigating Interference in Reinforcement Learning
Figure 2 for Measuring and Mitigating Interference in Reinforcement Learning
Figure 3 for Measuring and Mitigating Interference in Reinforcement Learning
Figure 4 for Measuring and Mitigating Interference in Reinforcement Learning

Catastrophic interference is common in many network-based learning systems, and many proposals exist for mitigating it. Before overcoming interference we must understand it better. In this work, we provide a definition and novel measure of interference for value-based reinforcement learning methods such as Fitted Q-Iteration and DQN. We systematically evaluate our measure of interference, showing that it correlates with instability in control performance, across a variety of network architectures. Our new interference measure allows us to ask novel scientific questions about commonly used deep learning architectures and study learning algorithms which mitigate interference. Lastly, we outline a class of algorithms which we call online-aware that are designed to mitigate interference, and show they do reduce interference according to our measure and that they improve stability and performance in several classic control environments.

* Published at Conference on Lifelong Learning Agents (CoLLAs) 2023 
Viaarxiv icon

CA-CentripetalNet: A novel anchor-free deep learning framework for hardhat wearing detection

Jul 09, 2023
Zhijian Liu, Nian Cai, Wensheng Ouyang, Chengbin Zhang, Nili Tian, Han Wang

Automatic hardhat wearing detection can strengthen the safety management in construction sites, which is still challenging due to complicated video surveillance scenes. To deal with the poor generalization of previous deep learning based methods, a novel anchor-free deep learning framework called CA-CentripetalNet is proposed for hardhat wearing detection. Two novel schemes are proposed to improve the feature extraction and utilization ability of CA-CentripetalNet, which are vertical-horizontal corner pooling and bounding constrained center attention. The former is designed to realize the comprehensive utilization of marginal features and internal features. The latter is designed to enforce the backbone to pay attention to internal features, which is only used during the training rather than during the detection. Experimental results indicate that the CA-CentripetalNet achieves better performance with the 86.63% mAP (mean Average Precision) with less memory consumption at a reasonable speed than the existing deep learning based methods, especially in case of small-scale hardhats and non-worn-hardhats.

* Signal, Image and Video Processing,2023  
* It has been accepted for the journal of Signal, Image and Video Processing, which is a complete version. It is noted that it has been deleted for future publishing 
Viaarxiv icon

A Joint Design for Full-duplex OFDM AF Relay System with Precoded Short Guard Interval

Jul 07, 2023
Pu Yang, Xiang-Gen Xia, Qingyue Qu, Han Wang, Yi Liu

Figure 1 for A Joint Design for Full-duplex OFDM AF Relay System with Precoded Short Guard Interval
Figure 2 for A Joint Design for Full-duplex OFDM AF Relay System with Precoded Short Guard Interval
Figure 3 for A Joint Design for Full-duplex OFDM AF Relay System with Precoded Short Guard Interval
Figure 4 for A Joint Design for Full-duplex OFDM AF Relay System with Precoded Short Guard Interval

In-band full-duplex relay (FDR) has attracted much attention as an effective solution to improve the coverage and spectral efficiency in wireless communication networks. The basic problem for FDR transmission is how to eliminate the inherent self-interference and re-use the residual self-interference (RSI) at the relay to improve the end-to-end performance. Considering the RSI at the FDR, the overall equivalent channel can be modeled as an infinite impulse response (IIR) channel. For this IIR channel, a joint design for precoding, power gain control and equalization of cooperative OFDM relay systems is presented. Compared with the traditional OFDM systems, the length of the guard interval for the proposed design can be distinctly reduced, thereby improving the spectral efficiency. By analyzing the noise sources, this paper evaluates the signal to noise ratio (SNR) of the proposed scheme and presents a power gain control algorithm at the FDR. Compared with the existing schemes, the proposed scheme shows a superior bit error rate (BER) performance.

* 16 pages, 5 figures 
Viaarxiv icon

Clickbait Detection via Large Language Models

Jun 16, 2023
Yi Zhu, Han Wang, Ye Wang, Yun Li, Yunhao Yuan, Jipeng Qiang

Figure 1 for Clickbait Detection via Large Language Models
Figure 2 for Clickbait Detection via Large Language Models
Figure 3 for Clickbait Detection via Large Language Models
Figure 4 for Clickbait Detection via Large Language Models

Clickbait, which aims to induce users with some surprising and even thrilling headlines for increasing click-through rates, permeates almost all online content publishers, such as news portals and social media. Recently, Large Language Models (LLMs) have emerged as a powerful instrument and achieved tremendous success in a serious of NLP downstream tasks. However, it is not yet known whether LLMs can be served as a high-quality clickbait detection system. In this paper, we analyze the performance of LLMs in the few-shot scenarios on a number of English and Chinese benchmark datasets. Experimental results show that LLMs cannot achieve the best results compared to the state-of-the-art deep and fine-tuning PLMs methods. Different from the human intuition, the experiments demonstrated that LLMs cannot make satisfied clickbait detection just by the headlines.

Viaarxiv icon

MCPI: Integrating Multimodal Data for Enhanced Prediction of Compound Protein Interactions

Jun 15, 2023
Li Zhang, Wenhao Li, Haotian Guan, Zhiquan He, Mingjun Cheng, Han Wang

Figure 1 for MCPI: Integrating Multimodal Data for Enhanced Prediction of Compound Protein Interactions
Figure 2 for MCPI: Integrating Multimodal Data for Enhanced Prediction of Compound Protein Interactions
Figure 3 for MCPI: Integrating Multimodal Data for Enhanced Prediction of Compound Protein Interactions
Figure 4 for MCPI: Integrating Multimodal Data for Enhanced Prediction of Compound Protein Interactions

The identification of compound-protein interactions (CPI) plays a critical role in drug screening, drug repurposing, and combination therapy studies. The effectiveness of CPI prediction relies heavily on the features extracted from both compounds and target proteins. While various prediction methods employ different feature combinations, both molecular-based and network-based models encounter the common obstacle of incomplete feature representations. Thus, a promising solution to this issue is to fully integrate all relevant CPI features. This study proposed a novel model named MCPI, which is designed to improve the prediction performance of CPI by integrating multiple sources of information, including the PPI network, CCI network, and structural features of CPI. The results of the study indicate that the MCPI model outperformed other existing methods for predicting CPI on public datasets. Furthermore, the study has practical implications for drug development, as the model was applied to search for potential inhibitors among FDA-approved drugs in response to the SARS-CoV-2 pandemic. The prediction results were then validated through the literature, suggesting that the MCPI model could be a useful tool for identifying potential drug candidates. Overall, this study has the potential to advance our understanding of CPI and guide drug development efforts.

* 12 pages, 9 figures 
Viaarxiv icon

Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Jun 08, 2023
Peng Cheng, Xianyuan Zhan, Zhihao Wu, Wenjia Zhang, Shoucheng Song, Han Wang, Youfang Lin, Li Jiang

Figure 1 for Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
Figure 2 for Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
Figure 3 for Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL
Figure 4 for Look Beneath the Surface: Exploiting Fundamental Symmetry for Sample-Efficient Offline RL

Offline reinforcement learning (RL) offers an appealing approach to real-world tasks by learning policies from pre-collected datasets without interacting with the environment. However, the performance of existing offline RL algorithms heavily depends on the scale and state-action space coverage of datasets. Real-world data collection is often expensive and uncontrollable, leading to small and narrowly covered datasets and posing significant challenges for practical deployments of offline RL. In this paper, we provide a new insight that leveraging the fundamental symmetry of system dynamics can substantially enhance offline RL performance under small datasets. Specifically, we propose a Time-reversal symmetry (T-symmetry) enforced Dynamics Model (TDM), which establishes consistency between a pair of forward and reverse latent dynamics. TDM provides both well-behaved representations for small datasets and a new reliability measure for OOD samples based on compliance with the T-symmetry. These can be readily used to construct a new offline RL algorithm (TSRL) with less conservative policy constraints and a reliable latent space data augmentation procedure. Based on extensive experiments, we find TSRL achieves great performance on small benchmark datasets with as few as 1% of the original samples, which significantly outperforms the recent offline RL algorithms in terms of data efficiency and generalizability.

* The first two authors contributed equally 
Viaarxiv icon

Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

May 28, 2023
Han Wang, Ming Shan Hee, Md Rabiul Awal, Kenny Tsu Wei Choo, Roy Ka-Wei Lee

Figure 1 for Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
Figure 2 for Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
Figure 3 for Evaluating GPT-3 Generated Explanations for Hateful Content Moderation
Figure 4 for Evaluating GPT-3 Generated Explanations for Hateful Content Moderation

Recent research has focused on using large language models (LLMs) to generate explanations for hate speech through fine-tuning or prompting. Despite the growing interest in this area, these generated explanations' effectiveness and potential limitations remain poorly understood. A key concern is that these explanations, generated by LLMs, may lead to erroneous judgments about the nature of flagged content by both users and content moderators. For instance, an LLM-generated explanation might inaccurately convince a content moderator that a benign piece of content is hateful. In light of this, we propose an analytical framework for examining hate speech explanations and conducted an extensive survey on evaluating such explanations. Specifically, we prompted GPT-3 to generate explanations for both hateful and non-hateful content, and a survey was conducted with 2,400 unique respondents to evaluate the generated explanations. Our findings reveal that (1) human evaluators rated the GPT-generated explanations as high quality in terms of linguistic fluency, informativeness, persuasiveness, and logical soundness, (2) the persuasive nature of these explanations, however, varied depending on the prompting strategy employed, and (3) this persuasiveness may result in incorrect judgments about the hatefulness of the content. Our study underscores the need for caution in applying LLM-generated explanations for content moderation. Code and results are available at https://github.com/Social-AI-Studio/GPT3-HateEval.

* 9 pages, 2 figures, Accepted by International Joint Conference on Artificial Intelligence(IJCAI) 
Viaarxiv icon

MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Apr 25, 2023
Han Wang, Jiayuan Zhang, Lipeng Wan, Xingyu Chen, Xuguang Lan, Nanning Zheng

Figure 1 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes
Figure 2 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes
Figure 3 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes
Figure 4 for MMRDN: Consistent Representation for Multi-View Manipulation Relationship Detection in Object-Stacked Scenes

Manipulation relationship detection (MRD) aims to guide the robot to grasp objects in the right order, which is important to ensure the safety and reliability of grasping in object stacked scenes. Previous works infer manipulation relationship by deep neural network trained with data collected from a predefined view, which has limitation in visual dislocation in unstructured environments. Multi-view data provide more comprehensive information in space, while a challenge of multi-view MRD is domain shift. In this paper, we propose a novel multi-view fusion framework, namely multi-view MRD network (MMRDN), which is trained by 2D and 3D multi-view data. We project the 2D data from different views into a common hidden space and fit the embeddings with a set of Von-Mises-Fisher distributions to learn the consistent representations. Besides, taking advantage of position information within the 3D data, we select a set of $K$ Maximum Vertical Neighbors (KMVN) points from the point cloud of each object pair, which encodes the relative position of these two objects. Finally, the features of multi-view 2D and 3D data are concatenated to predict the pairwise relationship of objects. Experimental results on the challenging REGRAD dataset show that MMRDN outperforms the state-of-the-art methods in multi-view MRD tasks. The results also demonstrate that our model trained by synthetic data is capable to transfer to real-world scenarios.

Viaarxiv icon

Side Channel-Assisted Inference Leakage from Machine Learning-based ECG Classification

Apr 04, 2023
Jialin Liu, Ning Miao, Chongzhou Fang, Houman Homayoun, Han Wang

Figure 1 for Side Channel-Assisted Inference Leakage from Machine Learning-based ECG Classification
Figure 2 for Side Channel-Assisted Inference Leakage from Machine Learning-based ECG Classification
Figure 3 for Side Channel-Assisted Inference Leakage from Machine Learning-based ECG Classification
Figure 4 for Side Channel-Assisted Inference Leakage from Machine Learning-based ECG Classification

The Electrocardiogram (ECG) measures the electrical cardiac activity generated by the heart to detect abnormal heartbeat and heart attack. However, the irregular occurrence of the abnormalities demands continuous monitoring of heartbeats. Machine learning techniques are leveraged to automate the task to reduce labor work needed during monitoring. In recent years, many companies have launched products with ECG monitoring and irregular heartbeat alert. Among all classification algorithms, the time series-based algorithm dynamic time warping (DTW) is widely adopted to undertake the ECG classification task. Though progress has been achieved, the DTW-based ECG classification also brings a new attacking vector of leaking the patients' diagnosis results. This paper shows that the ECG input samples' labels can be stolen via a side-channel attack, Flush+Reload. In particular, we first identify the vulnerability of DTW for ECG classification, i.e., the correlation between warping path choice and prediction results. Then we implement an attack that leverages Flush+Reload to monitor the warping path selection with known ECG data and then build a predictor for constructing the relation between warping path selection and labels of input ECG samples. Based on experiments, we find that the Flush+Reload-based inference leakage can achieve an 84.0\% attacking success rate to identify the labels of the two samples in DTW.

Viaarxiv icon