Alert button
Picture for Huan Chen

Huan Chen

Alert button

Spectral-wise Implicit Neural Representation for Hyperspectral Image Reconstruction

Dec 02, 2023
Huan Chen, Wangcai Zhao, Tingfa Xu, Shiyun Zhou, Peifu Liu, Jianan Li

Coded Aperture Snapshot Spectral Imaging (CASSI) reconstruction aims to recover the 3D spatial-spectral signal from 2D measurement. Existing methods for reconstructing Hyperspectral Image (HSI) typically involve learning mappings from a 2D compressed image to a predetermined set of discrete spectral bands. However, this approach overlooks the inherent continuity of the spectral information. In this study, we propose an innovative method called Spectral-wise Implicit Neural Representation (SINR) as a pioneering step toward addressing this limitation. SINR introduces a continuous spectral amplification process for HSI reconstruction, enabling spectral super-resolution with customizable magnification factors. To achieve this, we leverage the concept of implicit neural representation. Specifically, our approach introduces a spectral-wise attention mechanism that treats individual channels as distinct tokens, thereby capturing global spectral dependencies. Additionally, our approach incorporates two components, namely a Fourier coordinate encoder and a spectral scale factor module. The Fourier coordinate encoder enhances the SINR's ability to emphasize high-frequency components, while the spectral scale factor module guides the SINR to adapt to the variable number of spectral channels. Notably, the SINR framework enhances the flexibility of CASSI reconstruction by accommodating an unlimited number of spectral bands in the desired output. Extensive experiments demonstrate that our SINR outperforms baseline methods. By enabling continuous reconstruction within the CASSI framework, we take the initial stride toward integrating implicit neural representation into the field.

* Accepted by IEEE Transactions on Circuits and Systems for Video Technology, to be published 
Viaarxiv icon

Spectrum-driven Mixed-frequency Network for Hyperspectral Salient Object Detection

Dec 02, 2023
Peifu Liu, Tingfa Xu, Huan Chen, Shiyun Zhou, Haolin Qin, Jianan Li

Hyperspectral salient object detection (HSOD) aims to detect spectrally salient objects in hyperspectral images (HSIs). However, existing methods inadequately utilize spectral information by either converting HSIs into false-color images or converging neural networks with clustering. We propose a novel approach that fully leverages the spectral characteristics by extracting two distinct frequency components from the spectrum: low-frequency Spectral Saliency and high-frequency Spectral Edge. The Spectral Saliency approximates the region of salient objects, while the Spectral Edge captures edge information of salient objects. These two complementary components, crucial for HSOD, are derived by computing from the inter-layer spectral angular distance of the Gaussian pyramid and the intra-neighborhood spectral angular gradients, respectively. To effectively utilize this dual-frequency information, we introduce a novel lightweight Spectrum-driven Mixed-frequency Network (SMN). SMN incorporates two parameter-free plug-and-play operators, namely Spectral Saliency Generator and Spectral Edge Operator, to extract the Spectral Saliency and Spectral Edge components from the input HSI independently. Subsequently, the Mixed-frequency Attention module, comprised of two frequency-dependent heads, intelligently combines the embedded features of edge and saliency information, resulting in a mixed-frequency feature representation. Furthermore, a saliency-edge-aware decoder progressively scales up the mixed-frequency feature while preserving rich detail and saliency information for accurate salient object prediction. Extensive experiments conducted on the HS-SOD benchmark and our custom dataset HSOD-BIT demonstrate that our SMN outperforms state-of-the-art methods regarding HSOD performance. Code and dataset will be available at https://github.com/laprf/SMN.

* Accepted by IEEE Transactions on Multimedia, to be published 
Viaarxiv icon

Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

Oct 17, 2023
Chong Zhang, Ya Guo, Yi Tu, Huan Chen, Jinyang Tang, Huijia Zhu, Qi Zhang, Tao Gui

Figure 1 for Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Figure 2 for Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Figure 3 for Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction
Figure 4 for Reading Order Matters: Information Extraction from Visually-rich Documents by Token Path Prediction

Recent advances in multimodal pre-trained models have significantly improved information extraction from visually-rich documents (VrDs), in which named entity recognition (NER) is treated as a sequence-labeling task of predicting the BIO entity tags for tokens, following the typical setting of NLP. However, BIO-tagging scheme relies on the correct order of model inputs, which is not guaranteed in real-world NER on scanned VrDs where text are recognized and arranged by OCR systems. Such reading order issue hinders the accurate marking of entities by BIO-tagging scheme, making it impossible for sequence-labeling methods to predict correct named entities. To address the reading order issue, we introduce Token Path Prediction (TPP), a simple prediction head to predict entity mentions as token sequences within documents. Alternative to token classification, TPP models the document layout as a complete directed graph of tokens, and predicts token paths within the graph as entities. For better evaluation of VrD-NER systems, we also propose two revised benchmark datasets of NER on scanned documents which can reflect real-world scenarios. Experiment results demonstrate the effectiveness of our method, and suggest its potential to be a universal solution to various information extraction tasks on documents.

* Accepted as a long paper in the main conference of EMNLP 2023 
Viaarxiv icon

Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

Jul 11, 2023
Ghanshyam Verma, Shovon Sengupta, Simon Simanta, Huan Chen, Janos A. Perge, Devishree Pillai, John P. McCrae, Paul Buitelaar

Figure 1 for Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning
Figure 2 for Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning
Figure 3 for Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning
Figure 4 for Empowering recommender systems using automatically generated Knowledge Graphs and Reinforcement Learning

Personalized recommendations have a growing importance in direct marketing, which motivates research to enhance customer experiences by knowledge graph (KG) applications. For example, in financial services, companies may benefit from providing relevant financial articles to their customers to cultivate relationships, foster client engagement and promote informed financial decisions. While several approaches center on KG-based recommender systems for improved content, in this study we focus on interpretable KG-based recommender systems for decision making.To this end, we present two knowledge graph-based approaches for personalized article recommendations for a set of customers of a large multinational financial services company. The first approach employs Reinforcement Learning and the second approach uses the XGBoost algorithm for recommending articles to the customers. Both approaches make use of a KG generated from both structured (tabular data) and unstructured data (a large body of text data).Using the Reinforcement Learning-based recommender system we could leverage the graph traversal path leading to the recommendation as a way to generate interpretations (Path Directed Reasoning (PDR)). In the XGBoost-based approach, one can also provide explainable results using post-hoc methods such as SHAP (SHapley Additive exPlanations) and ELI5 (Explain Like I am Five).Importantly, our approach offers explainable results, promoting better decision-making. This study underscores the potential of combining advanced machine learning techniques with KG-driven insights to bolster experience in customer relationship management.

* Accepted at KDD (OARS) 2023 [https://oars-workshop.github.io/] 
Viaarxiv icon

LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

Jun 09, 2023
Yi Tu, Ya Guo, Huan Chen, Jinyang Tang

Figure 1 for LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Figure 2 for LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Figure 3 for LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding
Figure 4 for LayoutMask: Enhance Text-Layout Interaction in Multi-modal Pre-training for Document Understanding

Visually-rich Document Understanding (VrDU) has attracted much research attention over the past years. Pre-trained models on a large number of document images with transformer-based backbones have led to significant performance gains in this field. The major challenge is how to fusion the different modalities (text, layout, and image) of the documents in a unified model with different pre-training tasks. This paper focuses on improving text-layout interactions and proposes a novel multi-modal pre-training model, LayoutMask. LayoutMask uses local 1D position, instead of global 1D position, as layout input and has two pre-training objectives: (1) Masked Language Modeling: predicting masked tokens with two novel masking strategies; (2) Masked Position Modeling: predicting masked 2D positions to improve layout representation learning. LayoutMask can enhance the interactions between text and layout modalities in a unified model and produce adaptive and robust multi-modal representations for downstream tasks. Experimental results show that our proposed method can achieve state-of-the-art results on a wide variety of VrDU problems, including form understanding, receipt understanding, and document image classification.

* Accepted by ACL 2023 main conference 
Viaarxiv icon

Cultural-aware Machine Learning based Analysis of COVID-19 Vaccine Hesitancy

Apr 14, 2023
Raed Alharbi, Sylvia Chan-Olmsted, Huan Chen, My T. Thai

Figure 1 for Cultural-aware Machine Learning based Analysis of COVID-19 Vaccine Hesitancy
Figure 2 for Cultural-aware Machine Learning based Analysis of COVID-19 Vaccine Hesitancy
Figure 3 for Cultural-aware Machine Learning based Analysis of COVID-19 Vaccine Hesitancy
Figure 4 for Cultural-aware Machine Learning based Analysis of COVID-19 Vaccine Hesitancy

Understanding the COVID-19 vaccine hesitancy, such as who and why, is very crucial since a large-scale vaccine adoption remains as one of the most efficient methods of controlling the pandemic. Such an understanding also provides insights into designing successful vaccination campaigns for future pandemics. Unfortunately, there are many factors involving in deciding whether to take the vaccine, especially from the cultural point of view. To obtain these goals, we design a novel culture-aware machine learning (ML) model, based on our new data collection, for predicting vaccination willingness. We further analyze the most important features which contribute to the ML model's predictions using advanced AI explainers such as the Probabilistic Graphical Model (PGM) and Shapley Additive Explanations (SHAP). These analyses reveal the key factors that most likely impact the vaccine adoption decisions. Our findings show that Hispanic and African American are most likely impacted by cultural characteristics such as religions and ethnic affiliation, whereas the vaccine trust and approval influence the Asian communities the most. Our results also show that cultural characteristics, rumors, and political affiliation are associated with increased vaccine rejection.

* 6 pages, 5 figures 
Viaarxiv icon

NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System

Sep 27, 2022
Weiqiang Wang, Xuefei Zhe, Huan Chen, Di Kang, Tingguang Li, Ruizhi Chen, Linchao Bao

Figure 1 for NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System
Figure 2 for NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System
Figure 3 for NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System
Figure 4 for NEURAL MARIONETTE: A Transformer-based Multi-action Human Motion Synthesis System

We present a neural network-based system for long-term, multi-action human motion synthesis. The system, dubbed as NEURAL MARIONETTE, can produce high-quality and meaningful motions with smooth transitions from simple user input, including a sequence of action tags with expected action duration, and optionally a hand-drawn moving trajectory if the user specifies. The core of our system is a novel Transformer-based motion generation model, namely MARIONET, which can generate diverse motions given action tags. Different from existing motion generation models, MARIONET utilizes contextual information from the past motion clip and future action tag, dedicated to generating actions that can smoothly blend historical and future actions. Specifically, MARIONET first encodes target action tag and contextual information into an action-level latent code. The code is unfolded into frame-level control signals via a time unrolling module, which could be then combined with other frame-level control signals like the target trajectory. Motion frames are then generated in an auto-regressive way. By sequentially applying MARIONET, the system NEURAL MARIONETTE can robustly generate long-term, multi-action motions with the help of two simple schemes, namely "Shadow Start" and "Action Revision". Along with the novel system, we also present a new dataset dedicated to the multi-action motion synthesis task, which contains both action tags and their contextual information. Extensive experiments are conducted to study the action accuracy, naturalism, and transition smoothness of the motions generated by our system.

Viaarxiv icon

Toward An Optimal Selection of Dialogue Strategies: A Target-Driven Approach for Intelligent Outbound Robots

Jun 22, 2022
Ruifeng Qian, Shijie Li, Mengjiao Bao, Huan Chen, Yu Che

Figure 1 for Toward An Optimal Selection of Dialogue Strategies: A Target-Driven Approach for Intelligent Outbound Robots
Figure 2 for Toward An Optimal Selection of Dialogue Strategies: A Target-Driven Approach for Intelligent Outbound Robots
Figure 3 for Toward An Optimal Selection of Dialogue Strategies: A Target-Driven Approach for Intelligent Outbound Robots
Figure 4 for Toward An Optimal Selection of Dialogue Strategies: A Target-Driven Approach for Intelligent Outbound Robots

With the growth of the economy and society, enterprises, especially in the FinTech industry, have increasing demands of outbound calls for customers such as debt collection, marketing, anti-fraud calls, and so on. But a large amount of repetitive and mechanical work occupies most of the time of human agents, so the cost of equipment and labor for enterprises is increasing accordingly. At the same time, with the development of artificial intelligence technology in the past few decades, it has become quite common for companies to use new technologies such as Big Data and artificial intelligence to empower outbound call businesses. The intelligent outbound robot is a typical application of the artificial intelligence technology in the field of outbound call businesses. It is mainly used to communicate with customers in order to accomplish a certain target. It has the characteristics of low cost, high reuse, and easy compliance, which has attracted more attention from the industry. At present, there are two kinds of intelligent outbound robots in the industry but both of them still leave large room for improvement. One kind of them is based on a finite state machine relying on the configuration of jump conditions and corresponding nodes based on manual experience. This kind of intelligent outbound robot is also called a flow-based robot. For example, the schematic diagram of the working model of a flow-based robot for debt collection is shown in Fig.\ref{fig:label}. In each round, the robot will reply to the user with the words corresponding to each node.

Viaarxiv icon

Improving Conversational Recommendation System by Pretraining on Billions Scale of Knowledge Graph

Apr 30, 2021
Chi-Man Wong, Fan Feng, Wen Zhang, Chi-Man Vong, Hui Chen, Yichi Zhang, Peng He, Huan Chen, Kun Zhao, Huajun Chen

Figure 1 for Improving Conversational Recommendation System by Pretraining on Billions Scale of Knowledge Graph
Figure 2 for Improving Conversational Recommendation System by Pretraining on Billions Scale of Knowledge Graph
Figure 3 for Improving Conversational Recommendation System by Pretraining on Billions Scale of Knowledge Graph
Figure 4 for Improving Conversational Recommendation System by Pretraining on Billions Scale of Knowledge Graph

Conversational Recommender Systems (CRSs) in E-commerce platforms aim to recommend items to users via multiple conversational interactions. Click-through rate (CTR) prediction models are commonly used for ranking candidate items. However, most CRSs are suffer from the problem of data scarcity and sparseness. To address this issue, we propose a novel knowledge-enhanced deep cross network (K-DCN), a two-step (pretrain and fine-tune) CTR prediction model to recommend items. We first construct a billion-scale conversation knowledge graph (CKG) from information about users, items and conversations, and then pretrain CKG by introducing knowledge graph embedding method and graph convolution network to encode semantic and structural information respectively.To make the CTR prediction model sensible of current state of users and the relationship between dialogues and items, we introduce user-state and dialogue-interaction representations based on pre-trained CKG and propose K-DCN.In K-DCN, we fuse the user-state representation, dialogue-interaction representation and other normal feature representations via deep cross network, which will give the rank of candidate items to be recommended.We experimentally prove that our proposal significantly outperforms baselines and show it's real application in Alime.

* Paper is accepted by ICDE2021 industry track 
Viaarxiv icon

An Emotion-controlled Dialog Response Generation Model with Dynamic Vocabulary

Mar 04, 2021
Shuangyong Song, Kexin Wang, Chao Wang, Haiqing Chen, Huan Chen

Figure 1 for An Emotion-controlled Dialog Response Generation Model with Dynamic Vocabulary
Figure 2 for An Emotion-controlled Dialog Response Generation Model with Dynamic Vocabulary

In response generation task, proper sentimental expressions can obviously improve the human-like level of the responses. However, for real application in online systems, high QPS (queries per second, an indicator of the flow capacity of on-line systems) is required, and a dynamic vocabulary mechanism has been proved available in improving speed of generative models. In this paper, we proposed an emotion-controlled dialog response generation model based on the dynamic vocabulary mechanism, and the experimental results show the benefit of this model.

* 4 pages 
Viaarxiv icon