Alert button
Picture for Jiaqi Wu

Jiaqi Wu

Alert button

Generating Unbiased Pseudo-labels via a Theoretically Guaranteed Chebyshev Constraint to Unify Semi-supervised Classification and Regression

Nov 03, 2023
Jiaqi Wu, Junbiao Pang, Qingming Huang

Both semi-supervised classification and regression are practically challenging tasks for computer vision. However, semi-supervised classification methods are barely applied to regression tasks. Because the threshold-to-pseudo label process (T2L) in classification uses confidence to determine the quality of label. It is successful for classification tasks but inefficient for regression tasks. In nature, regression also requires unbiased methods to generate high-quality labels. On the other hand, T2L for classification often fails if the confidence is generated by a biased method. To address this issue, in this paper, we propose a theoretically guaranteed constraint for generating unbiased labels based on Chebyshev's inequality, combining multiple predictions to generate superior quality labels from several inferior ones. In terms of high-quality labels, the unbiased method naturally avoids the drawback of T2L. Specially, we propose an Unbiased Pseudo-labels network (UBPL network) with multiple branches to combine multiple predictions as pseudo-labels, where a Feature Decorrelation loss (FD loss) is proposed based on Chebyshev constraint. In principle, our method can be used for both classification and regression and can be easily extended to any semi-supervised framework, e.g. Mean Teacher, FixMatch, DualPose. Our approach achieves superior performance over SOTAs on the pose estimation datasets Mouse, FLIC and LSP, as well as the classification datasets CIFAR10/100 and SVHN.

Viaarxiv icon

Modeling the Uncertainty with Maximum Discrepant Students for Semi-supervised 2D Pose Estimation

Nov 03, 2023
Jiaqi Wu, Junbiao Pang, Qingming Huang

Semi-supervised pose estimation is a practically challenging task for computer vision. Although numerous excellent semi-supervised classification methods have emerged, these methods typically use confidence to evaluate the quality of pseudo-labels, which is difficult to achieve in pose estimation tasks. For example, in pose estimation, confidence represents only the possibility that a position of the heatmap is a keypoint, not the quality of that prediction. In this paper, we propose a simple yet efficient framework to estimate the quality of pseudo-labels in semi-supervised pose estimation tasks from the perspective of modeling the uncertainty of the pseudo-labels. Concretely, under the dual mean-teacher framework, we construct the two maximum discrepant students (MDSs) to effectively push two teachers to generate different decision boundaries for the same sample. Moreover, we create multiple uncertainties to assess the quality of the pseudo-labels. Experimental results demonstrate that our method improves the performance of semi-supervised pose estimation on three datasets.

Viaarxiv icon

Ghost-free High Dynamic Range Imaging via Hybrid CNN-Transformer and Structure Tensor

Dec 01, 2022
Yu Yuan, Jiaqi Wu, Zhongliang Jing, Henry Leung, Han Pan

Figure 1 for Ghost-free High Dynamic Range Imaging via Hybrid CNN-Transformer and Structure Tensor
Figure 2 for Ghost-free High Dynamic Range Imaging via Hybrid CNN-Transformer and Structure Tensor
Figure 3 for Ghost-free High Dynamic Range Imaging via Hybrid CNN-Transformer and Structure Tensor
Figure 4 for Ghost-free High Dynamic Range Imaging via Hybrid CNN-Transformer and Structure Tensor

Eliminating ghosting artifacts due to moving objects is a challenging problem in high dynamic range (HDR) imaging. In this letter, we present a hybrid model consisting of a convolutional encoder and a Transformer decoder to generate ghost-free HDR images. In the encoder, a context aggregation network and non-local attention block are adopted to optimize multi-scale features and capture both global and local dependencies of multiple low dynamic range (LDR) images. The decoder based on Swin Transformer is utilized to improve the reconstruction capability of the proposed model. Motivated by the phenomenal difference between the presence and absence of artifacts under the field of structure tensor (ST), we integrate the ST information of LDR images as auxiliary inputs of the network and use ST loss to further constrain artifacts. Different from previous approaches, our network is capable of processing an arbitrary number of input LDR images. Qualitative and quantitative experiments demonstrate the effectiveness of the proposed method by comparing it with existing state-of-the-art HDR deghosting models. Codes are available at https://github.com/pandayuanyu/HSTHdr.

Viaarxiv icon

Learning to Kindle the Starlight

Nov 16, 2022
Yu Yuan, Jiaqi Wu, Lindong Wang, Zhongliang Jing, Henry Leung, Shuyuan Zhu, Han Pan

Figure 1 for Learning to Kindle the Starlight
Figure 2 for Learning to Kindle the Starlight
Figure 3 for Learning to Kindle the Starlight
Figure 4 for Learning to Kindle the Starlight

Capturing highly appreciated star field images is extremely challenging due to light pollution, the requirements of specialized hardware, and the high level of photographic skills needed. Deep learning-based techniques have achieved remarkable results in low-light image enhancement (LLIE) but have not been widely applied to star field image enhancement due to the lack of training data. To address this problem, we construct the first Star Field Image Enhancement Benchmark (SFIEB) that contains 355 real-shot and 854 semi-synthetic star field images, all having the corresponding reference images. Using the presented dataset, we propose the first star field image enhancement approach, namely StarDiffusion, based on conditional denoising diffusion probabilistic models (DDPM). We introduce dynamic stochastic corruptions to the inputs of conditional DDPM to improve the performance and generalization of the network on our small-scale dataset. Experiments show promising results of our method, which outperforms state-of-the-art low-light image enhancement algorithms. The dataset and codes will be open-sourced.

Viaarxiv icon

Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local Cross-modal Attention

Oct 18, 2022
Yu Yuan, Jiaqi Wu, Zhongliang Jing, Henry Leung, Han Pan

Figure 1 for Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local Cross-modal Attention
Figure 2 for Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local Cross-modal Attention
Figure 3 for Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local Cross-modal Attention
Figure 4 for Multimodal Image Fusion based on Hybrid CNN-Transformer and Non-local Cross-modal Attention

The fusion of images taken by heterogeneous sensors helps to enrich the information and improve the quality of imaging. In this article, we present a hybrid model consisting of a convolutional encoder and a Transformer-based decoder to fuse multimodal images. In the encoder, a non-local cross-modal attention block is proposed to capture both local and global dependencies of multiple source images. A branch fusion module is designed to adaptively fuse the features of the two branches. We embed a Transformer module with linear complexity in the decoder to enhance the reconstruction capability of the proposed network. Qualitative and quantitative experiments demonstrate the effectiveness of the proposed method by comparing it with existing state-of-the-art fusion models. The source code of our work is available at https://github.com/pandayuanyu/HCFusion.

Viaarxiv icon

Privacy Information Classification: A Hybrid Approach

Jan 27, 2021
Jiaqi Wu, Weihua Li, Quan Bai, Takayuki Ito, Ahmed Moustafa

Figure 1 for Privacy Information Classification: A Hybrid Approach
Figure 2 for Privacy Information Classification: A Hybrid Approach
Figure 3 for Privacy Information Classification: A Hybrid Approach
Figure 4 for Privacy Information Classification: A Hybrid Approach

A large amount of information has been published to online social networks every day. Individual privacy-related information is also possibly disclosed unconsciously by the end-users. Identifying privacy-related data and protecting the online social network users from privacy leakage turn out to be significant. Under such a motivation, this study aims to propose and develop a hybrid privacy classification approach to detect and classify privacy information from OSNs. The proposed hybrid approach employs both deep learning models and ontology-based models for privacy-related information extraction. Extensive experiments are conducted to validate the proposed hybrid approach, and the empirical results demonstrate its superiority in assisting online social network users against privacy leakage.

* IJCAI 2019 Workshop. The 4th International Workshop on Smart Simulation and Modelling for Complex Systems 
Viaarxiv icon

Athena: Constructing Dialogues Dynamically with Discourse Constraints

Nov 21, 2020
Vrindavan Harrison, Juraj Juraska, Wen Cui, Lena Reed, Kevin K. Bowden, Jiaqi Wu, Brian Schwarzmann, Abteen Ebrahimi, Rishi Rajasekaran, Nikhil Varghese, Max Wechsler-Azen, Steve Whittaker, Jeffrey Flanigan, Marilyn Walker

Figure 1 for Athena: Constructing Dialogues Dynamically with Discourse Constraints
Figure 2 for Athena: Constructing Dialogues Dynamically with Discourse Constraints
Figure 3 for Athena: Constructing Dialogues Dynamically with Discourse Constraints
Figure 4 for Athena: Constructing Dialogues Dynamically with Discourse Constraints

This report describes Athena, a dialogue system for spoken conversation on popular topics and current events. We develop a flexible topic-agnostic approach to dialogue management that dynamically configures dialogue based on general principles of entity and topic coherence. Athena's dialogue manager uses a contract-based method where discourse constraints are dispatched to clusters of response generators. This allows Athena to procure responses from dynamic sources, such as knowledge graph traversals and feature-based on-the-fly response retrieval methods. After describing the dialogue system architecture, we perform an analysis of conversations that Athena participated in during the 2019 Alexa Prize Competition. We conclude with a report on several user studies we carried out to better understand how individual user characteristics affect system ratings.

* 3rd Proceedings of Alexa Prize (Alexa Prize 2019) 
Viaarxiv icon

Entertaining and Opinionated but Too Controlling: A Large-Scale User Study of an Open Domain Alexa Prize System

Aug 13, 2019
Kevin K. Bowden, Jiaqi Wu, Wen Cui, Juraj Juraska, Vrindavan Harrison, Brian Schwarzmann, Nicholas Santer, Steve Whittaker, Marilyn Walker

Figure 1 for Entertaining and Opinionated but Too Controlling: A Large-Scale User Study of an Open Domain Alexa Prize System
Figure 2 for Entertaining and Opinionated but Too Controlling: A Large-Scale User Study of an Open Domain Alexa Prize System
Figure 3 for Entertaining and Opinionated but Too Controlling: A Large-Scale User Study of an Open Domain Alexa Prize System
Figure 4 for Entertaining and Opinionated but Too Controlling: A Large-Scale User Study of an Open Domain Alexa Prize System

Conversational systems typically focus on functional tasks such as scheduling appointments or creating todo lists. Instead we design and evaluate SlugBot (SB), one of 8 semifinalists in the 2018 AlexaPrize, whose goal is to support casual open-domain social inter-action. This novel application requires both broad topic coverage and engaging interactive skills. We developed a new technical approach to meet this demanding situation by crowd-sourcing novel content and introducing playful conversational strategies based on storytelling and games. We collected over 10,000 conversations during August 2018 as part of the Alexa Prize competition. We also conducted an in-lab follow-up qualitative evaluation. Over-all users found SB moderately engaging; conversations averaged 3.6 minutes and involved 26 user turns. However, users reacted very differently to different conversation subtypes. Storytelling and games were evaluated positively; these were seen as entertaining with predictable interactive structure. They also led users to impute personality and intelligence to SB. In contrast, search and general Chit-Chat induced coverage problems; here users found it hard to infer what topics SB could understand, with these conversations seen as being too system-driven. Theoretical and design implications suggest a move away from conversational systems that simply provide factual information. Future systems should be designed to have their own opinions with personal stories to share, and SB provides an example of how we might achieve this.

* To appear in 1st International Conference on Conversational User Interfaces (CUI 2019) 
Viaarxiv icon

SlugBot: Developing a Computational Model andFramework of a Novel Dialogue Genre

Jul 22, 2019
Kevin K. Bowden, Jiaqi Wu, Wen Cui, Juraj Juraska, Vrindavan Harrison, Brian Schwarzmann, Nick Santer, Marilyn Walker

Figure 1 for SlugBot: Developing a Computational Model andFramework of a Novel Dialogue Genre
Figure 2 for SlugBot: Developing a Computational Model andFramework of a Novel Dialogue Genre
Figure 3 for SlugBot: Developing a Computational Model andFramework of a Novel Dialogue Genre
Figure 4 for SlugBot: Developing a Computational Model andFramework of a Novel Dialogue Genre

One of the most interesting aspects of the Amazon Alexa Prize competition is that the framing of the competition requires the development of new computational models of dialogue and its structure. Traditional computational models of dialogue are of two types: (1) task-oriented dialogue, supported by AI planning models,or simplified planning models consisting of frames with slots to be filled; or (2)search-oriented dialogue where every user turn is treated as a search query that may elaborate and extend current search results. Alexa Prize dialogue systems such as SlugBot must support conversational capabilities that go beyond what these traditional models can do. Moreover, while traditional dialogue systems rely on theoretical computational models, there are no existing computational theories that circumscribe the expected system and user behaviors in the intended conversational genre of the Alexa Prize Bots. This paper describes how UCSC's SlugBot team has combined the development of a novel computational theoretical model, Discourse Relation Dialogue Model, with its implementation in a modular system in order to test and refine it. We highlight how our novel dialogue model has led us to create a novel ontological resource, UniSlug, and how the structure of UniSlug determine show we curate and structure content so that our dialogue manager implements and tests our novel computational dialogue model.

* arXiv admin note: text overlap with arXiv:1801.01531 
Viaarxiv icon

Implicit Discourse Relation Identification for Open-domain Dialogues

Jul 09, 2019
Mingyu Derek Ma, Kevin K. Bowden, Jiaqi Wu, Wen Cui, Marilyn Walker

Figure 1 for Implicit Discourse Relation Identification for Open-domain Dialogues
Figure 2 for Implicit Discourse Relation Identification for Open-domain Dialogues
Figure 3 for Implicit Discourse Relation Identification for Open-domain Dialogues

Discourse relation identification has been an active area of research for many years, and the challenge of identifying implicit relations remains largely an unsolved task, especially in the context of an open-domain dialogue system. Previous work primarily relies on a corpora of formal text which is inherently non-dialogic, i.e., news and journals. This data however is not suitable to handle the nuances of informal dialogue nor is it capable of navigating the plethora of valid topics present in open-domain dialogue. In this paper, we designed a novel discourse relation identification pipeline specifically tuned for open-domain dialogue systems. We firstly propose a method to automatically extract the implicit discourse relation argument pairs and labels from a dataset of dialogic turns, resulting in a novel corpus of discourse relation pairs; the first of its kind to attempt to identify the discourse relations connecting the dialogic turns in open-domain discourse. Moreover, we have taken the first steps to leverage the dialogue features unique to our task to further improve the identification of such relations by performing feature ablation and incorporating dialogue features to enhance the state-of-the-art model.

* To appear in Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL2019) 
Viaarxiv icon