Alert button
Picture for Yaping Huang

Yaping Huang

Alert button

Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation

May 11, 2023
Yujie Wang, Chao Huang, Liner Yang, Zhixuan Fang, Yaping Huang, Yang Liu, Erhong Yang

Figure 1 for Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation
Figure 2 for Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation
Figure 3 for Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation
Figure 4 for Cost-efficient Crowdsourcing for Span-based Sequence Labeling: Worker Selection and Data Augmentation

This paper introduces a novel worker selection algorithm, enhancing annotation quality and reducing costs in challenging span-based sequence labeling tasks in Natural Language Processing (NLP). Unlike previous studies targeting simpler tasks, this study contends with the complexities of label interdependencies in sequence labeling tasks. The proposed algorithm utilizes a Combinatorial Multi-Armed Bandit (CMAB) approach for worker selection. The challenge of dealing with imbalanced and small-scale datasets, which hinders offline simulation of worker selection, is tackled using an innovative data augmentation method termed shifting, expanding, and shrinking (SES). The SES method is designed specifically for sequence labeling tasks. Rigorous testing on CoNLL 2003 NER and Chinese OEI datasets showcased the algorithm's efficiency, with an increase in F1 score up to 100.04% of the expert-only baseline, alongside cost savings up to 65.97%. The paper also encompasses a dataset-independent test emulating annotation evaluation through a Bernoulli distribution, which still led to an impressive 97.56% F1 score of the expert baseline and 59.88% cost savings. This research addresses and overcomes numerous obstacles in worker selection for complex NLP tasks.

Viaarxiv icon

The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector

Mar 21, 2023
Caixia Zhou, Yaping Huang, Mengyang Pu, Qingji Guan, Li Huang, Haibin Ling

Figure 1 for The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
Figure 2 for The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
Figure 3 for The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector
Figure 4 for The Treasure Beneath Multiple Annotations: An Uncertainty-aware Edge Detector

Deep learning-based edge detectors heavily rely on pixel-wise labels which are often provided by multiple annotators. Existing methods fuse multiple annotations using a simple voting process, ignoring the inherent ambiguity of edges and labeling bias of annotators. In this paper, we propose a novel uncertainty-aware edge detector (UAED), which employs uncertainty to investigate the subjectivity and ambiguity of diverse annotations. Specifically, we first convert the deterministic label space into a learnable Gaussian distribution, whose variance measures the degree of ambiguity among different annotations. Then we regard the learned variance as the estimated uncertainty of the predicted edge maps, and pixels with higher uncertainty are likely to be hard samples for edge detection. Therefore we design an adaptive weighting loss to emphasize the learning from those pixels with high uncertainty, which helps the network to gradually concentrate on the important pixels. UAED can be combined with various encoder-decoder backbones, and the extensive experiments demonstrate that UAED achieves superior performance consistently across multiple edge detection benchmarks. The source code is available at \url{https://github.com/ZhouCX117/UAED}

* CVPR2023 
Viaarxiv icon

Uncertainty-Driven Action Quality Assessment

Jul 29, 2022
Caixia Zhou, Yaping Huang

Figure 1 for Uncertainty-Driven Action Quality Assessment
Figure 2 for Uncertainty-Driven Action Quality Assessment
Figure 3 for Uncertainty-Driven Action Quality Assessment
Figure 4 for Uncertainty-Driven Action Quality Assessment

Automatic action quality assessment (AQA) has attracted more interests due to its wide applications. However, existing AQA methods usually employ the multi-branch models to generate multiple scores, which is not flexible for dealing with a variable number of judges. In this paper, we propose a novel Uncertainty-Driven AQA (UD-AQA) model to generate multiple predictions only using one single branch. Specifically, we design a CVAE (Conditional Variational Auto-Encoder) based module to encode the uncertainty, where multiple scores can be produced by sampling from the learned latent space multiple times. Moreover, we output the estimation of uncertainty and utilize the predicted uncertainty to re-weight AQA regression loss, which can reduce the contributions of uncertain samples for training. We further design an uncertainty-guided training strategy to dynamically adjust the learning order of the samples from low uncertainty to high uncertainty. The experiments show that our proposed method achieves new state-of-the-art results on the Olympic events MTL-AQA and surgical skill JIGSAWS datasets.

Viaarxiv icon

BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling

Apr 16, 2022
Cunliang Kong, Yujie Wang, Ruining Chong, Liner Yang, Hengyuan Zhang, Erhong Yang, Yaping Huang

Figure 1 for BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling
Figure 2 for BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling
Figure 3 for BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling
Figure 4 for BLCU-ICALL at SemEval-2022 Task 1: Cross-Attention Multitasking Framework for Definition Modeling

This paper describes the BLCU-ICALL system used in the SemEval-2022 Task 1 Comparing Dictionaries and Word Embeddings, the Definition Modeling subtrack, achieving 1st on Italian, 2nd on Spanish and Russian, and 3rd on English and French. We propose a transformer-based multitasking framework to explore the task. The framework integrates multiple embedding architectures through the cross-attention mechanism, and captures the structure of glosses through a masking language model objective. Additionally, we also investigate a simple but effective model ensembling strategy to further improve the robustness. The evaluation results show the effectiveness of our solution. We release our code at: https://github.com/blcuicall/SemEval2022-Task1-DM.

Viaarxiv icon

EDTER: Edge Detection with Transformer

Mar 16, 2022
Mengyang Pu, Yaping Huang, Yuming Liu, Qingji Guan, Haibin Ling

Figure 1 for EDTER: Edge Detection with Transformer
Figure 2 for EDTER: Edge Detection with Transformer
Figure 3 for EDTER: Edge Detection with Transformer
Figure 4 for EDTER: Edge Detection with Transformer

Convolutional neural networks have made significant progresses in edge detection by progressively exploring the context and semantic features. However, local details are gradually suppressed with the enlarging of receptive fields. Recently, vision transformer has shown excellent capability in capturing long-range dependencies. Inspired by this, we propose a novel transformer-based edge detector, \emph{Edge Detection TransformER (EDTER)}, to extract clear and crisp object boundaries and meaningful edges by exploiting the full image context information and detailed local cues simultaneously. EDTER works in two stages. In Stage I, a global transformer encoder is used to capture long-range global context on coarse-grained image patches. Then in Stage II, a local transformer encoder works on fine-grained patches to excavate the short-range local cues. Each transformer encoder is followed by an elaborately designed Bi-directional Multi-Level Aggregation decoder to achieve high-resolution features. Finally, the global context and local cues are combined by a Feature Fusion Module and fed into a decision head for edge prediction. Extensive experiments on BSDS500, NYUDv2, and Multicue demonstrate the superiority of EDTER in comparison with state-of-the-arts.

* Accepted by CVPR2022 
Viaarxiv icon

RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth

Aug 02, 2021
Mengyang Pu, Yaping Huang, Qingji Guan, Haibin Ling

Figure 1 for RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
Figure 2 for RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
Figure 3 for RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth
Figure 4 for RINDNet: Edge Detection for Discontinuity in Reflectance, Illumination, Normal and Depth

As a fundamental building block in computer vision, edges can be categorised into four types according to the discontinuity in surface-Reflectance, Illumination, surface-Normal or Depth. While great progress has been made in detecting generic or individual types of edges, it remains under-explored to comprehensively study all four edge types together. In this paper, we propose a novel neural network solution, RINDNet, to jointly detect all four types of edges. Taking into consideration the distinct attributes of each type of edges and the relationship between them, RINDNet learns effective representations for each of them and works in three stages. In stage I, RINDNet uses a common backbone to extract features shared by all edges. Then in stage II it branches to prepare discriminative features for each edge type by the corresponding decoder. In stage III, an independent decision head for each type aggregates the features from previous stages to predict the initial results. Additionally, an attention module learns attention maps for all types to capture the underlying relations between them, and these maps are combined with initial results to generate the final edge detection results. For training and evaluation, we construct the first public benchmark, BSDS-RIND, with all four types of edges carefully annotated. In our experiments, RINDNet yields promising results in comparison with state-of-the-art methods. Additional analysis is presented in supplementary material.

* Accepted by ICCV2021 
Viaarxiv icon

Learning from Pixel-Level Label Noise: A New Perspective for Semi-Supervised Semantic Segmentation

Mar 26, 2021
Rumeng Yi, Yaping Huang, Qingji Guan, Mengyang Pu, Runsheng Zhang

Figure 1 for Learning from Pixel-Level Label Noise: A New Perspective for Semi-Supervised Semantic Segmentation
Figure 2 for Learning from Pixel-Level Label Noise: A New Perspective for Semi-Supervised Semantic Segmentation
Figure 3 for Learning from Pixel-Level Label Noise: A New Perspective for Semi-Supervised Semantic Segmentation
Figure 4 for Learning from Pixel-Level Label Noise: A New Perspective for Semi-Supervised Semantic Segmentation

This paper addresses semi-supervised semantic segmentation by exploiting a small set of images with pixel-level annotations (strong supervisions) and a large set of images with only image-level annotations (weak supervisions). Most existing approaches aim to generate accurate pixel-level labels from weak supervisions. However, we observe that those generated labels still inevitably contain noisy labels. Motivated by this observation, we present a novel perspective and formulate this task as a problem of learning with pixel-level label noise. Existing noisy label methods, nevertheless, mainly aim at image-level tasks, which can not capture the relationship between neighboring labels in one image. Therefore, we propose a graph based label noise detection and correction framework to deal with pixel-level noisy labels. In particular, for the generated pixel-level noisy labels from weak supervisions by Class Activation Map (CAM), we train a clean segmentation model with strong supervisions to detect the clean labels from these noisy labels according to the cross-entropy loss. Then, we adopt a superpixel-based graph to represent the relations of spatial adjacency and semantic similarity between pixels in one image. Finally we correct the noisy labels using a Graph Attention Network (GAT) supervised by detected clean labels. We comprehensively conduct experiments on PASCAL VOC 2012, PASCAL-Context and MS-COCO datasets. The experimental results show that our proposed semi supervised method achieves the state-of-the-art performances and even outperforms the fully-supervised models on PASCAL VOC 2012 and MS-COCO datasets in some cases.

Viaarxiv icon

Transform consistency for learning with noisy labels

Mar 25, 2021
Rumeng Yi, Yaping Huang

Figure 1 for Transform consistency for learning with noisy labels
Figure 2 for Transform consistency for learning with noisy labels
Figure 3 for Transform consistency for learning with noisy labels
Figure 4 for Transform consistency for learning with noisy labels

It is crucial to distinguish mislabeled samples for dealing with noisy labels. Previous methods such as Coteaching and JoCoR introduce two different networks to select clean samples out of the noisy ones and only use these clean ones to train the deep models. Different from these methods which require to train two networks simultaneously, we propose a simple and effective method to identify clean samples only using one single network. We discover that the clean samples prefer to reach consistent predictions for the original images and the transformed images while noisy samples usually suffer from inconsistent predictions. Motivated by this observation, we introduce to constrain the transform consistency between the original images and the transformed images for network training, and then select small-loss samples to update the parameters of the network. Furthermore, in order to mitigate the negative influence of noisy labels, we design a classification loss by using the off-line hard labels and on-line soft labels to provide more reliable supervisions for training a robust model. We conduct comprehensive experiments on CIFAR-10, CIFAR-100 and Clothing1M datasets. Compared with the baselines, we achieve the state-of-the-art performance. Especially, in most cases, our proposed method outperforms the baselines by a large margin.

Viaarxiv icon

Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning

Jan 29, 2021
Shengsheng Zhang, Yaping Huang, Yun Chen, Liner Yang, Chencheng Wang, Erhong Yang

Figure 1 for Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning
Figure 2 for Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning
Figure 3 for Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning
Figure 4 for Few-Shot Domain Adaptation for Grammatical Error Correction via Meta-Learning

Most existing Grammatical Error Correction (GEC) methods based on sequence-to-sequence mainly focus on how to generate more pseudo data to obtain better performance. Few work addresses few-shot GEC domain adaptation. In this paper, we treat different GEC domains as different GEC tasks and propose to extend meta-learning to few-shot GEC domain adaptation without using any pseudo data. We exploit a set of data-rich source domains to learn the initialization of model parameters that facilitates fast adaptation on new resource-poor target domains. We adapt GEC model to the first language (L1) of the second language learner. To evaluate the proposed method, we use nine L1s as source domains and five L1s as target domains. Experiment results on the L1 GEC domain adaptation dataset demonstrate that the proposed approach outperforms the multi-task transfer learning baseline by 0.50 $F_{0.5}$ score on average and enables us to effectively adapt to a new L1 domain with only 200 parallel sentences.

Viaarxiv icon

Classification-driven Single Image Dehazing

Nov 21, 2019
Yanting Pei, Yaping Huang, Xingyuan Zhang

Figure 1 for Classification-driven Single Image Dehazing
Figure 2 for Classification-driven Single Image Dehazing
Figure 3 for Classification-driven Single Image Dehazing
Figure 4 for Classification-driven Single Image Dehazing

Most existing dehazing algorithms often use hand-crafted features or Convolutional Neural Networks (CNN)-based methods to generate clear images using pixel-level Mean Square Error (MSE) loss. The generated images generally have better visual appeal, but not always have better performance for high-level vision tasks, e.g. image classification. In this paper, we investigate a new point of view in addressing this problem. Instead of focusing only on achieving good quantitative performance on pixel-based metrics such as Peak Signal to Noise Ratio (PSNR), we also ensure that the dehazed image itself does not degrade the performance of the high-level vision tasks such as image classification. To this end, we present an unified CNN architecture that includes three parts: a dehazing sub-network (DNet), a classification-driven Conditional Generative Adversarial Networks sub-network (CCGAN) and a classification sub-network (CNet) related to image classification, which has better performance both on visual appeal and image classification. We conduct comprehensive experiments on two challenging benchmark datasets for fine-grained and object classification: CUB-200-2011 and Caltech-256. Experimental results demonstrate that the proposed method outperforms many recent state-of-the-art single image dehazing methods in terms of image dehazing metrics and classification accuracy.

Viaarxiv icon