Alert button
Picture for Qianren Mao

Qianren Mao

Alert button

Neural-Hidden-CRF: A Robust Weakly-Supervised Sequence Labeler

Sep 10, 2023
Zhijun Chen, Hailong Sun, Wanhao Zhang, Chunyi Xu, Qianren Mao, Pengpeng Chen

Figure 1 for Neural-Hidden-CRF: A Robust Weakly-Supervised Sequence Labeler
Figure 2 for Neural-Hidden-CRF: A Robust Weakly-Supervised Sequence Labeler
Figure 3 for Neural-Hidden-CRF: A Robust Weakly-Supervised Sequence Labeler
Figure 4 for Neural-Hidden-CRF: A Robust Weakly-Supervised Sequence Labeler

We propose a neuralized undirected graphical model called Neural-Hidden-CRF to solve the weakly-supervised sequence labeling problem. Under the umbrella of probabilistic undirected graph theory, the proposed Neural-Hidden-CRF embedded with a hidden CRF layer models the variables of word sequence, latent ground truth sequence, and weak label sequence with the global perspective that undirected graphical models particularly enjoy. In Neural-Hidden-CRF, we can capitalize on the powerful language model BERT or other deep models to provide rich contextual semantic knowledge to the latent ground truth sequence, and use the hidden CRF layer to capture the internal label dependencies. Neural-Hidden-CRF is conceptually simple and empirically powerful. It obtains new state-of-the-art results on one crowdsourcing benchmark and three weak-supervision benchmarks, including outperforming the recent advanced model CHMM by 2.80 F1 points and 2.23 F1 points in average generalization and inference performance, respectively.

* 13 pages, 4 figures, accepted by SIGKDD-2023 
Viaarxiv icon

DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining

May 20, 2023
Weifeng Jiang, Qianren Mao, Jianxin Li, Chenghua Lin, Weiyi Yang, Ting Deng, Zheng Wang

Figure 1 for DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining
Figure 2 for DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining
Figure 3 for DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining
Figure 4 for DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining

Many text mining models are constructed by fine-tuning a large deep pre-trained language model (PLM) in downstream tasks. However, a significant challenge is maintaining performance when we use a lightweight model with limited labeled samples. We present DisCo, a semi-supervised learning (SSL) framework for fine-tuning a cohort of small student models generated from a large PLM using knowledge distillation. Our key insight is to share complementary knowledge among distilled student cohorts to promote their SSL effectiveness. DisCo employs a novel co-training technique to optimize multiple small student models by promoting knowledge sharing among students under diversified views: model views produced by different distillation strategies and data views produced by various input augmentations. We evaluate DisCo on both semi-supervised text classification and extractive summarization tasks. Experimental results show that DisCo can produce student models that are 7.6 times smaller and 4.8 times faster in inference than the baseline PLMs while maintaining comparable performance. We also show that DisCo-generated student models outperform the similar-sized models elaborately tuned in distinct tasks.

Viaarxiv icon

Attend and Select: A Segment Attention based Selection Mechanism for Microblog Hashtag Generation

Jun 06, 2021
Qianren Mao, Xi Li, Hao Peng, Bang Liu, Shu Guo, Jianxin Li, Lihong Wang, Philip S. Yu

Figure 1 for Attend and Select: A Segment Attention based Selection Mechanism for Microblog Hashtag Generation
Figure 2 for Attend and Select: A Segment Attention based Selection Mechanism for Microblog Hashtag Generation
Figure 3 for Attend and Select: A Segment Attention based Selection Mechanism for Microblog Hashtag Generation
Figure 4 for Attend and Select: A Segment Attention based Selection Mechanism for Microblog Hashtag Generation

Automatic microblog hashtag generation can help us better and faster understand or process the critical content of microblog posts. Conventional sequence-to-sequence generation methods can produce phrase-level hashtags and have achieved remarkable performance on this task. However, they are incapable of filtering out secondary information and not good at capturing the discontinuous semantics among crucial tokens. A hashtag is formed by tokens or phrases that may originate from various fragmentary segments of the original text. In this work, we propose an end-to-end Transformer-based generation model which consists of three phases: encoding, segments-selection, and decoding. The model transforms discontinuous semantic segments from the source text into a sequence of hashtags. Specifically, we introduce a novel Segments Selection Mechanism (SSM) for Transformer to obtain segmental representations tailored to phrase-level hashtag generation. Besides, we introduce two large-scale hashtag generation datasets, which are newly collected from Chinese Weibo and English Twitter. Extensive evaluations on the two datasets reveal our approach's superiority with significant improvements to extraction and generation baselines. The code and datasets are available at \url{https://github.com/OpenSUM/HashtagGen}.

Viaarxiv icon

Automated Timeline Length Selection for Flexible Timeline Summarization

May 29, 2021
Xi Li, Qianren Mao, Hao Peng, Hongdong Zhu, Jianxin Li, Zheng Wang

Figure 1 for Automated Timeline Length Selection for Flexible Timeline Summarization
Figure 2 for Automated Timeline Length Selection for Flexible Timeline Summarization
Figure 3 for Automated Timeline Length Selection for Flexible Timeline Summarization
Figure 4 for Automated Timeline Length Selection for Flexible Timeline Summarization

By producing summaries for long-running events, timeline summarization (TLS) underpins many information retrieval tasks. Successful TLS requires identifying an appropriate set of key dates (the timeline length) to cover. However, doing so is challenging as the right length can change from one topic to another. Existing TLS solutions either rely on an event-agnostic fixed length or an expert-supplied setting. Neither of the strategies is desired for real-life TLS scenarios. A fixed, event-agnostic setting ignores the diversity of events and their development and hence can lead to low-quality TLS. Relying on expert-crafted settings is neither scalable nor sustainable for processing many dynamically changing events. This paper presents a better TLS approach for automatically and dynamically determining the TLS timeline length. We achieve this by employing the established elbow method from the machine learning community to automatically find the minimum number of dates within the time series to generate concise and informative summaries. We applied our approach to four TLS datasets of English and Chinese and compared them against three prior methods. Experimental results show that our approach delivers comparable or even better summaries over state-of-art TLS methods, but it achieves this without expert involvement.

Viaarxiv icon

Noised Consistency Training for Text Summarization

May 28, 2021
Junnan Liu, Qianren Mao, Bang Liu, Hao Peng, Hongdong Zhu, Jianxin Li

Figure 1 for Noised Consistency Training for Text Summarization
Figure 2 for Noised Consistency Training for Text Summarization
Figure 3 for Noised Consistency Training for Text Summarization
Figure 4 for Noised Consistency Training for Text Summarization

Neural abstractive summarization methods often require large quantities of labeled training data. However, labeling large amounts of summarization data is often prohibitive due to time, financial, and expertise constraints, which has limited the usefulness of summarization systems to practical applications. In this paper, we argue that this limitation can be overcome by a semi-supervised approach: consistency training which is to leverage large amounts of unlabeled data to improve the performance of supervised learning over a small corpus. The consistency regularization semi-supervised learning can regularize model predictions to be invariant to small noise applied to input articles. By adding noised unlabeled corpus to help regularize consistency training, this framework obtains comparative performance without using the full dataset. In particular, we have verified that leveraging large amounts of unlabeled data decently improves the performance of supervised learning over an insufficient labeled dataset.

Viaarxiv icon