Alert button
Picture for Feifei Wang

Feifei Wang

Alert button

Improved Naive Bayes with Mislabeled Data

Apr 13, 2023
Qianhan Zeng, Yingqiu Zhu, Xuening Zhu, Feifei Wang, Weichen Zhao, Shuning Sun, Meng Su, Hansheng Wang

Figure 1 for Improved Naive Bayes with Mislabeled Data
Figure 2 for Improved Naive Bayes with Mislabeled Data
Figure 3 for Improved Naive Bayes with Mislabeled Data
Figure 4 for Improved Naive Bayes with Mislabeled Data

Labeling mistakes are frequently encountered in real-world applications. If not treated well, the labeling mistakes can deteriorate the classification performances of a model seriously. To address this issue, we propose an improved Naive Bayes method for text classification. It is analytically simple and free of subjective judgements on the correct and incorrect labels. By specifying the generating mechanism of incorrect labels, we optimize the corresponding log-likelihood function iteratively by using an EM algorithm. Our simulation and experiment results show that the improved Naive Bayes method greatly improves the performances of the Naive Bayes method with mislabeled data.

Viaarxiv icon

Diversity-Aware Meta Visual Prompting

Mar 14, 2023
Qidong Huang, Xiaoyi Dong, Dongdong Chen, Weiming Zhang, Feifei Wang, Gang Hua, Nenghai Yu

Figure 1 for Diversity-Aware Meta Visual Prompting
Figure 2 for Diversity-Aware Meta Visual Prompting
Figure 3 for Diversity-Aware Meta Visual Prompting
Figure 4 for Diversity-Aware Meta Visual Prompting

We present Diversity-Aware Meta Visual Prompting~(DAM-VP), an efficient and effective prompting method for transferring pre-trained models to downstream tasks with frozen backbone. A challenging issue in visual prompting is that image datasets sometimes have a large data diversity whereas a per-dataset generic prompt can hardly handle the complex distribution shift toward the original pretraining data distribution properly. To address this issue, we propose a dataset Diversity-Aware prompting strategy whose initialization is realized by a Meta-prompt. Specifically, we cluster the downstream dataset into small homogeneity subsets in a diversity-adaptive way, with each subset has its own prompt optimized separately. Such a divide-and-conquer design reduces the optimization difficulty greatly and significantly boosts the prompting performance. Furthermore, all the prompts are initialized with a meta-prompt, which is learned across several datasets. It is a bootstrapped paradigm, with the key observation that the prompting knowledge learned from previous datasets could help the prompt to converge faster and perform better on a new dataset. During inference, we dynamically select a proper prompt for each input, based on the feature distance between the input and each subset. Through extensive experiments, our DAM-VP demonstrates superior efficiency and effectiveness, clearly surpassing previous prompting methods in a series of downstream datasets for different pretraining models. Our code is available at: \url{https://github.com/shikiw/DAM-VP}.

* CVPR2023, code is available at https://github.com/shikiw/DAM-VP 
Viaarxiv icon

Towards Precise Flood Prediction via Hierachical Terrain Attention and Multi-Scale Rainfall Guidance

Dec 04, 2022
Feifei Wang, Yong Wang, Shaoqing Chen, Bing Li, Qidong Huang

Figure 1 for Towards Precise Flood Prediction via Hierachical Terrain Attention and Multi-Scale Rainfall Guidance
Figure 2 for Towards Precise Flood Prediction via Hierachical Terrain Attention and Multi-Scale Rainfall Guidance
Figure 3 for Towards Precise Flood Prediction via Hierachical Terrain Attention and Multi-Scale Rainfall Guidance
Figure 4 for Towards Precise Flood Prediction via Hierachical Terrain Attention and Multi-Scale Rainfall Guidance

With the deterioration of climate, the phenomenon of rain-induced flooding has become frequent. To mitigate its impact, recent works adopt convolutional neural networks or other variants to predict the floods. However, these methods directly force the model to reconstruct the raw pixels of water depth maps through constraining pixel-level differences, ignoring the high-level information contained in terrain features and rainfall patterns. To address this, we present a novel GAN-based framework for precise flood prediction, which incorporates hierarchical terrain spatial attention to help the model focus on spatially-salient areas of terrain features and constructs multi-scale rainfall embedding to extensively integrate rainfall pattern information into generation. To better adapt the model in various rainfall conditions, we leverage a rainfall regression loss for both the generator and the discriminator as additional supervision. Extensive evaluations on real catchment datasets demonstrate the superior performance of our method, which greatly surpasses the previous arts under different rainfall conditions.

* Under review 
Viaarxiv icon

High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Nov 04, 2022
Yucong Lin, Jinhua Su, Yuhang Li, Yuhao Wei, Hanchao Yan, Saining Zhang, Jiaan Luo, Danni Ai, Hong Song, Jingfan Fan, Tianyu Fu, Deqiang Xiao, Feifei Wang, Jue Hou, Jian Yang

Figure 1 for High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss
Figure 2 for High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss
Figure 3 for High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss
Figure 4 for High-Resolution Boundary Detection for Medical Image Segmentation with Piece-Wise Two-Sample T-Test Augmented Loss

Deep learning methods have contributed substantially to the rapid advancement of medical image segmentation, the quality of which relies on the suitable design of loss functions. Popular loss functions, including the cross-entropy and dice losses, often fall short of boundary detection, thereby limiting high-resolution downstream applications such as automated diagnoses and procedures. We developed a novel loss function that is tailored to reflect the boundary information to enhance the boundary detection. As the contrast between segmentation and background regions along the classification boundary naturally induces heterogeneity over the pixels, we propose the piece-wise two-sample t-test augmented (PTA) loss that is infused with the statistical test for such heterogeneity. We demonstrate the improved boundary detection power of the PTA loss compared to benchmark losses without a t-test component.

Viaarxiv icon

Knowledge Graph Enhanced Relation Extraction Datasets

Oct 19, 2022
Yucong Lin, Hongming Xiao, Jiani Liu, Zichao Lin, Keming Lu, Feifei Wang, Wei Wei

Figure 1 for Knowledge Graph Enhanced Relation Extraction Datasets
Figure 2 for Knowledge Graph Enhanced Relation Extraction Datasets
Figure 3 for Knowledge Graph Enhanced Relation Extraction Datasets
Figure 4 for Knowledge Graph Enhanced Relation Extraction Datasets

Knowledge-enhanced methods that take advantage of auxiliary knowledge graphs recently emerged in relation extraction, and they surpass traditional text-based relation extraction methods. However, there are no unified public benchmarks that currently involve evidence sentences and knowledge graphs for knowledge-enhanced relation extraction. To combat these issues, we propose KGRED, a knowledge graph enhanced relation extraction dataset with features as follows: (1) the benchmarks are based on widely-used distantly supervised relation extraction datasets; (2) we refine these existing datasets to improve the data quality, and we also construct auxiliary knowledge graphs for these existing datasets through entity linking to support knowledge-enhanced relation extraction tasks; (3) with the new benchmarks we curated, we build baselines in two popular relation extraction settings including sentence-level and bag-level relation extraction, and we also make comparisons among the latest knowledge-enhanced relation extraction methods. KGRED provides high-quality relation extraction datasets with auxiliary knowledge graphs for evaluating the performance of knowledge-enhanced relation extraction methods. Meanwhile, our experiments on KGRED reveal the influence of knowledge graph information on relation extraction tasks.

* 25 pages, 11 figures, will be submitted to Neurocomputing soon 
Viaarxiv icon

Jointly Dynamic Topic Model for Recognition of Lead-lag Relationship in Two Text Corpora

Nov 21, 2021
Yandi Zhu, Xiaoling Lu, Jingya Hong, Feifei Wang

Figure 1 for Jointly Dynamic Topic Model for Recognition of Lead-lag Relationship in Two Text Corpora
Figure 2 for Jointly Dynamic Topic Model for Recognition of Lead-lag Relationship in Two Text Corpora
Figure 3 for Jointly Dynamic Topic Model for Recognition of Lead-lag Relationship in Two Text Corpora
Figure 4 for Jointly Dynamic Topic Model for Recognition of Lead-lag Relationship in Two Text Corpora

Topic evolution modeling has received significant attentions in recent decades. Although various topic evolution models have been proposed, most studies focus on the single document corpus. However in practice, we can easily access data from multiple sources and also observe relationships between them. Then it is of great interest to recognize the relationship between multiple text corpora and further utilize this relationship to improve topic modeling. In this work, we focus on a special type of relationship between two text corpora, which we define as the "lead-lag relationship". This relationship characterizes the phenomenon that one text corpus would influence the topics to be discussed in the other text corpus in the future. To discover the lead-lag relationship, we propose a jointly dynamic topic model and also develop an embedding extension to address the modeling problem of large-scale text corpus. With the recognized lead-lag relationship, the similarities of the two text corpora can be figured out and the quality of topic learning in both corpora can be improved. We numerically investigate the performance of the jointly dynamic topic modeling approach using synthetic data. Finally, we apply the proposed model on two text corpora consisting of statistical papers and the graduation theses. Results show the proposed model can well recognize the lead-lag relationship between the two corpora, and the specific and shared topic patterns in the two corpora are also discovered.

Viaarxiv icon

Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning

Jun 10, 2021
Liangqiong Qu, Yuyin Zhou, Paul Pu Liang, Yingda Xia, Feifei Wang, Li Fei-Fei, Ehsan Adeli, Daniel Rubin

Figure 1 for Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
Figure 2 for Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
Figure 3 for Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning
Figure 4 for Rethinking Architecture Design for Tackling Data Heterogeneity in Federated Learning

Federated learning is an emerging research paradigm enabling collaborative training of machine learning models among different organizations while keeping data private at each institution. Despite recent progress, there remain fundamental challenges such as lack of convergence and potential for catastrophic forgetting in federated learning across real-world heterogeneous devices. In this paper, we demonstrate that attention-based architectures (e.g., Transformers) are fairly robust to distribution shifts and hence improve federated learning over heterogeneous data. Concretely, we conduct the first rigorous empirical investigation of different neural architectures across a range of federated algorithms, real-world benchmarks, and heterogeneous data splits. Our experiments show that simply replacing convolutional networks with Transformers can greatly reduce catastrophic forgetting of previous devices, accelerate convergence, and reach a better global model, especially when dealing with heterogeneous data. We will release our code and pretrained models at https://github.com/Liangqiong/ViT-FL-main to encourage future exploration in robust architectures as an alternative to current research efforts on the optimization front.

Viaarxiv icon