Get our free extension to see links to code for papers anywhere online!

Chrome logo Add to Chrome

Firefox logo Add to Firefox

Combining Cost-Sensitive Classification with Negative Selection for Protein Function Prediction

May 18, 2018
Marco Frasca, Nicolò Cesa Bianchi

Share this with someone who'll enjoy it:

Motivation: Computational methods play a central role in annotating the functions of large amounts of proteins delivered by high-throughput technologies. Despite the encouraging results achieved by these methods, many functions still have a very low number of verified protein annotations, leading to a pronounced imbalance between annotated and unannotated proteins. Furthermore, functional taxonomies rarely report negative annotations. This leaves ill defined the set of negative examples, which is crucial for training the majority of machine learning methods. In practice, neglecting data imbalance and the problem of selecing negative examples can strongly limit the accuracy of protein function prediction. Results: We present a novel approach combining a suitable imbalance-aware classification strategy, addressing the scarcity of annotated proteins, with an active learning strategy for selecting the most reliable negative examples. When implemented in a Support Vector Machine, this combined approach shows improved accuracy on yeast and human proteomes over standard SVM and top-performing function prediction tools

   Access Paper Source

Share this with someone who'll enjoy it: