Skip to main content

R3P-Loc: A compact multi-label predictor using ridge regression and random projection for protein subcellular localization

Author(s): Wan, S; Mak, M-W; Kung, S-Y

Download
To refer to this page use: http://arks.princeton.edu/ark:/88435/pr1z31np1m
Full metadata record
DC FieldValueLanguage
dc.contributor.authorWan, S-
dc.contributor.authorMak, M-W-
dc.contributor.authorKung, S-Y-
dc.date.accessioned2024-01-20T17:50:22Z-
dc.date.available2024-01-20T17:50:22Z-
dc.date.issued2014en_US
dc.identifier.citationWan, S, Mak, M-W, Kung, S-Y. (2014). R3P-Loc: A compact multi-label predictor using ridge regression and random projection for protein subcellular localization. Journal of Theoretical Biology, 360 (34 - 45. doi:10.1016/j.jtbi.2014.06.031en_US
dc.identifier.urihttp://arks.princeton.edu/ark:/88435/pr1z31np1m-
dc.description.abstractLocating proteins within cellular contexts is of paramount significance in elucidating their biological functions. Computational methods based on knowledge databases (such as gene ontology annotation (GOA) database) are known to be more efficient than sequence-based methods. However, the predominant scenarios of knowledge-based methods are that (1) knowledge databases typically have enormous size and are growing exponentially, (2) knowledge databases contain redundant information, and (3) the number of extracted features from knowledge databases is much larger than the number of data samples with ground-truth labels. These properties render the extracted features liable to redundant or irrelevant information, causing the prediction systems suffer from overfitting. To address these problems, this paper proposes an efficient multi-label predictor, namely R3P-Loc, which uses two compact databases for feature extraction and applies random projection (RP) to reduce the feature dimensions of an ensemble ridge regression (RR) classifier. Two new compact databases are created from Swiss-Prot and GOA databases. These databases possess almost the same amount of information as their full-size counterparts but with much smaller size. Experimental results on two recent datasets (eukaryote and plant) suggest that R3P-Loc can reduce the dimensions by seven folds and significantly outperforms state-of-the-art predictors. This paper also demonstrates that the compact databases reduce the memory consumption by 39 times without causing degradation in prediction accuracy. For readers’ convenience, the R3P-Loc server is available online at http://bioinfo.eie.polyu.edu.hk/R3PLocServer/.en_US
dc.format.extent34 - 45en_US
dc.language.isoen_USen_US
dc.relation.ispartofJournal of Theoretical Biologyen_US
dc.rightsAuthor's manuscripten_US
dc.titleR3P-Loc: A compact multi-label predictor using ridge regression and random projection for protein subcellular localizationen_US
dc.typeJournal Articleen_US
dc.identifier.doidoi:10.1016/j.jtbi.2014.06.031-
pu.type.symplectichttp://www.symplectic.co.uk/publications/atom-terms/1.0/journal-articleen_US

Files in This Item:
File Description SizeFormat 
R3P_Loc_A_compact_multi_label_predictor.pdf235.45 kBAdobe PDFView/Download


Items in OAR@Princeton are protected by copyright, with all rights reserved, unless otherwise indicated.