Events

Seminar Series

Presenter:

Ning Zhang

Date:

03-13-2017

Time:

11:00AM-12:00PM

Location:

2206A Student Center

A Deep Natural Network Method for Predicting Mitochondrially Localized Proteins in Plants

Targeting and translocation of proteins to the appropriate subcellular compartments is crucial for cell organization and function. Some newly synthesized proteins are transported to mitochondria with the assistance of complex targeting sequences containing either an N-terminal pre-sequence or a multitude of internal signals. Compared with experimental approaches, computational predictions provide an efficient way to infer subcellular localization for any given protein. However, it is still challenging to predict plant mitochondrially localized proteins accurately due to various limitations. Consequently, the performance of current tools is unsatisfactory. We present a novel computational approach for large-scale prediction of plant mitochondrial proteins. We collected subcellular localization data for plant proteins from databases and the literature, and extracted different types of features from the training data, including amino acid composition, protein sequence profile, and gene co-expression information. We then trained deep neural networks for predicting plant mitochondrial proteins. Benchmarked on an independent dataset, our method achieves considerable improvements over existing tools in predicting mitochondria-localized proteins in plants. We improved the true positive rate by 10-30% over three of the state-of-the-art tools under similar specificity levels. We also applied our method to predict candidate mitochondrial proteins on the whole proteome of Arabidopsis and potato.

Seminar Series

Presenter:

Yang Liu

Date:

03-06-2017

Time:

11:00AM-12:00PM

Location:

2206A Student Center

Geonmic Selection using Deep Learning method

Genomic selection is an approach to enhance the quantitative traits in plant and animal breeding program at early stage using whole genome molecular markers, especially for long life-cycle species. It’s based on the assumption that all quantitative trait loci (QTL) tend to be in linkage disequilibrium with at least on marker. Statistical methods, such as ridge regression, best linear unbiased prediction (RR-BLUP)[1], Bayes A[2], Bayesian LASSO[3] are widely used for genomic selection problem works SNP matrix.  Other machine learning methods (random forrest, support vector machine and neural network)[4] are also been applied for this study. In this work, we are developing a deep learning method using long short term memory (LSTM) recurrent network on a public standard dataset of Pinus taeda (loblolly pine) [5]. The stem height trait (HT, cm) was measured across 861 individuals genotyped with 4,853 SNPs derived from 32 parents. The genomic estimated breeding values (GEBV) was calculated using 10-fold cross-validation method and accuracy was measured using Pearson correlation coefficient between GEBV and observed values.

 

Reference:

[1] Hoerl, Arthur E., and Robert W. Kennard. “Ridge regression: biased estimation for nonorthogonal problems.” Technometrics 42.1 (2000): 80-86.

[2] Meuwissen, T.H.E.,  B.J. Hayes, and M.E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829.

[3] Park, Trevor, and George Casella. “The bayesian lasso.” Journal of the American Statistical Association 103.482 (2008): 681-686.

[4] Heslot, Nicolas, et al. “Genomic selection in plant breeding: a comparison of models.” Crop Science 52.1 (2012): 146-160.

[5] Resende, Márcio FR, et al. “Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.).” Genetics 190.4 (2012): 1503-1510.

1 2 3 4