Published on 03/15/2017
Endometriosis, a complex and common gynecological disorder affecting 5–10% of reproductive-age women, is characterized by the growth of endometrial tissue outside of the uterine cavity. Accumulating evidence indicates that various epigenetic aberrations are associated with endometriosis. In our study, we have methylation data and clinical information on 80 patients (36 controls and 44 cases) from a clinical study. Our objectives are to identify the genomic regions associated with endometriosis and identify the specific genes associated with endometriosis after adjusting for potential confounding variables. We have developed a bioinformatics methylation data analysis pipeline in-house using several open source tools including FastQC, Bowtie2 and R packages. In this seminar, the informatics approach for the data analysis will be presented.
Published on 03/13/2017
Targeting and translocation of proteins to the appropriate subcellular compartments is crucial for cell organization and function. Some newly synthesized proteins are transported to mitochondria with the assistance of complex targeting sequences containing either an N-terminal pre-sequence or a multitude of internal signals. Compared with experimental approaches, computational predictions provide an efficient way to infer subcellular localization for any given protein. However, it is still challenging to predict plant mitochondrially localized proteins accurately due to various limitations. Consequently, the performance of current tools is unsatisfactory. We present a novel computational approach for large-scale prediction of plant mitochondrial proteins. We collected subcellular localization data for plant proteins from databases and the literature, and extracted different types of features from the training data, including amino acid composition, protein sequence profile, and gene co-expression information. We then trained deep neural networks for predicting plant mitochondrial proteins. Benchmarked on an independent dataset, our method achieves considerable improvements over existing tools in predicting mitochondria-localized proteins in plants. We improved the true positive rate by 10-30% over three of the state-of-the-art tools under similar specificity levels. We also applied our method to predict candidate mitochondrial proteins on the whole proteome of Arabidopsis and potato.
Published on 03/05/2017
Genomic selection is an approach to enhance the quantitative traits in plant and animal breeding program at early stage using whole genome molecular markers, especially for long life-cycle species. It’s based on the assumption that all quantitative trait loci (QTL) tend to be in linkage disequilibrium with at least on marker. Statistical methods, such as ridge regression, best linear unbiased prediction (RR-BLUP), Bayes A, Bayesian LASSO are widely used for genomic selection problem works SNP matrix. Other machine learning methods (random forrest, support vector machine and neural network) are also been applied for this study. In this work, we are developing a deep learning method using long short term memory (LSTM) recurrent network on a public standard dataset of Pinus taeda (loblolly pine) . The stem height trait (HT, cm) was measured across 861 individuals genotyped with 4,853 SNPs derived from 32 parents. The genomic estimated breeding values (GEBV) was calculated using 10-fold cross-validation method and accuracy was measured using Pearson correlation coefficient between GEBV and observed values.
 Hoerl, Arthur E., and Robert W. Kennard. "Ridge regression: biased estimation for nonorthogonal problems." Technometrics 42.1 (2000): 80-86.
 Meuwissen, T.H.E., B.J. Hayes, and M.E. Goddard. 2001. Prediction of total genetic value using genome-wide dense marker maps. Genetics 157:1819–1829.
 Park, Trevor, and George Casella. "The bayesian lasso." Journal of the American Statistical Association 103.482 (2008): 681-686.
 Heslot, Nicolas, et al. "Genomic selection in plant breeding: a comparison of models." Crop Science 52.1 (2012): 146-160.
 Resende, Márcio FR, et al. "Accuracy of genomic selection methods in a standard data set of loblolly pine (Pinus taeda L.)." Genetics 190.4 (2012): 1503-1510.
Published on 02/16/2017
The ePHR (electronic Personal Health Record) is a self-service technology (SST) used in health care, which can serve as an electronic information source for patients, physician and the government. Based on the literature review, we found that there are some concerns and barriers during the ePHR implement in industry perspectives, physicians’ perspective, patients’ perspectives and technology perspectives. In the pilot study of ePHR implementation impact on physician workflow, we conduct a qualitative analysis using structured physician interviews, and a quantitative analysis for physician workflow observations. We try to create recommendations for ePHR implementation for a variety of physician practices, at the same time, discussed some other important ePHR implementation components: communication, training and measurement.
Published on 02/10/2017
Auxins are a class of phytohormones in plants which have an active role in growth and development. The Auxin hormone control pathway in maize meristem is not well studied and has significant scope for in-depth exploration. Previous studies have not shown much novel information about how Auxin is regulated. Weighted Gene co-expression analysis takes results from differentially expressed genes and organizes them into clusters and modules showing possible interactions and co-regulation. These clusters usually highlight unique interactions which cannot be seen by most other methods. We present our ongoing work in building gene co-regulatory networks using certain gene knockouts in the maize meristem Auxin network.
Published on 02/03/2017
Health communication is the process that coordinates health services such as specimen transaction, oral interactions, medical records, and more. Healthcare workflows are based on communication established historically through the practice of healthcare or by the leadership in health institutions. However, during healthcare practices communication doesn’t flow according to plan; interpersonal miscommunication, technical glitches, information overload, etc. risk inefficient healthcare services. We hypothesize that health records contain information related to communication and we can retrieve it in order to address issues of communication. We present an informatics pipeline to retrieve health communication episodes from unstructured health data. The method uses Resource Description Framework (RDF), ontological modeling, and description logic inference to uncover and quantify implicit communication episodes. Retrieved communication has the potential to optimize and improve health communication structures especially, in data-intensive to precision medicine settings.
Published on 01/24/2017
With the advent of next-generation sequencing technologies, a considerable effort has been put into sequencing the epigenome of different species. The efforts such as “Encode” and “Roadmap” epigenomics projects provide an opportunity to compare epigenomes across species (especially between human and mouse). This study is an effort to understand how different histone modifications vary/co-appear between orthologus regions of the two species. In this work, we have also used various measures of orthologus similarity between each pair of orthologus genes and explore how histone modifications are conserved with respect to changes in these similarity measures. These measures of similarity include “gene ontology semantic similarity” (GOSemsim), “codon usage frequency similarity” (CUFS), Ka/Ks ratio and gene expression similarity. Our simulations indicate that evolutionary selection pressure of an orthologus pair (Ka/Ks ratio) is more strongly correlated with its histone modification than any other similarity measure.
Published on 01/18/2017
Early childhood home visiting programs date back to the 1880s and deliver a vital public service of providing and connecting families with health, educational, and economic resources to support optimal development. Continuous quality improvement (CQI) consists of systematic and continuous actions that lead to measurable improvement in services for targeted groups. CQI initiatives (CQII) in home visiting programs have traditionally occurred within a local implementing agency (LIA), parent organization, or funding provision. LIA CQII are often lost to the benefit of external agencies facing similar challenges. We developed a web-based environment, the Gateway, to virtually connect and engage users within an environment aimed to balance CQII training and practice. The environment supports CQII activities which promote and support LIAs quality improvement initiatives while aligning stakeholders from seven Missouri home visiting LIAs. Gateway standardizes quality improvement training, collates overlapping resources, and supports knowledge translation, thus improving capacity for measurable change in organizational initiatives. Gateway allows LIA personnel to identify program activities in need of quality improvement, and guide the planning, implementation, and evaluation of CQII. Prior to site launch, pilot and usability testing was conducted to three defined groups with positive results and a combined System Usability Scale score of 71.63. After full launch, we examined performance relative to targets through the integration of data, dashboards, and reports. To our knowledge, a virtual environment aimed to create a culture of quality improvement and foster CQII for home visiting program LIAs has not been previously reported. Given broad focus on CQI priorities across disciplines, the Gateway offers endless potential with expansion of this site and deployment to programs and agencies beyond the build population.
Published on 11/30/2016
The Neosho madtom (Noturus placidus) is a small catfish, generally less than 3 inches in length, unique to the Neosho-Spring River system within the Arkansas River Basin. It was federally listed as threatened in 1990, largely due to habitat loss. As part of conservation efforts, we generated whole genome Illumina paired-end sequence data from ten Neosho madtom (average 39X coverage) originating from three geographically separated subpopulations to evaluate genetic diversity and population structure. One slender madtom (Noturus exilis) was also sequenced as an outgroup. Although over 1 million variants were found between Neosho and slender madtom, only 86,155 SNPs were variable across the Neosho madtoms sequenced, indicating overall low level genetic diversity. In addition, principal component analysis based on these genotypes indicated weak population structure, suggesting these subpopulations are genetically compatible for reintroduction among the locations. Using only 50X coverage of paired-end and mate pair data, we assembled the Neosho madtom genome into 68,147 scaffolds with a scaffold N50 of 120 kb, demonstrating the value in assembling a genome from a population that is closely related to a species of economic interest (i.e., channel catfish, Ictalurus punctatus) but has lower genetic diversity and is easier to assemble. Ongoing efforts aim to improve the assembly and develop demographic models and genomic resources to investigate the basic biology of why such a low-diversity species can subsist and to assist in future conservation efforts.
Published on 11/08/2016
Polyploidy is an important mechanism in plant evolution. We are interested in studying how selective pressures change after a lineage experiences whole genome duplication (WGD) or triplication (WGT). Alpha duplication is the most recent WGD event in Arabidopsis. Then a WGT event occurred in genus Brassica when they diverged from Arabidopsis thaliana. We examined selection at both the population and the species level, by calculating the ratio of non-synonymous to synonymous polymorphisms (pN/pS) and computing Ka/Ks between species. In both lineages of Arabidopsis and Brassica, pN/pS values are larger than Ka/Ks, in accord with the expectation that most populations include individuals possessing mildly deleterious mutations that will eventually be removed by purifying selection. Naïve models of evolution after gene duplication would suggest that duplicates should experience some period of relaxed purifying selection as a result of the genetic redundancy. However, we found that alpha duplicates are actually under stronger constraint compared to other genes. Similarly, triplicated Brassica rapa genes have smaller pN/pS and Ka/Ks when compared to single copy genes. This indicates that the special classes of genes surviving after polyploidy are still under relatively strong selective pressure. Next, we mapped pN/pS and Ka/Ks onto the Arabidopsis thaliana metabolic network and looked for correlations between selection and network statistics. We found pN/pS and Ka/Ks are more variable for nodes with lower degree, while nodes with higher degree are more likely to have smaller pN/pS and Ka/Ks. These results suggest that more “important” nodes in network are generally more constrained selectively.