Bio


I am an Assistant Professor of Biomedical Data Science and, by courtesy, of Computer Science and Electrical Engineering at Stanford University. I work on a wide range of problems in machine learning (from proving mathematical properties to building large-scale algorithms) and am especially interested in applications in biotech and health. I received a Ph.D from Harvard in 2014, and was at one time a member of Microsoft Research, a Gates Scholar at the University of Cambridge and a Simons fellow at U.C. Berkeley. I joined Stanford in 2016 and am excited to be an inaugural Chan-Zuckerberg Investigator and the faculty director of the AI for Health program. We are also a part of the Stanford AI Lab. My research is supported by the NSF CAREER Award, and the Google and Tencent AI awards.

Academic Appointments


Current Research and Scholarly Interests


My group works on both foundations of statistical machine learning and applications in biomedicine and healthcare. We develop new technologies that make ML more accountable to humans, more reliable/robust and reveals core scientific insights.

We want our ML to be impactful and beneficial, and as such, we are deeply motivated by transformative applications in biotech and health. We collaborate with and advise many academic and industry groups.

2019-20 Courses


Stanford Advisees


  • Doctoral Dissertation Reader (AC)
    Louis Blankemeier, Jiaqi Jiang, Michael Kim, Ismael Lemhadri, Greg McInnes, Stephen Pfohl, Meltem Tolunay, Jessica Torres
  • Orals Chair
    Katherine McNamara
  • Postdoctoral Faculty Sponsor
    Roxana Daneshjou
  • Doctoral Dissertation Advisor (AC)
    Abubakar Abid, Amirata Ghorbani, Tony Ginart, Ruishan Liu, Jaime Roquero Gimenez, Zhenqin Wu
  • Orals Evaluator
    Michael Kim, Avanti Shrikumar
  • Master's Program Advisor
    Anthony Carrington, Soham Gadgil, Alexander Verge, Yuhui Zhang
  • Doctoral (Program)
    Lingjiao Chen, Bryan He, John Hughes, Garrett Thomas, Kevin Wu
  • Postdoctoral Research Mentor
    Yongchan Kwon

All Publications


  • Integrating spatial gene expression and breast tumour morphology via deep learning. Nature biomedical engineering He, B., Bergenstrahle, L., Stenbeck, L., Abid, A., Andersson, A., Borg, A., Maaskola, J., Lundeberg, J., Zou, J. 2020

    Abstract

    Spatial transcriptomics allows for the measurement of RNA abundance at a high spatial resolution, making it possible to systematically link the morphology of cellular neighbourhoods and spatially localized gene expression. Here, we report the development of a deep learning algorithm for the prediction of local gene expression from haematoxylin-and-eosin-stained histopathology images using a new dataset of 30,612 spatially resolved gene expression data matched to histopathology images from 23 patients with breast cancer. We identified over 100 genes, including known breast cancer biomarkers of intratumoral heterogeneity and the co-localization of tumour growth and immune activation, the expression of which can be predicted from the histopathology images at a resolution of 100m. We also show that the algorithm generalizes well to The Cancer Genome Atlas and to other breast cancer gene expression datasets without the need for re-training. Predicting the spatially resolved transcriptome of a tissue directly from tissue images may enable image-based screening for molecular biomarkers with spatial variation.

    View details for DOI 10.1038/s41551-020-0578-x

    View details for PubMedID 32572199

  • How Much Does Your Data Exploration Overfit? Controlling Bias via Information Usage IEEE TRANSACTIONS ON INFORMATION THEORY Russo, D., Zou, J. 2020; 66 (1): 302–23
  • Video-based AI for beat-to-beat assessment of cardiac function. Nature Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C. P., Heidenreich, P. A., Harrington, R. A., Liang, D. H., Ashley, E. A., Zou, J. Y. 2020; 580 (7802): 252–56

    Abstract

    Accurate assessment of cardiac function is crucial for the diagnosis of cardiovascular disease1, screening for cardiotoxicity2 and decisions regarding the clinical management of patients with a critical illness3. However, human assessment of cardiac function focuses on a limited sampling of cardiac cycles and has considerable inter-observer variability despite years of training4,5. Here, to overcome this challenge, we present a video-based deep learning algorithm-EchoNet-Dynamic-that surpasses the performance of human experts in the critical tasks of segmenting the left ventricle, estimating ejection fraction and assessing cardiomyopathy. Trained on echocardiogram videos, our model accurately segments the left ventricle with a Dice similarity coefficient of 0.92, predicts ejection fraction with a mean absolute error of 4.1% and reliably classifies heart failure with reduced ejection fraction (area under the curve of 0.97). In an external dataset from another healthcare system, EchoNet-Dynamic predicts the ejection fraction with a mean absolute error of 6.0% and classifies heart failure with reduced ejection fraction with an area under the curve of 0.96. Prospective evaluation with repeated human measurements confirms that the model has variance that is comparable to or less than that of human experts. By leveraging information across multiple cardiac cycles, our model can rapidly identify subtle changes in ejection fraction, is more reproducible than human evaluation and lays the foundation for precise diagnosis of cardiovascular disease in real time. As a resource to promote further innovation, we also make publicly available a large dataset of 10,030 annotated echocardiogram videos.

    View details for DOI 10.1038/s41586-020-2145-8

    View details for PubMedID 32269341

  • Fast and covariate-adaptive method amplifies detection power in large-scale multiplehypothesis testing. Nature communications Zhang, M. J., Xia, F., Zou, J. 2019; 10 (1): 3433

    Abstract

    Multiple hypothesis testing is an essential component of modern data science. In many settings, in addition to the p-value, additional covariates for each hypothesis are available, e.g., functional annotation of variants in genome-wide association studies. Such information is ignored by popular multiple testing approaches such as the Benjamini-Hochberg procedure (BH). Here we introduce AdaFDR, a fast and flexible method that adaptively learns the optimal p-value threshold from covariates to significantly improve detection power. On eQTL analysis of the GTEx data, AdaFDR discovers 32% more associations than BH at the same false discovery rate. We prove that AdaFDR controls false discovery proportion and show that it makes substantially more discoveries while controlling false discovery rate (FDR) in extensive experiments. AdaFDR is computationally efficient and allows multi-dimensional covariates with both numeric and categorical values, making it broadly useful across many applications.

    View details for DOI 10.1038/s41467-019-11247-0

    View details for PubMedID 31366926

  • Large dataset enables prediction of repair after CRISPR-Cas9 editing in primary T cells. Nature biotechnology Leenay, R. T., Aghazadeh, A., Hiatt, J., Tse, D., Roth, T. L., Apathy, R., Shifrut, E., Hultquist, J. F., Krogan, N., Wu, Z., Cirolia, G., Canaj, H., Leonetti, M. D., Marson, A., May, A. P., Zou, J. 2019

    Abstract

    Understanding of repair outcomes after Cas9-induced DNA cleavage is still limited, especially in primary human cells. We sequence repair outcomes at 1,656 on-target genomic sites in primary human T cells and use these data to train a machine learning model, which we have called CRISPR Repair Outcome (SPROUT). SPROUT accurately predicts the length, probability and sequence of nucleotide insertions and deletions, and will facilitate design of SpCas9 guide RNAs in therapeutically important primary human cells.

    View details for DOI 10.1038/s41587-019-0203-2

    View details for PubMedID 31359007

  • Clinical Genetics Lacks Standard Definitions and Protocols for the Collection and Use of Diversity Measures. American journal of human genetics Popejoy, A. B., Crooks, K. R., Fullerton, S. M., Hindorff, L. A., Hooker, G. W., Koenig, B. A., Pino, N., Ramos, E. M., Ritter, D. I., Wand, H., Wright, M. W., Yudell, M., Zou, J. Y., Plon, S. E., Bustamante, C. D., Ormond, K. E., Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group 2020

    Abstract

    Genetics researchers and clinical professionals rely on diversity measures such as race, ethnicity, and ancestry (REA) to stratify study participants and patients for a variety of applications in research and precision medicine. However, there are no comprehensive, widely accepted standards or guidelines for collecting and using such data in clinical genetics practice. Two NIH-funded research consortia, the Clinical Genome Resource (ClinGen) and Clinical Sequencing Evidence-generating Research (CSER), have partnered to address this issue and report how REA are currently collected, conceptualized, and used. Surveying clinical genetics professionals and researchers (n = 448), we found heterogeneity in the way REA are perceived, defined, and measured, with variation in the perceived importance of REA in both clinical and research settings. The majority of respondents (>55%) felt that REA are at least somewhat important for clinical variant interpretation, ordering genetic tests, and communicating results to patients. However, there was no consensus on the relevance of REA, including how each of these measures should be used in different scenarios and what information they can convey in the context of human genetics. A lack of common definitions and applications of REA across the precision medicine pipeline may contribute to inconsistencies in data collection, missing or inaccurate classifications, and misleading or inconclusive results. Thus, our findings support the need for standardization and harmonization of REA data collection and use in clinical genetics and precision health research.

    View details for DOI 10.1016/j.ajhg.2020.05.005

    View details for PubMedID 32504544

  • RNA-GPS predicts high-resolution RNA subcellular localization and highlights the role of splicing. RNA (New York, N.Y.) Wu, K. E., Parker, K. R., Fazal, F. M., Chang, H., Zou, J. 2020

    Abstract

    Subcellular localization is essential to RNA biogenesis, processing, and function across the gene expression life cycle. However, the specific nucleotide sequence motifs that direct RNA localization are incompletely understood. Fortunately, new sequencing technologies have provided transcriptome-wide atlases of RNA localization, creating an opportunity to leverage computational modeling. Here we present RNA-GPS, a new machine learning model that uses nucleotide-level features to predict RNA localization across 8 different subcellular locations - the first to provide such a wide range of predictions. RNA-GPS's design enables high throughput sequence ablation and feature importance analyses to probe the sequence motifs that drive localization prediction. We find localization informative motifs to be concentrated on 3' UTRs and scattered along the coding sequence, and motifs related to splicing to be important drivers of predicted localization, even for cytotopic distinctions for membraneless bodies within the nucleus or for organelles within the cytoplasm. Overall, our results suggest transcript splicing is one of many elements influencing RNA subcellular localization.

    View details for DOI 10.1261/rna.074161.119

    View details for PubMedID 32220894

  • Video-based AI for beat-to-beat assessment of cardiac function NATURE Ouyang, D., He, B., Ghorbani, A., Yuan, N., Ebinger, J., Langlotz, C. P., Heidenreich, P. A., Harrington, R. A., Liang, D. H., Ashley, E. A., Zou, J. Y. 2020
  • A benchmark of algorithms for the analysis of pooled CRISPR screens. Genome biology Bodapati, S., Daley, T. P., Lin, X., Zou, J., Qi, L. S. 2020; 21 (1): 62

    Abstract

    Genome-wide pooled CRISPR-Cas-mediated knockout, activation, and repression screens are powerful tools for functional genomic investigations. Despite their increasing importance, there is currently little guidance on how to design and analyze CRISPR-pooled screens. Here, we provide a review of the commonly used algorithms in the computational analysis of pooled CRISPR screens. We develop a comprehensive simulation framework to benchmark and compare the performance of these algorithms using both synthetic and real datasets. Our findings inform parameter choices of CRISPR screens and provide guidance to researchers on the design and analysis of pooled CRISPR screens.

    View details for DOI 10.1186/s13059-020-01972-x

    View details for PubMedID 32151271

  • RNA-GPS Predicts SARS-CoV-2 RNA Localization to Host Mitochondria and Nucleolus. bioRxiv : the preprint server for biology Wu, K., Zou, J., Chang, H. Y. 2020

    Abstract

    The SARS-CoV-2 coronavirus is driving a global pandemic, but its biological mechanisms are less well understood. SARS-CoV-2 is an RNA virus whose multiple genomic and subgenomic RNA (sgRNA) transcripts hijack the host cell's machinery, located across distinct cytotopic locations. Subcellular localization of its viral RNA could play important roles in viral replication and host antiviral immune response. Here we perform computational modeling of SARS-CoV-2 viral RNA localization across eight subcellular neighborhoods. We compare hundreds of SARS-CoV-2 genomes to the human transcriptome and other coronaviruses and perform systematic sub-sequence analyses to identify the responsible signals. Using state-of-the-art machine learning models, we predict that the SARS-CoV-2 RNA genome and all sgRNAs are enriched in the host mitochondrial matrix and nucleolus. The 5' and 3' viral untranslated regions possess the strongest and most distinct localization signals. We discuss the mitochondrial localization signal in relation to the formation of double-membrane vesicles, a critical stage in the coronavirus life cycle. Our computational analysis serves as a hypothesis generation tool to suggest models for SARS-CoV-2 biology and inform experimental efforts to combat the virus.

    View details for DOI 10.1101/2020.04.28.065201

    View details for PubMedID 32511373

    View details for PubMedCentralID PMC7263502

  • Deep learning interpretation of echocardiograms. NPJ digital medicine Ghorbani, A., Ouyang, D., Abid, A., He, B., Chen, J. H., Harrington, R. A., Liang, D. H., Ashley, E. A., Zou, J. Y. 2020; 3: 10

    Abstract

    Echocardiography uses ultrasound technology to capture high temporal and spatial resolution images of the heart and surrounding structures, and is the most common imaging modality in cardiovascular medicine. Using convolutional neural networks on a large new dataset, we show that deep learning applied to echocardiography can identify local cardiac structures, estimate cardiac function, and predict systemic phenotypes that modify cardiovascular risk but not readily identifiable to human interpretation. Our deep learning model, EchoNet, accurately identified the presence of pacemaker leads (AUC=0.89), enlarged left atrium (AUC=0.86), left ventricular hypertrophy (AUC=0.75), left ventricular end systolic and diastolic volumes ( R 2 =0.74 and R 2 =0.70), and ejection fraction ( R 2 =0.50), as well as predicted systemic phenotypes of age ( R 2 =0.46), sex (AUC=0.88), weight ( R 2 =0.56), and height ( R 2 =0.33). Interpretation analysis validates that EchoNet shows appropriate attention to key cardiac structures when performing human-explainable tasks and highlights hypothesis-generating regions of interest when predicting systemic phenotypes difficult for human interpretation. Machine learning on echocardiography images can streamline repetitive tasks in the clinical workflow, provide preliminary interpretation in areas with insufficient qualified cardiologists, and predict phenotypes challenging for human evaluation.

    View details for DOI 10.1038/s41746-019-0216-8

    View details for PubMedID 31993508

  • LitGen: Genetic Literature Recommendation Guided by Human Explanations. Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing Nie, A., Pineda, A. L., Wright, M. W., Wand, H., Wulf, B., Costa, H. A., Patel, R. Y., Bustamante, C. D., Zou, J. 2020; 25: 67–78

    Abstract

    As genetic sequencing costs decrease, the lack of clinical interpretation of variants has become the bottleneck in using genetics data. A major rate limiting step in clinical interpretation is the manual curation of evidence in the genetic literature by highly trained biocurators. What makes curation particularly time-consuming is that the curator needs to identify papers that study variant pathogenicity using different types of approaches and evidences-e.g. biochemical assays or case control analysis. In collaboration with the Clinical Genomic Resource (ClinGen)-the flagship NIH program for clinical curation-we propose the first machine learning system, LitGen, that can retrieve papers for a particular variant and filter them by specific evidence types used by curators to assess for pathogenicity. LitGen uses semi-supervised deep learning to predict the type of evi+dence provided by each paper. It is trained on papers annotated by ClinGen curators and systematically evaluated on new test data collected by ClinGen. LitGen further leverages rich human explanations and unlabeled data to gain 7.9%-12.6% relative performance improvement over models learned only on the annotated papers. It is a useful framework to improve clinical variant curation.

    View details for PubMedID 31797587

  • NCI Workshop on Artificial Intelligence in Radiation Oncology: Training the Next Generation. Practical radiation oncology Kang, J., Thompson, R. F., Aneja, S., Lehman, C., Trister, A., Zou, J., Obcemea, C., El Naqa, I. 2020

    Abstract

    Artificial intelligence (AI) is about to touch every aspect of radiotherapy from consultation, treatment planning, quality assurance, therapy delivery, to outcomes modeling. There is an urgent need to train radiation oncologists and medical physicists in data science to help shepherd AI solutions into clinical practice. Poorly trained personnel may do more harm than good when attempting to apply rapidly developing and complex technologies. As the amount of AI research expands in our field, the radiation oncology community needs to discuss how to educate future generations in this area. The National Cancer Institute (NCI) Workshop on AI in Radiation Oncology (Shady Grove, MD, April 4-5, 2019) was the first (https://dctd.cancer.gov/NewsEvents/20190523_ai_in_radiation_oncology.htm) of two data science workshops in radiation oncology hosted by the NCI in 2019. During this workshop, the Training and Education Working Group was formed by volunteers among the invited attendees. Its members represent radiation oncology, medical physics, radiology, computer science, industry, and the NCI. In this perspective article written by members of the Training and Education Working Group, we provide and discuss Action Points relevant for future trainees interested in radiation oncology AI: (1) creating AI awareness and responsible conduct; (2) implementing a practical didactic curriculum; (3) creating a publicly available database of training resources; and (4) accelerate learning and funding opportunities. Together, these Action Points can facilitate the translation of AI into clinical practice.

    View details for DOI 10.1016/j.prro.2020.06.001

    View details for PubMedID 32544635

  • Predicting target genes of noncoding regulatory variants with ICE. Bioinformatics (Oxford, England) Wu, Z., Ioannidis, N. M., Zou, J. 2020

    Abstract

    Interpreting genetic variants of unknown significance (VUS) is essential in clinical applications of genome sequencing for diagnosis and personalized care. Noncoding variants remain particularly difficult to interpret, despite making up a large majority of trait associations identified in GWAS analyses. Predicting the regulatory effects of noncoding variants on candidate genes is a key step in evaluating their clinical significance. Here we develop a machine learning algorithm, ICE (Inference of Connected eQTLs), to predict the regulatory targets of noncoding variants identified in studies of expression quantitative trait loci (eQTLs). We assemble datasets using eQTL results from the Genotype-Tissue Expression (GTEx) project and learn to separate positive and negative pairs based on annotations characterizing the variant, gene and the intermediate sequence. ICE achieves an area under the receiver operating characteristic curve (ROC-AUC) of 0.799 using random cross-validation, and 0.700 for a more stringent position-based cross-validation. Further evaluation on rare variants and experimentally-validated regulatory variants shows a significant enrichment in ICE identifying the true target genes versus negative controls. In gene ranking experiments, ICE achieves a top-1 accuracy of 50% and top-3 accuracy of 90%. Salient features, including GC content, histone modifications and Hi-C interactions are further analyzed and visualized to illustrate their influences on predictions. ICE can be applied to any VUS of interest and each candidate nearby gene to output a score reflecting the likelihood of regulatory effect on the expression level. These scores can be used to prioritize variants and genes to assist in patient diagnosis and GWAS follow-up studies.Codes and data used in this work are available at https://github.com/miaecle/eQTL_Trees.Supplementary data.

    View details for DOI 10.1093/bioinformatics/btaa254

    View details for PubMedID 32330225

  • PB-Net: Automatic peak integration by sequential deep learning for multiple reaction monitoring. Journal of proteomics Wu, Z., Serie, D., Xu, G., Zou, J. 2020: 103820

    Abstract

    Mass spectrometry (MS) based proteomics has become an indispensable component of modern molecular and cellular biochemistry analysis. Multiple reaction monitoring (MRM) is one of the most well-established MS techniques for molecule detection and quantification. Despite its wide usage, there lacks an accurate computational framework to analyze MRM data, and expert annotation is often required, especially to perform peak integration. Here we propose a deep learning method PB-Net (Peak Boundary Neural Network), built upon recent advances in sequential neural networks, for fully automatic chromatographic peak integration. To train PB-Net, we generated a large dataset of over 170,000 expert annotated peaks from MS transitions spanning a wide dynamic range, including both peptides and intact glycopeptides. Our model demonstrated outstanding performances on unseen test samples, reaching near-perfect agreement (Pearson's r 0.997) with human annotated ground truth. Systematic evaluations also show that PB-Net is substantially more robust and accurate compared to previous state-of-the-art peak integration software. PB-Net can benefit the wide community of mass spectrometry data analysis, especially in applications involving high-throughput MS experiments. Codes and test data used in this work are available at https://github.com/miaecle/PB-net. SIGNIFICANCE: Human annotations serve an important role in accurate quantification of multiple reaction monitoring (MRM) experiments, though they are costly to collect and limit analysis throughput. In this work we proposed and developed a novel technique for the peak-integration step in MRM, based on recent innovations in sequential deep learning models. We collected in total 170,000 expert-annotated MRM peaks and trained a set of accurate and robust neural networks for the task. Results demonstrated a substantial improvement over the current state-of-the-art software for mass spectrometry analysis and comparable level of accuracy and precision as human annotators.

    View details for DOI 10.1016/j.jprot.2020.103820

    View details for PubMedID 32416316

  • Sex and gender analysis improves science and engineering. Nature Tannenbaum, C., Ellis, R. P., Eyssel, F., Zou, J., Schiebinger, L. 2019; 575 (7781): 137–46

    Abstract

    The goal of sex and gender analysis is to promote rigorous, reproducible and responsible science. Incorporating sex and gender analysis into experimental design has enabled advancements across many disciplines, such as improved treatment of heart disease and insights into the societal impact of algorithmic bias. Here we discuss the potential for sex and gender analysis to foster scientific discovery, improve experimental efficiency and enable social equality. We provide a roadmap for sex and gender analysis across scientific disciplines and call on researchers, funding agencies, peer-reviewed journals and universities to coordinate efforts to implement robust methods of sex and gender analysis.

    View details for DOI 10.1038/s41586-019-1657-6

    View details for PubMedID 31695204

  • VetTag: improving automated veterinary diagnosis coding via large-scale language modeling NPJ DIGITAL MEDICINE Zhang, Y., Nie, A., Zehnder, A., Page, R. L., Zou, J. 2019; 2
  • Modeling Spatial Correlation of Transcripts with Application to Developing Pancreas SCIENTIFIC REPORTS Liu, R., Mignardi, M., Jones, R., Enge, M., Kim, S. K., Quake, S. R., Zou, J. 2019; 9
  • Modeling Spatial Correlation of Transcripts with Application to Developing Pancreas. Scientific reports Liu, R., Mignardi, M., Jones, R., Enge, M., Kim, S. K., Quake, S. R., Zou, J. 2019; 9 (1): 5592

    Abstract

    Recently high-throughput image-based transcriptomic methods were developed and enabled researchers to spatially resolve gene expression variation at the molecular level for the first time. In this work, we develop a general analysis tool to quantitatively study the spatial correlations of gene expression in fixed tissue sections. As an illustration, we analyze the spatial distribution of single mRNA molecules measured by in situ sequencing on human fetal pancreas at three developmental time points-80, 87 and 117days post-fertilization. We develop a density profile-based method to capture the spatial relationship between gene expression and other morphological features of the tissue sample such as position of nuclei and endocrine cells of the pancreas. In addition, we build a statistical model to characterize correlations in the spatial distribution of the expression level among different genes. This model enables us to infer the inhibitory and clustering effects throughout different time points. Our analysis framework is applicable to a wide variety of spatially-resolved transcriptomic data to derive biological insights.

    View details for PubMedID 30944357

  • A large CRISPR-induced bystander mutation causes immune dysregulation. Communications biology Simeonov, D. R., Brandt, A. J., Chan, A. Y., Cortez, J. T., Li, Z., Woo, J. M., Lee, Y., Carvalho, C. M., Indart, A. C., Roth, T. L., Zou, J., May, A. P., Lupski, J. R., Anderson, M. S., Buaas, F. W., Rokhsar, D. S., Marson, A. 2019; 2: 70

    Abstract

    A persistent concern with CRISPR-Cas9 gene editing has been the potential to generate mutations at off-target genomic sites. While CRISPR-engineering mice to delete a ~360bp intronic enhancer, here we discovered a founder line that had marked immune dysregulation caused by a 24kb tandem duplication of the sequence adjacent to the on-target deletion. Our results suggest unintended repair of on-target genomic cuts can cause pathogenic "bystander" mutations that escape detection by routine targeted genotyping assays.

    View details for PubMedID 30793048

  • Contrastive Multivariate Singular Spectrum Analysis Dirie, A., Abid, A., Zou, J., IEEE IEEE. 2019: 1122–27
  • Contingent Payment Mechanisms for Resource Utilization Ma, H., Meir, R., Parkes, D. C., Zou, J., Assoc Comp Machinery ASSOC COMPUTING MACHINERY. 2019: 422–30
  • Improving the Stability of the Knockoff Procedure: Multiple Simultaneous Knockoffs and Entropy Maximization Gimenez, J., Zou, J., Chaudhuri, K., Sugiyama, M. MICROTOME PUBLISHING. 2019
  • Knockoffs for the Mass: New Feature Importance Statistics with False Discovery Guarantees Gimenez, J., Ghorbani, A., Zou, J., Chaudhuri, K., Sugiyama, M. MICROTOME PUBLISHING. 2019
  • VetTag: improving automated veterinary diagnosis coding via large-scale language modeling. NPJ digital medicine Zhang, Y., Nie, A., Zehnder, A., Page, R. L., Zou, J. 2019; 2: 35

    Abstract

    Unlike human medical records, most of the veterinary records are free text without standard diagnosis coding. The lack of systematic coding is a major barrier to the growing interest in leveraging veterinary records for public health and translational research. Recent machine learning effort is limited to predicting 42 top-level diagnosis categories from veterinary notes. Here we develop a large-scale algorithm to automatically predict all 4577 standard veterinary diagnosis codes from free text. We train our algorithm on a curated dataset of over 100 K expert labeled veterinary notes and over one million unlabeled notes. Our algorithm is based on the adapted Transformer architecture and we demonstrate that large-scale language modeling on the unlabeled notes via pretraining and as an auxiliary objective during supervised learning greatly improves performance. We systematically evaluate the performance of the model and several baselines in challenging settings where algorithms trained on one hospital are evaluated in a different hospital with substantial domain shift. In addition, we show that hierarchical training can address severe data imbalances for fine-grained diagnosis with a few training cases, and we provide interpretation for what is learned by the deep network. Our algorithm addresses an important challenge in veterinary medicine, and our model and experiments add insights into the power of unsupervised learning for clinical natural language processing.

    View details for DOI 10.1038/s41746-019-0113-1

    View details for PubMedID 31304381

    View details for PubMedCentralID PMC6550141

  • A primer on deep learning in genomics. Nature genetics Zou, J., Huss, M., Abid, A., Mohammadi, P., Torkamani, A., Telenti, A. 2018

    Abstract

    Deep learning methods are a class of machine learning techniques capable of identifying highly complex patterns in large datasets. Here, we provide a perspective and primer on deep learning applications for genome analysis. We discuss successful applications in the fields of regulatory genomics, variant calling and pathogenicity scores. We include general guidance for how to effectively use deep learning methods as well as a practical guide to tools and resources. This primer is accompanied by an interactive online tutorial.

    View details for PubMedID 30478442

  • The clinical imperative for inclusivity: Race, ethnicity, and ancestry (REA) in genomics. Human mutation Popejoy, A. B., Ritter, D. I., Crooks, K., Currey, E., Fullerton, S. M., Hindorff, L. A., Koenig, B., Ramos, E. M., Sorokin, E. P., Wand, H., Wright, M. W., Zou, J., Gignoux, C. R., Bonham, V. L., Plon, S. E., Bustamante, C. D., Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group (ADWG) 2018; 39 (11): 1713–20

    Abstract

    The Clinical Genome Resource (ClinGen) Ancestry and Diversity Working Group highlights the need to develop guidance on race, ethnicity, and ancestry (REA) data collection and use in clinical genomics. We present quantitative and qualitative evidence to characterize: (1) acquisition of REA data via clinical laboratory requisition forms, and (2) information disparity across populations in the Genome Aggregation Database (gnomAD) at clinically relevant sites ascertained from annotations in ClinVar. Our requisition form analysis showed substantial heterogeneity in clinical laboratory ascertainment of REA, as well as marked incongruity among terms used to define REA categories. There was also striking disparity across REA populations in the amount of information available about clinically relevant variants in gnomAD. European ancestral populations constituted the majority of observations (55.8%), allele counts (59.7%), and private alleles (56.1%) in gnomAD at 550 loci with "pathogenic" and "likely pathogenic" expert-reviewed variants in ClinVar. Our findings highlight the importance of implementing and supporting programs to increase diversity in genome sequencing and clinical genomics, as well as measuring uncertainty around population-level datasets that are used in variant interpretation. Finally, we suggest the need for a standardized REA data collection framework to be developed through partnerships and collaborations and adopted across clinical genomics.

    View details for PubMedID 30311373

  • DeepTag: inferring diagnoses from veterinary clinical notes NPJ DIGITAL MEDICINE Nie, A., Zehnder, A., Page, R. L., Zhang, Y., Pineda, A., Rivas, M. A., Bustamante, C. D., Zou, J. 2018; 1
  • Integrative proteomics and bioinformatic prediction enable a high-confidence apicoplast proteome in malaria parasites. PLoS biology Boucher, M. J., Ghosh, S., Zhang, L., Lal, A., Jang, S. W., Ju, A., Zhang, S., Wang, X., Ralph, S. A., Zou, J., Elias, J. E., Yeh, E. 2018; 16 (9): e2005895

    Abstract

    Malaria parasites (Plasmodium spp.) and related apicomplexan pathogens contain a nonphotosynthetic plastid called the apicoplast. Derived from an unusual secondary eukaryote-eukaryote endosymbiosis, the apicoplast is a fascinating organelle whose function and biogenesis rely on a complex amalgamation of bacterial and algal pathways. Because these pathways are distinct from the human host, the apicoplast is an excellent source of novel antimalarial targets. Despite its biomedical importance and evolutionary significance, the absence of a reliable apicoplast proteome has limited most studies to the handful of pathways identified by homology to bacteria or primary chloroplasts, precluding our ability to study the most novel apicoplast pathways. Here, we combine proximity biotinylation-based proteomics (BioID) and a new machine learning algorithm to generate a high-confidence apicoplast proteome consisting of 346 proteins. Critically, the high accuracy of this proteome significantly outperforms previous prediction-based methods and extends beyond other BioID studies of unique parasite compartments. Half of identified proteins have unknown function, and 77% are predicted to be important for normal blood-stage growth. We validate the apicoplast localization of a subset of novel proteins and show that an ATP-binding cassette protein ABCF1 is essential for blood-stage survival and plays a previously unknown role in apicoplast biogenesis. These findings indicate critical organellar functions for newly discovered apicoplast proteins. The apicoplast proteome will be an important resource for elucidating unique pathways derived from secondary endosymbiosis and prioritizing antimalarial drug targets.

    View details for PubMedID 30212465

  • Design AI so that it's fair NATURE Zou, J., Schiebinger, L. 2018; 559 (7714): 324–26

    View details for DOI 10.1038/d41586-018-05707-8

    View details for Web of Science ID 000439059800025

    View details for PubMedID 30018439

  • Exploring patterns enriched in a dataset with contrastive principal component analysis NATURE COMMUNICATIONS Abid, A., Zhang, M. J., Bagaria, V. K., Zou, J. 2018; 9: 2134

    Abstract

    Visualization and exploration of high-dimensional data is a ubiquitous challenge across disciplines. Widely used techniques such as principal component analysis (PCA) aim to identify dominant trends in one dataset. However, in many settings we have datasets collected under different conditions, e.g., a treatment and a control experiment, and we are interested in visualizing and exploring patterns that are specific to one dataset. This paper proposes a method, contrastive principal component analysis (cPCA), which identifies low-dimensional structures that are enriched in a dataset relative to comparison data. In a wide variety of experiments, we demonstrate that cPCA with a background dataset enables us to visualize dataset-specific patterns missed by PCA and other standard methods. We further provide a geometric interpretation of cPCA and strong mathematical guarantees. An implementation of cPCA is publicly available, and can be used for exploratory data analysis in many applications where PCA is currently used.

    View details for PubMedID 29849030

  • Word embeddings quantify 100 years of gender and ethnic stereotypes PROCEEDINGS OF THE NATIONAL ACADEMY OF SCIENCES OF THE UNITED STATES OF AMERICA Garg, N., Schiebinger, L., Jurafsky, D., Zou, J. 2018; 115 (16): E3635–E3644

    Abstract

    Word embeddings are a powerful machine-learning framework that represents each English word by a vector. The geometric relationship between these vectors captures meaningful semantic relationships between the corresponding words. In this paper, we develop a framework to demonstrate how the temporal dynamics of the embedding helps to quantify changes in stereotypes and attitudes toward women and ethnic minorities in the 20th and 21st centuries in the United States. We integrate word embeddings trained on 100 y of text data with the US Census to show that changes in the embedding track closely with demographic and occupation shifts over time. The embedding captures societal shifts-e.g., the women's movement in the 1960s and Asian immigration into the United States-and also illuminates how specific adjectives and occupations became more closely associated with certain populations over time. Our framework for temporal analysis of word embedding opens up a fruitful intersection between machine learning and quantitative social science.

    View details for PubMedID 29615513

  • Autowarp: Learning a Warping Distance from Unlabeled Time Series Using Sequence Autoencoders Abid, A., Zou, J., Bengio, S., Wallach, H., Larochelle, H., Grauman, K., CesaBianchi, N., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2018
  • Embedding for Informative Missingness: Deep Learning With Incomplete Data Ghorbani, A., Zou, J. Y., IEEE IEEE. 2018: 437–45
  • The Effects of Memory Replay in Reinforcement Learning Liu, R., Zou, J., IEEE IEEE. 2018: 478–85
  • DeepTag: inferring diagnoses from veterinary clinical notes. NPJ digital medicine Nie, A., Zehnder, A., Page, R. L., Zhang, Y., Pineda, A. L., Rivas, M. A., Bustamante, C. D., Zou, J. 2018; 1: 60

    Abstract

    Large scale veterinary clinical records can become a powerful resource for patient care and research. However, clinicians lack the time and resource to annotate patient records with standard medical diagnostic codes and most veterinary visits are captured in free-text notes. The lack of standard coding makes it challenging to use the clinical data to improve patient care. It is also a major impediment to cross-species translational research, which relies on the ability to accurately identify patient cohorts with specific diagnostic criteria in humans and animals. In order to reduce the coding burden for veterinary clinical practice and aid translational research, we have developed a deep learning algorithm, DeepTag, which automatically infers diagnostic codes from veterinary free-text notes. DeepTag is trained on a newly curated dataset of 112,558 veterinary notes manually annotated by experts. DeepTag extends multitask LSTM with an improved hierarchical objective that captures the semantic structures between diseases. To foster human-machine collaboration, DeepTag also learns to abstain in examples when it is uncertain and defers them to human experts, resulting in improved performance. DeepTag accurately infers disease codes from free-text even in challenging cross-hospital settings where the text comes from different clinical settings than the ones used for training. It enables automated disease annotation across a broad range of clinical diagnoses with minimal preprocessing. The technical framework in this work can be applied in other medical domains that currently lack medical coding resources.

    View details for DOI 10.1038/s41746-018-0067-8

    View details for PubMedID 31304339

    View details for PubMedCentralID PMC6550285

  • Diabetes reversal by inhibition of the low-molecular-weight tyrosine phosphatase NATURE CHEMICAL BIOLOGY Stanford, S. M., Aleshin, A. E., Zhang, V., Ardecky, R. J., Hedrick, M. P., Zou, J., Ganji, S. R., Bliss, M. R., Yamamoto, F., Bobkov, A. A., Kiselar, J., Liu, Y., Cadwell, G. W., Khare, S., Yu, J., Barquilla, A., Chung, T. D., Mustelin, T., Schenk, S., Bankston, L. A., Liddington, R. C., Pinkerton, A. B., Bottini, N. 2017; 13 (6): 624-?

    Abstract

    Obesity-associated insulin resistance plays a central role in type 2 diabetes. As such, tyrosine phosphatases that dephosphorylate the insulin receptor (IR) are potential therapeutic targets. The low-molecular-weight protein tyrosine phosphatase (LMPTP) is a proposed IR phosphatase, yet its role in insulin signaling in vivo has not been defined. Here we show that global and liver-specific LMPTP deletion protects mice from high-fat diet-induced diabetes without affecting body weight. To examine the role of the catalytic activity of LMPTP, we developed a small-molecule inhibitor with a novel uncompetitive mechanism, a unique binding site at the opening of the catalytic pocket, and an exquisite selectivity over other phosphatases. This inhibitor is orally bioavailable, and it increases liver IR phosphorylation in vivo and reverses high-fat diet-induced diabetes. Our findings suggest that LMPTP is a key promoter of insulin resistance and that LMPTP inhibitors would be beneficial for treating type 2 diabetes.

    View details for DOI 10.1038/nchembio.2344

    View details for Web of Science ID 000401419300015

    View details for PubMedID 28346406

  • Correcting for cell-type heterogeneity in DNA methylation: a comprehensive evaluation. Nature methods Rahmani, E., Zaitlen, N., Baran, Y., Eng, C., Hu, D., Galanter, J., Oh, S., Burchard, E. G., Eskin, E., Zou, J., Halperin, E. 2017; 14 (3): 218-219

    View details for DOI 10.1038/nmeth.4190

    View details for PubMedID 28245214

  • NeuralFDR: Learning Discovery Thresholds from Hypothesis Features Xia, F., Zhang, M. J., Zou, J., Tse, D., Guyon, Luxburg, U. V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R. NEURAL INFORMATION PROCESSING SYSTEMS (NIPS). 2017
  • Quantifying unobserved protein-coding variants in human populations provides a roadmap for large-scale sequencing projects. Nature communications Zou, J., Valiant, G., Valiant, P., Karczewski, K., Chan, S. O., Samocha, K., Lek, M., Sunyaev, S., Daly, M., MacArthur, D. G. 2016; 7: 13293-?

    Abstract

    As new proposals aim to sequence ever larger collection of humans, it is critical to have a quantitative framework to evaluate the statistical power of these projects. We developed a new algorithm, UnseenEst, and applied it to the exomes of 60,706 individuals to estimate the frequency distribution of all protein-coding variants, including rare variants that have not been observed yet in the current cohorts. Our results quantified the number of new variants that we expect to identify as sequencing cohorts reach hundreds of thousands of individuals. With 500K individuals, we find that we expect to capture 7.5% of all possible loss-of-function variants and 12% of all possible missense variants. We also estimate that 2,900 genes have loss-of-function frequency of <0.00001 in healthy humans, consistent with very strong intolerance to gene inactivation.

    View details for DOI 10.1038/ncomms13293

    View details for PubMedID 27796292

    View details for PubMedCentralID PMC5095512

  • Limits on Active to Sterile Neutrino Oscillations from Disappearance Searches in the MINOS, Daya Bay, and Bugey-3 Experiments PHYSICAL REVIEW LETTERS Adamson, P., An, F. P., Anghel, I., Aurisano, A., Balantekin, A. B., Band, H. R., Barr, G., Bishai, M., Blake, A., Blyth, S., Bock, G. J., Bogert, D., Cao, D., Cao, G. F., Cao, J., Cao, S. V., Carroll, T. J., Castromonte, C. M., Cen, W. R., CHAN, Y. L., Chang, J. F., Chang, L. C., Chang, Y., Chen, H. S., Chen, Q. Y., Chen, R., Chen, S. M., Chen, Y., Chen, Y. X., Cheng, J., Cheng, J., Cheng, Y. P., Cheng, Z. K., CHERWINKA, J. J., Childress, S., Chu, M. C., Chukanov, A., Coelho, J. A., Corwin, L., Cronin-Hennessy, D., Cummings, J. P., de Arcos, J., De Rijck, S., Deng, Z. Y., Devan, A. V., Devenish, N. E., Ding, X. F., Ding, Y. Y., Diwan, M. V., Dolgareva, M., Dove, J., Dwyer, D. A., Edwards, W. R., Escobar, C. O., Evans, J. J., Falk, E., FELDMAN, G. J., Flanagan, W., Frohne, M. V., Gabrielyan, M., Gallagher, H. R., Germani, S., Gill, R., Gomes, R. A., Gonchar, M., Gong, G. H., Gong, H., Goodman, M. C., Gouffon, P., Graf, N., Gran, R., Grassi, M., Grzelak, K., Gu, W. Q., Guan, M. Y., Guo, L., Guo, R. P., Guo, X. H., Guo, Z., Habig, A., Hackenburg, R. W., Hahn, S. R., Han, R., Hans, S., Hartnell, J., Hatcher, R., He, M., Heeger, K. M., Heng, Y. K., Higuera, A., Holin, A., Hor, Y. K., Hsiung, Y. B., Hu, B. Z., Hu, T., Hu, W., Huang, E. C., Huang, H. X., Huang, J., Huang, X. T., Huber, P., Huo, W., Hussain, G., Hylen, J., Irwin, G. M., Isvan, Z., Jaffe, D. E., Jaffke, P., James, C., Jen, K. L., Jensen, D., Jetter, S., Ji, X. L., Ji, X. P., Jiao, J. B., Johnson, R. A., de Jong, J. K., Joshi, J., Kafka, T., Kang, L., Kasahara, S. M., Kettell, S. H., Kohn, S., Koizumi, G., Kordosky, M., Kramer, M., Kreymer, A., Kwan, K. K., Kwok, M. W., Kwok, T., Lang, K., Langford, T. J., Lau, K., Lebanowski, L., Lee, J., Lee, J. H., Lei, R. T., Leitner, R., Leung, J. K., Li, C., Li, D. J., Li, F., Li, G. S., Li, Q. J., Li, S., LI, S. C., Li, W. D., Li, X. N., Li, Y. F., Li, Z. B., Liang, H., Lin, C. J., Lin, G. L., Lin, S., Lin, S. K., Lin, Y., Ling, J. J., Link, J. M., Litchfield, P. J., Littenberg, L., Littlejohn, B. R., Liu, D. W., Liu, J. C., Liu, J. L., Loh, C. W., Lu, C., Lu, H. Q., Lu, J. S., Lucas, P., Luk, K. B., Lv, Z., Ma, Q. M., Ma, X. B., Ma, X. Y., Ma, Y. Q., Malyshkin, Y., Mann, W. A., Marshak, M. L., Caicedo, D. A., Mayer, N., McDonald, K. T., McGivern, C., McKeown, R. D., Medeiros, M. M., Mehdiyev, R., Meier, J. R., Messier, M. D., Miller, W. H., Mishra, S. R., Mitchell, I., Mooney, M., Moore, C. D., Mualem, L., Musser, J., Nakajima, Y., Naples, D., Napolitano, J., Naumov, D., Naumova, E., Nelson, J. K., Newman, H. B., Ngai, H. Y., Nichol, R. J., Ning, Z., Nowak, J. A., O'Connor, J., Ochoa-Ricoux, J. P., Olshevskiy, A., Orchanian, M., Pahlka, R. B., Paley, J., Pan, H., Park, J., Patterson, R. B., Patton, S., Pawloski, G., Pec, V., Peng, J. C., Perch, A., Pfuetzner, M. M., Phan, D. D., Phan-Budd, S., Pinsky, L., Plunkett, R. K., Poonthottathil, N., Pun, C. S., Qi, F. Z., Qi, M., Qian, X., Qiu, X., Radovic, A., Raper, N., Rebel, B., Ren, J., Rosenfeld, C., Rosero, R., Roskovec, B., Ruan, X. C., Rubin, H. A., Sail, P., Sanchez, M. C., Schneps, J., Schreckenberger, A., Schreiner, P., Sharma, R., Sher, S. M., Sousa, A., Steiner, H., Sun, G. X., Sun, J. L., Tagg, N., Talaga, R. L., Tang, W., Taychenachev, D., Thomas, J., Thomson, M. A., Tian, X., Timmons, A., Todd, J., Tognini, S. C., Toner, R., Torretta, D., Treskov, K., Tsang, K. V., Tull, C. E., Tzanakos, G., Urheim, J., Vahle, P., Viaux, N., Viren, B., Vorobel, V., Wang, C. H., Wang, M., Wang, N. Y., Wang, R. G., Wang, W., Wang, X., Wang, Y. F., Wang, Z., Wang, Z. M., Webb, R. C., Weber, A., Wei, H. Y., WEN, L. J., Whisnant, K., White, C., Whitehead, L., Whitehead, L. H., Wise, T., Wojcicki, S. G., Wong, H. L., Wong, S. C., Worcester, E., Wu, C., Wu, Q., Wu, W. J., Xia, D. M., Xia, J. K., Xing, Z. Z., Xu, J. L., Xu, J. Y., Xu, Y., Xue, T., Yang, C. G., Yang, H., Yang, L., Yang, M. S., Yang, M. T., Ye, M., Ye, Z., Yeh, M., Young, B. L., Yu, Z. Y., Zeng, S., Zhan, L., Zhang, C., Zhang, H. H., Zhang, J. W., Zhang, Q. M., Zhang, X. T., Zhang, Y. M., Zhang, Y. X., Zhang, Z. J., Zhang, Z. P., Zhang, Z. Y., Zhao, J., Zhao, Q. W., Zhao, Y. B., Zhong, W. L., Zhou, L., Zhou, N., Zhuang, H. L., Zou, J. H. 2016; 117 (15)

    Abstract

    Searches for a light sterile neutrino have been performed independently by the MINOS and the Daya Bay experiments using the muon (anti)neutrino and electron antineutrino disappearance channels, respectively. In this Letter, results from both experiments are combined with those from the Bugey-3 reactor neutrino experiment to constrain oscillations into light sterile neutrinos. The three experiments are sensitive to complementary regions of parameter space, enabling the combined analysis to probe regions allowed by the Liquid Scintillator Neutrino Detector (LSND) and MiniBooNE experiments in a minimally extended four-neutrino flavor framework. Stringent limits on sin^{2}2θ_{μe} are set over 6 orders of magnitude in the sterile mass-squared splitting Δm_{41}^{2}. The sterile-neutrino mixing phase space allowed by the LSND and MiniBooNE experiments is excluded for Δm_{41}^{2}<0.8  eV^{2} at 95%  CL_{s}.

    View details for DOI 10.1103/PhysRevLett.117.151801

    View details for PubMedID 27768356

  • Hierarchical Patterning of Multifunctional Conducting Polymer Nanoparticles as a Bionic Platform for Topographic Contact Guidance ACS NANO Ho, D., Zou, J., Chen, X., Munshi, A., Smith, N. M., Agarwal, V., Hodgetts, S. I., Plant, G. W., Bakker, A. J., Harvey, A. R., Luzinov, I., Iyer, K. S. 2015; 9 (2): 1767-1774

    Abstract

    The use of programmed electrical signals to influence biological events has been a widely accepted clinical methodology for neurostimulation. An optimal biocompatible platform for neural activation efficiently transfers electrical signals across the electrode-cell interface and also incorporates large-area neural guidance conduits. Inherently conducting polymers (ICPs) have emerged as frontrunners as soft biocompatible alternatives to traditionally used metal electrodes, which are highly invasive and elicit tissue damage over long-term implantation. However, fabrication techniques for the ICPs suffer a major bottleneck, which limits their usability and medical translation. Herein, we report that these limitations can be overcome using colloidal chemistry to fabricate multimodal conducting polymer nanoparticles. Furthermore, we demonstrate that these polymer nanoparticles can be precisely assembled into large-area linear conduits using surface chemistry. Finally, we validate that this platform can act as guidance conduits for neurostimulation, whereby the presence of electrical current induces remarkable dendritic axonal sprouting of cells.

    View details for DOI 10.1021/nn506607x

    View details for Web of Science ID 000349940500072

    View details for PubMedID 25623615

  • Endovascular Repair With the Chimney Technique for Stanford Type B Aortic Dissection Involving Right-Sided Arch With Mirror Image Branching JOURNAL OF ENDOVASCULAR THERAPY Ma, H., Yang, H., Xu, W., Zou, J., Jiang, J., Jiao, Y., Zhang, X. 2013; 20 (3): 283-288

    Abstract

    To report endovascular repair with the chimney technique of type B aortic dissection involving a right-sided aortic arch (RAA).Two hypertensive men aged 48 and 42 years with symptoms of aortic dissection resistant to medical therapy underwent emergent thoracic endovascular aortic repair with the chimney technique to extend the proximal landing zones. Both patients had right-sided arches with mirror image branching. One patient required a bare metal chimney stent to maintain perfusion to the right subclavian artery, while the other patient had a chimney stent to revascularize the right common carotid artery. Short-term follow-up (1 year and 1 month, respectively) showed that there was positive aortic remodeling, and the chimney stents were patent.Chimney TEVAR seems safe and effective for Stanford type B dissection in patients having RAA with mirror image branching and no sufficient proximal fixation zone.

    View details for Web of Science ID 000320074100005

    View details for PubMedID 23731297

  • Conversion of Human Fibroblasts to Functional Endothelial Cells by Defined Factors ARTERIOSCLEROSIS THROMBOSIS AND VASCULAR BIOLOGY Li, J., Huang, N. F., Zou, J., Laurent, T. J., Lee, J. C., Okogbaa, J., Cooke, J. P., Ding, S. 2013; 33 (6): 1366-?

    Abstract

    Transdifferentiation of fibroblasts to endothelial cells (ECs) may provide a novel therapeutic avenue for diseases, including ischemia and fibrosis. Here, we demonstrate that human fibroblasts can be transdifferentiated into functional ECs by using only 2 factors, Oct4 and Klf4, under inductive signaling conditions.To determine whether human fibroblasts could be converted into ECs by transient expression of pluripotency factors, human neonatal fibroblasts were transduced with lentiviruses encoding Oct4 and Klf4 in the presence of soluble factors that promote the induction of an endothelial program. After 28 days, clusters of induced endothelial (iEnd) cells seemed and were isolated for further propagation and subsequent characterization. The iEnd cells resembled primary human ECs in their transcriptional signature by expressing endothelial phenotypic markers, such as CD31, vascular endothelial-cadherin, and von Willebrand Factor. Furthermore, the iEnd cells could incorporate acetylated low-density lipoprotein and form vascular structures in vitro and in vivo. When injected into the ischemic limb of mice, the iEnd cells engrafted, increased capillary density, and enhanced tissue perfusion. During the transdifferentiation process, the endogenous pluripotency network was not activated, suggesting that this process bypassed a pluripotent intermediate step.Pluripotent factor-induced transdifferentiation can be successfully applied for generating functional autologous ECs for therapeutic applications.

    View details for DOI 10.1161/ATVBAHA.112.301167

    View details for Web of Science ID 000319119500038

    View details for PubMedID 23520160

  • Amino Acid Homeostasis Modulates Salicylic Acid-Associated Redox Status and Defense Responses in Arabidopsis PLANT CELL Liu, G., Ji, Y., Bhuiyan, N. H., Pilot, G., Selvaraj, G., Zou, J., Wei, Y. 2010; 22 (11): 3845-3863

    Abstract

    The tight association between nitrogen status and pathogenesis has been broadly documented in plant-pathogen interactions. However, the interface between primary metabolism and disease responses remains largely unclear. Here, we show that knockout of a single amino acid transporter, LYSINE HISTIDINE TRANSPORTER1 (LHT1), is sufficient for Arabidopsis thaliana plants to confer a broad spectrum of disease resistance in a salicylic acid-dependent manner. We found that redox fine-tuning in photosynthetic cells was causally linked to the lht1 mutant-associated phenotypes. Furthermore, the enhanced resistance in lht1 could be attributed to a specific deficiency of its main physiological substrate, Gln, and not to a general nitrogen deficiency. Thus, by enabling nitrogen metabolism to moderate the cellular redox status, a plant primary metabolite, Gln, plays a crucial role in plant disease resistance.

    View details for DOI 10.1105/tpc.110.079392

    View details for Web of Science ID 000285576500025

    View details for PubMedID 21097712

    View details for PubMedCentralID PMC3015111

  • Alcoholic neurobiology: Changes in dependence and recovery 12th International Congress of the International-Society-for-Biomedical-Research-on-Alcoholism Crews, F. T., Buckley, T., Dodd, P. R., Ende, G., Foley, N., Harper, C., He, J., Innes, D., Loh, E. W., Pfefferbaum, A., Zou, J., SULLIVAN, E. V. WILEY-BLACKWELL. 2005: 1504–13

    Abstract

    This article presents the proceedings of a symposium held at the meeting of the International Society for Biomedical Research on Alcoholism (ISBRA) in Mannheim, Germany, in October, 2004. Chronic alcoholism follows a fluctuating course, which provides a naturalistic experiment in vulnerability, resilience, and recovery of human neural systems in response to presence, absence, and history of the neurotoxic effects of alcoholism. Alcohol dependence is a progressive chronic disease that is associated with changes in neuroanatomy, neurophysiology, neural gene expression, psychology, and behavior. Specifically, alcohol dependence is characterized by a neuropsychological profile of mild to moderate impairment in executive functions, visuospatial abilities, and postural stability, together with relative sparing of declarative memory, language skills, and primary motor and perceptual abilities. Recovery from alcoholism is associated with a partial reversal of CNS deficits that occur in alcoholism. The reversal of deficits during recovery from alcoholism indicates that brain structure is capable of repair and restructuring in response to insult in adulthood. Indirect support of this repair model derives from studies of selective neuropsychological processes, structural and functional neuroimaging studies, and preclinical studies on degeneration and regeneration during the development of alcohol dependence and recovery form dependence. Genetics and brain regional specificity contribute to unique changes in neuropsychology and neuroanatomy in alcoholism and recovery. This symposium includes state-of-the-art presentations on changes that occur during active alcoholism as well as those that may occur during recovery-abstinence from alcohol dependence. Included are human neuroimaging and neuropsychological assessments, changes in human brain gene expression, allelic combinations of genes associated with alcohol dependence and preclinical studies investigating mechanisms of alcohol induced neurotoxicity, and neuroprogenetor cell expansion during recovery from alcohol dependence.

    View details for DOI 10.1097/01.alc.0000175013.50644.61

    View details for Web of Science ID 000231767900018

    View details for PubMedID 16156047

  • ANTI-IL-6 MONOCLONAL-ANTIBODIES PROTECT AGAINST LETHAL ESCHERICHIA-COLI INFECTION AND LETHAL TUMOR-NECROSIS-FACTOR-ALPHA CHALLENGE IN MICE JOURNAL OF IMMUNOLOGY STARNES, H. F., PEARCE, M. K., Tewari, A., Yim, J. H., Zou, J. C., Abrams, J. S. 1990; 145 (12): 4185-4191

    Abstract

    Potentially fatal physiologic and metabolic derangements can occur in response to bacterial infection in animals and man. Recently it has been shown that alterations in the levels of circulating cytokines such as IL-6 and TNF-alpha occur shortly after bacterial challenge. To understand better the role of IL-6 in inflammation, we investigated the effects of in vivo anti-mouse IL-6 antibody treatment in a mouse model of septic shock. Rat anti-mouse IL-6 neutralizing mAb was produced from splenocytes of an animal immunized with mouse rIL-6. This mAb, MP5-20F3, was a very potent and specific antagonist of mouse IL-6 in vitro bioactivity, demonstrated using the NFS60 myelomonocytic and KD83 plasmacytoma target cell lines, and also immunoprecipitated radiolabeled IL-6. Anti-IL-6 mAb pretreatment of mice subsequently challenged with lethal doses of i.p. Escherichia coli or i.v. TNF-alpha protected mice from death caused by these treatments. Pretreatment of E. coli-challenged mice with anti-IL-6 led to an increase in serum TNF bioactivity, in comparison to isotype control antibody, implicating IL-6 as a negative modulator of TNF in vivo. Anti-TNF-alpha treatment of mice challenged i.p. with live E. coli resulted in a 70% decrease in serum IL-6 levels, determined by immunoenzymetric assay, compared to control antibody, thereby supporting a role for TNF-alpha as a positive regulator of IL-6 levels. We conclude that IL-6 is a mediator in lethal E. coli infection, and suggest that antagonists of IL-6 may be beneficial therapeutically in life-threatening bacterial infection.

    View details for Web of Science ID A1990EP04100033

    View details for PubMedID 2124237