The Environmental Enteric Dysfunction Biopsy Initiative (EEDBI) aims to combine intestinal biopsy data from Pakistan, Bangladesh, Zambia, and the U.S. in a joint effort to identify the mechanisms underpinning environmental enteric dysfunction (EED) and identify and validate therapeutic targets and biomarkers.
The goal of the transcriptomics data science rally was to determine if children with EED have differential expression of specific genes or pathways. In the process of gene expression, a cell uses information encoded in a gene to produce RNA and synthesize a functional gene product, such as a protein.1 A statistically significant change in the number of RNA molecules between two experimental conditions defines differential expression.1
EED is a widespread syndrome among children and adults in low- and middle-income countries (LMICs). The characteristics of EED include inflammation, reduced absorption, and reduced barrier function in the small intestine.2 Histologically, EED is associated with flattened intestinal villi, elongated crypts, and increased infiltration of immune cells.3
While EED can have severe consequences for childhood growth and development, very few studies have assessed EED in children until recently. The effects of EED include stunting, which is associated with neurodevelopmental delays and higher risk of death from infections4, and increased risk for later in life non-infectious chronic disorders. Proposed EED mechanisms include nutritional deficiency or infection with microorganisms such as protozoa (Cryptosporidia, Giardia), specific pathovars of Escherichia coli, Helicobacter pylori, or Campylobacter.2,5,6 However, the causes of EED, the mechanism linking EED to stunting, and the risk factors for severe EED are still largely unknown.4,7
A better understanding of how specific host genes and pathways contribute to EED would allow for more straightforward and efficient diagnosis, targeted interventions, and ultimately, prevention of severe malnutrition.
The datasets for these rallies comprised data from five participating sites: Samples from 291 children with EED in Bangladesh, Pakistan, and Zambia, and 147 healthy comparison children from two North American sites. Data types include anthropometrics, demographics, RNA sequence transcripts, and clinical histology. The transcriptomics data were generated at a depth of 30 million reads (short lengths of human RNA) per biopsy.
A confirmed EED diagnosis relies on endoscopy and intestinal biopsy.2 In an endoscopy, a scope is inserted into the mouth, through the esophagus, stomach, and upper small bowel. Through the endoscope, a biopsy forceps safely removes a small tissue sample for histology and transcriptomics studies.
While the ideal healthy comparison group would comprise confirmed healthy biopsies from similarly aged children in the same geographic region, logistics and ethical concerns make it unlikely to obtain these materials. Therefore, we employed methods to ensure that the identified genes were associated with EED and not due to differences across geographic locations. First, we adjusted for batch effects. Second, we applied a “leave two-out” machine learning testing method to the transcriptomic data. This method trains and tests multiple times with different combinations of datasets, each time leaving out one dataset from the EED sites and one dataset from the healthy comparison sites. The “leave two-out” method helped confirm our findings despite dispersed geography, variable entry criteria, and multiple sequence venues.
Further, we assessed confounding variables in the data including clinical dimensions, sex, age, library preparation date, sequencing date, study sequencing site, biopsy site, data collection site, and variables related to sequencing data. Confounding variables affect other variables to produce distorted associations, so it is essential to determine which relationships among the variables could be confounding.
The key results from this data science rally were:
This shortlist of DEGs point to potential proxy measurements for EED. In addition, some of the DEGs provide new insight into mechanisms of treatment. Beyond the independent DEG knowledge gained from this rally, we learned that DEGs and their networks can focus us on multiple potentially informative ways of intervening with EED, including highlighting interventions that target a pathway or network known to interact with one of the DEGs. The DEGs and their gene networks could also represent a meaningful approach to diagnosis or treatment of EED, or identify additional targets in pathway and offer greater insight into mechanism of disease. Interestingly, the results of the EEDBI transcriptomics analysis identified genes and pathways related to the innate immune response and host regulation of viral replication.
A future direction of this project would be to leverage the existing EED data and knowledge with single-cell RNA sequencing databases. Single-cell sequencing allows for a better understanding of the function of individual cells. The cell type analysis and specificity of single-cell sequencing have the potential to help inform better drug targets for EED by telling us which cell types look similar to our EED rally findings.
The main goal of the EEDBI data science rallies as a whole is an improved understanding of the pathobiology of EED. The results of the EEDBI clinical rallies have provided improved histology-based scoring system for identifying EED and EED disease severity and a better understanding of EED mechanisms. The results of the transcriptomics rally was a prioritized list of DEGs associated with EED and gene ontology analysis.
The tests that are required to confirm EED, endoscopy and small intestinal biopsy, are invasive and difficult to perform on small children, and no single biomarker reliably diagnoses EED.2,4 The results of the EEDBI transcriptomics data science rally coupled with the EEDBI clinical rally can potentially identify and inform biomarker discovery and validation that could aid in EED diagnosis as well as develop less invasive diagnostic tests.
By confirming biomarker status through additional studies (published or unpublished), these biomarkers could become a means to validate intervention responses. Further, we also have hints at prominent pathophysiologic mechanisms which will help identify potential druggable targets related to altering EED.