Case Study

Evaluating the Feasibility of Using Algorithms to Stratify Pregnancy Risk

  • Data Sharing
  • Multidisciplinary Analysis

Pregnant women in lower-middle income countries experience extraordinary levels of maternal mortality due to complications related to pregnancy and childbirth. The underlying identified causes of mortality are often preventable in a high income setting but understanding how to optimally match limited local resources in a region to the woman in an already stressed, under resourced healthcare system is unclear. Ki has been collaborating on identifying pregnant women’s health profiles using data in the Ki repository. The plan is that the profile will be part of data-driven matching that also takes into consideration geographic constraints and facility-specific resources and capabilities.

Toward healthier pregnancies

Approximately 5 percent of pregnant women suffer from incident pre-eclampsia, a sudden spike in blood pressure during the second or third trimester. It is treatable, but if left untreated it can be fatal to both mother and baby. In low- and middle-income countries where it is less likely to be diagnosed, pre-eclampsia is a leading cause of maternal mortality, stillbirths, and neonatal mortality.

One approach to saving mothers’ and newborns’ lives is to identify women who may be at heightened risk of adverse outcomes, including pre-eclampsia, and monitor them closely throughout their pregnancy. This is known as pregnancy risk stratification, and it’s a promising strategy for customizing pre-natal care based on women’s individual needs. However, stratifying risk is hard to do well, especially in relatively basic health facilities.

The Gates Foundation is interested in creating a mobile application
to make it easier. In theory, the app would run data about pregnant women through an algorithm and produce a pregnancy risk score to help health care workers make better clinical decisions. These decisions would optimally identify the needs of the pregnant woman so that she can find a facility with the right resources for her. Ki was brought in to partner its data analysis experts with domain experts to inform the process of turning theory into practice using multiple studies available in the Ki data repository. These studies have included de novo identification and comparison with existing clinical algorithms for pregnant woman risk stratification.


6 studies from a variety of countries measuring clinical and socioeconomic variables on 25 cohorts & ~137,000 mothers and children multiple times during pregnancy and the first month of life.


Pregnancy, Preeclampsia, Neonatal Death, Stillbirth, Machine Learning, Longitudinal Modeling

Simple models with only five or six highly predictive variables tended to perform as well as much more complex models including dozens of variables.

A lifesaving algorithm?

Our first step was combing through the literature and determining how risk stratification is currently done. We found no examples of algorithms that could do what we were looking for: predict many adverse outcomes (instead of just one) throughout a pregnancy (instead of at a single, specific point), including when those outcomes are likely to happen. 

Our second step was creating two predictive models using integrated data sets drawn from various studies in our repository.

The first tested whether it was possible to build a model that generates an initial estimate of a woman’s risk of pre-eclampsia and could be revised over time as additional data is collected. This was an important requirement, because it treats pregnancy realistically as a dynamic process, not a static state. To do this, we constructed a complex joint model combining several statistical techniques; although this worked, it took a lot of computing time to run the model, suggesting that other modelling approaches might be considered.

The second model, based on a machine learning algorithm, predicted risk of stillbirth and newborn death. We wanted to understand the relative importance of the approximately 40 variables included in the model. The most predictive variables were sex of the child (this, it turned out, was because sex typically isn’t recorded for stillbirths), gestational age of the baby, whether the mother had a previous pregnancy ending in stillbirth or neonatal death, the number of children the mother has had, the father’s level of schooling, the mothers’ age and height, and the mother’s blood pressure at the second prenatal care visit. The appearance of father’s level of schooling on this list was a surprise, leading the team to consider how it might factor paternal education into its future work. Another interesting finding was that simple models with only a five or six highly predictive variables tended to perform as well as much more complex models including dozens of variables.

Based in part on the insights from these data analyses, the Gates Foundation is currently working with a range of partners to continue building a risk stratification app that will help clinicians around the world ensure that mothers get the lifesaving care they need.


The figure on the left provides the relative importance of risk factors for predicting neonatal death and stillbirth. Socioeconomic status like father's education and prior history of infant mortality events appear at the top of the figure and are more influential than maternal health and other obstetric history such as number of previous caesarean deliveries when predicting neonatal death and stillbirth. For each participant, a single dot represents the influence of a single risk factor on predicting the outcome.