Investigators at the Rady Children’s Institute for Genomic Medicine (RCIGM) in San Diego have updated their pediatric sequencing program for diagnosing rare diseases, incorporating machine learning methods to generate, analyze, and interpret genetic data and get those insights to clinicians more quickly.
“By harnessing the power of technology, we can quickly and accurately determine the root cause of genetic diseases,” Stephen Kingsmore, RCIGM’s President and CEO, said in a statement. “We rapidly provide this critical information to intensive care physicians so they can focus on personalizing care for babies who are struggling to survive.”
In Science Translational Medicine, Kingsmore and colleagues outlined a whole-genome sequencing and interpretation pipeline that includes streamlined library preparation steps, along with automated machine learning and clinical natural language processing (CNLP) — an experimental and analytical workflow that was developed at TCIGM with help from collaborators at firms such as Illumina, Fabric Genomics, Diploid, Alexion, and Clinithink.
Kingsmore was part of a team that developed rapid pediatric genome sequencing at Children’s Mercy Kansas City, a Newborn Sequencing in Genomic Medicine and Public Health (NSIGHT) study site, several years ago— a program that demonstrated the potential clinical utility of the approach. He moved to Rady Children’s Hospital in 2015 to head up and evaluate its burgeoning rapid pediatric genome sequencing program.
In a paper published in npg Genomic Medicine earlier this month, the team presented data supporting the cost-effectiveness of this approach for undiagnosed infants in the neonatal intensive care unit. But while the approach appears promising for difficult-to-diagnose pediatric cases, authors of the new study noted that it relies on “highly qualified professionals to decipher results,” which “precludes widespread implementation.”
In their latest study, the researchers applied the approach to retrospective samples from more 100 children with rare genetic conditions already diagnosed with 105 genetic diseases at Rady. By doing bead-based library prep on DNA extracted directly from dried blood spot and fresh blood samples, they were able to produce paired-end, short-read sequencing on the samples within 15.5 hours, aligning and calling variants with help from Illumina’s Dragen platform.
From there, the team relied on an automatic CNLP approach to glean phenotypes from electronic health records — a strategy that reportedly extracted pediatric patient phenomes with some 80 percent precision and 93 percent recall in the new analysis, making it possible to rank potential disease-related variants based on a child’s symptoms or traits.
Using this workflow, the investigators successfully picked up 105 genetic conditions in 101 children, identifying an average of more than four phenotypic features that matched each child’s known condition with the automated methods. In contrast, manual interpretation of the patient profiles had uncovered roughly 0.9 phenotypic features matching those conditions, on average.
“We automated provisional diagnoses by combining the ranking of the similarity of a patient’s CNLP phenome with respect to the expected phenotypic features of all genetic diseases, together with the ranking of the pathogenicity of all of the patient’s genomic variants,” the authors wrote, noting that the automated, retrospective diagnoses lined up well with those produced through manual interpretation of the data.
In a subset of 95 children affected with 97 genetic conditions, for example, they saw 97 percent recall and 99 percent precision with the automated approach relative to interpretations done manually. In three of seven infants in the intensive care unit, they saw even greater accuracy for the pipeline, which showed 100 percent sensitivity and precision.
When the team applied the speedier, automated pipeline to prospectively assess seriously ill infants in the ICU, it was able to diagnose three of seven cases with 100 percent precision and recall in a span of just over 20 hours, on average — saving more than 22 hours, on average, compared with prior rapid whole-genome sequencing and analysis pipelines.
Moreover, the authors reported that their automated provisional diagnosis pipeline can be scaled up to evaluate as many as 30 patients per week for each sequencing instrument on hand.
Based on such results, they argued that “[s]upervised autonomous systems may provide effective first-tier, provisional diagnoses, allowing valuable cognitive resources to be reserved for unsolved or difficult cases, manual curation of variants, and clinical report generation that includes a summary of medical management literature.”