San Diego-based Illumina Inc. has unveiled its new PrimateAI-3D, a trailblazing artificial intelligence algorithm that can predict with unprecedented accuracy disease-causing genetic mutations in patients.
The software’s results, published in two papers in the June 2 issue of Science, detail the training of the algorithm and its application to half a million genomes in the UK Biobank cohort.
Two accompanying papers on the primate evolution research that informed the development of PrimateAI-3D software also published in the journal in early June.
According to the National Institutes of Health, the amount of genomic data being generated is approaching 40 billion gigabytes each year. The ability to share, analyze and interpret genomic data is critical to unlocking discoveries that will advance understanding of human health and improve precision medicine.
An Algo Trained by Natural Selection
Each person carries millions of genetic variants that underlie individual differences in health and disease risk, but most of these variants are presently of unknown function. By highlighting disease-causing variants with unparalleled accuracy, Illumina says PrimateAI-3D addresses a critical challenge facing the successful implementation of personalized genomic medicine.
To achieve its state-of-the-art performance, PrimateAI-3D utilizes deep neural network architectures similar to ChatGPT and AlphaFold but is trained on genome sequences rather than human language.
However, unlike generative language models such as ChatGPT, where existing texts can be used to inform training, the genetic variants that cause disease in the human genome are largely unknown.
To overcome this, PrimateAI-3D effectively uses natural selection to train the parameters of the deep neural network, using millions of benign genetic variants identified through the sequencing of 233 diverse primate species, the largest such sequencing effort of nonhuman primate species to date.
Sequencing nonhuman primates can help scientists infer the pathogenicity of human genetic variants, and thus improve clinical variant interpretation on a genome-wide scale.
The result is a deep neural network that has been shown to identify disease-causing variants with superior accuracy in all six clinical cohorts that were tested and provide individualized predictions of genetic disease risk that have been validated in a cohort of nearly half a million people.
“Because of their closeness to the human genome, nonhuman primate species are uniquely valuable, both for what they can teach us about the genetic basis of human diseases, and in their own right,” said Kyle Farh, vice president of Artificial Intelligence at Illumina and senior author of the publications.
Unlocking Precision Medicine
As described in the accompanying paper published in Science, Illumina scientists, along with academic collaborators, next applied the PrimateAI-3D algorithm to identify rare pathogenic mutations in nearly half a million individuals in the UK Biobank. They found that the genomes of 97% of otherwise healthy members of the general population harbor highly actionable variants for at least one of 90 different clinical conditions that they surveyed.
PrimateAI-3D also greatly improved the accuracy of genetic risk prediction, enabling the first demonstration of polygenic risk scores that were largely unaffected by ancestry bias, a key step toward the equitable implementation of genetic-based precision medicine for diverse, non-European populations.
“The application of the latest advances in AI to genomics opens tremendous opportunities for Illumina in both genetic risk prediction and drug target discovery by decoding the basis of complex genetic diseases such as diabetes, heart disease, and autoimmune diseases,” said Alex Aravanis, chief technology officer of Illumina.
The company said this month that PrimateAI-3D will be made broadly available to the genomics community integrated across Illumina Connected Software.
Illumina
FOUNDED: 1998
CEO: Charles Dadswell (interim)
HEADQUARTERS: San Diego
EMPLOYEES: 7,825
BUSINESS: DNA sequencing
STOCK: ILMN (NASDAQ)
REVENUE: $4.5 billion (2021)
WEBSITE: illumina.com
NOTABLE: Illumina is the world’s leading genomics sequencing company.