To advance genomics in healthcare, we must consider what data we collect and – crucially – who we collect it from. Here, bioinformatician Nana E. Mensah explains why
Although we share a large amount of our DNA, we all have differences in our genomes. These variations can be caused by a variety of factors, including our ancestry. While professional guidelines recognise that the sensitivity of a genomic test may be influenced by geographical information, datasets are not always diverse enough to account for this.
This week in our blog, we speak to bioinformatician and PHD student, Nana E. Mensah, who explains the limitations caused by a lack of genomic diversity in reference data and the benefits of working towards a more representative data culture.
The cost of inequality
Genomic tests can be used to look at a patient’s DNA to understand the cause of a condition. Specifically, we look for the differences between a patient’s DNA profile (their genome) and profiles from people without their condition’s symptoms. These profiles come from public databases of human genomes sequenced from hundreds of research studies.
When we consider that the first human genome sequence cost around $3 billion in 2000, it is no surprise that the first public databases came from high-income countries, and that the diversity of the databases was affected by this. By 2017, 88% of people recruited for major genomic studies had European ancestry, with the majority being from the US, UK or Iceland.
When genomic databases lack diversity, we risk incorrectly applying this data to other populations, and this can have serious and far-reaching consequences.
Lost in translation
DNA variation that causes health complications in one population of shared ancestry may not do so in another. Researchers at the University of Pennsylvania reviewed the impact of a lack of genomic diversity on translating research, which provided several eye-opening examples:
- The most common genetic cause of cystic fibrosis in European populations is the F508del variant in the CFTR gene, but this is not the case for people with African ancestry. As cystic fibrosis drugs are increasingly selected based on molecular tests, a lack of genomic diversity limits the potential for treatment across a population with mixed ancestry.
- Complex conditions like heart disease and diabetes are influenced by both our genes and our environment, and polygenic risk scores are increasingly being used to understand the contributions our genomes make. These scores are derived from large databases that pool data from genomic studies. Currently 78% of this data is comprised of genomes from European individuals and, consequently, risk scores derived from this data are more accurate at predicting outcomes for European populations than for patients of other ethnicities.
- Warfarin, a common anticoagulant, can have life-threatening side effects if the incorrect dose is given. The drug’s effectiveness can be estimated from a patient’s genome, and this information is used to decide the best dose of the drug. While we know that, in Europeans, warfarin metabolism is affected by variants in the CYP2C9, VKORC1 and CYP4F2 genes, similar studies are lacking for other populations.
Benefits of diversity
We all share 99.9% of our DNA, but these examples show that the smallest difference can have large implications for the way we diagnose, treat and manage health conditions. Alongside ethical reasons, there is a scientific imperative to improving diversity in genomic research.
Studying diverse populations is the only way to truly understand how genetic variation influences our response to diet, infection and different health conditions. For example, we may never have discovered drugs that regulate cholesterol if the PCSK9 gene hadn’t been studied in people of African ancestry. Similarly, under-representing genomes from Middle Eastern people in research has been highlighted as an oversight because of the potential for novel medical discoveries.
In discussions about genomic diversity, Africa has often been in the spotlight as the continent with the greatest human genomic variation. In 2018, an analysis of 910 genomes from African people found 300 million base pairs of human DNA that had not been previously studied by genetics researchers. More recently, a study by the Human Heredity and Health in Africa (H3Africa) consortium uncovered 3 million previously undescribed DNA variants from 426 African individuals, many of which are in genes linked to a number of conditions.
Fortunately, the diversity of reference genomic data is growing. I was pleased to see that the gnomAD population database increased the number of genomes from African people from 4,359 to 20,744 in its newest version. This progress is promising, but is it enough?
Making the most of it
Even as genomic databases are made more diverse, researchers could still continue with current practice. A 2020 Nature article found that 45 of 58 studies that used data from the UK biobank excluded data from specific populations; 31 of the studies gave no reason for these exclusions, and the remainder cited statistical limitations. One reason for this could be that datasets from non-European populations are smaller, so researchers could have less confidence in their findings and may choose to exclude these datasets from analysis. A possible solution could be to make reasons for exclusion a requirement for publishing genomic research, as suggested by the authors.
So, who is responsible for leading the march towards genomic equity in healthcare? Researchers certainly have their part to play, as do many others – from individual clinicians to high level decision makers – yet diverse genomic data is only as useful as a healthcare system allows it to be. Arguably, we move at the pace that genomics research is integrated into clinical practice.
Of the countries with national genomic medicine initiatives, the UK stands out due to its unified healthcare system that eases data sharing between institutions. The NHS Genomic Medicine Service is among the first to offer genome sequencing as part of routine care and plans to analyse 500,000 genomes by 2023/24.
I hope that the future finds all healthcare systems in the same position. Globally, we already face extreme disparities in healthcare. By graduating from a niche clinical practice to the foundation of precision medicine, genomics has the potential to either reduce or expand these disparities. We must continue to take steps towards genomic equity because the promise of precision medicine will never truly be realised if we don’t learn from and embrace it.
Suggested further reading
- Bentley AR, Callier SL, Rotimi CN. Evaluating the promise of inclusion of African ancestry populations in genomics. NPJ Genom Med. 2020;5: 5. doi:10.1038/s41525-019-0111-x
- Manrai AK, Funke BH, Rehm HL, Olesen MS, Maron BA, Szolovits P, et al. Genetic Misdiagnoses and the Potential for Health Disparities. N Engl J Med. 2016;375: 655–665. doi:10.1056/NEJMsa1507092
- Choudhury A, Aron S, Botigué LR, Sengupta D, Botha G, Bensellak T, et al. High-depth African genomes inform human migration and health. Nature. 2020;586: 741–748. doi:10.1038/s41586-020-2859-7
- Ben-Eghan C, Sun R, Hleap JS, Diaz-Papkovich A, Munter HM, Grant AV, et al. Don’t ignore genetic data from minority populations. Nature. 2020;585: 184–186. doi:10.1038/d41586-020-02547-3
- Peterson RE, Kuchenbaecker K, Walters RK, Chen C-Y, Popejoy AB, Periyasamy S, et al. Genome-wide Association Studies in Ancestrally Diverse Populations: Opportunities, Methods, Pitfalls, and Recommendations. Cell. 2019;179: 589–603. doi:10.1016/j.cell.2019.08.051