Crowd of people of different ethnicities, genders, backgrounds

New data diversity initiative launches

This week, we look at a transformative programme that aims to rapidly make genomic datasets more representative

In September, Genomics England announced a new Data Diversity Initiative that will transform genomic datasets, reducing a long-established bias towards populations of European ancestry.

Working out the bias

As the science of genomics developed, the first datasets that became available were primarily composed of genomes from affluent, western populations and were therefore skewed towards people of European ancestry.

This bias is an ongoing problem, and researchers have come to understand that results from genomic studies using those datasets are less relevant and applicable to people from non-European populations.

“Genomics has the potential to transform healthcare,” said Genomics England CEO, Chris Wigley. “However, it has to work for all of us, and those who have non-European ancestry have been underrepresented in research data and therefore risk not getting equal benefit as [genomics] comes into the mainstream of patient care.”

Reference genome: one for all?

When a patient’s genome is sequenced, it is compared with a reference genome – a template. Bioinformaticians use this comparison to see where the patient’s DNA differs from the reference, allowing them to find changes or variations.

A 2019 study published in Nature found that a pan-genome constructed from whole genomes of 910 people of African descent contained around 10% more DNA than the current human reference genome.

This is concerning because the accuracy of the comparison is only as strong as the accuracy of the reference itself. If the reference lacks a significant portion of the genomes of people with African heritage, then significant findings could be overlooked or missed.

The Data Diversity Initiative

The programme, a collaboration with researchers and the NHS, aims to rapidly increase representation of people from varied ancestral, socioeconomic and geographic backgrounds within genomic data.

The project was launched on the 17 September this year, when health and social care secretary Sajid Javid gave a blood sample at Great Ormond Street Hospital in London.

His genome will be sequenced and stored in a de-identified form, joining over 140,000 other genomes from people who participated in the 100,000 Genomes Project and the GenOMICC Covid-19 study.

Mr Javid, who is from a British Pakistani family, said: “I am extremely proud to be taking part in this study which is helping make sure that everyone, no matter their background, can benefit from our world-leading genomic research programmes.”

The initiative has been launched ahead of the new Office for Health Improvement and Disparities (OHID), which will be part of the Department of Health and Social Care.  The office will open in October with the aim of tackling health disparities to ‘break the link between people’s background and their prospects for a healthy life.’  

Want to learn more? Find out about recent data diversity studies here, or check out our article with bioinformatician Nana E. Mensah who explores why equity matters

Please note: This article is for informational or educational purposes, and does not substitute professional medical advice.