New genomic data studies launch to combat Covid-19

How can genomic data help in the fight against coronavirus? We look at the pros and cons of two new approaches

When trying to determine the nature of a virus, both in terms of its onset and the severity of its effects, genomic data can provide a great deal of information. But there are many different datasets available, and various ways it can be collected.

Two new studies, one from Genomics England and the other from the University of Edinburgh, are using two entirely different datasets to better understand the genomic basis of the coronavirus. But what are the pros and cons of these differing approaches, and how might the quality of each dataset affect the results?

Genomics England launches new platform

Genomics England has announced a new platform that researchers can use to work with patients’ genomic data. The platform will first be used during Covid-19 research studies, such as the GenOMICC consortium, which aims to collect 35,000 whole genomes from NHS patients who have been affected by the virus. Subsequently, data from the 100,000 Genomes Project will be incorporated into this platform for research into cancer and rare diseases.

As these projects generate such enormous datasets, the platform is designed to allow researchers to run analyses and retrieve results quickly, and without seeing the source data to protect the privacy of participants.

Using commercial genetic test results

Elsewhere, a study called Coronagenes, based at the University of Edinburgh’s Medical Research Council Human Genetics Unit, is taking a very different approach by using data from people who have already had direct-to-consumer (DTC) DNA testing.

The unit is seeking 30 million participants who have used DNA testing services such as AncestryDNA or 23andMe to share their results, along with completing an online questionnaire including questions about whether they have experienced Covid-19 symptoms. The participants will be encouraged to update their questionnaires if they subsequently develop the virus, to create a varied dataset that includes responses from before, during and after infection, linked to a DNA record.

It is hoped that the study of these linked records could help to identify gene variants that affect the severity and duration of Covid-19 illness, as well as the likelihood of becoming infected in the first place.

“Time is of the essence. To identify the genes that explain why some people get very sick from coronavirus and others don’t, we need the solidarity of a large proportion of people from different countries who can share their DNA testing results with us. In this case, size really matters,” said University of Edinburgh professor Albert Tenesa, who is leading the Coronagenes initiative.

Dataset differences: the pros and cons

The approach of Coronagenes is very different from that taken by the GenOMICC consortium, and is likely to have contrasting benefits and limitations. By using data from direct-to-consumer services, Coronagenes will avoid the costs associated with collecting and sequencing the DNA samples. It also means that data can be accepted from participants worldwide, rather than being limited to those in the UK. Together, these factors mean that a much larger number of participants can be enrolled.

However, the data the Coronagenes team will be working with is likely to be of lower quality, as they will not have control over the testing process. Additionally, most DTC genetic testing does not sequence the whole genome, or even the whole exome (the 2% of the genome that codes for proteins). Instead they sample SNPs – single nucleotide polymorphisms – single letter changes in the DNA, which are useful for predicting common types of variation between people. DTC tests often assay more than 500,000 SNPs, but this represents only a tiny fraction of the 3 billion base pairs in a complete human genome.

Read more about how genomics is being used in the fight against Covid-19 in our series of articles.