What can genomics tell us about the coronavirus?

As the government declares the coronavirus “a serious and imminent threat”, we examine what genomic sequencing has revealed so far about 2019-nCoV

Advances in genome sequencing and interpretation have been put to the test in the recent coronavirus outbreak originating in Wuhan, China. But how is this technology moving forward our understanding of the origins and spread of the virus?

The viral genome

Viruses’ genomes store instructions for making proteins. (In the case of coronaviruses, their genomes are made of RNA rather than DNA.) These viruses hijack the machinery of host cells to manufacture these proteins, which in turn control all parts of a virus life cycle: how they infect the host’s cells, replicate and get passed along to the next host.

The coronavirus genome comprises around 26-32 thousand base pairs. Quite large for a virus, but small compared to the 3 billion base pairs that comprise the human genome.

The first sequenced genome from the Wuhan coronavirus was published by a Chinese team, who sequenced samples from patients who were at the seafood market where the outbreak seems to have started.

Where did it come from?

Coronaviruses are typically found in birds and mammals, and also cause some common colds in humans. Periodically, a mutation allows a new strain to ‘jump’ from animals to humans causing respiratory infections, as happened in 2002 with the SARS epidemic, and again with the current strain, which is known as 2019-nCoV.

Research published in the Lancet, the National Institute for Viral Disease Control and Prevention in Beijing, China, compared the genomes of virus samples from nine patients at the epicentre of the outbreak with existing coronavirus genomes on file.

Their analysis showed that the closest match was a strain found in bats, and that 2019-nCoV is significantly more similar to the bat strain than to other human coronaviruses, including the one that caused SARS. From this they were able to say that 2019-nCoV most likely evolved from a coronavirus in the bat population, although it is currently unclear if there was an intermediate host.

It is known that a wide variety of animals were sold at the seafood market in Wuhan, to which many of the earliest cases were connected. SARS was closely related to a different bat coronavirus, but it was later established that civets functioned as the intermediate host.

Routes of transmission

In the same Lancet article, the researchers also looked for differences between the samples taken from the different human patients. They were extremely similar, allowing the team to postulate that the current outbreak is the result of a single coronavirus that made the jump into humans, and has since spread by human-to-human transmission only. It is highly unlikely that animal contact is causing more cases.

Subsequently, samples have been sequenced from two coronavirus patients in France, and two in the UK. As all the teams have made their sequences available to other scientists, they have been compared and found to be very similar, supporting the Chinese team’s theory that they share a single origin.

“I can announce that Public Health England has sequenced the viral genome from the first two positive cases in the UK, and is today making that sequence available to the scientific community,” health minister Matt Hancock told the UK parliament. “Its findings suggest that the virus has not evolved in the last month.”

The second part of this article will discuss how the data from sequencing the virus’s genome is being used in current clinical management of the disease, as well as identifying drug targets and developing vaccines.