Render of Covid-19 virus cells and DNA helices

Researchers find ‘hidden’ coronavirus gene

A newly discovered gene within the genome of the SARS-CoV-2 virus could offer clues about its origins and a possible target for treatment

As we previously reported on this blog, the genome of SARS-CoV-2 was sequenced early in the pandemic, but researchers have now found an extra gene hidden within its nucleotide sequence.

The virus’s genome first appeared to consist of 15 genes, but a study published in eLife shows that a 16th protein-coding gene is nested inside one of the others.

Elusive overlapping genes

Viruses generally have much smaller genomes than plants, animals and even bacteria. They often contain overlapping genes (OLGs), which allow them to produce multiple different proteins without significantly increasing their genome size. It is possible for a single base in a viral genome to form part of the instructions for two or even three different proteins.

Although it is well-known that OLGs are present in viral genomes, they can be very difficult to spot. Most bioinformatics programmes are good at identifying individual genes but are not optimised for detecting instructions spread across adjacent genes in a sequence.

Finding open reading frames

An open reading frame (ORF) is a sequence of nucleotides in DNA (or, in the case of the coronavirus genome, RNA) that contains no stop codons.

Each group of three nucleotides forms a codon, which represents a specific amino acid or instruction when the sequence is translated into a protein. The sequence ‘TAA’ means stop, and indicates the end of a reading frame. A long stretch of the genome without a stop codon is likely to be part of a gene.

To look for OLGs, the researchers first screened the SARS-CoV-2 genome for ORFs. Then they applied a software programme that recognises patterns of genetic change that are unique to OLGs. This approach identified one especially long OLG inside the gene ORF3a, which was longer than some of the other genes already identified.

The sequence, named ORF3d, was found to encode a novel protein that is not found in other human disease-causing coronaviruses, such as SARS-CoV – the virus responsible for the 2002-04 Sars outbreak.

New information: new applications

The new information discovered by researchers is useful in providing clues about the origins of SARS-CoV-2. After finding the gene, the researchers used genomic databases to compare the SARS-CoV-2 genome to other coronaviruses. They found that the gene had been identified once before: in a coronavirus variant isolated from pangolins in Guangxi, China. It was not found to be present in either pangolins or bats in other closely-related strains of the virus. Although far from definitive, this may help to identify how the virus crossed over into humans.

A second application is in the area of drug development, and some research has already been done into how the ORF3d protein interacts with the human immune system. Although this work is at an early stage, it appears that the protein is expressed in infected people, and elicits a strong antibody response. However, there is some evidence that T-cells – an important part of the body’s immune response – do not react to the ORF3d protein. Before any drugs can be developed that target ORF3d, work is required to fully understand its effects on the immune response to SARS-CoV-2.

Want to learn more about Covid-19? Read more in our regularly updated article series, or watch our video about how genomic sequencing is being used to identify and track the spread of the virus.