A newly discovered gene within the genome of the SARS-CoV-2 virus could offer clues about its origins and a possible target for treatment
Elusive overlapping genes
Viruses generally have much smaller genomes than plants, animals and even bacteria. They often contain overlapping genes (OLGs), which allow them to produce multiple different proteins without significantly increasing their genome size. It is possible for a single base in a viral genome to form part of the instructions for two or even three different proteins.
Although it is well-known that OLGs are present in viral genomes, they can be very difficult to spot. Most bioinformatics programmes are good at identifying individual genes but are not optimised for detecting instructions spread across adjacent genes in a sequence.
Finding open reading frames
Each group of three nucleotides forms a codon, which represents a specific amino acid or instruction when the sequence is translated into a protein. The sequence ‘TAA’ means stop, and indicates the end of a reading frame. A long stretch of the genome without a stop codon is likely to be part of a gene.
To look for OLGs, the researchers first screened the SARS-CoV-2 genome for ORFs. Then they applied a software programme that recognises patterns of genetic change that are unique to OLGs. This approach identified one especially long OLG inside the gene ORF3a, which was longer than some of the other genes already identified.
The sequence, named ORF3d, was found to encode a novel protein that is not found in other human disease-causing coronaviruses, such as SARS-CoV – the virus responsible for the 2002-04 Sars outbreak.
New information: new applications
The new information discovered by researchers is useful in providing clues about the origins of SARS-CoV-2. After finding the gene, the researchers used genomic databases to compare the SARS-CoV-2 genome to other coronaviruses. They found that the gene had been identified once before: in a coronavirus variant isolated from pangolins in Guangxi, China. It was not found to be present in either pangolins or bats in other closely-related strains of the virus. Although far from definitive, this may help to identify how the virus crossed over into humans.
A second application is in the area of drug development, and some research has already been done into how the ORF3d protein interacts with the human immune system. Although this work is at an early stage, it appears that the protein is expressed in infected people, and elicits a strong antibody response. However, there is some evidence that T-cells – an important part of the body’s immune response – do not react to the ORF3d protein. Before any drugs can be developed that target ORF3d, work is required to fully understand its effects on the immune response to SARS-CoV-2.
Want to learn more about Covid-19? Read more in our regularly updated article series, or watch our video about how genomic sequencing is being used to identify and track the spread of the virus.