Genome sequencing produces volumes of valuable data, but what are the challenges of interpretation that gives practitioners the answers they need?
The process of sequencing any gene will inevitably identify changes in the DNA sequence. Some changes may be known polymorphisms of no clinical significance, while others may be relevant to the phenotype of the individual under investigation, writes Fiona Macdonald.
Interpreting results in whole genome sequencing
The interpretation of sequence changes is a significant part of the work carried out by scientists involved in genetic diagnosis. Until recently the number of changes detected per patient has been small, as labs have been sequencing single genes, panels of genes or just exomes. Interpreting the vast number of variants identified by whole genome sequencing (WGS) as part of the 100,000 Genomes Project, however, is a much greater undertaking.
Some changes in the DNA sequence are easy to interpret. A nonsense mutation, replacing an amino acid with a premature stop codon, or a frameshift mutation which alters the reading frame and also introduces a stop codon, will significantly alter a protein and is highly likely to be pathogenic. Similarly, an alteration affecting the correct splicing of genes will have a deleterious effect. But what about missense changes? Is replacing one amino acid out of thousands in a protein enough to cause disruption so that the protein no longer functions as it should? What do we need to do to prove causation?
Literature can be very helpful, especially if functional studies have been carried out to prove that an alteration in the protein affects its ability to act. However, the proof sufficient for a research publication differs to that necessary for a clinical diagnosis.
Databases and clinical use
Databases may show that a specific alteration has been seen on more than one occasion. However, some databases suffer from duplication of entries, inconsistent interpretation and poor clinical and experimental evidence leading to discordant data.
A number of reliable databases exist and form an important resource for variant interpretation. The Human Gene Mutation Database (HGMD) 1 is a comprehensive database of published germline mutations, which in 2013 contained more than 141,000 abnormalities in more than 5,700 genes. The database highlights variants where the pathogenicity is questionable and carries out regular reviews as new information becomes available. Careful assessment by practitioners is always required, however, and users must not assume that any change documented is pathogenic in all individuals.
Another example of a well-curated database for clinical use is the Inherited Colorectal Cancer database InSight. It contains a large number of genetic variants and is regularly reviewed by an interpretation committee who classify variants as they are identified and submitted so that laboratories can have significant confidence in their classification2.
Global Alliance for Genomics and Health is attempting to create an environment for collaborative variant curation and will develop an accessible database for the breast cancer genes BRCA1 and 2. At a recent workshop, there was discussion around the wealth of data held in diagnostic labs – which is not currently shared – being uploaded for all to access.
The unique nature of some variants means that in theory individual patients could be identified from databases despite anonymisation – a risk that should be highlighted to patients alongside discussion of the wider benefits that their participation in research could bring. This topic was hotly debated at a recent meeting held by the PHG Foundation and the Association of Clinical Genetics Science, and attended by Dame Fiona Caldicott, and it seems there is a way forward to improving databases so that they can be used for diagnosis.
Other sources for determining pathogenicity
A number of in silico tools such as SIFT and Polyphen can be used to interrogate amino acid changes or disruption of splicing, but they do show inconsistencies and cannot be used reliably in isolation. Sequence conservation of a particular amino acid across multiple species tends to indicate that the amino acid is of importance, as does co-segregation of variants with disease status in family members. The absence of amino acid change in normal controls is also indicative of a variant’s pathogenicity.
Despite all these approaches, a definitive answer is not always possible and variants of uncertain significance remain a challenge. Re-analysis over time is necessary so that we may provide answers for patients and their families.
Analysing genomes for the 100,000 Genomes Project
Given the complexity of analysing just a few variants per person, how is the issue of analysing the much larger numbers of variants identified by whole genome sequencing to be addressed? Especially given that this complexity is compounded by the potential to detect variants not associated with the phenotype.
The clinicians and scientists of the newly established Genomics England Clinical Interpretation Partnerships (GeCIPs) will play a vital role. Using their specialist clinical and research expertise in the areas of rare diseases and cancer, as well as for validation and feedback of results, they will assist with the interpretation of complex findings and continuously refine data generated by the project to accelerate the benefit to patients. The project itself will undoubtedly provide invaluable data and cement the NHS’s reputation as a world leader in the study of genomic medicine.
Learn more about interpreting genomic data
To find out more about the challenges of interpreting genomic data, and how these can be tackled, try our Introduction to Bioinformatics course.
Fiona Macdonald is a consultant clinical scientist based primarily in the West Midlands Regional Genetics Laboratory, and is a fellow of the Royal College of Pathologists. She is an author on more than 80 papers and has also written a variety of book chapters as well as co-authoring two books on the molecular basis of cancer.
- Stenson PD et al, The Human Gene Mutation Database: building a comprehensive mutation repository for clinical and molecular genetics, diagnostic testing and personalized genomic medicine. Hum. Genet., 2014:133, 1-9
- Thompson B et al. Application of a 5-tiered scheme for standardized classification of 2,360 unique mismatch repair gene variants in the InSiGHT locus-specific database. Nature Genetics; Advance Online Publication; Dec 22nd 2013.