Long-read sequencing

Clinical applications

Similar to short-read sequencing, the potential clinical applications of long-read sequencing include:

Long-read sequencing offers improved accuracy for the detection of specific types of genetic variant that are difficult to detect using current short-read sequencing methods (see below). Its use in clinical practice is restricted to a few specific situations, though it may expand in the future.

Long-read sequencing has played an important role in producing the most complete human reference genome sequence, by enabling sequencing of regions that short-read sequencing struggles with.

How does it work?

There are currently two major producers of ‘true’ long-read sequencing technology: Pacific Biosciences (PacBio) and Oxford Nanopore Technologies (Nanopore).

PacBio: Single molecule real-time (SMRT) sequencing

In this method, a long chain of DNA is synthesised using the DNA to be sequenced as a template. Fluorescence is detected when a nucleotide is incorporated into the growing DNA strand. The process is briefly explained below.

The DNA to be sequenced is made circular.
The circular DNA is applied to a surface (‘SMRT Cell’) patterned with thousands of tiny wells (‘zero mode waveguides’). Each well contains a single DNA polymerase enzyme working on a single circular DNA molecule.
Fluorescently-labelled nucleotides are used to generate a new strand of DNA from the circular DNA. As each nucleotide is incorporated by the polymerase into the new strand, its fluorescence is measured. This happens in each of the thousands of wells.
Each circular DNA molecule is sequenced multiple times because there is no end to stop sequencing.

Nanopore

In this method, DNA bases are detected as they pass through a very small hole (known as a nanopore) in a membrane. The process is briefly explained below.

Linear patient DNA is linked to an enzyme that ‘unzips’ the double helix so that a single strand can be fed into the nanopore.
Nanopores are present in a thin membrane submerged in a salt solution. An electrical potential is applied. This causes salt ions to pass through the pore, establishing a current.
As the single-stranded DNA passes through the nanopore, each base (A, C, G or T) blocks the flow of the current in a different way. The order of these types of current flow disruptions is read and translated into the order of bases in the DNA strand.

Synthetic long reads

In addition to the ‘true’ long-read sequencing methods described above, it is also possible to perform short-read sequencing with modifications that allow the assembly of larger fragments after sequencing (known as synthetic long reads). The process is briefly explained below.

Large fragments of DNA are separated from each other – for example, by attaching each fragment to a different tiny bead.
While separated from each other, the large fragments are sheared into shorter fragments and labelled with synthetic DNA barcodes.
Short-read sequencing is performed for all short fragments together.
During data analysis, short reads with the same barcode are identified and assembled back into the larger fragment from which they are derived.

Advantages and limitations of long-read sequencing

Advantages

Long reads are able to span complex regions, enabling better detection of:

structural variation (such as inversions or insertions);
complex rearrangements;
repeat expansion length variations; and
variants in repetitive or highly polymorphic regions.

Additional potential clinical applications include:

sequencing of genes with a pseudogene;
detection of modified DNA bases, including methylation;
determination of which variant is on which copy of a chromosome (haplotype phasing);
sequencing of RNA transcripts to detect alternative splicing; and
applications for which a portable device is an advantage (Nanopore sequencing).

Limitations

Currently, long-read sequencing is not as widely used in clinical applications as short-read sequencing, for reasons laid out below.

The accuracy per read is lower than in short-read sequencing.
The cost per base is higher than in short-read sequencing.
Sample throughput is lower than in short-read sequencing.
It is not as established in clinical labs, meaning that training and new workflows are required.
Bioinformatics approaches are less mature than for short-read sequencing.
Input DNA should ideally be in long fragments (high molecular weight DNA), which can be a challenge for standard automated laboratory DNA isolation methods.

Key messages

In long-read sequencing, DNA and RNA fragments do not have to be broken into smaller fragments for sequencing and then pieced back together for data analysis.
Long-read sequencing has higher accuracy than short-read sequencing for detecting some types of variant, such as structural variants and highly repetitive variants.
Long-read sequencing is not yet widely used in clinical practice.

Resources

For clinicians

Nature Portfolio: Filling in the gaps telomere to telomere
Oxford Nanopore Technologies: How nanopore sequencing works
PacBio: PacBio sequencing: How it works (video, 1 minute 29 seconds)
PacBio: Sequencing 101: From DNA to discovery — the steps of SMRT sequencing
PHG Foundation: Developments in long-read sequencing

References:

Aganezov S, Yan SM, Soto DC and others. ‘A complete reference genome improves analysis of human genetic variation‘. Science 2022: volume 376, issue 6,588. DOI: 10.1126/science.abl3533
Altemose N, Logsdon GA, Bzikadze AV and others. ‘Complete genomic and epigenetic maps of human centromeres‘. Science 2022: volume 376, issue 6,588. DOI: 10.1126/science.abl4178
Gershman A, Sauria MEG, Guitart X and others. ‘Epigenetic patterns in a complete human genome‘. Science 2022: volume 376, issue 6,588. DOI: 10.1126/science.abj5089
Hoyt SJ, Storer JM, Hartley GA and others. ‘From telomere to telomere: The transcriptional and epigenetic state of human repeat elements‘. Science 2022: volume 376, issue 6,588. DOI: 10.1126/science.abk3112
Nurk S, Koren S, Rhie A and others. ‘The complete sequence of a human genome‘. Science 2022: volume 376, issue 6,588, pages 44–53. DOI: 10.1126/science.abj6987
Vollger MR, Guitart X, Dishuck PC and others. ‘Segmental duplications and their variation in a complete human genome‘. Science 2022: volume 376, issue 6,588. DOI: 10.1126/science.abj6965