The base-by-base sequencing of specific regions of the chromosome or whole genomes would have been impossible on a large scale without the improvements being made DNA sequence determination techniques. Thanks primarily to Sanger's method of enzymatic sequencing (1977), the discovery of the polymerase chain reaction or PCR (1986), and the development of automated sequencing in the early 1990s, the productivity of 1 base sequenced over the course of a year's work by a single operator (1 base/year/man) in 1965 has increased to 1 billion bases/year/man in 2000 and up to 10,000 billion bases/year/man in 2021. The actual sequence of the four nitrogenous bases along the double helix constitutes the physical map with the highest degree of resolution and has been publicly available in electronic form since April 14, 2003, for almost all of the euchromatic regions of human chromosomes.
The ultimate goal of any systematic mapping project is to determine the complete nucleotide sequence of the DNA molecule that makes up the chromosome. Due to technical limitations inherent in the enzymatic sequencing method employed up to the late 2000 years, a continuous sequence no longer than 500 to 1,000 bases (typically 700) can be obtained in a single experiment. It is, therefore, necessary to proceed by sequencing small stretches of DNA whose respective sequences are finally assembled into a continuous template, termed a "contig." The process goes back toward a single final "contig," represented by the entire chromosome, by exploiting the information on the relative position of isolated DNA fragments or by "de novo" assembling of the fragment sequences.
Nucleotide sequencing was performed in the 1960s and 1970s by traditional biochemical methods, which involved detaching (and analyzing) one nucleotide after another from a DNA strand. These methods were very time-consuming and resource-intensive: Robert Holley's group took a year to identify the sequence of the 65 nucleotides that make up the tRNA for yeast Alanine (Holley et al., 1965).
A fundamental breakthrough occurred in 1977 when two new DNA sequencing methods were described: the chemical method, proposed by Maxam and Gilbert (base-specific chemical cleavage method), and the enzymatic method, devised by Frederick Sanger and also known as the "chain termination method" or "dideoxy method." Sanger's method, for which he received the Nobel Prize in Chemistry in 1980, took over because of its greater simplicity of execution and productivity and formed the basis of commonly used DNA sequencing methods to this day.
The enzyme that makes Sanger's method possible is DNA polymerase, which, in the presence of the four deoxynucleotide monomers (dATP, dGTP, dCTP and dTTP), is capable of synthesizing a complementary copy of a single-stranded DNA molecule, provided there is a short initial double-stranded region. In this region, the second strand, paired with the first strand by base complementarity, serves as a primer ("primer") for the extension of the new strand. In the laboratory, synthetic oligonucleotides, i.e., small single-stranded DNA chains of about 20 nucleotides that possess a sequence complementary to that of the point from which sequencing is to be initiated, are used as primers. The primer appends itself by providing a free 3 ́-OH end from which DNA polymerase successively adds nucleotides complementary to those on the strand to be sequenced, originating a new strand by a polymerization process that proceeds in the 5 ́-3 ́ direction. The sequence to which the primer appends must be known, and in the case of fragments cloned within a vector, the region of the vector bordering the insert of the unknown sequence can be used for this purpose.
Although the method used today is still based on the rational basis originally described by Sanger, since the 1990s, continuous technical improvements have significantly increased its productivity (review in Ciccodicola and D'Urso, 1998). In particular, five improvements allowed a high increase in the processivity of the original method, in which a different reaction was set up for each nucleotide.
1) Cycle Sequencing. The possibility of increasing the amount of the terminated chains through repeated cycles of in vitro DNA replication using a DNA polymerase has made it possible the direct sequencing of DNA without in vivo amplification in vector-transfected hosts. In particular, cycle sequencing is the repetition of the steps of denaturation, annealing of the primer and extension in a way similar to the one used in PCR. However, in this case, a single primer is used, and a single strand is obtained at each cycle so that there is no exponential amplification of the product but only a linear amplification. For instance, after 20 cycles, we will obtain 20 times the amount of chains compared to a single run of polymerization, and not 220, because the products of each cycle are not a substrate for the next polymerization reaction, which restarts from the original template bound by the single type of primer available. The final result is a larger quantity of terminated molecules.
2) Better incorporation of ddNTPS. The Sanger reaction itself has been improved by using polymerases specifically designed to incorporate ddNTP with high efficiency. It should be noted that the incorporation of unusual forms of dNTPs, such as ddNTPs, is not a natural function of DNA polymerases. Some mutated DNA polymerases obtained through genetic engineering have shown the ability to incorporate ddNTPs with higher efficiency, and are ideal for setting up Sanger reactions. For example, Taq DNA polymerases in which the phenylalanine is substituted by a tyrosine at position 667 (Taq F667Y, also known as "FS Taq Polymerase") can incorporate ddNTPs much more efficiently than the wild-type Taq DNA polymerase.
3) Replacing radioactive labelling with fluorescent labelling. The increased amount of the DNA interrupetd chains (point 1) above) made it possible to detect the result of the Sanger reactions by adding a base-specific fluorescent dye (or fluorochrome) to it. Fluorochromes are compounds that can emit, under certain conditions, a visible light beam of a specific wavelength, thus of a particular colour. For example, to mark dideoxynucleotide with base A, one could use a fluorochrome that emits green light when excited by a laser beam, while using different fluorochromes to mark dideoxynucleotides with bases G, C, and T, the light emitted could be yellow, blue and red, respectively. The use of flurochromes thus allows the running of the interrupted chains for all four nucleotides in a single lane.
4) Capillary gel and automation. A substantial technological advance has been the automation of the laboratory procedure, thanks to the spread of automated DNA sequencing machines. These machines are based on a capillary tube that is filled with small amounts of a specially formulated gel. A robotic system then loads the sequencing reaction products into one end of the capillary, which is then subjected to an electric field to separate the molecules. A laser positioned at the opposite end of the capillary induces fluorochrome light emission as DNA molecules of different lengths finish their run, and a detector system identifies and records which dye they bear.
Parallelization may further increase the throughput of the technique: with a 96- or 384- capillary instrument, if each run yields a sequence of 700 nucleotides and each capillary operates three runs a day, about 800,000 bases per day can be sequenced. Even in the ideal case of uninterrupted use of such a sequencer, it would still take about ten years to complete the sequencing of the approximately 3 billion base pairs of a single haploid human genome. This time can, of course, be reduced if one has many of these expensive machines operating in parallel, as in the case of the initial map of the human genome in 2001.
Several groups of researchers then worked on developing techniques that do not require enzymes and are based on innovative principles. Their success since 2008 ("Next Generation Sequencing", NGS) led to a significant reduction in the time required to obtain the results of an experiment, or test, based on DNA sequencing, with important practical implications for the development of genomic medicine.