Sanger_paper Sanger Paper

User Manual: sanger_paper

Open the PDF directly: View PDF .
Page Count: 6

Download
Open PDF In Browser	View PDF

perspective

A large genome center’s improvements to
the Illumina sequencing system
Michael A Quail, Iwanka Kozarewa, Frances Smith, Aylwyn Scally, Philip J Stephens,
Richard Durbin, Harold Swerdlow & Daniel J Turner
The Wellcome Trust Sanger Institute is one of the
world’s largest genome centers, and a substantial
amount of our sequencing is performed with
‘next-generation’ massively parallel sequencing
technologies: in June 2008 the quantity of purityfiltered sequence data generated by our Genome
Analyzer (Illumina) platforms reached 1 terabase,
and our average weekly Illumina production output
is currently 64 gigabases. Here we describe a set of
improvements we have made to the standard Illumina
protocols to make the library preparation more reliable
in a high-throughput environment, to reduce bias,
tighten insert size distribution and reliably obtain
high yields of data.

Next-generation DNA sequencers, such as the 454-FLX
(Roche), SOLiD (Applied Biosystems) and Genome
Analyzer (Illumina) have transformed the landscape of
genetics through their ability to produce hundreds of
megabases of sequence information in a single run. This
has enabled us to design genome-wide and ultra-deep
sequencing projects that, because of their enormity, would
not otherwise have been possible (for reviews see refs. 1,
2 and for an evaluation of the performance of these three
platforms see ref. 3).
At the Wellcome Trust Sanger Institute we currently
have all three of these sequencing platforms, though the
Genome Analyzer is the platform we have invested most
heavily in: we have 28 machines on site, all capable of
generating paired-end data. The Illumina data analysis
pipeline performs ‘purity filtering’ of these data to eliminate sequence data from clusters that appear to be mixed
as a consequence of their proximity on the flowcell. We
typically generate 4–5 gigabases (Gb) of filtered sequence
data, with an error rate of <0.9% per seven-day, 36-cycle
paired-end run, making us one of the world’s largest and
most productive users of Illumina sequencers.
Sequencing library preparation involves the production of a random collection of adapter-modified DNA

fragments, with a specific range of fragment sizes, which
are ready to be sequenced. We have found the standard
Illumina sequencing library preparation protocols (Fig. 1)
to be suboptimal in several respects, and we enhanced our
output by developing and implementing many modifications and improvements to these protocols, all with the
aim of obtaining the maximum number of high-quality
sequence reads per run from the lowest mass of starting
DNA, in a robust and reproducible way. The modifications and improvements we describe here can be adopted
en masse as an alternative library preparation pipeline.
However, because some steps are additional rather than
alternative, we tend to select different modifications for
different sequencing projects, depending on the specific
requirements of that project (Supplementary Table 1
online). Here we have attempted to describe each modification in the order in which it would fit in to the standard
library preparation pipeline.
Fragmentation
The first stage in a standard genomic DNA library preparation for the Genome Analyzer is DNA fragmentation
by nebulization (in 30–60% glycerol at 30–35 p.s.i.). This
generates fragments with a typical size range of 0–1200
base pairs (bp) and a peak around 5–600 bp. Nebulization
is a fairly reproducible technique, is sequence-independent, and is rapid and inexpensive4. However, the wide
size distribution of generated fragments is uneconomical:
by mass, the 200 ± 20-bp fragments represent only ~10%
of the total DNA after nebulization. Moreover, approximately half of the DNA vaporizes during nebulization,
meaning that only 5% of the original DNA is used for
subsequent library generation. Even under much more
extreme nebulization conditions (for example, 90 p.s.i.
for 18 min) it is not possible to ‘move’ the fragmentsize peak below around 400 bp, and doing so still does
not improve the yield at 200 bp (ref. 4 and unpublished
observations). Thus we have evaluated alternative methods of sample fragmentation.

Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire, CB10 1SA, UK. Correspondence should be
addressed to D.J.T. (djt@sanger.ac.uk).
PUBLISHED ONLINE 25 november 2008; DOI:10.1038/NMETH.1270

nature methods | VOL.5 NO.12 | DECEMBER 2008 | 1005

perspective
PCR-amplified
library

Genomic DNA

Fragmentation

Single-stranded
DNA library

Fragmented DNA
sample (0–1200 bp)
End repair

Cycles of cluster amplification
Clonal DNA clusters
on flowcell surface

Blunt-ended, 5'phosphorylated fragments

A-tailing

3' A-tailed
fragments
Ligation

Quantification
11

Linearization, blocking and hybridization

Single-stranded clusters
ready for sequencing
Flowcell transfer to Genome Analyzer
Incorporation of fluorescent
reversible terminator nucleotides
Imaging

4,5

Size-selected
fragment library
PCR

3,4

Adapter-ligated
fragments
Size selection

Denaturation

Cleavage of fluorophores
and blocking groups

6,7,8,9,10

tocol: (i) bias in the base composition of
sequences; (ii) high frequency of chimeric
sequences, produced when two template
strands are ligated during the adapter ligation step; and (iii) imperfect distribution of
insert sizes. These have all been overcome by
the use of several protocol modifications,
described here.
Paired-end oligonucleotides. We no longer
use the Illumina single-end adapters or PCR
primers because paired-end oligos generate
sequencing libraries that are compatible with
both single and paired-end flowcells. The
adapters themselves are modified to confer
protection from digestion at the 3′ thymine
(T) overhang. Though we do not have the
details of the modification used by Illumina,
we have obtained comparable results using
our own adapters and PCR primers modified with a phosphorothioate between the
two bases at the 3′ end (Supplementary
Protocol 2 online).

Gel extraction. During the size selection step
of the standard library prep protocol, a gel
slice is selected and the DNA extracted. We
identified that melting this gel slice by heating
to 50 °C in chaotropic buffer decreased the
Figure 1 | Illumina sequencing workflow. Stages in the library preparation. Steps accompanied by
representation of A+T-rich sequences, posnumbers are those for which we suggest alternatives to the standard Illumina protocols. Numbers
sibly reflecting a higher affinity of spin colcorrespond to those given in Supplementary Protocols 1–13 online.
umns for double-stranded DNA, as strands
with a high A+T content will be most likely to
Adaptive focused acoustics. We now routinely fragment DNA sambecome denatured during this step and may not reanneal. To improve
ples using adaptive focused acoustics technology in a 24-well format
the representation of these A+T-rich sequences, we modified the gel(AFA; Covaris). In this process, acoustic energy is controllably focused extraction protocol, melting agarose gel slices in the supplied buffer
into the aqueous DNA sample by a dish-shaped transducer, resulting at room temperature (18–22 °C). This reduces G+C bias considerably
in cavitation events within the sample. The collapse of bubbles in the (Supplementary Protocol 3 online and Fig. 3a,b).
suspension creates multiple, intense, localized jets of water, which disrupt the DNA molecules in a reproducible and predictable way.
Double size selection. Partially complementary adapters, which
After disruption, 200-bp fragments comprise 17% of the total essentially consist of the sequences to which the sequence primers
fractionated DNA by mass, but in contrast to nebulization, very hybridize during the sequencing reaction, are ligated onto the A-tailed
little DNA is lost during the fragmentation process, generating a
fragments7 via a T overhang (Fig. 1). Their structure ensures that each
four- to fivefold higher yield of the intended fragment size range template strand receives different sequences at the opposite ends8 and
(Supplementary Protocol 1 online and Fig. 2). Additionally, because works in much the same way as a vectorette9.
Inefficient end-repair or A-tailing reactions will result in a lower
of its high-throughput capability, AFA has enabled highly multiplexed
sequencing using indexing tags, where each sample needs to be pro- concentration of template to which adapters can be successfully ligatcessed separately until after PCR amplification (Fig. 1). Also, because ed, and so the relative concentration of adapters is increased, which
the size distribution of DNA fragmented by AFA is narrow, for some will promote the formation of adapter dimers. If these dimers are not
applications, such as array enrichment of targeted loci5,6, we are able removed, they will ultimately be sequenced along with the intended
to omit the gel electrophoresis–based size selection step altogether
template, wasting the capacity of the flowcell. Additionally, inefficient
from the library preparation, decreasing the workload and increasing A-tailing will result in a high proportion of blunt-ended template
yields further.
molecules, which can self-ligate, generating chimeric sequences.
It is likely that the efficiency of the A-tailing step will be improved
A-tailing, ligation and size selection
by the use of alternative polymerases and higher concentrations of
After close scrutiny of paired-end reads obtained from the Genome
magnesium ions. Additionally, the efficiency of the ligation step
Analyzer, that is, those in which each cluster was sequenced in both appears to be improved by the use of ultrapure ligases, such as
forward and reverse directions, we discovered a number of artifacts
those from Enzymatics, which are virtually free of the contaminatthat could be attributed to the standard library preparation pro- ing exonuclease activity generally found in standard commercial
PCR-amplified
library

1006 | VOL.5 NO.12 | DECEMBER 2008 | nature methods

Library preparation
Cluster growth
Sequencing by synthesis

Relative fluorescence units

perspective
Nebulization
AFA
Ladder

140
120
100
80
60
40
20
0
–20
100 150

300 400 500 700 1,500
Size (bp)

Figure 2 | Sample fragmentation. Comparison of fragmentation by
nebulization with AFA technology. We fragmented 4.5 µg of human genomic
DNA by nebulization or AFA, purified the samples using a spin column,
eluted the DNA in 30 µl of 10 mM Tris pH 8.5, and ran 1 µl of each eluate on
an Agilent Bioanalyzer 2100 DNA 100 chip. For a 200-bp (±20 bp) library,
the yield produced by AFA was four- to fivefold greater than that produced
by nebulization.

Relative fluorescence units

Mapped depth (bin size, 500 bp)

Relative fluorescence units

Number of reads (millions)

Relative fluorescence units

Mapped depth (bin size, 500 bp)

PCR
As well as increasing robustness, extracting more DNA from gel slices
preparations of ligases. Using this enzyme we have achieved a
20–30% increase in yield of successfully ligated fragments (as deter- enables the DNA to be quantified more accurately before PCR amplimined by quantitative PCR (qPCR); Supplementary Protocol 4 fication. The PCR step introduces into the adapter-ligated template
online), presumably because the reduced exonuclease activity molecules the oligonucleotide sequences required for hybridization
to the flowcell surface.
regenerates fewer blunt-ended fragments after A-tailing.
However, although the steps described
above may reduce the formation of bluntended ligation concatamers, enzymatic
a 10 30 G+C content40 (%) 50 60
b 10 30G+C content40(%) 50 60
reactions are rarely 100% efficient, and thus
60
60
a small proportion of template strands will
50
50
Distribution of reads with
Distribution of reads with
still be chimeric. In many applications, a low
indicated G+C content
indicated G+C content
40
40
Mean
read
depth
Mean read depth
frequency of chimeric sequences will presS.d.
S.d.
30
30
ent no problem and can simply be removed
20
20
informatically. In other applications, such as
10
10
detection of infrequent de novo recombinant
0
0
0
20
40
60
80 100
0
20
40
60
80 100
molecules, chimeric sequences will generate
Percentile of unique sequence
Percentile of unique sequence
ordered by G+C content
ordered by G+C content
false positives, so their frequency needs to be
c
d 140
reduced to minimize the amount of subsee 140
140
120
120
120
quent confirmatory work.
100
100
100
80
80
80
Chimeric templates will be longer than the
60
60
60
40
40
40
singletons, which provides a way of prevent20
20
20
0
0
0
–20
–20
–20
ing them from contaminating the DNA frac15 100150 300 400 500 700 1,500
15 100150 300 400 500 700 1,500
15 100150 300 400 500 700 1,500
tion: for applications in which a low frequency
Size (bp)
Size (bp)
Size (bp)
of chimeric templates is required, we perform
3.5
3.5
f
g
an additional size selection after shearing but
3.0
3.0
2.5
2.5
before ligation (Supplementary Protocol 5
2.0
2.0
online). This results in a narrow size range
1.5
1.5
being available for ligation. Any blunt-ended
1.0
1.0
concatemers are appreciably longer than the
0.5
0.5
0.0
0.0
singletons, and we remove these in the post0 100 200 300 400 500 600
0 100 200 300 400 500 600
ligation size-selection step. This additional
Size (bp)
Size (bp)
size selection reduces the incidence of chimeras to 0.02%, compared to up to 5% with the Figure 3 | A-tailing, ligation and size selection. (a,b) G+C plots before (a) and after (b) optimization
standard library preparation protocol, and we of gel extraction. The figures show the total area in which reads with a particular G+C content are
distributed, with the mean and s.d. The greater width of the shaded area in plot a indicates a wider
have found this step to have the added benefit
dispersion of coverage for all values of G+C content for which sequences were obtained. (c–e) Agilent
of reducing the shoulder of small insert sizes, Bioanalyzer 2100 traces for three libraries, a 60-bp insert library with optimized PCR (c), the same
giving a tighter insert-size distribution of the 60-bp library with excess DNA in PCR (d) and a 200bp insert library, showing shoulder of small
desired fraction (Fig. 3c–e), which leads to fragments (e). (f,g) Insert size distribution from sequenced human DNA using the standard (f) and
modified (g) paired-end library preparation protocols.
clusters with more uniform diameter.
Number of reads (millions)

Paired-end size selection. Using standard protocols, we found singleend library preparations, using single- or paired-end adapters, to be
considerably more robust at the step of size selection by electrophoresis than their paired-end counterparts. With single-end preps, we
excise a band of 50 bp or larger, which generally yields more than
enough DNA to give a high yield of PCR products. However, for the
paired-end protocol, to generate as narrow an insert size range as
possible, a scalpel is inserted into the gel at the desired position, the
blade is washed with Tris buffer, and this buffer acts as the template
for the PCR amplification. We found this practice to yield enough
DNA to give successful amplification only in approximately 30–40%
of attempts. To overcome this, we now excise a 2-mm-wide gel slice
containing DNA of the desired size and extract that following the
Illumina protocol, though with no heating during the melting step,
as discussed above (Supplementary Protocol 3). In our hands this
typically yields 10–20 times more DNA than the standard protocol,
has an almost 100% success rate and generates an acceptably narrow
size distribution of paired-end reads (Fig. 3f,g).

nature methods | VOL.5 NO.12 | DECEMBER 2008 | 1007

perspective
Figure 4 | PCR. (a) An ~200-bp fragment library was prepared, and 10 ng
was amplified for 18 cycles using standard Illumina PCR conditions or
optimized PCR conditions. (b) A comparison of methods of PCR amplicon
purification. We prepared a paired-end library with phiX DNA using conditions
that would promote the formation of adapter dimers and unextended PCR
primers. After PCR, we divided the library into two: half was purified following
the standard Illumina protocol, through a QiaQuick PCR cleanup column (left),
whereas the other was purified using SPRI technology (right).

a
280
240
200

Optimal PCR conditions

160

Standard PCR conditions

120
80
40
0
100 150

Column
cleanup

300 400 500
Size (bp)
Ladder

700 1,500

Band
size (bp)

Ladder

PCR yield. By the use of alternative high-fidelity polymerases in
a more optimized reaction, we have found it possible to increase
the yield of the enrichment PCR five- to tenfold (Supplementary
Protocol 7 online and Fig. 4a), which allows fewer cycles of amplification to be performed.

SPRI
cleanup

1,400
1,000
1,000
900
800
700
600
500
400
300
200

Adapter dimers
PCR primers

PCR cleanup. Surplus PCR primers may interfere with quantification
and will compete with the amplicon for hybridization to the flowcell
surface. Consequently, it is necessary to remove surplus oligos after
amplification. We have found that solid-phase reversible immobilization (SPRI) technology10 can be used to remove a higher proportion
of primers and adapter dimers than spin columns, while producing
a comparable yield of amplicon DNA, and allows elution in a wider
variety of buffers (Supplementary Protocol 8 online and Fig. 4b).

Template quantity. By using optimized quantities of template in the
PCR, we can ensure a clean library, free of adapter-dimer or singlestranded DNA. We routinely analyze our sequencing libraries after
PCR by microfluidic capillary electrophoresis and have noticed that
the quality of the library obtained decreases with increasing concentration of template DNA: too much template DNA often results in
the accumulation of an apparently higher-molecular-weight peak
(Fig. 3d), which may represent a single-stranded template product
that accumulates as primers become depleted. Conversely, the lower
the mass of DNA used in the PCR, the fewer the number of template
strands for the same fragment size, and the greater the incidence of
PCR product duplicates in the resulting sequences: we have observed
libraries from which as many as 60% of sequences were PCR product
duplicates. Thus it is essential to choose the appropriate set of conditions for each PCR (Supplementary Protocol 6 online).

b
35,000
30,000
500-bp fragments

25,000

200-bp fragments

20,000
15,000
10,000

Unfiltered cluster number per tile

Number of clusers after purity filtering

Sequencing without PCR. We have found that it is unnecessary to
retain the PCR step to enrich for properly ligated fragments, so long
as only those fragments with an adapter at either end can be quantified, as only they will yield clusters that can be sequenced. This can
be done by quantitative PCR, discussed below. Thus we can eliminate the PCR step entirely, simply by ligating on appropriate adapters
after A-tailing (Supplementary Protocol 9 online). For this purpose
we use high-performance liquid chromatography–purified, partially
noncomplementary oligos with a phosphorothioate linkage between
the two bases at the 3′ end of one strand. From a starting amount
of 5 µg of DNA and fractionation by AFA, we can obtain sufficient
paired-end DNA for > 400 lanes of high-density clusters, or 100 lanes
if nebulization is used to fragment the DNA. The obvious benefits
of this are that PCR duplicates are absent: the observed duplication

70,000
Median cluster number
Upper and lower quartiles
1.5 interquartile range

50,000

Outlier
Fluctuation in median cluster
number from bin to bin
qPCR assay introduced

30,000

10,000

5,000
0

0
0

10,000 20,000 30,000 40,000 50,000 60,000 70,000
Total number of clusters detected

75 150 250 350

450 550 650 750
Run number

850

Figure 5 | Quantification. (a) Cluster throughput as a function of total clusters for 200- and 500-bp inserts. The 500-bp inserts underwent fewer cycles of
cluster amplification (28, compared to 35 for the 200-bp libraries), resulting in smaller clusters, and so a cluster density of 40–44k per tile (GA1) will produce
the maximum yield from either insert size. (b) Standardization of cluster density with qPCR quantification. Runs were grouped into 25-run bins, and a boxplot
was generated. After some initial problems with degradation of standards, cluster number has leveled out at ~35,000–40,000 per tile.
1008 | VOL.5 NO.12 | DECEMBER 2008 | nature methods

perspective

Quantification
At close to neutral pH, the concentration of DNA going onto the
flowcell governs the number of clusters produced. Thus, for different fragment sizes undergoing a given number of cycles of cluster
amplification, there is an optimal concentration range of DNA that
will yield clusters in the optimal density range, enabling the maximum amount of data to be obtained. For fragments with a mean
insert size of 500 bp or lower, we aim for 40,000–44,000 clusters per
imaged area (tile) on the Genome Analyzer model 1, giving an average of 20,000–25,000 filtered clusters per tile, equating to 2.0–2.4
Gb per single end run (150–170,000 clusters per tile for the Genome
Analyzer model 2; Fig. 5a).
Overestimation of DNA concentration results in too few clusters, which may make the flowcell uneconomical to sequence.
Underestimation results in too high a cluster density, which can
greatly reduce the amount of data obtained, owing to cluster overlap.
Quantification of DNA before sequencing is thus one of the key factors in the process.
Electrophoresis. We found the accuracy of spectrophotometry to be
inadequate for quantification: cluster density based on this method
tended to be inconsistent, but typically five- to tenfold lower than
anticipated, presumably because spectrophotometry analysis measures not only the intended amplicon but also adapter dimers and
unextended primers, with no way of distinguishing between them,
and also struggles to measure low DNA concentrations accurately. By
quantifying libraries electrophoretically, with an Agilent Bioanalyzer,
we have been able to achieve a much more consistent cluster density. Additionally, because electrophoresis can be used to distinguish
between DNA species on the basis of size, it provides a way to check
the quality of the library preparation. However, for a small proportion
of libraries, we obtained far higher cluster densities, and consequently
far less useful data, than anticipated. We assume that this is a result
of single-stranded DNA generated in the PCR that cannot be easily
quantified when mixed with double-stranded DNA.
Quantitative PCR. This led us to develop a qPCR quantification
assay (for discussion see ref. 11) because such an approach should
be capable of detecting and quantifying all amplifiable molecules.
We designed amplification primers and a dual-labeled probe to
target the Illumina paired-end adapter sequences (Supplementary
Protocol 11 online). We quantify unknown libraries against standard libraries that have been sequenced previously, and for which
we know the accurate cluster number, and how this relates to the
Agilent concentration of that library. Because amplification in the
qPCR with these primers is not perfectly efficient, we use 3 dilutions
of standard libraries (100, 10 and 1 pM), and dilute the unknown

Denaturation
Being single-stranded, array eluates require no particular steps to
denature the DNA12 before sequencing. However, for low concentrations (<1 nM) of double-stranded DNA it is more problematic:
denaturation by heating has the potential both to damage the DNA
and to introduce anti-(G+C) bias13.
Modified hybridization buffers. For all denaturation we prefer the use of 0.1 M NaOH to heating, though for subnanomolar
libraries this requires an alternative hybridization buffer to be used
(Supplementary Protocol 12 online). We have found the addi-

14
13
Water

12
pH

Direct sequencing of short amplicons. To avoid unnecessary PCR
amplification steps, which would potentially exacerbate biases, we can
perform extremely deep sequencing of short amplicons using locusspecific primers that possess tails that can hybridize to the oligos tethered to the flowcell surface. The tailless forward and reverse oligos
are then used as primers in the sequencing steps (Supplementary
Protocol 10 online).

library to 10 pM, based on the concentration as measured using
an Agilent Bioanalyzer 2100. With this assay, we take the Agilentderived concentration values to be arbitrary, allowing us to dilute
unknowns to a concentration that lies within the 1–100 pM range,
but Agilent values also provide a useful double-check. We have found
that cluster density can be predicted reliably in this way (Fig. 5b).
We have found that the ability to quantify DNA in the picomolar
concentration range also opens up the opportunity for sequencing
much lower DNA concentrations than those permitted by the standard protocol, such as unamplified array eluates from a sequencecapture experiment5,6.

5× SSC

5× SSC + 2 mM Tris

5× SSC + 10 mM Tris

5× SSC + 50 mM Tris

8
7
0

5
10
15
Volume 0.1 M NaOH added (µl)

b
30,000
Cluster/tile

rate in mapped gorilla DNA sequences prepared without PCR was
approximately 0.5%. This rate of duplication is caused by noise in the
cluster detection and sequence analysis software.

5× SSC

20,000

5× SSC + 5 mM Tris
10,000

0
0

5
10
15
Volume denatured template added (µl)

Figure 6 | Denaturation. (a) pH titration of hybridization buffers. Following
denaturation, the concentration of NaOH in DNA templates is 0.1 M
NaOH. Adding more than 8 µl of this denatured template to the 1 ml of
Hybridization Buffer (5× SSC, 0.1% Tween-20), before loading DNA onto the
flowcell, increases the pH to above 10. This prevents efficient hybridization,
and thus the cluster density falls. The addition of Tris-HCl pH 7.3 to the
supplied bottles of Hybridization Buffer dramatically increases buffering
capacity, making template hybridization more robust. (b) The addition
of 5 mM Tris-HCl pH 7.3 to Illumina Hybridization Buffer allows a greater
volume of denatured template to be added before high pH prevents effective
annealing of templates to the oligos on the flowcell surface. This increases
the robustness of cluster generation by counteracting pipetting errors in the
denaturation step.
nature methods | VOL.5 NO.12 | DECEMBER 2008 | 1009

perspective

tion of Tris to the standard Illumina Hybridization Buffer to be
beneficial to the robustness of the initial hybridization of DNA
to the flowcell for all libraries, because it can counteract pipetting
errors during the denaturation stage that would otherwise raise
the pH to a level that would prevent efficient hybridization (Fig. 6).
Additionally, diluting the supplied 2 M NaOH and adding a greater
volume to the 20 µl denaturation reaction helps to reduce fluctuation in cluster number owing to pipetting errors (Supplementary
Protocol 12).
Amplification quality control
After cluster amplification, DNA on the flowcell is double-stranded,
which allows clusters to be stained by an intercalating dye and to be
detected using a fluorescence microscope (Supplementary Protocol
13 online). This is a valuable quality control step, that we use for all
flowcells before linearization and blocking to confirm that the cluster density is appropriate. We generally do not sequence data from
flowcells that have too high or too low a cluster density beyond the
amplification stage.
Conclusion
The Genome Analyzer is a powerful sequencing technology, yet still
relatively new, and consequently it has not yet reached its full sequencing potential. Here we have described modifications that allow for
more efficient library preparation and enable a stable workflow in a
production environment.
At the Sanger Institute, in addition to a sequencing research and
development team we have several teams who are responsible for
keeping the production instruments running. A library-making
group processes samples, and generates, quality-controls and quantifies libraries. A production group, working in shifts, prepares and
quality-controls flowcells by SybrGreen staining, prepares reagents for
sequencing and manages washing, priming and loading the instruments seven days per week. Informatics teams are responsible for
facilitating sample tracking, for handling the sequence data and for
performing pipeline analyses. All steps in the process are recorded
using custom-written lab-tracking and run-tracking database software. All Genome Analyzers are networked, and the generated image
data are continually uploaded to a large compute and disk-storage
cluster for image and base-calling analysis, alignment and assembly,
and other informatics tasks. We keep images for about 1 month on a
disk server, but we store the run quality control and other run details
in a database and deposit short-read sequences for permanent storage in a large repository. A team of project managers coordinate and
oversee individual sequencing projects.
We have recently upgraded all of our Genome Analyzers to the
model 2. The wider flowcells used by upgraded machines offer a 40%
greater imaging area, with the potential for increased read lengths
(>70 bases) of a higher quality (below 1% error in a phiX control lane
for 1–50 bases). Combined with improvements to the image analysis

1010 | VOL.5 NO.12 | DECEMBER 2008 | nature methods

software and a faster run time, both of which we are currently testing,
a conservative prediction is that by the end of 2008, our output will
reach 6–10 terabases of high-quality sequence per year, equivalent to
180 human genomes at 15-fold coverage, or approximately 200,000
bases per second.
The improved workflow and high yield should maintain the
Genome Analyzer as our next-generation sequencing platform of
choice for the immediate future. How long this remains true depends
upon the performance of existing rival technologies: Roche’s 454,
ABI’s SOLiD, Helicos’ ‘True Single Molecule Sequencing’ and Dover
Systems’ Polonator, and those that are on the horizon, such as nanopore technologies, for example Oxford Nanopore Technologies, the
Harvard Nanopore Group, and Pacific Biosciences’ Single Molecule
Real Time technology, which promise to bring us closer to the eagerly
anticipated $1,000 genome.
Note: Supplementary information is available on the Nature Methods website.
ACKNOWLEDGMENTS
We thank all the staff at Illumina for their support, particularly T. Ost, M. Gibbs, J.
Smith, N. Gormley, V. Smith and K. Hall. We also thank C. Brown, A. Brown,
R. Pettett, T. Skelly, N. Whiteford, L. Mamanova, E. Sheridan and E. Huckle for helpful
discussions and assistance.
COMPETING INTERESTS STATEMENT
The authors declare competing financial interests: details accompany the full-text
HTML version of the paper at http://www.nature.com/naturemethods/.
Published online at http://www.nature.com/naturemethods/
Reprints and permissions information is available online at http://npg.
nature.com/reprintsandpermissions/
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.

Bentley, D.R. Whole-genome re-sequencing. Curr. Opin. Genet. Dev. 16, 545–552
(2006).
Mardis, E.R. The impact of next-generation sequencing technology on genetics.
Trends Genet. 24, 133–141 (2008).
Smith, D.R. et al. Rapid whole-genome mutational profiling using nextgeneration sequencing technologies. Genome Res. 18, 1638–1642 (2008).
Surzycki, S. DNA sequencing. in Basic Techniques in Molecular Biology 377–380
(Springer-Verlag, Berlin, 2000).
Albert, T.J. et al. Direct selection of human genomic loci by microarray
hybridization. Nat. Methods 4, 903–905 (2007).
Hodges, E. et al. Genome-wide in situ exon capture for selective resequencing.
Nat. Genet. 39, 1522–1527 (2007).
Sambrook, J., Fritsch, E. & Maniatis, T. Molecular Cloning: A Laboratory Manual,
(Cold Spring Harbor Laboratory Press, Cold Spring Harbor, New York, 1989).
Smith, D. & Malek, J. Asymmetrical adapters and uses thereof. US patent
0172839 (2007).
Riley, J. et al. A novel, rapid method for the isolation of terminal sequences
from yeast artificial chromosome (YAC) clones. Nucleic Acids Res. 18, 2887–2890
(1990).
Hawkins, T.L., O’Connor-Morin, T., Roy, A. & Santillan, C. DNA purification and
isolation using a solid-phase. Nucleic Acids Res. 22, 4543–4544 (1994).
Meyer, M. et al. From micrograms to picograms: quantitative PCR reduces the
material demands of high-throughput sequencing. Nucleic Acids Res. 36, e5
(2008).
Thomas, R. The denaturation of DNA. Gene 135, 77–79 (1993).
Mandel, M. & Marmur, J. Use of ultraviolet absorbance-temperature profile for
determining the guanine plus cytosine content of DNA. Methods Enzymol. 12,
195–206 (1968).

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : No
XMP Toolkit                     : Adobe XMP Core 4.0-c316 44.253921, Sun Oct 01 2006 17:14:39
Create Date                     : 2008:11:12 11:20:27-05:00
Metadata Date                   : 2008:11:15 14:24:35+05:30
Modify Date                     : 2008:11:15 14:24:35+05:30
Creator Tool                    : Adobe InDesign CS3 (5.0)
Thumbnail Format                : JPEG
Thumbnail Width                 : 256
Thumbnail Height                : 256
Thumbnail Image                 : (Binary data 10621 bytes, use -b option to extract)
Document ID                     : adobe:docid:indd:18520b13-2c78-11dc-a8a5-9fd3f24b9e1e
Instance ID                     : uuid:8279f1fd-f2ae-479f-abc6-e4c6812a7226
Rendition Class                 : proof:pdf
Derived From Instance ID        : 4b399694-1f37-11dc-8de5-9a9a14eaf2d2
Derived From Document ID        : adobe:docid:indd:82fc0970-1482-11d9-8310-f246cfba8cb1
Manifest Link Form              : ReferenceStream, ReferenceStream, ReferenceStream, ReferenceStream, ReferenceStream, ReferenceStream
Manifest Placed X Resolution    : 72.00, 72.00, 72.00, 72.00, 72.00, 72.00
Manifest Placed Y Resolution    : 72.00, 72.00, 72.00, 72.00, 72.00, 72.00
Manifest Placed Resolution Unit : Inches, Inches, Inches, Inches, Inches, Inches
Manifest Reference Instance ID  : uuid:D93CB644A1B0DD119EA99FEDB008260D, uuid:5288C1EFB17411DD89E99FC90A01B2EC, uuid:D9F6893F35B2DD119B39D1917C678248, uuid:DB3CB644A1B0DD119EA99FEDB008260D, uuid:DAF6893F35B2DD119B39D1917C678248, uuid:DCF6893F35B2DD119B39D1917C678248
Manifest Reference Document ID  : uuid:E17A9F51AE4111DD89C8C7ED42CCB758, uuid:CF014DFF4BAEDD11B321B8BF9FEF3F1C, uuid:D1014DFF4BAEDD11B321B8BF9FEF3F1C, uuid:E17A9F53AE4111DD89C8C7ED42CCB758, uuid:D4014DFF4BAEDD11B321B8BF9FEF3F1C, uuid:D6014DFF4BAEDD11B321B8BF9FEF3F1C
Format                          : application/pdf
Producer                        : Adobe PDF Library 8.0
Trapped                         : False
Page Count                      : 6
Creator                         : Adobe InDesign CS3 (5.0)

EXIF Metadata provided by EXIF.tools

Sanger_paper Sanger Paper

Navigation menu

Versions of this User Manual:

Views

Navigation