Bcl2fastq2 Conversion Software V2.18 User Guide V2 18 15051736 01

User Manual:

Open the PDF directly: View PDF .
Page Count: 34

Download
Open PDF In Browser	View PDF

bcl2fastq2 Conversion v2.18
User Guide
For Research Use Only. Not for use in diagnostic procedures.

Introduction
Install bcl2fastq2 Conversion Software
BCL Conversion Input Files
Sample Sheet
Run BCL Conversion and Demultiplexing
BCL Conversion Output Files
Troubleshooting
Appendix: Installation Requirements
Revision History
Technical Assistance

ILLUMINA PROPRIETARY
Document # 15051736 v01
April 2016

3
5
7
13
16
21
28
29
31

This document and its contents are proprietary to Illumina, Inc. and its affiliates ("Illumina"), and are intended solely for the
contractual use of its customer in connection with the use of the product(s) described herein and for no other purpose. This
document and its contents shall not be used or distributed for any other purpose and/or otherwise communicated, disclosed,
or reproduced in any way whatsoever without the prior written consent of Illumina. Illumina does not convey any license
under its patent, trademark, copyright, or common-law rights nor similar rights of any third parties by this document.
The instructions in this document must be strictly and explicitly followed by qualified and properly trained personnel in order
to ensure the proper and safe use of the product(s) described herein. All of the contents of this document must be fully read
and understood prior to using such product(s).
FAILURE TO COMPLETELY READ AND EXPLICITLY FOLLOW ALL OF THE INSTRUCTIONS CONTAINED HEREIN
MAY RESULT IN DAMAGE TO THE PRODUCT(S), INJURY TO PERSONS, INCLUDING TO USERS OR OTHERS, AND
DAMAGE TO OTHER PROPERTY.
ILLUMINA DOES NOT ASSUME ANY LIABILITY ARISING OUT OF THE IMPROPER USE OF THE PRODUCT(S)
DESCRIBED HEREIN (INCLUDING PARTS THEREOF OR SOFTWARE).
© 2016 Illumina, Inc. All rights reserved.
Illumina, 24sure, BaseSpace, BeadArray, BlueFish, BlueFuse, BlueGnome, cBot, CSPro, CytoChip, DesignStudio,
Epicentre, ForenSeq, Genetic Energy, GenomeStudio, GoldenGate, HiScan, HiSeq, HiSeq X, Infinium, iScan, iSelect,
MiniSeq, MiSeq, MiSeqDx, MiSeq FGx, NeoPrep, NextBio, Nextera, NextSeq, Powered by Illumina, SureMDA,
TruGenome, TruSeq, TruSight, Understand Your Genome, UYG, VeraCode, verifi, VeriSeq, the pumpkin orange color,
and the streaming bases design are trademarks of Illumina, Inc. and/or its affiliate(s) in the U.S. and/or other countries. All
other names, logos, and other trademarks are the property of their respective owners.

The Illumina sequencing instruments generate per-cycle base call (BCL) files at the end of
the sequencing run. A majority of analysis applications use per-read FASTQ files as
input for analysis. You can use the bcl2fastq2 Conversion Software v2.18 to convert base
call (BCL) files from a sequencing run into FASTQ files.
Use this guide to install the bcl2fastq2 Conversion Software and run the BCL conversion
and demultiplexing process.

Supported Instruments
The bcl2fastq2 Conversion Software supports the following instruments:
} MiniSeq
} MiSeq
} NextSeq 500, 550
} HiSeq X
} HiSeq 2000, 2500, 3000, 4000
If your Illumina sequencing system runs a earlier software version of Real-Time Analysis
(RTA) than v1.18.54 and you want to convert BCL to FASTQ, install bcl2fastq v1.8.4, and
refer to the bcl2fastq Conversion User Guide Version v1.8.4 (part # 15038058) for instructions.

BCL Conversion and Demultiplexing Directory
The bcl2fastq2 Conversion Software performs BCL conversion and demultiplexing in a
single step. By default, the software puts the resulting demultiplexed compressed FASTQ
files in /Data/Intensities/BaseCalls.
The software puts reads with undetermined indexes in files that begin with
Undetermined_S0_, unless the sample sheet specifies a sample ID or sample name for
reads without an index.
If the Sample_Project column is specified for a sample in the sample sheet, the FASTQ
files for that sample are placed in /Data/Intensities/BaseCalls/.
Multiple samples can use the same project directory. If the Sample_ID and Sample_
Name columns are specified but do not match, the FASTQ files are placed in an
additional sub-directory called .

BCL to FASTQ Conversion Process
The bcl2fastq2 Conversion Software converts the base calls in the per-cycle BCL files to
the per-read FASTQ format. As an option, the software can trim adapters and remove
Unique Molecular Identifier (UMI) bases from reads.
Adapter Trimming—The bcl2fastq2 Conversion Software checks whether a read extends
past the sample DNA insert and into the adapter sequence. The software uses an
approximate string matching algorithm to identify all or part of the adapter, and treats
the insertions and deletions as a single mismatch. If an adapter sequence is detected,
base calls matching the adapter and beyond the match are masked or removed in the
FASTQ file.
Unique Molecular Indentifiers (UMIs) Removal—UMIs are random k-mers attached to
the genomic DNA before polymerase chain reaction (PCR) amplification. After the UMI is
amplified with amplicons, the software can detect PCR duplicates and correct
amplification errors and can remove these bases and places them into the read name in

bcl2fastq2 Conversion Software v2.18 Guide

Introduction

the FASTQ files. Also, when the TrimUMI sample sheet setting is active, the software can
remove the bases from the reads.
Demultiplexing—First, the software reorganizes the FASTQ files based on the index
sequencing information. For best practices, avoid choosing indexes that differ by fewer
than 3 bases during sample preparation. Then, the software generates the statistics and
reports for the demultiplexed FASTQ files. Also, the software recalculates the base calling
analysis statistics and store the statistics in the InterOp folder. You can view the
statistics with the Sequencing Analysis Viewer (SAV) software from Illumina.
Output Files
} FASTQ Files
} InterOp Files
} ConversionStats File
} DemultiplexingStats File
} Adapter Trimming File
} FastqSummary and DemuxSummary
} HTML Reports
} JSON File

Document # 15051736 v01

You can download the bcl2fastq2 Conversion Software from the Downloads page on the
Illumina website.
For installation requirements, see Appendix: Installation Requirements on page 29.

Install from RPM Package
You need to have access the root system to install.
1

To install the RPM file, use the following command line:
yum install -y

The starting point for the bcl2fastq converter is the binary executable
/usr/local/bin/bcl2fastq.
2

To install the RPM package in a user specified location, use the following command
line:
rpm --install --prefix

Install from Source
For installation, the directory locations are specified with the following environment
variables:
Variables
SOURCE
BUILD
INSTALL_DIR

Description
Location of the bcl2fastq2 source code
Location of the build directory
Location where the executable is installed

For example, the environment variables can be set as:
export TMP=/tmp
export SOURCE=${TMP}/bcl2fastq
export BUILD=${TMP}/bcl2fastq2-v2.18.x-build
export INSTALL_DIR=/usr/local/bcl2fastq2-v2.18.x
The build directory must be different from the source directory.
Follow these steps to install from source:
1

Decompress and extract the source code.
cd ${TMP}
tar -xvzf path-to-tarball/bcl2fastq2-v2.18.x.tar.gz
This command creates a bcl2fastq sub-directory in the ${TMP} directory.

Configure the build using the following commands:
mkdir ${BUILD}
cd ${BUILD}
${SOURCE}/src/configure --prefix=${INSTALL_DIR}
The commands in step 2 create a build directory. Move WHAT to that directory, and
then run the configuration in the directory.
The --prefix parameter provides the absolute path to the install the directory.
The command creates a sub-directory in the ${TMP} directory.

bcl2fastq2 Conversion Software v2.18 Guide

Install bcl2fastq2 Conversion Software

Build the package using the following commands:
make

Install the package using the following commands:
make install
Depending on the ${INSTALL_DIR} directory, you may need root privilege.

Document # 15051736 v01

After sequencing, the instruments generate a BaseCalls directory, which contains the base
calls files (BCL), for demultiplexing.
For demultiplexing, the bcl2fastq2 Conversion Software requires the following input files:
Instrument

Input Files

MiSeq and HiSeq 2000/2500

• BCL Files (*.bcl.gz)
• STATS Files
• FILTER Files
• CONTROL Files
• Position Files
• RunInfo Files
• Config Files
• Sample Sheet Files (optional)

MiniSeq and NextSeq 500/550

• BCL Files (*bcl.bgzf)
• BCI Files
• FILTER Files
• Position Files
• RunInfo Files
• Sample Sheet Files (optional)

HiSeq X and HiSeq 3000/4000

• BCL Files (*.bcl.gz)
• FILTER Files
• Position Files
• RunInfo Files
• Sample Sheet Files (optional)

bcl2fastq2 Conversion Software v2.18 Guide

BCL Conversion Input Files

BCL Conversion Input Files Diagram
Figure 1 BCL Conversion Input Files from the MiSeq or HiSeq 2000/2500 System

Document # 15051736 v01

BCL Conversion Input Files

Figure 2 BCL Conversion Input Files from the MiniSeq or NextSeq System

Figure 3 BCL Conversion Input Files from the HiSeq X System

Folder and File Naming
The top-level run folder name is generated using 3 fields to identify the
, separated by underscores.

bcl2fastq2 Conversion Software v2.18 Guide

The software generates the top-level run folder using 3 fields separated by underscores to
identify the .
Example:
YYMMDD_machinename_NNNN
For best practices, do not deviate from the run folder naming convention because doing
so can cause the software to stop.
} The first field is a six-digit number (YYMMDD) specifying the date of the run.
} The second field specifies the name of the sequencing machine. The field can consist
of any combination of upper or lower case letters, digits, or hyphens, but it cannot
contain any other characters or underscore.
} The third field is a four-digit specifies the experiment ID on that instrument. Each
instrument supplies a series of consecutively numbered experiment IDs from the onboard sample tracking database or a LIMS.
For best practices, we recommend that you create unique names for the experiment or
sample IDs for each instrument to avoid naming conflicts.
For example, a run folder named 150108_instrument1_3147 indicates that the
experiment ID is 3147; the run is on instrument 1, and the date is on January 8, 2015
(YYMMDD). The date and instrument name specify a unique run folder for any number
of instruments.
Also, you can view the flow cell number in the run folder name.
Example:
YYMMDD_machinename_NNNN_FCYYY
When you publish the data to a public database, we recommend that you use a prefix for
each instrument with the identity of the sequencing center.

BCL Files
The BCL files are compressed with the gzip (*.gz) or the blocked GNU zip (*.bgzf) format.
The BaseCalls directory contains the BCL files. You can locate the files from the following
directory:
Data/Intensities/BaseCalls/L/C.1
Table 1 BCL File Format
Bytes
Description
Bytes 0–3

Number N of cluster

Bytes 4–(N+3)
N—Cluster
index

Bits 0–1 are the bases, [A, C, G, T] for [0, 1, 2, 3]:
bits 2–7 are shifted by 2 bits and contain the quality
score.
All bits with 0 in a byte is reserved for no call.

Data type
Unsigned 32 bits
integer
Unsigned 8 bits
integer

BCI Files
The BCI (*.bci) files contain one record per tile for the sequencing run in binary format.
You can locate these files from the following directory:
/Data/Intensities/BaseCalls/L

Document # 15051736 v01

Description
Tile number
Number of clusters in the tile

STATS Files
The STATS file (*.stats) is a binary file that contains base calling statistics. You can locate
these files from the following directory:
Data/Intensities/BaseCalls/L00/C.1
Table 3 Stats File Format
Start
Description

Data Type

Byte 0

Cycle number

integer

Byte 4

Average Cycle Intensity

double

Byte 12

Average intensity for A over all clusters with intensity for A

double

Byte 20

Average intensity for C over all clusters with intensity for C

double

Byte 28

Average intensity for G over all clusters with intensity for G

double

Byte 36

Average intensity for T over all clusters with intensity for T

double

Byte 44

Average intensity for A over clusters with base call A

double

Byte 52

Average intensity for C over clusters with base call C

double

Byte 60

Average intensity for G over clusters with base call G

double

Byte 68

Average intensity for T over clusters with base call T

double

Byte 76

Number of clusters with base call A

integer

Byte 80

Number of clusters with base call C

integer

Byte 84

Number of clusters with base call G

integer

Byte 88

Number of clusters with base call T

integer

Byte 92

Number of clusters with base call X

integer

Byte 96

Number of clusters with intensity for A

integer

Byte 100

Number of clusters with intensity for C

integer

Byte 104

Number of clusters with intensity for G

integer

Byte 108

Number of clusters with intensity for T

integer

FILTER Files
The FILTER file (*.filter) is a binary file that contains the filter results. You can locate
these files from the following directory:
Data/Intensities/BaseCalls/L
Table 4 Filter File Format
Bytes
Bytes 0–3
Bytes 4–7
Bytes 8–11
Bytes 12–(N+11)
N—cluster number

bcl2fastq2 Conversion Software v2.18 Guide

Description
Zero value (for backwards compatibility)
Filter format version number
Number of clusters
Unsigned 8 bits integer
Bit 0 is pass or failed filter

BCL Conversion Input Files

Table 2 BCI File Format
Bytes
Bytes 0–3
Bytes 4–7

CONTROL Files
The CONTROL (*.control) file is a binary files that contains the control results. You can
locate these files from the following directory:
/Data/Intensities/BaseCalls/L00/
Table 5 Control File Format
Bytes
Description
Bytes 0–3
Zero value (for backwards compatibility)
Bytes 4–7
Format version number
Bytes 8–11
Number of clusters
Bytes 12–(2xN+11)
The bit number indicates the following:
N—cluster index
• Bit 0: always empty (0)
• Bit 1: was the read identified as a control?
• Bit 2: was the match ambiguous?
• Bit 3: did the read match the PhiX tag?
• Bit 4: did the read align to match the PhiX tag?
• Bit 5: did the read match the control index sequence?
• Bits 6,7: reserved for future use
• Bits 8..15: the report key for the matched record in the
controls.fasta file (specified by the REPORT_KEY metadata)

CONFIG Files
The CONFIG (*config.xml) file records information specific to the generation of the
subfolders. The file contains a tag-value list that describes the cycle-image folders used to
generate each folder of intensity and sequence files. You can locate the file from the
following directory:
/Data/Intensities/
The other CONFIG (*config.xml) file is in the BaseCalls directory, which contains the
meta-information on the base caller runs. You can locate the file from the following
directory:
/Data/Intensities/BaseCalls/

Position Files
The BCL to FASTQ converter can use different types of position files.
The LOCS (*.locs) file is a binary file that contains the cluster positions. Additionally, the
*.clocs files are compressed versions of LOCS files.
The *_pos.txt files are text-based files with 2 columns and a number of rows equal to the
number of clusters. The first column is the X-coordinate and the second column is the Ycoordinate. Each line has a at the end.
You can locate these files from the following directory:
Data/Intensities/BaseCalls/L

RunInfo File
The RunInfo.xml file is located at the top-level run folder . The file
contains information on the run, flow cell, and instrument IDs, date and read structure.
Also, the file provides the number of reads, the number of cycles per read, and the index
reads.

Document # 15051736 v01

The sample sheet (*SampleSheet.csv) file provides information on the relationship
between samples and indexes during library creation. The sample sheet is optional and
is at the top-level run folder. When a sample sheet is not provided, all reads are assigned
to the default sample Undetermined_S0, which includes one file per lane per read.

Settings Section
The bcl2fastq2 Conversion Software uses the adapter settings for adapter trimming.
Table 6 Adapter Specifications
Setting
Description
Adapter or TrimAdapter The adapter sequence to be trimmed. If an AdapterRead2 is
provided, this sequence is only used to trim Read 1.
AdapterRead2 or
The adapter sequence to be trimmed in Read 2. If not
TrimAdapterRead2
provided, the same sequence specified in Adapter is used.
MaskAdapter
The adapter sequence to be masked rather than trimmed. If
MaskAdapterRead2 is provided, this sequence is only used to
mask Read 1.
MaskAdapterRead2
The adapter sequence to be masked in Read 2. If not provided,
the same sequence specified in MaskAdapter is used.
FindAdapterWithIndels 1 (default) or 0. If 1 (true), an approximate string matching
algorithm is used to identify the adapter, treating insertions
and deletions as a single mismatch (Myers 1999, J.ACM). If 0
(false), a sliding window algorithm is used, in which insertions
and deletions of bases inside the adapter sequence is not
tolerated.
Table 7 Cycle and Tile Specifications
Setting
Description
Read1EndWithCycle
The last cycle to use for Read 1.
Read2EndWithCycle
The last cycle to use for Read 2.
Read1StartFromCycle
The first cycle to use for Read 1.
Read2StartFromCycle
The first cycle to use for Read 2.
Read1UMILength
The length of the UMI used for Read 1.
Read2UMILength
The length of the UMI used for Read 2.
Read1UMIStartFromCycle The first cycle to use for UMI in Read 1.
The cycle index is absolute and not affected by
Read1StartFromCycle. The software supports UMIs only
at the beginning or end of reads.
Read2UMIStartFromCycle The first cycle to use for UMI in Read 2.
The cycle index is absolute and not affected by
Read2StartFromCycle. The software currently supports
UMIs only at the beginning or end of reads.
TrimUMI
0 (default) or 1 (true). When TrimUMI setting is set to 1, the
software trims the UMI bases from Read 1 and Read 2.
ExcludeTiles
Tiles to exclude. Separate tiles using a plus sign [+], or
specified as a range with a hyphen [-]. For
example, ExcludeTiles,1101+2201+1301-1306
means skip tiles 1101, 2201, and 1301 through 1306.
ExcludeTilesLaneX
Tiles to exclude for Lane X. For example,
ExcludeTilesLane6,1101–1108 means skip tiles 1101
through 1108 for lane 6 only.

bcl2fastq2 Conversion Software v2.18 Guide

Sample Sheet

Table 8 FASTQ Specifications
Setting
Description
CreateFastqForIndexReads 0 (default) or 1. If 1 (true), generate FASTQ files for index
reads. Normally, these FASTQ files are not needed,
because demultiplexing is carried out automatically based
on the sample sheet. Also, the index sequence is already
placed in the sequence identifiers in the FASTQ files.
Generating FASTQ files is based on the following:
• The index read masks are specified from the --usebases-mask option.
• The RunInfo.xml file when the --use-bases-mask
option is not used.
ReverseComplement
0 (default) or 1. If 1 (true), all reads are reverse
complemented as they are written to FASTQ files. This
step is necessary in certain unusual cases (eg processing of
mate-pair data using BWA, which expects paired-end
data).

Data Section
The bcl2fastq2 Conversion Software uses the information in the columns of the Data
section.
Column
Sample_Project

Lane
Sample_ID
Sample_Name
index
index2

Description
The sample project name. The software creates a directory with
the specified sample project name and stores the FASTQ files
there. You can use multiple samples in the same project.
When specified, the software generates FASTQ files for only the
samples with the specified lane number.
The sample ID.
The sample name.
The index sequence.
The index sequence for index 2.

If the Sample_ID and Sample_Name columns do not match, the FASTQ files are placed
in an additional sub-directory called .
You can use alphanumeric characters, hyphens [-], and underscores [_] for the Sample_
Project, Sample_ID, and Sample_Name.

Sample Sheet Demultiplexing Scenarios
The Illumina Experiment Manager performs the following for sample sheet BCL
conversion and demultiplexing:
} All reads are placed in the Undetermined_S0 FASTQ files when there is no sample
sheet.
} All reads are placed in the Undetermined_S0 FASTQ files when there is a sample
sheet but no data section.
} All reads are placed in the sample FASTQ file as defined in the sample sheet when
there is a sample sheet and one sample has no indexes.
} When there is a sample sheet and the samples have indexes, the software performs
the following:
} Reads without a matching index are placed in the default Undetermined_S0
FASTQ files.

Document # 15051736 v01

For each sample, there is one file per lane per read number when reads exist for that
sample, lane, and read number.
NOTE
When the Lane column of the sample sheet Data section is populated, only those lanes are
converted. When the Lane column is not used, all lanes are converted.

Create a Sample Sheet with IEM
The Illumina Experiment Manager (IEM) software helps you create and edit sample
sheets for Illumina sequencers and analysis software. You can use IEM to create sample
sheets for any Illumina sequencer and for any Nextera or TruSeq libraries.
You can download EIM at support.illumina.com/sequencing/sequencing_
software/experiment_manager/downloads.html.
View the Illumina Experience Manager User Guide for creating a sample sheet.

bcl2fastq2 Conversion Software v2.18 Guide

Sample Sheet

} Reads with a valid index are placed in the sample FASTQ file as defined in the
sample sheet.

Run BCL Conversion and Demultiplexing
Use the following command to run the bcl2fastq2 Conversion Software :
nohup /usr/local/bin/bcl2fastq [options]
An example of a command with options:
nohup /usr/local/bin/bcl2fastq --runfolder-dir
--output-dir
This command produces a set of FASTQ files in the BaseCalls directory. Reads with an
unresolved or erroneous index are placed in the Undetermined_S0 FASTQ files. By
default, --runfolder-dir is the current directory and --output-dir is the
Data/Intensities/BaseCalls sub-directory of the run folder.
NOTE
To generate a log file for a problematic bcl2fastq run, use the -l or --min-log-level
DEBUG option. By default, bcl2fastq generates a log file with logging level INFO.

BCL2FASTQ Options
The main command line options are the --runfolder-dir and --output-dir. For
command line options that have a corresponding sample sheet setting, the value passed
on the command line overwrites the value found in the sample sheet.
Table 9 Main Options
Option
-R, --runfolder-dir
-o, --output-dir

Description
Path to run folder directory
Default: ./
Path to demultiplexed output
Default: /Data/Intensities/BaseCalls/

You can use the following advanced options for non-default settings or for customized
settings.
Table 10 Directory Options
Option
-i, --input-dir
--intensities-dir

--interop-dir
--stats-dir

--reports-dir
--sample-sheet

Description
Path to input directory
Default: /Data/Intensities/BaseCalls/
Path to intensities directory
If intensities directory is specified, then the input
directory must also be specified.
Default: /../
Path to demultiplexing statistics directory
Default: /InterOp/
Path to human-readable demultiplexing statistics
directory
Default: /Stats/
Path to reporting directory
Default: /Reports/
Path to sample sheet, so you can specify the location
and name of the sample sheet, if different from
default.
Default: /SampleSheet.csv

Document # 15051736 v01

The file i/o threads spend most of their time sleeping, and so take little processing time.
The processing of demultiplexed data is allocated 1 thread per CPU to make sure that
there are no idle CPUs, resulting in more threads than CPUs by default. You can use the
following options to provide control on threading. If, for example, you share your
computing resources with colleagues and wish to limit your usage, these options are
useful.
Table 11 Processing Options
Option
Description
-r, --loadingNumber of threads used for loading BCL data.
threads
Default depends on architecture.
-d,
Number of threads used for demultiplexing,
--demultiplexingDefault depends on architecture.
threads
-p,
Number of threads used for processing demultiplexed data.
--processing-threads Default depends on architecture.
-w,
Number of threads used for writing FASTQ data. This number
--writing-threads
must not be higher than number of samples.
Default depends on architecture.

If you want to use these options to assign multiple threads, consider the following:
} The most CPU demanding stage is the processing step (-p option). Assign this step
the most threads.
} The second most CPU demanding stage is the demultiplexing step (-d option).
Assign this step the second highest number of threads. Tests indicate 20% of
processing time is used for demultiplexing a HiSeq X run.
} Reading and writing stages are lightweight and do not need many threads. This
consideration is especially important for a local hard drive where too many threads
mean too many parallel read write actions giving suboptimal performance.
} Use one thread per CPU core plus a little more to supply CPU with work. This
method prevents CPUs being idle due to a thread being blocked while waiting for
another thread.
} The number of threads depends on the data. If you specify more writing threads than
samples, the extra threads do no work but can cost time due to context switching.
Table 12 Behavioral Options
Option
Description
--adapter-stringency The minimum match rate that would trigger the masking or
trimming process. This value is calculated as MatchCount /
(MatchCount + MismatchCount) and ranges from 0 to 1, but it is
not recommended to use any value < 0.5, as this value would
introduce too many false positives. The default value for this
parameter is 0.9, meaning that only reads with > 90% sequence
identity with the adapter are trimmed.
Default: 0.9

bcl2fastq2 Conversion Software v2.18 Guide

Run BCL Conversion and Demultiplexing

For processing, if your computing platform supports threading, the software manages the
threads by the following defaults:
} 4 threads for reading the data
} 4 threads for writing the data
} 20% for demultiplexing data
} 100% for processing demultiplexed data

Option
--aggregated-tiles

Description
This flag tells the converter about the structure of the input files.
Accepted values:
AUTO Automatically detects the tile setting
YES Tiles are aggregated into single input file
NO There are separate input files for individual tiles
Default: AUTO
--barcode-mismatches Number of allowed mismatches per index
Multiple entries, comma delimited allowed. Each entry is
applied to the corresponding index; last entry applies to all
remaining indexes.
Default: 1. Accepted values: 0, 1 or 2.
--create-fastq-for- Create FASTQ files also for Index Reads.
index-reads
Generating FASTQ files is based on the following:
• The index read masks are specified from the --use-bases-mask
option.
• The RunInfo.xml file when the --use-bases-mask option is not
used.
--ignore-missingMissing or corrupt BCL files are ignored. Assumes 'N'/'#' for
bcls
missing calls
--ignore-missingMissing or corrupt filter files are ignored. Assumes Passing
filter
Filter for all clusters in tiles where filter files are missing.
--ignore-missingMissing or corrupt positions files are ignored. If corresponding
positions
position files are missing, bcl2fastq writes unique coordinate
positions in FASTQ header.
--ignore-missingMissing or corrupt control files are ignored. Missing controls: 0
controls
--minimum-trimmedMinimum read length after adapter trimming. bcl2fastq trims
read-length
the adapter from the read down to the value of this parameter.
If there is more adapter match below this value, then those
bases are masked, not trimmed (replaced by N rather than
removed).
Default: 35
--mask-shortThis option applies when a read is trimmed to below the length
adapter-reads
specified by the --minimum-trimmed-read-length option
(default of 35). These parameters specify the following
behavior:
If the number of bases left after adapter trimming is less than -minimum-trimmed-read-length, force the read length to
be equal to --minimum-trimmed-read-length by masking
adapter bases (replace with Ns) that fall below this length.
If the number of ACGT bases left after this process falls below -mask-short-adapter-reads, mask all bases, resulting in a
read with --minimum-trimmed-read-length number of Ns.
Default: 22
--tiles
The --tiles argument takes a regular expression to select for
processing only a subset of the tiles available in the flow cell.
This argument can be specified multiple times, one time for each
regular expression. Examples:
To select all the tiles ending with 5 in all lanes:
--tiles [0–9][0–9][0–9]5
To select tile 2 in lane 1 and all the tiles in the other lanes:
--tiles s_1_0002 --tiles s_[2–8]

Document # 15051736 v01

Description
The --use-bases-mask string specifies how to use each cycle.
An n means ignore the cycle.
A Y (or y) means use the cycle.
An I means use the cycle for the Index Read.
A number means that the previous character is repeated that
many times.
An asterisk [*] means that the previous character is repeated
until the end of this read or index (length according to the
RunInfo.xml).
The read masks are separated with commas: ,
The format for dual indexing is as follows: --use-bases-mask
Y*,I*,I*,Y* or variations thereof as specified.
You can also specify the --use-bases-mask multiple times for
separate lanes, like this way:
--use-bases-mask 1:y*,i*,i*,y* --use-bases-mask
y*,n*,n*,y*
Where the 1: means: Use this setting for lane 1. In this case, the
second --use-bases-mask parameter is used for all other
lanes.
If this option is not specified, the mask is determined from the
'RunInfo.xml file in the run directory. If it cannot do this
determination, supply the --use-bases-mask.
When the --use-bases-mask option is specified, the number
of index cycles and the length of index in the sample sheet
should match.
--with-failed-reads Include all clusters in the output, even clusters that are non-PF.
These clusters would have been excluded by default.
--write-fastqGenerate FASTQ files containing reverse complements of actual
reverse-complement
data.
--no-bgzfTurn off BGZF compression, and use GZIP for FASTQ files.
compression
BGZF compression allows downstream applications to
decompress in parallel. This parameter is available in case a
consumer of FASTQ data cannot handle all standard GZIP
formats.
--fastq-compression- Zlib compression level (1–9) used for FASTQ files.
level
Default: 4
--no-lane-splitting Do not split FASTQ files by lane.
--find-adaptersFind adapters with simple sliding window algorithm. Insertions
with-sliding-window and deletions of bases inside the adapter sequence are not
handled.
NOTE
Do not use the --no-lane-splitting option if you want to upload the resulting FASTQ
files to BaseSpace. The FASTQ files generated from the --no-lane-splitting option
are not compatible with the BaseSpace file uploader. Files generated without this option
(the default setting) are compatible for upload to BaseSpace.
Table 13 General Options
Option
-h,
--help
-v,
--version

bcl2fastq2 Conversion Software v2.18 Guide

Description
Produce help message and exit
Print program version information

Run BCL Conversion and Demultiplexing

Option
--use-bases-mask

Option
-l,
--min-log-level

Description
Minimum log level
Recognized values: NONE, FATAL, ERROR, WARNING, INFO,
DEBUG, TRACE
To generate a log file for a problematic bcl2fastq2 run, use the l or --min-log-level DEBUG option.
Default: INFO

Document # 15051736 v01

The bcl2fastq2 Conversion Software provides the following output files: output directory
has the following characteristics:
} FASTQ Files
} InterOp Files
} ConversionStats File
} DemultiplexingStats File
} AdapterTrimming File
} FastqSummary and DemuxSummary
} HTML Reports
} JSON File

FASTQ Files
The bcl2fastq2 Conversion Software converts *.bcl, *.bcl.gz, and *.bcl.bgzf files into
FASTQ files, which can be used as input for secondary analysis. When there is no
sample sheet, the software generates a Undetermined_S0 FASTQ file for each lane and
read number combination.

FASTQ File Names
FASTQ files are named with the sample name and the sample number. The sample
number is a numeric assignment based on the order that the sample is listed for the run.
For example:
Data\Intensities\BaseCalls\samplename_S1_L001_R1_001.fastq.gz
} samplename—The sample name listed for the sample. If a sample name is not
provided, the file name includes the sample ID.
} S1—The sample number based on the order that samples are listed for the run
starting with 1. In this example, S1 indicates that this sample is the first sample
listed for the run.
NOTE
Reads that cannot be assigned to any sample are written to a FASTQ file for sample
number 0, and excluded from downstream analysis.

}
}
}

L001—The lane number.
R1—The read. In this example, R1 means Read 1. For a paired-end run, a file from
Read 2 includes R2 in the file name. When generated, the Index Reads are I1 or I2.
001—The last segment is always 001.

FASTQ files are compressed in the GNU zip format, as indicated by *.gz in the file name.
FASTQ files can be uncompressed using tools such as gzip (command-line) or 7-zip
(GUI).

FASTQ File Format
FASTQ file is a text-based file format that contains base calls and quality values per read.
Each record contains 4 lines:
} The identifier
} The sequence
} A plus sign (+)
} The quality scores in an ASCII encoded format

bcl2fastq2 Conversion Software v2.18 Guide

BCL Conversion Output Files

The identifier is formatted as:
@Instrument:RunID:FlowCellID:Lane:Tile:X:Y:UMI
ReadNum:FilterFlag:0:SampleNumber
Example:
@SIM:1:FCX:1:15:6329:1045 1:N:0:2
TCGCACTCAACGCCCTGCATATGACAAGACAGAATC
+
<>;##=><9=AAAAAAAAAA9#:<#<;<<

InterOp Files
You can locate the InterOp files in the directory: /InterOp. The
directory contains binary files used by the Sequencing Analysis Viewer (SAV) software to
summarize various analysis metrics, such as cluster density, intensities, quality scores,
and overall run quality.
The index metrics are stored in the IndexMetricsOut.bin file, which has the following
binary format:
Byte 0: file version (1)
Bytes
} 2
} 2
} 2
} 2

(variable length): record:
bytes: lane number (unint16)
bytes: tile number (unint16)
bytes: read number (unint16)
bytes: number of bytes Y for index name (unint16)

Document # 15051736 v01

BCL Conversion Output Files

}
}
}
}
}
}

Y bytes: index name string (string in UTF8Encoding)
4 bytes: # clusters identified as index (uint32)
2 bytes: number of bytes V for sample name (unint16)
V bytes: sample name string (string in UTF8Encoding)
2 bytes: number of bytes W for sample project (unint16)
W bytes: sample project string (string in UTF8Encoding)

ConversionStats File
You can locate the ConversionStats.xml file in the directory: /Stats/, or in the directory specified by the --stats-dir option.
The file contains the following information per tile:
} Raw Cluster Count
} Read number
} YieldQ30
} Yield
} QualityScore Sum
The file contains the following information per lane:
} Lane Number

DemultiplexingStats File
You can locate the DemultiplexingStats.xml file in the directory: /Stats/, or in the directory specified by the --stats-dir option. The file
contains the following information per lane, barcode, and sample, project.
Also, the file contains the following information for flow cell:
} Barcode Count
} PerfectBarcode Count
} OneMismatchBarcode Count

AdapterTrimming File
The AdapterTrimming file is a text-based file format that contains a statistic summary of
adapter trimming for the FASTQ file. You can locate the file in the /Stats/ or in the directory specified by the --stats-dir option.
The file contains the following information:
} Lane
} Read
} Project
} Sample ID
} Sample Name
} Sample Number
} TrimmedBases
} PercentageOfBased (being trimmed)
Also, the file contains the fraction of reads with untrimmed bases for each sample, lane,
and read number.

bcl2fastq2 Conversion Software v2.18 Guide

FastqSummaryF1L#
The FastqSummaryF1L#.txt file (the # indicates the lane number) contains the number of
raw and passed filter reads for each sample number and tile.

DemuxSummaryF1L#
The DemuxSummaryF1L#.txt file (the # indicates the lane number) contains the
percentage of each tile that each sample makes up. The file also contains a list of the
1,000 most common unknown barcode sequences.

HTML Report
The HTML reports are generated from data in the DemultiplexingStats.xml and
ConversionStats.xml files. You can locate the reports in the directory: /Reports/html/, or in the directory specified by the --reports-dir
option.
The Flowcell Summary contains the following information:
} Clusters (Raw)
} Clusters (PF)
} Yield (MBases)
NOTE
For HiSeq X, HiSeq 4000, and HiSeq 3000, the number of raw clusters is actually the
number of wells on the flow cell that could potentially be seeded. The value is the same in
all cases.

The Lane Summary provides the following information for each project, sample, and
index sequence specified in the sample sheet:
} Lane #
} Clusters (Raw)
} % of the Lane
} % Perfect Barcode
} % One Mismatch
} Clusters (Filtered)
} Yield
} % PF Clusters
} %Q30 Bases
} Mean Quality Score
The Top Unknown Barcodes table in the HTML report provides the count and sequence
for the 10 most common unmapped bar codes in each lane.

JSON File
The Java Script Object Notification (JSON) file contains the *.json file extension. The
format for the JSON file makes it easier to parse the output data. The data in the
JSON file are a combination of all the following files:
} InterOP
} ConversionStats
} DemultiplexingStats
} Adapter Trimming
} FastqSummary and DemuxSummary

Document # 15051736 v01

BCL Conversion Output Files

}

HTML Report

bcl2fastq2 Conversion Software v2.18 Guide

Troubleshooting
}

}
}

If the bcl2fastq2 Conversion Software fails to complete a run, it could be missing an
input file or have a corrupt file. View the log file for missing or corrupt files. The
exact wording of the file status reported varies depending on the nature of the file
corruption. If the problem is the BCL file, launch the --ignore-missing-bcls
option. See BCL Advanced Options.
If there is a high percentage of reads assigned as undetermined, view the Top
Unknown Barcodes table in the HTML report on the index sequence.
If the bcl2fastq2 Conversion Software has problems processing Small RNA samples,
use the --minimum-trim-read-length 20 and --mask-short-adapterreads 20 command line instead of the default settings.

Document # 15051736 v01

The bcl2fastq2 Conversion Software requires the following components:
Component

Requirements

Network
Infrastructure

1 Gigabit minimum.

Server Infrastructure

Single multiprocessor or multicore computer running Linux.

Analysis Computer

Run software on the Linux operating systems only.

Memory

32 GB RAM.

Software

We recommend the RedHat Enterprise Linux 5 platform. The
following software is required:
• zlib
• librt
• libpthread
The following software are required to build the bcl2fastq2
Conversion Software :
• gcc 4.7 (with support for C++11)
• boost 1.54
• CMake 2.8.9
• zlib
• librt
• libpthread

bcl2fastq2 Conversion Software v2.18 Guide

Appendix: Installation Requirements

Notes

Revision History

Revision History
Part #

Revision

Date

15051736

July 2015

Updated to software requirements, gcc version.

15051736

June 2015

Updated to support bcl2fastq2 v2.17.

15051736

April 2016

• Updated to support bcl2fastq2 v2.18.
• Reformatted the User Guide to Illumina style standards.
• Added JSON file and input files list for MiniSeq.
• Revised BCL2FASTQ options and sample sheet settings.

bcl2fastq2 Conversion Software v2.18 Guide

Description of Change

Notes

For technical assistance, contact Illumina Technical Support.
Table 16 Illumina General Contact Information
Website
Email

www.illumina.com
techsupport@illumina.com

Table 17 Illumina Customer Support Telephone Numbers
Region
Contact Number
Region
North America
1.800.809.4566
Japan
Australia
1.800.775.688
Netherlands
Austria
0800.296575
New Zealand
Belgium
0800.81102
Norway
China
400.635.9898
Singapore
Denmark
80882346
Spain
Finland
0800.918363
Sweden
France
0800.911850
Switzerland
Germany
0800.180.8994
Taiwan
Hong Kong
800960230
United Kingdom
Ireland
1.800.812949
Other countries
Italy
800.874909

Contact Number
0800.111.5011
0800.0223859
0800.451.650
800.16836
1.800.579.2745
900.812168
020790181
0800.563118
00806651752
0800.917.0041
+44.1799.534000

Safety data sheets (SDSs)—Available on the Illumina website at
support.illumina.com/sds.html.
Product documentation—Available for download in PDF from the Illumina website. Go
to support.illumina.com, select a product, then select Documentation & Literature.

bcl2fastq2 Conversion Software v2.18 Guide

Technical Assistance

Illumina
5200 Illumina Way
San Diego, California 92122 U.S.A.
+1.800.809.ILMN (4566)
+1.858.202.4566 (outside North America)
techsupport@illumina.com
www.illumina.com

Source Exif Data:

File Type                       : PDF
File Type Extension             : pdf
MIME Type                       : application/pdf
PDF Version                     : 1.4
Linearized                      : Yes
Author                          : Illumina
Create Date                     : 2016:03:24 10:26:24-07:00
Modify Date                     : 2016:03:24 10:27:04-07:00
Subject                         : Instructions for running the bcl2fastq2 Conversion Software v2.18
Language                        : en-us
XMP Toolkit                     : Adobe XMP Core 5.4-c005 78.147326, 2012/08/23-13:03:03
Format                          : application/pdf
Creator                         : Illumina
Description                     : Instructions for running the bcl2fastq2 Conversion Software v2.18
Title                           : bcl2fastq2 Conversion Software v2.18 User Guide
Metadata Date                   : 2016:03:24 10:27:04-07:00
Keywords                        : 
Producer                        : madbuild
Document ID                     : uuid:7298a325-01e8-47e4-ab66-3255468a1454
Instance ID                     : uuid:c90add96-792a-461e-9b6e-5eb34effaa19
Page Mode                       : UseOutlines
Page Count                      : 34

EXIF Metadata provided by EXIF.tools

Bcl2fastq2 Conversion Software V2.18 User Guide V2 18 15051736 01

Navigation menu

Versions of this User Manual:

Views

Navigation