Main Content

fastqread

Read data from FASTQ file

Syntax

FASTQStruct = fastqread(File)
[Header, Sequence] = fastqread(File)
[Header, Sequence, Qual] = fastqread(File)
fastqread(..., 'Blockread', BlockreadValue, ...)
fastqread(..., 'HeaderOnly', HeaderOnlyValue, ...)
fastqread(..., 'TrimHeaders', TrimHeadersValue, ...)

Description

FASTQStruct = fastqread(File) reads a FASTQ-formatted file and returns the data in a MATLAB® array of structures.

[Header, Sequence] = fastqread(File) returns only the header and sequence data in two separate variables.

[Header, Sequence, Qual] = fastqread(File) returns the data in three separate variables.

fastqread(..., 'PropertyName', PropertyValue, ...) calls fastqread with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Enclose each PropertyName in single quotation marks. Each PropertyName is case insensitive. These property name/property value pairs are as follows:

fastqread(..., 'Blockread', BlockreadValue, ...) reads a single sequence entry or block of sequence entries from a FASTQ-formatted file containing multiple sequences.

fastqread(..., 'HeaderOnly', HeaderOnlyValue, ...) specifies whether to return only the header information.

fastqread(..., 'TrimHeaders', TrimHeadersValue, ...) specifies whether to trim the header to the first white space.

Input Arguments

File

Either of the following:

  • Character vector or string specifying a file name or path and file name of a FASTQ-formatted file. If you specify only a file name, that file must be on the MATLAB search path or in the MATLAB Current Folder.

  • MATLAB character array that contains the text of a FASTQ-formatted file.

BlockreadValue

Scalar or vector that controls the reading of a single sequence entry or block of sequence entries from a FASTQ-formatted file containing multiple sequences. Enter a scalar N to read the Nth entry in the file. Enter a 1-by-2 vector [M1, M2] to read a block of entries starting at the M1 entry and ending at the M2 entry. To read all remaining entries in the file starting at the M1 entry, enter a positive value for M1 and enter Inf for M2.

HeaderOnlyValue

Specifies whether to return only the header information. Choices are true or false (default).

TrimHeadersValue

Specifies whether to trim the header after the first white space character. White space characters include a space (char(32)) and a tab (char(9)). Choices are true or false (default).

Output Arguments

FASTQStruct

Array of structures containing information from a FASTQ-formatted file. There is one structure for each sequence read or entry in the file. Each structure contains the following fields.

FieldDescription
HeaderHeader information.
SequenceSingle letter-code representation of a nucleotide sequence.
QualityASCII representation of per-base quality scores for a nucleotide sequence.

Header

Variable containing header information or, if the FASTQ-formatted file contains multiple sequences, a cell array containing header information.

Sequence

Variable containing sequence information or, if the FASTQ-formatted file contains multiple sequences, a cell array containing sequence information.

Qual

Variable containing quality information or, if the FASTQ-formatted file contains multiple sequences, a cell array containing quality information.

Examples

Read a FASTQ file into an array of structures:

% Read the contents of a FASTQ-formatted file into
% an array of structures
reads = fastqread('SRR005164_1_50.fastq')

reads = 

1x50 struct array with fields:
    Header
    Sequence
    Quality

Read a FASTQ file into three separate variables:

% Read the contents of a FASTQ-formatted file into 
% three separate variables
[headers,seqs,quals] = fastqread('SRR005164_1_50.fastq');

Read a block of entries from a FASTQ file:

% Read the contents of reads 5 through 10 into
% an array of structures
reads_5_10 = fastqread('SRR005164_1_50.fastq', 'blockread', [5 10])

1x6 struct array with fields:
    Header
    Sequence
    Quality

More About

collapse all

FASTQ-file Format

A FASTQ-formatted file contains nucleotide sequence and quality information on four lines:

  • Line 1 — Header information prefixed with an @ symbol

  • Line 2 — Nucleotide sequence

  • Line 3 — Header information prefixed with a + symbol

  • Line 4 — ASCII representation of per-base quality scores for the nucleotide sequence using Phred or Solexa encoding

Version History

Introduced in R2009b