Main Content

SRASAMDumpOptions

Option set for srasamdump

Since R2024a

Description

An SRASAMDumpOptions object contains options for the srasamdump function, which you use to download the files from SRA (Sequence Read Archive) [1].

Creation

Description

example

sraOpt = SRASAMDumpOptions creates an SRASAMDumpOptions object with default property values.

SRASAMDumpOptions requires the SRA Toolkit for Bioinformatics Toolbox™. If this support package is not installed, then the function provides a download link. For details, see Bioinformatics Toolbox Software Support Packages.

example

sraOpt = SRASAMDumpOptions(Name=Value) sets the object properties using one or more name-value arguments. For example, when creating the SRASAMDumpOptions object, specify BZip2=true to set the value of the BZip2 property to true, so that the output files are compressed using bzip2.

sraOpt = SRASAMDumpOptions(S) specifies optional parameters using a string scalar or character vector S.

Input Arguments

expand all

srasamdump options, specified as a character vector or string scalar. S must be in the original sam-dump option syntax (prefixed by one or two dashes).

Example: '--aligned-region chr20:2500000-2600000'

Data Types: char | string

Properties

expand all

Flag to compress the output files using bzip2, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

Additional commands, specified as a character vector or string scalar.

The commands must be in the native syntax (prefixed by one or two dashes). Use this option to apply undocumented flags and flags without corresponding MATLAB® properties.

Example: ExtraCommand="--aligned-region chr20:2500000-2600000"

Data Types: char | string

Flag to produce FASTA-formatted output files, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

Flag to produce FASTQ-formatted output files, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

Flag to compress the output files using gzip, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

Flag to use '=' in the output if a base is identical to the reference, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

Flag to include all object properties with corresponding default values when converting properties to the original option syntax, specified as a numeric or logical 1 (true) or 0 (false). You can convert properties to the original syntax prefixed by one or two dashes (such as '--aligned-region chr20:2500000-2600000') by using the getCommand function.

When IncludeAll=false and you call getCommand(optionsObject), the software converts only the specified properties. If the value is true, getCommand converts all available properties, using default values for unspecified properties, to the original syntax.

Note

If you set IncludeAll to true, the software translates all available properties, with default values for unspecified properties. The only exception is that when the default value of a property is NaN, Inf, [], '', or "", then the software does not translate the corresponding property.

Data Types: logical

Minimum mapping quality required for an alignment to be included in the output, specified as a nonnegative scalar.

Data Types: double

Output filename, specified as a character vector or string scalar.

Data Types: char | string

Flag to output primary alignments only, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

Flag to output the unaligned reads with the aligned reads, specified as a numeric or logical 1 (true) or 0 (false).

Data Types: double | logical

This property is read-only.

Supported version of the original sam-dump software, returned as a string scalar.

Data Types: string

Object Functions

getCommandTranslate object properties to original options syntax
getOptionsTableReturn table with all properties and equivalent options in original syntax

Examples

collapse all

Download some paired-end sequencing data in a FASTQ format using an accession run number SRR11846824 that has two reads per spot and has no unaligned reads. Downloading the data may take a few minutes.

tbl = srafasterqdump("SRR11846824")
tbl=1×2 table
                          Reads_1                  Reads_2       
                   _____________________    _____________________

    SRR11846824    "SRR11846824_1.fastq"    "SRR11846824_2.fastq"

By default, the function uses the SplitType="SplitThree" option and downloads only biological reads. Specifically, the function splits spots into reads. For spots having two reads, the function produces *_1.fastq and *_2.fastq, represented by the Reads_1 and Reads_2 columns. If there are any unaligned reads, the function saves unaligned reads in a *.fastq file, which would be represented by the Reads column. Because there are no unaligned reads within this accession, the function did not produce a *.fastq file, and the output table has no Reads column. For details, see SplitType.

You can also specify other download options using SRAFasterqDumpOptions. For instance, use FastaOutput=true to get the FASTA-formatted file.

sraopt = SRAFasterqDumpOptions;
sraopt.FastaOutput = true;
tbl2 = srafasterqdump("SRR11846824",sraopt);

Alternatively, you can specify the options as name-value arguments instead of using the options object.

tbl2 = srafasterqdump("SRR11846824",FastaOutput=true);

You can also download the data in a SAM format using srasamdump.

samFile = srasamdump("SRR11846824")
samFile = 
"SRR11846824.sam"

Specify the download options using an SRASAMDumpOptions object. For instance, specify the output file name and compress the output file using bzip2.

samdumpopt = SRASAMDumpOptions;
samdumpopt.OutputFileName = "SRR11846824.sam.bz2";
samdumpopt.BZip2 = true
samdumpopt = 
  SRASAMDumpOptions with properties:

   Default properties:
       ExtraCommand: ""
        FastaOutput: 0
        FastqOutput: 0
               GZip: 0
      HideIdentical: 0
         IncludeAll: 0
      MinMapQuality: 0
      OutputPrimary: 0
    OutputUnaligned: 0
            Version: "3.0.6"

   Modified properties:
     OutputFileName: "SRR11846824.sam.bz2"
              BZip2: 1

bzFile = srasamdump("SRR11846824",samdumpopt)
bzFile = 
"SRR11846824.sam.bz2"

After downloading the SAM file, you can use it for downstream analyses. For instance, you can use bowtie2 to map the reads to the reference sequence.

First, download the C. elegans reference sequence.

celegans_refseq = fastaread("https://s3.amazonaws.com/igv.broadinstitute.org/genomes/seq/ce11/ce11.fa");

Save Chromosome 3 reference data in a FASTA file.

celegans_chr3   = celegans_refseq(3).Sequence;
warnState = warning;
warning('off','Bioinfo:fastawrite:AppendToFile'); 
fastawrite("celegans_chr3.fa",celegans_chr3);
warning(warnState);

Build a set of index files using bowtie2build. The status value of 0 means that the build was successful.

status = bowtie2build("celegans_chr3.fa","celegans_chr3_index");

Align read data to the reference. This may take a few minutes.

bowtie2("celegans_chr3_index","SRR11846824_1.fastq","SRR11846824_2.fastq","SRR11846824_mapped.sam");

Create a quality control plot for the SAM file. Note that, for this particular experiment, most of the reads happen to have the same quality score of 30.

seqqcplot("SRR11846824_mapped.sam");

Figure SRR11846824_mapped.sam contains 5 axes objects and another object of type annotationpane. Axes object 1 with title Quality Boxplot, xlabel Base Position, ylabel Quality Score contains 505 objects of type line. Axes object 2 with title Base Composition, xlabel Base Position, ylabel Reads (%) contains 5 objects of type bar. These objects represent A, C, G, T, Other. Axes object 3 with title Quality Distribution, xlabel Average Quality, ylabel Reads (%) contains an object of type bar. Axes object 4 with title GC Distribution, xlabel % GC-Content, ylabel Reads (%) contains an object of type bar. Axes object 5 with title Length Distribution, xlabel Length, ylabel Reads (%) contains an object of type bar.

Convert the SAM file to a BAM file. Suppress two informational warnings that are issued while creating a BioMap object.

w = warning;
warning("off","bioinfo:BioMap:BioMap:UnsortedReadsInSAMFile");
warning("off","bioinfo:saminfo:InvalidTagField");
bmObj = BioMap("SRR11846824_mapped.sam");
write(bmObj,"SRR11846824_mapped.bam",Format="BAM");
warning(w);

Visualize the alignment data in the Genomics Viewer app. The corresponding cytoband file is provided with the toolbox.

gv = genomicsViewer(ReferenceFile="celegans_chr3.fa",CytoBand="celegans_cytoBandIdeo.txt.gz");
addTracks(gv,"SRR11846824_mapped.bam");

Use the zoom slider to zoom in and see the features. Or you can enter the following in the search text box: Generated:3,711,861-3,711,940.

You may delete the downloaded files, such as the reference sequence file.

delete celegans_chr3.fa

Close the app.

close(gv);

References

Version History

Introduced in R2024a