Main Content

saminfo

Return information about SAM file

Syntax

InfoStruct = saminfo(File)
InfoStruct = saminfo(File,Name,Value)

Description

InfoStruct = saminfo(File) returns a MATLAB® structure containing summary information about a SAM-formatted file.

InfoStruct = saminfo(File,Name,Value) returns a MATLAB structure with additional options specified by one or more Name,Value pair arguments.

Input Arguments

File

Character vector or string specifying a file name or path and file name of a SAM-formatted file. If you specify only a file name, that file must be on the MATLAB search path or in the Current Folder.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

NumOfReads

Logical that controls the inclusion of a NumReads field in InfoStruct, the output structure.

Note

Setting NumOfReads to true can significantly increase the time to create the output structure.

Default: false

ScanDictionary

Logical that controls the scanning of the SAM-formatted file to determine the reference names and the number of reads aligned to each reference. If true, the ScannedDictionary and ScannedDictionaryCount fields contain this information.

Default: false

Output Arguments

InfoStruct

MATLAB structure containing summary information about a SAM-formatted file. The structure contains these fields.

FieldDescription
FilenameName of the SAM-formatted file.
FilePathPath to the file.
FileSizeSize of the file in bytes.
FileModDateModification date of the file.
NumReads*Number of sequence reads in the file.
ScannedDictionary*Cell array of character vectors specifying the names of the reference sequences in the SAM-formatted file.
ScannedDictionaryCount*Cell array specifying the number of reads aligned to each reference sequence.
Header**Structure containing the file format version, sort order, and group order.
SequenceDictionary**

Structure containing the:

  • Sequence name

  • Sequence length

  • Genome assembly identifier

  • MD5 checksum of sequence

  • URI of sequence

  • Species

ReadGroup**

Structure containing the:

  • Read group identifier

  • Sample

  • Library

  • Description

  • Platform unit

  • Predicted median insert size

  • Sequencing center

  • Date

  • Platform

Program**

Structure containing the:

  • Program name

  • Version

  • Command line

* — The NumReads field is empty if you do not set the NumOfReads name-value pair argument to true. The ScannedDictionary and ScannedDictionaryCount fields are empty if you do not set the ScanDictionary name-value pair argument to true.

** — These structures and their fields appear in the output structure only if they are in the SAM file. The information in these structures depends on the information in the SAM file.

Examples

Return information about the ex1.sam file included with Bioinformatics Toolbox™:

info = saminfo('ex1.sam')
info = 

                  Filename: 'ex1.sam'
                  FilePath: [1x89 char]
                  FileSize: 254270
               FileModDate: '12-May-2011 14:23:25'
                    Header: [1x1 struct]
        SequenceDictionary: [1x1 struct]
                 ReadGroup: [1x2 struct]
                  NumReads: []
         ScannedDictionary: {0x1 cell}
    ScannedDictionaryCount: [0x1 uint64]

Return information about the ex1.sam file including the number of sequence reads:

info = saminfo('ex1.sam','numofreads', true)
info = 

                  Filename: 'ex1.sam'
                  FilePath: [1x89 char]
                  FileSize: 254270
               FileModDate: '12-May-2011 14:23:25'
                    Header: [1x1 struct]
        SequenceDictionary: [1x1 struct]
                 ReadGroup: [1x2 struct]
                  NumReads: 1501
         ScannedDictionary: {0x1 cell}
    ScannedDictionaryCount: [0x1 uint64]

Tips

Use saminfo to investigate the size and content of a SAM file before using the samread function to read the file contents into a MATLAB structure.

Version History

Introduced in R2010a