Main Content

baminfo

Return information about BAM file

    Description

    example

    InfoStruct = baminfo(File) returns a MATLAB® structure containing summary information about a BAM-formatted file.

    Use baminfo to investigate the size and content of a BAM-formatted file, including reference sequence names, before using the bamread function to read the file contents into a MATLAB structure.

    example

    InfoStruct = baminfo(File,Name,Value) specifies additional options using one or more name-value arguments. For example, to return the number of alignment records, InfoStruct = baminfo(File,NumOfReads=true).

    Examples

    collapse all

    This example shows how to retrieve information about the ex1.bam file included with the Bioinformatics Toolbox™.

    info = baminfo('ex1.bam','ScanDictionary',true,'numofreads',true)
    info = struct with fields:
                      Filename: 'ex1.bam'
                      FilePath: '/mathworks/devel/bat/Bdoc23b/build/matlab/toolbox/bioinfo/bioinfodata'
                      FileSize: 126692
                   FileModDate: '07-May-2010 16:12:05'
                        Header: [1x1 struct]
                     ReadGroup: [1x2 struct]
            SequenceDictionary: [1x2 struct]
                      NumReads: 3307
             ScannedDictionary: {2x1 cell}
        ScannedDictionaryCount: [2x1 uint64]
    
    

    List the number of references found in the BAM file.

    numel(info.ScannedDictionary)
    ans = 2
    

    Alternatively, you can use the available header information from a BAM file to find out the number of references, thus avoiding the whole traversal of the source file.

    info = baminfo('ex1.bam'); 
    NRefs = numel(info.SequenceDictionary)
    NRefs = 2
    

    Input Arguments

    collapse all

    Path to BAM-formatted file, specified as a character vector or string specifying a file name or path and file name of a file. If you specify only a file name, that file must be on the MATLAB search path or in the current folder.

    Example: "C:\Documents\bamfile.bam"

    Data Types: char | string

    Name-Value Arguments

    Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

    Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

    Example: InfoStruct = baminfo("ex1.bam",ScanDictionary=true,NumOfReads=true)

    Names are case-insensitive. For example, you can use "numofreads" instead of "NumOfReads".

    Indication to determine the number of alignment records in the file, specified as false or true. If true, the NumReads field of InfoStruct in contains this information.

    Example: true

    Data Types: logical

    Indication to determine the reference names and the number of reads aligned to each reference.File, specified as false (do not determine) or true. If true, the ScannedDictionary and ScannedDictionaryCount fields of InfoStruct contain this information.

    Example: true

    Data Types: logical

    Output Arguments

    collapse all

    Summary information about a BAM-formatted file, returned as a MATLAB structure. The structure contains these fields.

    FieldDescription
    FilenameName of the BAM-formatted file.
    FilePathPath to the file.
    FileSizeSize of the file in bytes.
    FileModDateModification date of the file.
    Header**Structure containing the file format version, sort order, and group order.
    ReadGroup**

    Structure containing the:

    • Read group identifier

    • Sample

    • Library

    • Description

    • Platform unit

    • Predicted median insert size

    • Sequencing center

    • Date

    • Platform

    SequenceDictionary**

    Structure containing the:

    • Sequence name

    • Sequence length

    • Genome assembly identifier

    • MD5 checksum of sequence

    • URI of sequence

    • Species

    Program**

    Structure containing the:

    • Program name

    • Version

    • Command line

    NumReadsNumber of reference sequences in the BAM-formatted file.
    ScannedDictionary*Cell array of character vectors specifying the names of the reference sequences in the BAM-formatted file.
    ScannedDictionaryCount*Cell array specifying the number of reads aligned to each reference sequence.

    * — The ScannedDictionary and ScannedDictionaryCount fields are empty if you do not set the ScanDictionary name-value pair argument to true.

    ** — These structures and their fields appear in the output structure only if they are in the BAM file. The information in these structures depends on the information in the BAM file.

    Version History

    Introduced in R2010b