Main Content

getIndex

Class: BioMap

Return indices of read sequences aligned to reference sequence in BioMap object

Syntax

Indices = getIndex(BioObj, StartPos, EndPos)
Indices = getIndex(BioObj, StartPos, EndPos, R)
Indices = getIndex(..., Name,Value)

Description

Indices = getIndex(BioObj, StartPos, EndPos) returns Indices, a column vector of indices specifying the read sequences that align to a range or set of ranges in the reference sequence in BioObj, a BioMap object. The range or set of ranges are defined by StartPos and EndPos. StartPos and EndPos can be two nonnegative integers such that StartPos is less than EndPos, and both integers are smaller than the length of the reference sequence. StartPos and EndPos can also be two column vectors representing a set of ranges (overlapping or segmented).

getIndex includes each read only once. Therefore, if a read spans multiple ranges, the index for that read appears only once.

Indices = getIndex(BioObj, StartPos, EndPos, R) selects the reference associated with the range specified by StartPos and EndPos.

Indices = getIndex(..., Name,Value) returns indices with additional options specified by one or more Name,Value pair arguments.

Input Arguments

BioObj

Object of the BioMap class.

StartPos

Either of the following:

  • Nonnegative integer that defines the start of a range in the reference sequence. StartPos must be less than EndPos, and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the start of a range in the reference sequence.

EndPos

Either of the following:

  • Nonnegative integer that defines the end of a range in the reference sequence. EndPos must be greater than StartPos, and smaller than the total length of the reference sequence.

  • Column vector of nonnegative integers, each defining the end of a range in the reference sequence.

R

Positive integer indexing the SequenceDictionary property of BioObj, or a character vector or string specifying the actual name of the reference.

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Overlap

Specifies the minimum number of base positions that a read must overlap in a range or set of ranges, to be included. This value can be any of the following:

  • Positive integer

  • 'full' — A read must be fully contained in a range or set of ranges to be counted.

  • 'start' — A read's start position must lie within a range or set of ranges to be counted.

Default: 1

Depth

Specifies to decimate the output indices. The coverage depth at any base position is less than or equal to Depth, a positive integer.

Default: Inf

Spliced

Logical specifying whether short reads are spliced during mapping (as in mRNA-to-genome mapping). N symbols in the Signature property of the object are not counted.

Default: false

Output Arguments

Indices

Column vector of indices specifying the reads that align to a range or set of ranges in the specified reference sequence in BioObj, a BioMap object.

Examples

Construct a BioMap object, and then use the indices of the reads to retrieve the start and stop positions for the reads that are fully contained in the first 50 positions of the reference sequence:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the indices of reads that are fully contained in the
% first 50 positions of the reference sequence
indices = getIndex(BMObj1, 1, 50, 'overlap', 'full');
% Use these indices to return the start and stop positions of
% the reads 
starts = getStart(BMObj1, indices)
stops = getStop(BMObj1, indices)
starts =

           1
           3
           5
           6
           9
          13
          13
          15

stops =

          36
          37
          39
          41
          43
          47
          48
          49

Construct a BioMap object, and then use the indices of the reads to retrieve the sequences for the reads whose alignments overlap a segmented range by at least one base pair:

% Construct a BioMap object from a SAM file 
BMObj1 = BioMap('ex1.sam');
% Return the indices of the reads that overlap the
% segmented range 98:100 and 198:200, by at least 1 base pair
indices = getIndex(BMObj1, [98;198], [100;200], 'overlap', 1);
% Use these indices to return the sequences of the reads
sequences = getSequence(BMObj1, indices);

Tips

Use the Indices output from the getIndex method as input to other BioMap methods. Doing so lets you retrieve other information about the reads in the range, such as header, start position, mapping quality, sequences, etc.