seqlogo

Display sequence logo for nucleotide or amino acid sequences

Syntax

seqlogo(Seqs) seqlogo(Profile) WgtMatrix = seqlogo(...) [WgtMatrix, Handle] = seqlogo(...) seqlogo(..., 'Displaylogo', DisplaylogoValue, ...) seqlogo(..., 'Alphabet', AlphabetValue, ...) seqlogo(..., 'Startat', StartatValue, ...) seqlogo(..., 'Endat', EndatValue, ...) seqlogo(..., 'SSCorrection', SSCorrectionValue, ...)

Input Arguments

`Seqs`	Set of pairwise or multiply aligned nucleotide or amino acid sequences, represented by any of the following: Character array Cell array of character vectors String vector Array of structures containing a `Sequence` field
`Profile`	Sequence profile distribution matrix with the frequency of nucleotides or amino acids for every column in the multiple alignment, such as returned by the `seqprofile` function. The size of the frequency distribution matrix is: For nucleotides — `[4 x sequence length]` For amino acids — `[20 x sequence length]` If gaps were included, `Profile` may have `5` rows (for nucleotides) or `21` rows (for amino acids), but `seqlogo` ignores gaps.
`DisplaylogoValue`	Controls the display of a sequence logo. Choices are `true` (default) or `false`.
`AlphabetValue`	Character vector or string specifying the type of sequence (nucleotide or amino acid). Choices are `'NT'` (default) or`'AA'`.
`StartatValue`	Positive integer that specifies the starting position for the sequences in `Seqs`. Default starting position is `1`.
`EndatValue`	Positive integer that specifies the ending position for the sequences in `Seqs`. Default ending position is the maximum length of the sequences in `Seqs`.
`SSCorrectionValue`	Controls the use of small sample correction in the estimation of the number of bits. Choices are `true` (default) or `false`.

Output Arguments

`WgtMatrix`	Cell array containing the symbol list in `Seqs` or `Profile` and the weight matrix used to graphically display the sequence logo.
`Handle`	Handle to the sequence logo figure.

Description

seqlogo(Seqs) displays a sequence logo for Seqs, a set of aligned sequences. The logo graphically displays the sequence conservation at a particular position in the alignment of sequences, measured in bits. The maximum sequence conservation per site is log2(4) bits for nucleotide sequences and log2(20) bits for amino acid sequences. If the sequence conservation value is zero or negative, no logo is displayed in that position.

seqlogo(Profile) displays a sequence logo for Profile, a sequence profile distribution matrix with the frequency of nucleotides or amino acids for every column in the multiple alignment, such as returned by the seqprofile function.

Color Code for Nucleotides

Nucleotide	Color
`A`	Green
`C`	Blue
`G`	Yellow
`T`, `U`	Red
Other	Purple

Color Code for Amino Acids

Amino Acid	Chemical Property	Color
`G S T Y C Q N`	Polar	Green
`A V L I P W F M`	Hydrophobic	Orange
`D E`	Acidic	Red
`K R H`	Basic	Blue
Other	—	Tan

WgtMatrix = seqlogo(...) returns a cell array of unique symbols in the sequence Seqs or Profile, and the information weight matrix used to graphically display the logo.

[WgtMatrix, Handle] = seqlogo(...) returns a handle to the sequence logo figure.

seqlogo(Seqs, ...'PropertyName', PropertyValue, ...) calls seqpdist with optional properties that use property name/property value pairs. You can specify one or more properties in any order. Each PropertyName must be enclosed in single quotation marks and is case insensitive. These property name/property value pairs are as follows:

seqlogo(..., 'Displaylogo', DisplaylogoValue, ...) controls the display of a sequence logo. Choices are true (default) or false.

seqlogo(..., 'Alphabet', AlphabetValue, ...) specifies the type of sequence (nucleotide or amino acid). Choices are 'NT' (default) or'AA'.

Note

If you provide amino acid sequences to seqlogo, you must set Alphabet to 'AA'.

seqlogo(..., 'Startat', StartatValue, ...) specifies the starting position for the sequences in Seqs. Default starting position is 1.

seqlogo(..., 'Endat', EndatValue, ...) specifies the ending position for the sequences in Seqs. Default ending position is the maximum length of the sequences in Seqs.

seqlogo(..., 'SSCorrection', SSCorrectionValue, ...) controls the use of small sample correction in the estimation of the number of bits. Choices are true (default) or false.

Note

A simple calculation of bits tends to overestimate the conservation at a particular location. To compensate for this overestimation, when SSCorrection is set to true, a rough estimate is applied as an approximate correction. This correction works better when the number of sequences is greater than 50.

Examples

collapse all

Display a Sequence Logo for Aligned Nucleotide Sequences

Open Script

This example shows how to display a sequence logo for a set of aligned nucleotide sequences.

Create a series of aligned nucleotide sequences.

S = {'ATTATAGCAAACTA',...
     'AACATGCCAAAGTA',...
     'ATCATGCAAAAGGA'}

S =

  1x3 cell array

    {'ATTATAGCAAACTA'}    {'AACATGCCAAAGTA'}    {'ATCATGCAAAAGGA'}

Display the sequence logo.

seqlogo(S)

Display a Sequence Logo for Aligned Amino Acid Sequences

Open Script

This example shows how to display a sequence logo for a set of aligned amino acid sequences.

Create a series of aligned amino acid sequences.

S2 = {'LSGGQRQRVAIARALAL',...
      'LSGGEKQRVAIARALMN',...
      'LSGGQIQRVLLARALAA',...
      'LSGGERRRLEIACVLAL',...
      'FSGGEKKKNELWQMLAL',...
      'LSGGERRRLEIACVLAL'};

Display the sequence logo, specifying an amino acid sequence and limiting the logo to sequence positions 2 through 10.

seqlogo(S2, 'alphabet', 'aa', 'startAt', 2, 'endAt', 10)

References

[1] Schneider, T.D., and Stephens, R.M. (1990). Sequence Logos: A new way to display consensus sequences. Nucleic Acids Research 18, 6097–6100.

Version History

Introduced before R2006a