Main Content

genbankread

Read data from GenBank file

Syntax

GenBankData = genbankread(File)
GenBankData = genbankread(File, 'TimeOut', TimeOutValue)

Arguments

File

Either of the following:

  • Character vector or string specifying a file name, a path and file name, or a URL pointing to a file. The referenced file is a GenBank-formatted file (ASCII text file). If you specify only a file name, that file must be on the MATLAB® search path or in the MATLAB Current Folder.

  • MATLAB character array or string vector that contains the text of a GenBank-formatted file.

Tip

You can use the getgenbank function with the 'ToFile' property to retrieve sequence information from the GenBank® database and create an GenBank-formatted file.

TimeOutValueConnection timeout in seconds, specified as a positive scalar. The default value is 5. For details, see here.
GenBankData MATLAB structure or array of structures containing fields corresponding to GenBank keywords.

Description

GenBankData = genbankread(File) reads a GenBank-formatted file, File, and creates GenBankData, a structure or array of structures, containing fields corresponding to the GenBank keywords. When File contains multiple entries, each entry is stored as a separate element in GenBankData. For a list of the GenBank keywords, see https://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html.

GenBankData = genbankread(File, 'TimeOut', TimeOutValue) sets the connection timeout (in seconds) to read data from a remote file or URL.

Examples

  1. Retrieve sequence information for the HEXA gene, store the data in a file, and then read into the MATLAB software.

    getgenbank('nm_000520', 'ToFile', 'TaySachs_Gene.txt')
    s = genbankread('TaySachs_Gene.txt')
    
    s = 
    
                    LocusName: 'NM_000520'
          LocusSequenceLength: '2437'
         LocusNumberofStrands: ''
                LocusTopology: 'linear'
            LocusMoleculeType: 'mRNA'
         LocusGenBankDivision: 'PRI'
        LocusModificationDate: '18-FEB-2009'
                   Definition: [1x63 char]
                    Accession: 'NM_000520'
                      Version: 'NM_000520.4'
                           GI: '189181665'
                      Project: []
                       DBLink: []
                     Keywords: []
                      Segment: []
                       Source: 'Homo sapiens (human)'
               SourceOrganism: [4x65 char]
                    Reference: {1x10 cell}
                      Comment: [32x67 char]
                     Features: [147x74 char]
                          CDS: [1x1 struct]
                     Sequence: [1x2437 char]
  2. Display the source organism for this sequence.

    s.SourceOrganism
    
    ans =
    
    Homo sapiens                                                     
    Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi;
    Mammalia; Eutheria; Euarchontoglires; Primates; Haplorrhini;     
    Catarrhini; Hominidae; Homo.

Version History

Introduced before R2006a