Main Content

ocr

Recognize text using optical character recognition

Description

example

txt = ocr(I) returns an ocrText object that contains optical character recognition (OCR) information from the input image I. The object contains recognized characters, words, text lines, the locations of recognized words, and a metric indicating the confidence of each recognition result.

example

txt = ocr(I,roi) recognizes text in I within one or more rectangular regions.

txt = ocr(ds) returns a cell array of ocrText objects that contain the recognition results for the ROIs specified within the datastore for the corresponding image. Use this syntax to evaluate OCR results on a collection of images.

example

[___] = ocr(___,Name=Value) specifies options using one or more name-value arguments in addition to any combination of arguments from previous syntaxes. For example, LayoutAnalysis="page" treats the image as a page containing blocks of text.

Examples

collapse all

Load an image with text into workspace.

businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard)
ocrResults = 
  ocrText with properties:

                      Text: '4 MathWorks:...'
    CharacterBoundingBoxes: [107x4 double]
      CharacterConfidences: [107x1 single]
                     Words: {16x1 cell}
         WordBoundingBoxes: [16x4 double]
           WordConfidences: [16x1 single]
                 TextLines: {8x1 cell}
     TextLineBoundingBoxes: [8x4 double]
       TextLineConfidences: [8x1 single]

     recognizedText = ocrResults.Text;    
     figure
     imshow(businessCard)
     text(600,150,recognizedText,BackgroundColor=[1 1 1]);

Figure contains an axes object. The axes object contains 2 objects of type image, text.

Read an image into the workspace.

I = imread("handicapSign.jpg");

Define one or more rectangular regions of interest in which to recognize text within input image.

roi = [370 246 363 423];

Alternatively, you can use drawrectangle to select a region using a mouse.

For example,

figure;imshow(I)

roi = round(getPosition(drawrectangle))

Recognize text within the ROI.

ocrResults = ocr(I,roi);

Insert the recognized text into the original image. Display the image with the inserted recognized text.

Iocr = insertText(I,roi(1:2),ocrResults.Text,AnchorPoint="RightTop",FontSize=16);
figure
imshow(Iocr)

Figure contains an axes object. The axes object contains an object of type image.

Read an image containing a seven-segment display into the workspace.

I = imread("sevSegDisp.jpg");

Specify the ROI that contains the seven-segment display.

roi = [506 725 1418 626];

To recognize the digits from the seven-segment display, specify the Model argument as "seven-segment".

ocrResults = ocr(I,roi,Model="seven-segment");

Display the recognized digits and detection confidence.

fprintf("Recognized seven-segment digits: ""%s""\nDetection confidence: %0.4f",cell2mat(ocrResults.Words),ocrResults.WordConfidences)
Recognized seven-segment digits: "5405.9"
Detection confidence: 0.7948

Insert the recognized digits into the image.

Iocr = insertObjectAnnotation(I,"rectangle", ...
            ocrResults.WordBoundingBoxes,ocrResults.Words,LineWidth=5,FontSize=72);
figure
imshow(Iocr)

Figure contains an axes object. The axes object contains an object of type image.

Read an image containing text into the workspace.

businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard)
ocrResults = 
  ocrText with properties:

                      Text: '4 MathWorks:...'
    CharacterBoundingBoxes: [107x4 double]
      CharacterConfidences: [107x1 single]
                     Words: {16x1 cell}
         WordBoundingBoxes: [16x4 double]
           WordConfidences: [16x1 single]
                 TextLines: {8x1 cell}
     TextLineBoundingBoxes: [8x4 double]
       TextLineConfidences: [8x1 single]

Iocr = insertObjectAnnotation(businessCard,"rectangle", ...
                           ocrResults.WordBoundingBoxes, ...
                           ocrResults.WordConfidences);
figure
imshow(Iocr)

Figure contains an axes object. The axes object contains an object of type image.

Load an image containing text into the workspace.

businessCard = imread("businessCard.png");
ocrResults = ocr(businessCard);
bboxes = locateText(ocrResults,"Math",IgnoreCase=true);
Iocr = insertShape(businessCard,"FilledRectangle",bboxes);
figure
imshow(Iocr)

Figure contains an axes object. The axes object contains an object of type image.

This example shows how to evaluate the accuracy of an OCR model that can recognize seven-segment numerals on a dataset. The evaluation dataset contain images of energy meter displays that have seven-segment numerals in them.

Download and extract dataset.

datasetURL = "https://ssd.mathworks.com/supportfiles/vision/data/7SegmentImages.zip";
datasetZip = "7SegmentImages.zip";
if ~exist(datasetZip,"file")
    disp("Downloading evaluation dataset (" + datasetZip + " - 96 MB) ...");
    websave(datasetZip,datasetURL);
end

datasetFiles = unzip(datasetZip);

Load the evaluation ground truth.

ld = load("7SegmentGtruth.mat");
gTruth = ld.gTruth;

Create datastores that contain images, bounding boxes and text labels from the groundTruth object using the ocrTrainingData function with the label and attribute names used during labeling.

labelName = "Text";
attributeName = "Digits";
[imds,boxds,txtds] = ocrTrainingData(gTruth,labelName,attributeName);

Combine the datastores.

cds = combine(imds,boxds,txtds);

Run OCR on the evaluation dataset.

results = ocr(cds, Model="seven-segment");

Evaluate the OCR results against the ground truth.

metrics = evaluateOCR(results,cds);
Evaluating ocr results
----------------------
* Selected metrics: character error rate, word error rate.
* Processed 119 images.
* Finalizing... Done.
* Data set metrics:

    CharacterErrorRate    WordErrorRate
    __________________    _____________

         0.082195            0.19958   

Display accuracy of the OCR model.

modelAccuracy = 100*(1-metrics.DataSetMetrics.CharacterErrorRate);
disp("Accuracy of the OCR model= " + modelAccuracy + "%")
Accuracy of the OCR model= 91.7805%

Input Arguments

collapse all

Input image, specified as an M-by-N-by-3 truecolor image, M-by-N grayscale, or M-by-N binary image. The input image must consist of real, nonsparse values. The function converts truecolor or grayscale input images into a binary image using Otsu's thresholding technique, before performing character recognition. For best OCR results, specify an image in which the height of a lowercase x, or comparable character, is greater than 20 pixels. To improve results, remove any text rotations from the horizontal or vertical axes of greater than 10 degrees in either direction.

Data Types: single | double | int16 | uint8 | uint16 | logical

Rectangular regions of interest, specified as an M-by-4 matrix. Each row specifies a region of interest within the input image in the form [x y width height], where [x y] specifies the upper-left corner location, and [width height] specifies the size of the rectangular region of interest, in pixels. Each rectangle must be fully contained within the input image I. Before the recognition process, the function uses Otsu’s thresholding technique to convert truecolor and grayscale input regions of interest to binary regions.

To obtain best results when using ocr to recognize seven-segment digits, specify an roi that encloses only the part of the image that contains seven-segment digits.

Evaluation data, specified as a datastore that returns a cell array or a table when input to the read function. The datastore must return a cell array or a table on the read functions with at least these two columns:

  • 1st Column — A cell vector of logicals, grayscale images, or RGB images.

  • 2nd Column — A cell vector that contains numROIs-by-4 matrices of bounding boxes of the form [x y width height], which specify text locations within each image.

When using a datastore input, the LayoutAnalysis name-value argument value must be either "auto" or "none".

Name-Value Arguments

Specify optional pairs of arguments as Name1=Value1,...,NameN=ValueN, where Name is the argument name and Value is the corresponding value. Name-value arguments must appear after other arguments, but the order of the pairs does not matter.

Example: txt = ocr(I,LayoutAnalysis="page") treats the text in the image as a page containing blocks of text.

Before R2021a, use commas to separate each name and value, and enclose Name in quotes.

Type of layout analysis to perform for text segmentation, specified as one of these options:

TextLayout ValueText Treatment
"auto"

If the Model argument value is "seven-segment", treats the text in the image as though the LayoutAnalysis value is "block". If you specify the ds input argument, treats the text in the image as though the LayoutAnalysis value is "none". Otherwise, the function treats the text in the image as "page".

"page"Treats the text in the image as a page containing blocks of text.
"block"Treats the text in the image as a single block of text.
"line"Treats the text in the image as a single line of text.
"word"Treats the text in the image as a single word of text.
"character"Treats the text in the image as a single character.
"none"Do not perform layout analysis. Best used for images with a single line of text.

You can use the LayoutAnalysis argument to determine the layout of the text within the input image. For example, you can set LayoutAnalysis to "page" to recognize text from a scanned document that contains a specific format, such as a double column. This setting preserves the reading order in the returned text.

If your input image contains a few regions of text, or the text is located in a cluttered scene, the ocr function can return poor quality results. If you get poor OCR results, try a different layout that better matches the text in your image. If the text is located in a cluttered scene, try specifying an ROI around the text in your image in addition to trying a different layout.

Model to use for recognition, specified as one of these options;

  • "english", "japanese", "seven-segment" — These specify the built-in models for detecting English text, Japanese text, or seven-segment digits, respectively.

  • Character vector or string scalar — Use this option to specify a custom model or one of the additional language models included in the OCR Language Data Files support package.

  • Cell array of character vectors or string array — Use this option to specify multiple models to use for detection simultaneously.

For faster performance using the built-in models (including any additional installed language models), you can append -fast to the language model string. For example, "english-fast", "japanese-fast", or "seven-segment-fast".

You can also install additional models or add a custom model. For details, see Install OCR Language Data Files.

Specifying multiple languages enables simultaneous recognition of all the selected languages. However, selecting more than one language can reduce the accuracy of the ocr function and increase its processing time. To specify any of the additional languages that are contained in the Install OCR Language Data Files package, specify them the same way as the built-in languages. You do not need to specify the path. For example, to specify Finnish text recognition:

txt = ocr(img,Model="finnish");

 List of Support Package OCR Languages

You can use your own custom model by specifying the path to the trained data files. Use one of the methods described in the table to specify the path. The method you use depends on whether you obtain the trained data by using the trainOCR function or by using the OCR Trainer app.

Trained Model SourceSpecification Process
trainOCR function

For a trainOCR output ocrModel, specify:

ocrResults = ocr(I,Model=ocrModel);

ocrModel must contain the full path of the model.

OCR Trainer app

You must name the file in the format <Model>.traineddata. and save it in a folder named tessdata. <Model> is the name of your trained custom model. For example, to use a custom model named "eng", you must specify:

txt = ocr(img,Model="<path>/tessdata/eng.traineddata");
where, path is the full path to the tessdata folder, and eng.traineddata is returned by the OCR Trainer app.

When specifying multiple custom models, you must specify the same containing folder. For example, because this code points to two different containing folders, it does not work.

txt = ocr(img,Model={"<path>/one/tessdata/eng.traineddata",...
                "<path>/two/tessdata/jpn.traineddata"});

Some language model files have a dependency on another language model. For example, Hindi training depends on English. If you want to use Hindi, the English traineddata file must exist in the same folder as the Hindi traineddata file. The ocr function supports only traineddata files created by using tesseract-ocr 3.02, or by using the OCR Trainer app.

For deployment targets generated by MATLAB® Coder™: The generated OCR executable and language model file folder must be colocated.

  • For English: C:/path/to/eng.traineddata

  • For Japanese: C:/path/to/jpn.traineddata

  • For Seven-segment: C:/path/to/seven_segment.traineddata

  • For custom data files: C:/path/to/customlang.traineddata

  • C:/path/ocr_app.exe

You can copy the English, Japanese and seven-segment trained data files from this folder:

fullfile(matlabroot,"toolbox","vision","visionutilities","tessdata_best");

Character subset, specified as a character vector or string scalar. By default, CharacterSet is set to the empty character vector "", which specifies the function to search for all characters in the language model specified by the Model name-value argument. You can set this value to a smaller set of known characters to constrain the classification process.

The ocr function selects the best match for detected text from the CharacterSet. Using deducible knowledge about the characters in the input image helps to improve text recognition accuracy. For example, if you set CharacterSet to all numeric digits, "0123456789", the function attempts to match each character to only digits. In this case, the ocr function can incorrectly recognize a non-digit character as a digit.

If you specify the Model as seven-segment, the ocr function uses the CharacterSet value "0123456789.:-".

Output Arguments

collapse all

Recognized text and metrics, returned as an ocrText object, array of ocrText objects, or a cell array of ocrText objects. Each object contains the recognized text and the location of the recognized text for an input image, as well as metrics indicating the confidence of the results. Confidence values are in the range [0,1], representing a percent probability. The shape of the txt output depends on whether you specify an entire single image or single image with one ROI, a single image with multiple ROIs, or a datastore to the ocr function.

Tips

If your OCR results are not what you expect, try one or more of these options:

  • Increase the image size by 2– 4 times.

  • If the characters in the image are too close together or their edges are touching, use morphology to thin out the characters. Using morphology to thin out the characters helps create space between them.

  • Use binarization to check for non-uniform lighting issues. Use the graythresh and imbinarize functions to binarize the image. If the characters are not visible in the results of the binarization, then the image has a potential non-uniform lighting issue. Try top-hat filtering, using the imtophat function, or other techniques that deal with removing non-uniform illumination.

  • Use the roi argument to isolate the text. You can specify the roi manually or use text detection.

  • If your image looks like a natural scene containing words, such as a street scene, rather than a scanned document, try setting the LayoutAnalysis argument to either "Block" or "Word".

  • Ensure that the image contains dark text on a light background. To achieve this, you can binarize the image and invert it before passing it to the ocr function.

References

[1] Smith, Ray. An Overview of the Tesseract OCR Engine. In Ninth International Conference on Document Analysis and Recognition (ICDAR 2007), 629–33. IEEE, 2007. https://doi.org/10.1109/ICDAR.2007.4376991."

[2] Smith, R., D. Antonova, and D. Lee. Adapting the Tesseract Open Source OCR Engine for Multilingual OCR. Proceedings of the International Workshop on Multilingual OCR, (2009).

[3] R. Smith. Hybrid Page Layout Analysis via Tab-Stop Detection. Proceedings of the 10th international conference on document analysis and recognition. 2009.

Extended Capabilities

Version History

Introduced in R2014a

expand all