I want to be able to remove all text from a binary Image containing text and objects/shapes. Please can anyone help
No products are associated with this question.
Walter is saying that virtually any kind of weird-shaped blob might be a letter in some alphabet. You want to identify blobs that are in just one alphabet I presume. So basically you need to do OCR on the blobs to identify which ones belong to your alphabet and which are just meaningless scribbles or letters from other alphabets. Giving code for OCR has always been beyond the scope of Answers. Heck, there are companies employing dozens or hundreds of people that have been working on that for decades so I know it would not be worth my time to develop it all by myself. Personally, I'd look for an existing OCR package to buy. But if you want to write your own OCR package, have at it. Maybe you could try shape algorithms like SURF (see wikipedia.org).
You can try the File Exchange: http://www.mathworks.com/matlabcentral/fileexchange/?term=OCR
Clear the image completely. Every combination of shapes and colors is "text" to some writing system.
Unicode alone has more than 98000 defined characters, and there are probably a several thousand fonts; and then you have hand-written characters complete with shaky hands and pen blotches and human errors.
I don't understand "clear the image completely". The image I want to apply the code on has already been binarized, and the amount of text in the entire image is small; no more than 50 words. The objects I'm interested in are rectangular in shape and of significant sizes compared to the text font size which is only 12.
Okay, so is that 12-point Arabic, or is it Thai, or is it Japanese, or is it in Horta, or is it in Dalek ?
0 Comments