Extracting wrong characters from image

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
seopower
Posts: 6
Joined: 2012-05-04T04:13:36-07:00
Authentication code: 13

Extracting wrong characters from image

Post by seopower »

Hi,

I have a simple image (as attached) when trying to extract text (OCR) from image giving me wrong characters, resulting in wrong spellings. Please suggest what to do to extract correctly.

Image

Using tesseract with Ubuntu through PHP like given below:

exec('tesseract temp/' . $filename . '.png temp/' . $filename);


Thanks,
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Extracting wrong characters from image

Post by snibgo »

In my experience, Tesseract needs characters to be at least 10 pixels high to be reliable, and 20 is better. Yours are only 9 pixels high.
snibgo's IM pages: im.snibgo.com
markt
Posts: 8
Joined: 2016-02-15T17:08:32-07:00
Authentication code: 1151

Re: Extracting wrong characters from image

Post by markt »

Try adding an extra border around the extracted text image, I seemed to get improved recognition with Tesseract using additional 20x20 white border.
seopower
Posts: 6
Joined: 2012-05-04T04:13:36-07:00
Authentication code: 13

Re: Extracting wrong characters from image

Post by seopower »

snibgo wrote:In my experience, Tesseract needs characters to be at least 10 pixels high to be reliable, and 20 is better. Yours are only 9 pixels high.
Thanks, I have increased the size of the image by double and now it's recognising correctly but still missing space between two words.
Post Reply