[magick-users] extracting text area from image
Alexandru Ciobanu
capsunel at gmail.com
Mon Jun 11 18:15:46 PDT 2007
Wow!
Thank you, guys, for the great ideas so far.
To somewhat simplify the task, I indent to use the
local area threshold algorithm. So very same image
now becomes this:
http://picasaweb.google.com/capsunel/Imaging/photo#5074975837667856626
The command was very simple:
convert book_page.jpg -lat 5x5-5% -monochrome book_page_lat.jpg
Now, I'm guessing, it's easier to approach it, for example like
Marcel proposed.
Another approach (which I'll definitely try) is to work like the
unpaper tool (thanks, ignotus, for the idea):
http://unpaper.berlios.de/
The tool does some "magick" on images (scanned pages) to make
them better readable by both humans and OCR engines. There is
just one source file, and it is easy to follow.
=)
Still, if somebody has other ideas, I'm curious to hear them.
Alex
Wouterse, Marcel wrote:
> Hi,
>
> First methodology which pops into my mind:
>
> Make the white part transparent (with a fuzz factor) and then do a
> repage or something...
>
> Regards,
> Marcel
>
> -----Oorspronkelijk bericht-----
> Van: magick-users-bounces at imagemagick.org
> [mailto:magick-users-bounces at imagemagick.org] Namens Alexandru Ciobanu
> Verzonden: maandag 11 juni 2007 15:54
> Aan: magick-users at imagemagick.org
> Onderwerp: Re: [magick-users] extracting text area from image
>
>
> Hi, Ron!
>
> I need (1), i.e extract an image which is the same size as the text
> area. I will use a dedicated tool for OCR (2), which is the next step.
>
> Basically this must be a primitive implementation of layout analysis. =)
>
> And the image is here (I thought it'll make it attached):
> http://picasaweb.google.com/capsunel/Imaging/photo#5074517723571163346
>
> Note: the red area is not really important.
>
> Alex
>
> PS: I've posted the same question here:
> http://www.imagemagick.org/discourse-server/viewtopic.php?f=1&t=8949
>
>
> On 6/10/07, Ron Savage <ron at savage.net.au> wrote:
>
>> Alexandru Ciobanu wrote:
>>
>> Hi Alexandru
>>
>>
>>> I am trying to use ImageMagick to extract strictly
>>> the text area from a photograph of a book page.
>>>
>> Do you mean
>>
>> (1) extract an image which is the same size as the text area, or
>> (2) extract the text letter-by-letter
>>
>> The latter is called Optical Character Recognition, and I do not know
>> of any such feature within IM.
>>
>>
>>> If you look at the image attached, I am interested in the green area
>>>
>
>
>>> and, if possible, the red area.
>>>
>> No image attached. Please upload to your web site.
>>
>>
>>> The problem is that it has to be automated and work
>>> for books of various sizes.
>>>
>> Sure.
>>
>>
>>> My idea so far, is:
>>> apply a really crazy filter that would transform the
>>> green area into o big uniform blob, so that I can
>>> then extract its coordinates, and then use those
>>> on the original image.
>>>
>> Sounds reasonable, but also sounds like (1) above.
>> --
>> Ron Savage
>> ron at savage.net.au
>> http://savage.net.au/ _______________________________________________
>> Magick-users mailing list
>> Magick-users at imagemagick.org
>> http://studio.imagemagick.org/mailman/listinfo/magick-users
>>
>>
> _______________________________________________
> Magick-users mailing list
> Magick-users at imagemagick.org
> http://studio.imagemagick.org/mailman/listinfo/magick-users
>
>
>
> ----
>
> This message is confidential and may be privileged. Any review, retransmission, dissemination or other use of, or taking any action with reference to this information by persons other than the intended recipient is prohibited. If you received this message in error, please notify the sender by reply e-mail and delete this message from all computers. Please note that e-mails are susceptible to change. The sender will not accept liability for the improper or incomplete transmission of the information contained in this message.
>
>
> _______________________________________________
> Magick-users mailing list
> Magick-users at imagemagick.org
> http://studio.imagemagick.org/mailman/listinfo/magick-users
>
>
More information about the Magick-users
mailing list