Trimming scanned images with random borders?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
Tillomar
Posts: 3
Joined: 2017-09-12T09:25:37-07:00
Authentication code: 1151

Trimming scanned images with random borders?

Post by Tillomar »

Hi *,

I could use some help as I'm new to ImageMagick. I started doing the work (manually) with Photoshop, but quickly realized that both my time and my sanity would expire before getting through. When looking around for automated solutions, I found that if any software could, most likly ImageMagick would do the job.
(Sorry that the explanation is not as short as I would have liked. Well, it's according to the problem...)

I need to trim/crop around 1000+ scanned images: The scanner detects the page size automatically, and most of the time got it right. However, a lot of images show additional borders which I need to remove.

This is (the link to) a demonstration image without page contents, but clearly showing the page edge shadows on three sides:
https://www.dropbox.com/s/htldz4q4zh50bgq/C600_086.tif

The additional borders are detectable because
  • the image size deviates from the default page size by more then +/- 20 px for each affected axis
  • the scanner has the borders separated from the page contents by a small shadow which has a sharp edge to the page contents, but a blurry edge to the outside.
I tried the examples from the documentation regarding fuzzy trim, but there are some problems:
  • extra border on the top has never a shadow, so trimming the top edge would have to be done using a standard page length after fixing the bottom edge
  • The shadow width is variable between scans and along the edge; the position for trimming should therefore be the "inner" (sharp) edge, so the extra border is removed up to the inner edge of the shadow.
  • if a page edge was detected properly, there is no shadow
  • the extra border may be on any number (or no) side
As an extra, it would be nice to have all images cut or extended to the same size -- but this is something that can clearly be done in a second run; all pages have a sufficiently wide white border without content.

But the main point is -- I don't have a clue how to detect the inner edge; and I don't know how to leave alone an edge if it has no extra border.

Any help would be very much appreciated.

Tnx,
Tillomar


PS: Forgot to mention -- ImageMagick 7.0.7-Q16 x84 on Win7
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Trimming scanned images with random borders?

Post by fmw42 »

I assume your real scans have text data in them. You could try blurring so that the text blurs together. Then threshold. The get the bounding box of the largest black region (by area) using -connected-components. Once you know the bounding box, you can crop that area from the original image and insert it appropriately into a white background image of the desired size so that the text is near the top center with appropriate margins.
ozbigben
Posts: 27
Joined: 2012-03-25T02:15:27-07:00
Authentication code: 8675308

Re: Trimming scanned images with random borders?

Post by ozbigben »

Hmm, soft shadow... flatbed scanner/copier, white pages with white background and auto page detection?

Is this a real image? Are the 3 faint "rectangles" content or just processing artefacts? Final output is to be colour/greyscale/bitonal? Retaining the true page background? Page size is fixed or variable? The component approach may work but a shadow around all 4 sides is going to be a big component if there's only a little text e.g. page with a single paragraph. At some point I suspect you're still going to have to do some manual QA. Would have been easier with a darker background.

Any chance of seeing a real image or two (with redacted content) to get an idea of the range of page background:content?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Trimming scanned images with random borders?

Post by snibgo »

I assume these are shadows at the edge of the scanned paper. The goal is to over-write the shadows with white.

Writing a procedure based on a single artificial example is flaky, but here goes.

I assume the paper is at least 98% of white, and the shadows are less than this. I assume the horizontal shadow is at least 100 pixels wide, and your scanned image does not contain important material of this size or larger. Similarly for the vertical line.

Windows BAT script, for IM v6:

Code: Select all

set SRC=C600_086.tif

set HLEN=100
set VLEN=100

%IM%convert ^
  %SRC% ^
  ( +clone ^
    -threshold 98%% ^
    +write mpr:ORG ^
    +delete ^
  ) ^
  ( mpr:ORG ^
    -negate ^
    -morphology Erode rectangle:%HLEN%x1 ^
    -mask mpr:ORG -morphology Dilate rectangle:%HLEN%x1 ^
    +mask ^
  ) ^
  -compose Lighten -composite ^
  ( mpr:ORG ^
    -negate ^
    -morphology Erode rectangle:1x%VLEN% ^
    -mask mpr:ORG -morphology Dilate rectangle:1x%VLEN% ^
    +mask ^
  ) ^
  -compose Lighten -composite ^
  out.png
snibgo's IM pages: im.snibgo.com
Tillomar
Posts: 3
Joined: 2017-09-12T09:25:37-07:00
Authentication code: 1151

Re: Trimming scanned images with random borders?

Post by Tillomar »

Hi *,

sorry for the late reply; I'm in process of preparation for a journey. I very probably will have to delay working on this as I may not be online for two weeks.
Anyway, I do appreciate every comment, and will review this thread afterwards.

@ozbigben, @snibgo: This is from a duplex feed scanner (Canon DR-2010C). Yes, the problem seems to be auto page size detection; but if the scanner is switched to fixed paper size, then the upper border is sometimes (30%) shifted vertically -- the paper edge detection is not so good. Could be the paper, it is a bright-white coated paper, maybe confusing the edge detection algorithm.

The 'image' provided is a real scan, just with the real contents removed. The fine lines present are the paper edge shadows. Those lines are only (max.) on three sides as the direction of the scan line lighting prevents the shadow on the top edge. The contents could be quite different between pages. -- I haven't tested the components-approach yet.

@snibgo: IF present, the horizontal and vertical shadows lines are much longer, with a rather well defined length (aka page width / page height). Would that be helpfull?
At the moment, I just don't understand what your script is doing because I have to look-up all the commands...

Thanks a lot!
Tillomar
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Trimming scanned images with random borders?

Post by snibgo »

Tillomar wrote:At the moment, I just don't understand what your script is doing ...
It overwrites your shadows with white. So, from the example you supplied, it makes a completely white output.
snibgo's IM pages: im.snibgo.com
Post Reply