Here is the challenge:
The scan could look like
Code: Select all
OptionalHeading PageNumber
TextTextTextTextTextTextTextText
TextTextTextTextTextTextTextText
TextTextTextTextTextTextTextText
TextTextTextTextTextTextTextText
If I didn't have the headline and page number, I suppose I could use
- some threshold to make the margins really white
- some despeckle to remove stray spots
- some autocrop which removes all white borders completely
- some (linux bash) command in combination with imagemagick to determine the width of the picture
- some option to add a calculated width white borders as to make the whole picture the desired width
Now the problem is that the *optional* headline and page number should not count for the border calculation.
So in fact, I need to cut out a horizontal strip from the center where I have text only and do some manipulations and calculations there? Now it gets complicating.
Any ideas?
Btw, on top and bottom I probably want anything non-white as a reference point. So no way to handle "optionals" there.