Page 2 of 2

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-25T10:32:54-07:00
by bratpit
To convert pdf to png use ghostscript directly not IM.
IM uses ghostscript but is magnitude slower.
This is nix command in M$ will be similar.
gs -sDEVICE=png16m -dDOINTERPOLATE -dQUIET -sOutputFile=%03d.png -dDownScaleFactor=3 -dSAFER -dBATCH -dNOPAUSE -r900 in.pdf
r900 and dDownScaleFactor on the fly in memory do the same like convert
-density 900 -resize 33%
to improove quality but a lot faster .

For grayscale use
-sDEVICE=pnggray

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-25T12:35:07-07:00
by snibgo
@isfando: You have reverted to small characters, as you had in your first posts. Why? Your later post had larger characters, which will give better quality, thus better OCR.

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-25T15:40:03-07:00
by isfando
snibgo wrote: 2018-09-25T12:35:07-07:00 @isfando: You have reverted to small characters, as you had in your first posts. Why? Your later post had larger characters, which will give better quality, thus better OCR.
@snibgo What shows that i have reverted to small characters. I am not able to grasp it.
I am trying to stream line the approaches you taught me step by step.
I used your script in the 'EARLIER APPROACH' and got good results which are sharp in quality. But I am using convert twice.
In 'CURRENT APPROACH' i am using convert once and i am not able to apply same parameters as step1 of 'EARLIER 'APPROACH' and my results are not good.My question is how can i can join step1 and step2 of 'EARLIER APPROACH' into step1 of 'CURRENT APPROACH'.

********************EARLIER APPROACH******************************
1)

Code: Select all

 convert -density 300 ./sam.pdf -depth 8 -strip -background white -alpha off -threshold 70%  sam.png
the output image sam.png from this step is pretty crisp so the result in step 3 is also crisp
https://drive.google.com/open?id=1fBFFo ... HG-8w-6zGI
2)

Code: Select all

convert ^
  sam.png ^
  -strip ^
  ( +clone ^
    -threshold 50%% ^
    -write mpr:ORG ^
    +delete ^
  ) ^
  ( mpr:ORG ^
    -negate ^
    -morphology Erode rectangle:200x1 ^
    -mask mpr:ORG -morphology Dilate rectangle:200x1 ^
    +mask ^
    -morphology Dilate Disk:3 ^
  ) ^
  -compose Lighten -composite ^
  ( +clone ^
    -morphology HMT "1x4:1,0,0,1" ^
  ) ^
  -compose Lighten -composite ^
  ( +clone ^
    -morphology HMT "1x3:1,0,1" ^
  ) ^
  -compose Lighten -composite ^
  ( +clone ^
    -morphology HMT "3x1:1,0,1" ^
  ) ^
  -compose Lighten -composite ^
  -blur 0x0.5 out.png
3) Result image
https://drive.google.com/open?id=1obtnH ... hHFV0VsOCL

################################################################################################################33

*********************CURRENT APPROACH*********************************
1) doonepage.bat

Code: Select all

convert ^
  -density 300  ^
  %1 ^
  -depth 8 ^
  -strip ^
  ( +clone ^
    -threshold 50%% ^
    -write mpr:ORG ^
    +delete ^
  ) ^
  ( mpr:ORG ^
    -negate ^
    -morphology Erode rectangle:200x1 ^
    -mask mpr:ORG -morphology Dilate rectangle:200x1 ^
    +mask ^
    -morphology Dilate Disk:3 ^
  ) ^
  -compose Lighten -composite ^
  ( +clone ^
    -morphology HMT "1x4:1,0,0,1" ^
  ) ^
  -compose Lighten -composite ^
  ( +clone ^
    -morphology HMT "1x3:1,0,1" ^
  ) ^
  -compose Lighten -composite ^
  ( +clone ^
    -morphology HMT "3x1:1,0,1" ^
  ) ^
  -compose Lighten -composite ^
  -blur 0x0.5 %2

2) domanypages.bat

Code: Select all

set INPDF=sam.pdf

for /F "usebackq" %%L in (`exiftool -args -PageCount %INPDF%`) do set %%L

set /A LASTPAGE=%-PageCount%-1

for /L %%I in (0,1,%LASTPAGE%) do call DoOnePage %INPDF%[%%I] out_%%I.png

3)Result
https://drive.google.com/open?id=1toqjB ... 5pItrdMeRi

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-25T15:42:33-07:00
by isfando
@bratpit

okay i would give it a try.

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-26T04:04:12-07:00
by isfando
@snibgo

do you have a shell script alternative to this batch script.

Code: Select all

set INPDF=sam.pdf

for /F "usebackq" %%L in (`exiftool -args -PageCount %INPDF%`) do set %%L

set /A LASTPAGE=%-PageCount%-1

for /L %%I in (0,1,%LASTPAGE%) do call DoOnePage %INPDF%[%%I] out_%%I.png

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-26T04:14:30-07:00
by snibgo
I don't understand your question. That is a shell script, for the Windows BAT language. It could be translated to any other shell language.

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-26T04:24:46-07:00
by isfando
@snibgo

Sorry i meant to say do you have an alternative to this script in the format of a bash script which can be run on linux server.

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-26T05:58:27-07:00
by snibgo
No.

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-26T06:07:36-07:00
by isfando
@snibgo okay thanks alot. Sorry i had to ask alot questions because of lack of previous experience on the topic.

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-26T07:00:46-07:00
by isfando
@snibgo i was able to formulate the following script for bash. (I feel good to contribute somehow to the topic)

Code: Select all

#!/bin/bash

INPDF=$1

PAGES=$(exiftool -args -PageCount $INPDF | cut -d'=' -f2) 

N=$(( $PAGES ))

for ((I=1;I<=$N;I++));
do
    
    convert -density 300 $INPDF[$I] -depth 8 -strip -background white -alpha off -threshold 70%% out_${I}.png

    
    convert \
	 out_${I}.png\
	-strip \
	\( +clone \
	-threshold 50%% \
	-write mpr:ORG \
	+delete \
	\) \
	\( mpr:ORG \
	-negate \
	-morphology Erode rectangle:200x1 \
	-mask mpr:ORG -morphology Dilate rectangle:200x1 \
	+mask \
	-morphology Dilate Disk:3 \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "1x4:1,0,0,1" \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "1x3:1,0,1" \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "3x1:1,0,1" \
	\) \
	-compose Lighten -composite \
	-blur 0x0.5 out_${I}.png

done



Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-27T06:23:38-07:00
by isfando
isfando wrote: 2018-09-26T07:00:46-07:00 @snibgo i was able to formulate the following script for bash. (I feel good to contribute somehow to the topic)

Code: Select all

#!/bin/bash

INPDF=$1

PAGES=$(exiftool -args -PageCount $INPDF | cut -d'=' -f2) 

N=$(( $PAGES ))

for ((I=1;I<=$N;I++));
do
    
    convert -density 300 $INPDF[$I] -depth 8 -strip -background white -alpha off -threshold 70%% out_${I}.png

    
    convert \
	 out_${I}.png\
	-strip \
	\( +clone \
	-threshold 50%% \
	-write mpr:ORG \
	+delete \
	\) \
	\( mpr:ORG \
	-negate \
	-morphology Erode rectangle:200x1 \
	-mask mpr:ORG -morphology Dilate rectangle:200x1 \
	+mask \
	-morphology Dilate Disk:3 \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "1x4:1,0,0,1" \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "1x3:1,0,1" \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "3x1:1,0,1" \
	\) \
	-compose Lighten -composite \
	-blur 0x0.5 out_${I}.png

done


@snibgo i am using two convert commands in my bash script. How can i feed the output of the first convert command to second convert command without making a temporary png image because i am executing my script in multithreaded environment

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-27T07:13:26-07:00
by snibgo
For bash, don't double the percent signs.

You can combine the two converts, so you have only one, removing the final write from the first and the initial read from the second, like this:

Code: Select all

convert -density 300 $INPDF[$I] -depth 8 -strip -background white -alpha off -threshold 70% \
  -strip \
  \( +clone \
{and so on}
The second strip is redundant, and can be removed.

"-depth 8" has no effect until the output is written. But do you really want that?

Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-27T23:54:16-07:00
by bratpit
isfando wrote: 2018-09-27T06:23:38-07:00
isfando wrote: 2018-09-26T07:00:46-07:00 @snibgo i was able to formulate the following script for bash. (I feel good to contribute somehow to the topic)

Code: Select all

#!/bin/bash

INPDF=$1

PAGES=$(exiftool -args -PageCount $INPDF | cut -d'=' -f2) 

N=$(( $PAGES ))

for ((I=1;I<=$N;I++));
do
    
convert -density 300 $INPDF[$I] -depth 8 -strip -background white -alpha off -threshold 70%% out_${I}.png

    
    convert \
	 out_${I}.png\
	-strip \
	\( +clone \
	-threshold 50%% \
	-write mpr:ORG \
	+delete \
	\) \
	\( mpr:ORG \
	-negate \
	-morphology Erode rectangle:200x1 \
	-mask mpr:ORG -morphology Dilate rectangle:200x1 \
	+mask \
	-morphology Dilate Disk:3 \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "1x4:1,0,0,1" \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "1x3:1,0,1" \
	\) \
	-compose Lighten -composite \
	\( +clone \
	-morphology HMT "3x1:1,0,1" \
	\) \
	-compose Lighten -composite \
	-blur 0x0.5 out_${I}.png

done


@snibgo i am using two convert commands in my bash script. How can i feed the output of the first convert command to second convert command without making a temporary png image because i am executing my script in multithreaded environment
Check this with Ghostscript.
Besides your script doesn't preserve spaces in filenames.
This way is better IMHO.

Code: Select all

#!/bin/bash

INPDF="$1"
gs -sDEVICE=pnggray -dDOINTERPOLATE -dQUIET -sOutputFile=%03d.png -dDownScaleFactor=3 -dSAFER -dBATCH -dNOPAUSE -r900  "$INPDF"

find .  -mindepth 1 -maxdepth 1  -type f -name '*.png' |
while read file; do

   convert   "$file" -depth 8 -strip -background white -alpha off -threshold 70% out_"${file}"
    
done


Re: Remove horizontal summation lines but keep a minus

Posted: 2018-09-28T04:23:07-07:00
by snibgo
Yes, if spaces are permitted in filenames, they should be quoted.