Convert many images to a pdf: which is first, convertion or combination?

Questions and postings pertaining to the usage of ImageMagick regardless of the interface. This includes the command-line utilities, as well as the C and C++ APIs. Usage questions are like "How do I use ImageMagick to create drop shadows?".
Post Reply
Tim
Posts: 14
Joined: 2014-11-15T12:39:34-07:00
Authentication code: 6789

Convert many images to a pdf: which is first, convertion or combination?

Post by Tim »

I would like to convert about ~250 images (png and jpg files) in a directory to a pdf.

Can you provide the best way to do it?

What are the cons and pros of the following two ways:
first use "convert" to convert each image from png or jpg to pdf, and then use "convert" to combine the pdf files into one;
use "convert * my.pdf" to do all in one command.

My requirement is that the whole process
takes reasonable amount of disk space, see viewtopic.php?f=1&t=26544
and doesnt' degrade the image quality, see viewtopic.php?f=1&t=26546
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by fmw42 »

Are your pdf files totally vector or do they have raster images in them? Note that IM will rasterize each vector image and the result will be a vector shell around a raster image. So IM is not a very good tool to combine vector images. see http://www.imagemagick.org/Usage/formats/#vector
Tim
Posts: 14
Joined: 2014-11-15T12:39:34-07:00
Authentication code: 6789

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by Tim »

fmw42 wrote:Are your pdf files totally vector or do they have raster images in them? Note that IM will rasterize each vector image and the result will be a vector shell around a raster image. So IM is not a very good tool to combine vector images. see http://www.imagemagick.org/Usage/formats/#vector
All iamge are raster
User avatar
fmw42
Posts: 25562
Joined: 2007-07-02T17:14:51-07:00
Authentication code: 1152
Location: Sunnyvale, California, USA

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by fmw42 »

Then besides snibgo's suggestion of using a Q8 compile of IM (imagemagick), the only other way, might be to use some of the memory control features of IM (see below). Q8 is an 8-bit compile of IM. Q16 is a 16-bit compile of IM. So the Q8 will use only half the memory of Q16.

see -limit and http://www.imagemagick.org/Usage/files/#massive, but I am not an expert on that and do not know if that will help.
pipitas
Posts: 168
Joined: 2012-07-15T14:06:46-07:00
Authentication code: 15

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by pipitas »

Tim wrote:I would like to convert about ~250 images (png and jpg files) in a directory to a pdf.
The best approach very much depends on the answers to following questions:
  • Are the individual images within this set of 250 small or big in terms of their {width}x{height} dimensions?
  • If they are big, do you need to scale them down in order to let not grow your PDF file too large?
In general, a PDF created with 'convert my.jpg my.pdf' will have roughly the same size as the input JPEG.

So, what are the sizes of your PNG and JPEG input files? What is the output of the following command, executed in your image directory?

Code: Select all

du -hsc *.png *.jpg *.jpeg
If you do not (need to) down-sample your images, the PDF you create will roughly be the same size as the 'du -hsc' command suggests...
Tim
Posts: 14
Joined: 2014-11-15T12:39:34-07:00
Authentication code: 6789

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by Tim »

pipitas wrote:
Tim wrote:I would like to convert about ~250 images (png and jpg files) in a directory to a pdf.
The best approach very much depends on the answers to following questions:
  • Are the individual images within this set of 250 small or big in terms of their {width}x{height} dimensions?
  • If they are big, do you need to scale them down in order to let not grow your PDF file too large?
In general, a PDF created with 'convert my.jpg my.pdf' will have roughly the same size as the input JPEG.

So, what are the sizes of your PNG and JPEG input files? What is the output of the following command, executed in your image directory?

Code: Select all

du -hsc *.png *.jpg *.jpeg
If you do not (need to) down-sample your images, the PDF you create will roughly be the same size as the 'du -hsc' command suggests...
84M.

I am not sure if what you ask me to consider is what I plan to do:

I will use -resize 2500x3072 to resize all the images (some are just 800x983) to the largest one among all the images, and use -density 300x300 for the spatial resolution for images in the resulting pdf file.

Is that okay? Do I miss anything?
pipitas
Posts: 168
Joined: 2012-07-15T14:06:46-07:00
Authentication code: 15

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by pipitas »

I suggest you test the complete command chain you have in mind with a single test image.

The first command chain I have in mind is the following:

Code: Select all

convert some-2500x3072.jpg some-2500x3072.pdf
Now test what resolution and what PDF page size you got:

Code: Select all

pdfimages -list 2500x3072.pdf
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2500  3072  rgb     3   8  jpeg   no         8  0    72    72 47.0K 0.2%
Within a very large PDF page, your (very large) image is displayed by any PDF viewer (if set to 100% zoom) at 72 PPI.

No let's make that PDF page smaller:

Code: Select all

gs  -o 2500x3072-downsized.pdf -sDEVICE=pdfwrite -g8420x5950 -dPDFFitPage 2500x3072.pdf
and test the outcome again:

Code: Select all

pdfimages -list 2500x3072-downsized.pdf 
page   num  type   width height color comp bpc  enc interp  object ID x-ppi y-ppi size ratio
--------------------------------------------------------------------------------------------
   1     0 image    2500  3072  rgb     3   8  image  no         8  0   247   372 27.9K 0.1%
The tool now reports three things:
  1. The embedded image still has the same amount of pixels (2500x3072).
  2. The image in relation to the new page size has a resolution in the X-direction of 247 PPI, in Y-direction of 372 PPI
  3. The image's file size has changed from 47.0kB to 27.9kB
The second point is a bug in the 'pdfimages' tool -- I'll have to open a bug report in the issue tracker of the Poppler developers about it. In reality, the tool should report 302.66 PPI for both directions -- it's problem is that it mixes up width and height when it does its calculation (note how '302.66 * 302.66 == 247 * 372 == ca. 92000 ')

The third point can be handled by adding additional parameters to the Ghostscript scaling command, should this concern you for quality reasons.

My advice is: do not mess with 'density 300' for smaller images. It will only blow up your individual PDF's file size without giving you any gain in quality. Convert each image as is, directly. Then use the Ghostscript command to scale all PDFs to the same standard (A4?, letter?) size each (if you need that). This way you maintain the best control over the quality you'll get and not blow up file sizes unnecessarily.

(For more hints about Ghostscript usage when downsampling images within files, or changing their color spaces, or downscaling pages, see f.e. the following link: "How to downsample images within PDF file?": http://stackoverflow.com/a/9571488/359307 )
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by snibgo »

pipitas wrote:... pdfimages ...
Ah, very useful, excellent, thanks.

This answers the problem of extracting images from PDF files. That program writes one PAM file per image, no fuss, no worries about density. IM can read the PAM files in the usual way.
snibgo's IM pages: im.snibgo.com
pipitas
Posts: 168
Joined: 2012-07-15T14:06:46-07:00
Authentication code: 15

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by pipitas »

snibgo wrote:
pipitas wrote:... pdfimages ...
Ah, very useful, excellent, thanks.

This answers the problem of extracting images from PDF files. That program writes one PAM file per image, no fuss, no worries about density. IM can read the PAM files in the usual way.
The Windows version does not have all the options I used in my examples. It's still based on the original XPDF code base. On Linux, Mac OSX and Unix you can get a forked version (based on the "Poppler"-fork of XPDF), which added some more options (like the '-list' I made use of). Unfortunately, the Poppler-version is not readily available for Windows as a pre-compiled binary.
Tim
Posts: 14
Joined: 2014-11-15T12:39:34-07:00
Authentication code: 6789

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by Tim »

Thanks. I don't quite understand your reply yet.

According to your reply in the other question viewtopic.php?f=1&t=26544, I ran

Code: Select all

for i in *.png; do convert  ${i}  -resize 2500x3080  -units PixelsPerInch -density 300x300  ${i/.png/.pdf} ; done
for i in *.jpg; do convert  ${i}  -resize 2500x3080  -units PixelsPerInch -density 300x300  ${i/.jpg/.pdf} ; done
pdftk *.pdf cat output all.pdf
The all.pdf file seems okay, about 82MB, and in a pdf viewer, all the pages take up the same-size space on the screen.
2500x3080 is the largest size (both largest width and largest height) in pixels among all the image files, and I guess I don't lose any quality and enlarge some small images.
Is there some problem with it?

Do you recommend to run the gs command on pdf file(s), before or after combining the pdf files into one using pdftk?
Why?
pipitas wrote: No let's make that PDF page smaller:

Code: Select all

gs  -o 2500x3072-downsized.pdf -sDEVICE=pdfwrite -g8420x5950 -dPDFFitPage 2500x3072.pdf
The third point can be handled by adding additional parameters to the Ghostscript scaling command, should this concern you for quality reasons.

My advice is: do not mess with 'density 300' for smaller images. It will only blow up your individual PDF's file size without giving you any gain in quality. Convert each image as is, directly. Then use the Ghostscript command to scale all PDFs to the same standard (A4?, letter?) size each (if you need that). This way you maintain the best control over the quality you'll get and not blow up file sizes unnecessarily.

(For more hints about Ghostscript usage when downsampling images within files, or changing their color spaces, or downscaling pages, see f.e. the following link: "How to downsample images within PDF file?": http://stackoverflow.com/a/9571488/359307 )
pipitas
Posts: 168
Joined: 2012-07-15T14:06:46-07:00
Authentication code: 15

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by pipitas »

Tim wrote:The all.pdf file seems okay
What's also interesting for future readers of this thread: how much faster was this method, compared to your original one, do it as

Code: Select all

convert * out.pdf
??
Tim wrote:in a pdf viewer, all the pages take up the same-size space on the screen,
This is because your PDF viewer very likely by default is set to "scale to fit" (maybe that is even the setting contained within the PDF files -- I didnt check; if so, it will only take over, if the viewer does not override it).
Tim wrote:all the pages take up the same-size space on the screen
Watch what the "Zoom" control in the window menu bar tells you when you toggle between pages which originated from smaller or larger images.
Tim wrote:Do you recommend to run the gs command on pdf file(s), before or after combining the pdf files into one using pdftk? Why?
I do not recommend it. I'm offering it as an optional step, in case it bothers you that the individual page sizes are different, or too big, formally speaking. But you may be happy with the file as it looks like...

To check for the real page sizes (which are hidden from you by the viewer's "scale to fit page" behavior), as well as for possible CropBox settings, run this command:

Code: Select all

pdfinfo -box -f 1 -l 300 all.pdf | less
IF you want to scale the pages to equal (and smaller) sizes, do it after combining the individual PDFs. Why? It's only one command then. Otherwise you have to do it for each page individually...

In real life it doesn't matter how big the pages are scaled. As you see, there is always the "scale to page" setting in viewers -- the same is true for printer driver settings ("scale to page" or "scale to fit" -- mostly preselected as the default setting).
Tim
Posts: 14
Joined: 2014-11-15T12:39:34-07:00
Authentication code: 6789

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by Tim »

pipitas wrote:

IF you want to scale the pages to equal (and smaller) sizes, do it after combining the individual PDFs. Why? It's only one command then. Otherwise you have to do it for each page individually...
If the pages in a pdf file have different physical sizes (regardless of which zoom level is in use), which command can I use to scale all the pages to have the same physical size?
snibgo
Posts: 12159
Joined: 2010-01-23T23:01:33-07:00
Authentication code: 1151
Location: England, UK

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by snibgo »

pipitas wrote:Unfortunately, the Poppler-version is not readily available for Windows as a pre-compiled binary.
It is, through Cygwin.
snibgo's IM pages: im.snibgo.com
pipitas
Posts: 168
Joined: 2012-07-15T14:06:46-07:00
Authentication code: 15

Re: Convert many images to a pdf: which is first, convertion or combination?

Post by pipitas »

snibgo wrote:
pipitas wrote:Unfortunately, the Poppler-version is not readily available for Windows as a pre-compiled binary.
It is, through Cygwin.
Ah. Good to know! -- Thanks for the hint.
Post Reply