ImageMagick is a handy tool for manipulating graphics from the command line. In essence, it’s a basic image editor and will do any standard resizing, color changing, cropping, rotating, etc. With a little cleverness and magick, though, this app is a powerhouse with many features Photoshop can only dream about. I’ve been using ImageMagick (and its variants for perl and php) for a few years and I’m still learning new things about it all the time. One of its coolest features is the ability to work with many different file formats, including PDFs. I’ve been dealing with some PDFs recently that I’d like to turn into sample images (ala “try before you buy”) and before today I had a hard time even loading the image into memory but now I know why.
When dealing with PDFs, ImageMagick rasterizes each page of the file, loading these rasterized images into memory as it works it’s way through the entire file. With a file that’s 5-10 pages, this isn’t a problem. With a file that’s over 500 pages, this is practically impossible. One way to get around the problem is to load only a small part of the PDF by specifying the pages after the filename. In PerlMagick (if I want to load the first page):
$image = new Image::Magick;
$image->Read(’thefile.pdf[0]‘);
This rasterizes the first page of the file without loading anything past that page. To write the image to disk (in a format that doesn’t require Acrobat Reader) I can use the following:
$image->Write(filename=>’page1.gif’);
If you’re following along on your own computer, you’ll now notice that the new image looks horrible. The quality is severely reduced from the original making most (if not all) text unreadable. This is a problem since PDFs typically contain a lot of text.
The quality is reduced because ImageMagick reads the image in at screen resolution (72 dpi). When the PDF is viewed on the screen by Acrobat Reader, the vector-based text has more information and can be anti-aliased to look decent even when zoomed out very far. When removing the vector information, the image loses all of that helpful data. One way to work around this is by setting the resolution (and in turn, how much of the vector information you’d like to retain) before loading the image.
$image->Set(density=>’300′);
This will set the resolution to 300 dpi. This is typical magazine resolution so it’s plenty for my purposes. Now, however, the resulting image is huge. To fix this, I’ll simply resize the image to a more web-friendly size. The following command will make the picture 500 pixels wide:
$image->Resize(’500′);
When resizing an image after loading it in at a high-resolution, ImageMagick is able to use the extra data to anti-alias the text, making it look almost as good as the original pdf.
Here are the new commands in order with the rest of the steps (and an extra one to free up the memory):
$image = new Image::Magick;
$image->Set(density => ‘300′);
$image->Read(’thefile.pdf[0]‘);
$image->Resize(’500′);
$image->Write(filename => ‘page1.gif’);
undef $image;
Now with a loop, it’s easy to step through each page in the document and convert it. In my program, I only want to use five pages to give someone a sample of the full document. I could use the first five, or jump through the document giving the user a look at the beginning, middle and end.
If you have any questions or notice any errors, let me know by leaving a comment below. Thanks for reading!