endenizen

January 8, 2007

HOWTO: ImageMagick and PDFs

ImageMagick is a handy tool for manipulating graphics from the command line. In essence, it’s a basic image editor and will do any standard resizing, color changing, cropping, rotating, etc. With a little cleverness and magick, though, this app is a powerhouse with many features Photoshop can only dream about. I’ve been using ImageMagick (and its variants for perl and php) for a few years and I’m still learning new things about it all the time. One of its coolest features is the ability to work with many different file formats, including PDFs. I’ve been dealing with some PDFs recently that I’d like to turn into sample images (ala “try before you buy”) and before today I had a hard time even loading the image into memory but now I know why.

When dealing with PDFs, ImageMagick rasterizes each page of the file, loading these rasterized images into memory as it works it’s way through the entire file. With a file that’s 5-10 pages, this isn’t a problem. With a file that’s over 500 pages, this is practically impossible. One way to get around the problem is to load only a small part of the PDF by specifying the pages after the filename. In PerlMagick (if I want to load the first page):

$image = new Image::Magick;
$image->Read(‘thefile.pdf[0]‘);

This rasterizes the first page of the file without loading anything past that page. To write the image to disk (in a format that doesn’t require Acrobat Reader) I can use the following:

$image->Write(filename=>’page1.gif’);

If you’re following along on your own computer, you’ll now notice that the new image looks horrible. The quality is severely reduced from the original making most (if not all) text unreadable. This is a problem since PDFs typically contain a lot of text.

The quality is reduced because ImageMagick reads the image in at screen resolution (72 dpi). When the PDF is viewed on the screen by Acrobat Reader, the vector-based text has more information and can be anti-aliased to look decent even when zoomed out very far. When removing the vector information, the image loses all of that helpful data. One way to work around this is by setting the resolution (and in turn, how much of the vector information you’d like to retain) before loading the image.

$image->Set(density=>’300′);

This will set the resolution to 300 dpi. This is typical magazine resolution so it’s plenty for my purposes. Now, however, the resulting image is huge. To fix this, I’ll simply resize the image to a more web-friendly size. The following command will make the picture 500 pixels wide:

$image->Resize(‘500′);

When resizing an image after loading it in at a high-resolution, ImageMagick is able to use the extra data to anti-alias the text, making it look almost as good as the original pdf.

Here are the new commands in order with the rest of the steps (and an extra one to free up the memory):

$image = new Image::Magick;
$image->Set(density => ‘300′);
$image->Read(‘thefile.pdf[0]‘);
$image->Resize(‘500′);
$image->Write(filename => ‘page1.gif’);
undef $image;

Now with a loop, it’s easy to step through each page in the document and convert it. In my program, I only want to use five pages to give someone a sample of the full document. I could use the first five, or jump through the document giving the user a look at the beginning, middle and end.

If you have any questions or notice any errors, let me know by leaving a comment below. Thanks for reading!

January 13, 2006

HOWTO: Secure Browsing with PuTTY and Firefox

The government is watching you! Maybe not, but your boss is watching you! Aw heck, *someone* is watching you! The internet isn’t a safe place so why not protect your privacy while you’re browsing the www. With these simple steps, you can create a secure “tunnel” between the computer you’re using and a remote server. Your data will be encrypted before being passed through the tunnel to prevent anyone from seeing (or restricting) your internet browsing. This technique is useful if certain websites are blocked (typical of some schools) or if you just don’t want your privacy thrown out the window as your boss monitors every website you visit. Besides, you’re devoted to your job and wouldn’t dare visit sites like Slashdot, BoingBoing or FARK on company time.

Note: For certain software setups and configurations, this can tend to be a very involved process. For this guide, I’ll be using a feature specific to PuTTY. This may work with other SSH software but I make no guarantees, though you should be able to find other solutions easily with Google.

The process:

  1. Get PuTTY.
  2. Optional: If you’re adding this rule to a previously saved session, make sure to select that session and hit load before you continue.
  3. Click on “Tunnels” in the options list and enter 1080 for the source port (1080 is the “official” SOCKS port, though you can choose a different one if you so desire). Click the Dynamic radio button and hit Add.

PuTTY Configuration

  1. Optional: Scroll back up to the Session options and save the session.
  2. Make sure the Host Name (or IP) is set correctly in the Session options and click Open.
  3. When you login, you should have a tunnel between your computer and the server you connected to (If it doesn’t work, make sure you entered all of the information correctly). Now we have to configure Firefox to work with our newly created tunnel.
  4. Open up Firefox and click Tools->Options. Click on Connection Settings at the bottom of the panel.
  5. Select “Manual proxy configuration”, type in 127.0.0.1 for the SOCKS Host and put in 1080 (where 1080 is the port you used in the PuTTY tunnel).

Firefox Configuration

That’s it! You should now be able to browse the internet through the tunnel. To make sure it’s working visit www.whatismyip.com and you should see the ip of the server you created the tunnel to. If you have any questions about this HOWTO, please post them below and I’ll respond as best I can.