Friday, May 12, 2006

S-Nat and XML transactions

As we're getting our new hosting platform in place, we've noticed a few issues popping up. The first larger issue was PayFlowPro from Verisign and their lack of support for 64 bit operating systems. That was easily worked around, but the next issue has me perplexed.

The load balancers we're using work as NAT devices (Network Address Translation), so we have a private network behind them with all of our servers attached and the public network on the other side of them. All traffic to/from our servers goes through the load balancers, making for a firewall. To allow the web servers to talk to sites they want to visit (or request services and information from), NAT is used to allow for the public address of the load balancers to be used in the request to the servers on the other end.

This works well for most things I've tried. OS updates install just fine (rather quickly over the large pipes our data center has, I might add). Payment related data works just fine (or at least it does now that I have it fixed). I can open an X-Windows session and browse websites just fine. But the problems start to come in when we try to pass XML from our new servers to our office location. I get a 500 read timeout almost instantly. I haven't spent a whole lot of time on this, but it's sure puzzling how most services can work and one (which we use heavily on the site I haven't moved yet) refuses to work.

I'm sure I'll come across the answer soon enough, but until then I'll scratch my head a while.

Wednesday, May 03, 2006

Creating JPG images from PDF's using Perl and ImageMagick

One of the things that has always bugged me is the excessive bandwidth that PDF's use when presenting schematics. We also have a fair number of users that aren't computer literate enough to understand what Acrobat Reader is, although this seems to be less of an issue lately. Many of our vendors only supply PDF's, and with the 1000's and 1000's of schematics we have it's just not a good option to manually manipulate each one. Enter ImageMagick.

ImageMagick

ImageMagick is available for many languages, but I prefer using Perl to manipulate the images. Most of the concepts will be the same regardless of language, even if the syntax is changed. So, to begin, I'll include the module and assign the path I want to convert files from. I've also bolded any items you may want to change for your implementation in all of the code snippets below.
#!/usr/bin/perl
use Image::Magick;
use IO::Dir;
my $path = [path to images];
tie %dir, IO::Dir, $path;
At this point, we've got a few things done. The ImageMagick module is going to be used, and I have a hash tied to the path for filenames. Now, we can start doing some processing within that directory.
foreach (keys %dir) {
my $file = $_;
my $pImg = new Image::Magick;
my $jpg = $file; $jpg =~ s/\.pdf$/\.jpg/i;
next if (-e "$path$jpg");
At this point, we've got the pointer to our Image object set up and know some filenames. Now, we need to figure out the density to open this file. I suggest opening it 3 - 5 times your final resolution, so there's a simple way to get that figured out. This is our next chunk of code.
my ($width, $height, $size, $format) = $pImg->Ping("$path$file");
my $density = sprintf("%d", 16500 / $width);
$density = qq(60) if ($density < 60);
$density = qq(250) if ($density > 250);
$density = $density . 'x' . $density;
This gives us the dimensions and uncompressed size of the image. One item to note here: the dimensions are for the first page and not for the entire document. But since we're working with a select width that we'd like to end up at, it doesn't really matter. I chose 550 as our ideal width to fit on the site, so the calculation is 550 (end width) * 3 * 10 (to get the decimal point in the right spot), or 16500. We divide that by the width to get the density factor we need to use to open it at about 3x the end result size needed, then format it like "200x200". I also added in a minimum of 60x60 and a maximum of 250x250 to make sure it's sufficient quality but doesn't use too much memory. You'll want to experiment with these numbers just a bit.

Now, we want to set the density on our Image pointer object and open all pages of the file, then stack them if there is more than one page. This is actually pretty simple.
$pImg->Set(density => "$density");
$pImg->read("$path$file";
if ($pImg->[1]) {
$pImg->Append(stack => 'true');
$pImg = $pImg->Append();
}
next unless ($pImg->[0]);
The last line here checks to make sure we've got at least an ordinal page, otherwise the following lines of code will die. Now that we've got the image loaded and all the pages stacked on top of each other, it's time to manipulate. Oh, goody!

We're going to start by trimming the whitespace from the edges, then make sure it's RGB (most of our schematics are B&W) so we can add text of our domain to the newly created image. We'll also Despeckle, Sharpen, and adjust the Contrast before changing it to a JPG and resizing.
$pImg->Trim();
$pImg->Quantize(colorspace => 'RGB');
$pImg->Despeckle();
$pImg->Sharpen();
$pImg->Contrast();
$pImg->Set(magick => 'jpg', compression => 'JPEG', quality => '51');
($height, $width) = $pImg->Get('rows', 'columns');
my $newwidth = 550;
my $newheight = sprintf("%d", $newwidth / $width * $height);
$pImg->Resize(width => $newwidth, height => $newheight, blur => '1', filter => 'Box');
$pImg->Annotate(text => 'ToolPartsDirect.com', align => 'Left', x => $newwidth, y => $newheight - 15, fill => 'Blue', rotate => '270', pointsize => '4');
$pImg->Write("$path$jpg");
}
Hey, we're done! Some of the settings in there may need to be tweaked for your purposes, but since most PDF are heavy text and we've already adjusted most of the settings for text, it should be pretty close to what will work for you. Let me know if you spot any settings that look even better, but we're using this with reasonably good results now. Happy converting!