Orders Orders Backward Forward
Comments Comments
© 1997 The McGraw-Hill Companies, Inc. All rights reserved.
Any use of this Beta Book is subject to the rules stated in the Terms of Use.

Chapter 12 - Examples

In the introduction, I talked a little about the diverse way that perl has spread itself into the enterprise. Perl is a language with a couple of simple philosophy tenets:

1) expressability. Being able to say what you mean without the language getting in the way.

2) humility. Realizing that almost all good ideas have been implemented somewhere before. Searching and finding good ideas from these sources and incorporating them into perl itself.

3) interoperability. Being able to work easily with other tools.

These three tenets have paid off quite heavily, and now that you have seen 10 chapters of syntax, now is the time to see perl in action. So here are quite a few perl scripts for your perusal - modify them without compunction: that is the fourth tenet - public scrutiny to a project will do 90% of your work for you!

Chapter Overview

As said above, the following chapter will concern perl scripts in action. We start with what are called 'functional diagrams' which we shall use to clarify some of the logic in the longer scripts, and then proceed to code. We have divided these scripts into 9 different categories:

1) Grep Programs

2) Text/File manipulation

3) Code Generators

4) OLE (object linking and embedding)

5) interacting with GUIs via GuiDo

6) interacting with the Web via libwww

7) CGI scripting

8) Database programming with DBI

9) creating GUI applications with Tk.

In other words, we start with what has been called the 'traditional' role of perl (text manipulation) and then proceed down the line to items that are not normally assumed to be in perl's native environment (perl handles them quite well though, thank you very much).

Realize however that these categories are relatively artificial.

For example, we talk about a mail parser down below - one that can be programmed to get rid of junk mail. We code it as a simple Tk app with two buttons, and make it programmable so that you can enter in a file that does the parsing for you.

What do you call this application? A code generator? A 'text/file' manipulator? A GUI? It is all of these three things, but we choose the GUI - since we have already talked about the two aspects above.

Anyway, enjoy. This chapter covers a rather insane amount of territory, but I feel I owe it to the world in general to show that perl is 'not just for one liners anymore' and that there are quite a few real-world applications out there that fit hand in glove with perl. Just realize that each of these categories probably could generate its own book. In some cases it probably has.

I owe quite a few people for help and support on these examples. I have done a little bit of tweaking of their original apps, so blame me if it doesn't work, and I'll try to get it fixed.

Functional Diagrams:

But first, lets briefly mention a tool that we are going to use in some instances below; the functional diagram.

Functional diagrams are diagrams which show what 'functions are calling other functions' inside your programs. They are used to insure that you actually know what you are doing, and that you have a fairly clean design for your code.

We will see a lot more of them as we get into the 'heftier' pieces of code that we discuss in the OO section of this book.

If you have code that looks like the following:

a()

sub a { b(); c(); }

sub b { d(); }

sub c { print "END"; }

sub d { print "END"; }

then you have generated a functional diagram that looks something like Figure 12.1:

121.fig

Figure 12.1

Sample functional diagram.

Hence a() calls b() and c(), and b() calls d(); a nice clean hierarchy. This is what you will want your functional diagrams to look like. If you see code like this:

a()

sub a { b(); c(); }

sub b { d(); }

sub c { print "END"; }

sub d { a(); c(); }

Then you have generated a hierarchy that looks like Figure 12.2:

122.fig

Figure 12.2

Functional diagram with a circular dependancy

You want to avoid circular dependancies, and 'cross' dependancies like this, at pretty much all costs. Circular dependancies make your code less changeable. For if you change a component in d() then you have to potentially change a() and c() as well. Nice hierarchies limit your succeptability to change.

Anyway, keep functional diagrams in mind. When we get to some of the more complicated scripts below, we will use them as an explaining tool.

Grep Programs

One of the original tasks that perl was designed for was, of course, text manipulation. If you had Gigabytes of text, then it was - and is - a short step to use perl to actually find anything in that vast waste of information.

So, the first type of program we will consider here is the 'grep'.* When actually writing code, grep is infinitely useful for finding one's way through the detail inside the code, and to get a good picture about what the code is doing in the large. For example, you can say:

**** BEGIN NOTE ***

If you are interested about word history, grep stands for Globally search for a Regular Expression, and then Print it out. Comes from 'ed' the command line editor.

prompt% grep custID <filelist>

where custId is a pattern, and filelist is the list of files that you are looking for custId in. And the command

prompt% vi `grep -l custID <filelist>`

will actually let you look at all the files that contain the string custID. If you were editing code this would help quite a bit.

Perl was, and is, used to build on that good idea. With perl, you can build a thousand different specialized greps, each that has its own 'personalized touch', and greps for different types of patterns. We implement two different types of grep below.

There's more than one way to do it: context grep

The first type of grep we will implement is the context grep. Notice that the grep above (that shows only one line) has limited use, if you want to have an idea of a context for a given pattern.

Context grep is meant to change that. If you say something like:

C:\> cgrep.p regular compdefinition

you will get output like:

----------------------

 

 

grep: tool a Unix command for searching files for lines matching a given regular expression Named after the qed/ed editor subcommand "g/re/p" where re stands for regular expression, to Globally search for Regular expression and Print the lines containing matches to it. There are two other major variants, fgrep which searches only for fixed strings and egrep which accepts extended RE's but is usually the

-----------------------

In other words, you get an idea of how the word is being used, so as to make a better judgement about whether that portion is important to you.

Since this is such a basic tool, it is also a good candidate for showing exactly how many different, yet equivalent, solutions you can derive in perl. We will implement three separate and not necessarily equal solutions for the grep problem below.

And one key point here will drive all of our solutions. Note that if we are not careful, we could get output like:

----------------------

named after a theorem in calculus.

 

grep: tool a Unix command for searching files for lines matching a given regular expression Named after the qed/ed editor subcommand "g/re/p" where re stands for regular expression, to Globally search for Regular expression and Print the lines ----------------------

 

----------------------

grep: tool a Unix command for searching files for lines matching a given regular expression Named after the qed/ed editor subcommand "g/re/p" where re stands for regular expression, to Globally search for Regular expression and Print the lines containing matches to it. There are two other major variants, fgrep which searches only for fixed strings and egrep which accepts extended RE's but is usually the

-----------------------

We have doubled our text here. This quite annoying, and happens if we do the simple thing, and just print out two lines before and two lines after each time we see regular. Something like this:

for ($xx = 0; $xx < @lines; $xx++)

{

if ($lines[$xx] =~ m"$pattern")

{

my ($low) = ($xx - 2 > 0)? 0 : $xx-2;

my ($high) = ($xx + 2 > $#lines)? $#lines : $xx + 2;

print "@lines[$low..$high];

}

}

will lead to this doubling. To get the more useful 'squashing effect', to get:

----------------------

 

 

grep: tool a Unix command for searching files for lines matching a given regular expression Named after the qed/ed editor subcommand "g/re/p" where re stands for regular expression, to Globally search for Regular expression and Print the lines containing matches to it. There are two other major variants, fgrep which searches only for fixed strings and egrep which accepts extended RE's but is usually the

-----------------------

where the output is readable, we will need to be a bit clever.

context grep #1: via regular expressions

The first approach we can take to the context problem is to actually think of the entire context as 'one big regular expression.'. If our pattern was 'scout', the problem of matching the bold in:

boat cruise

relax breathe easy

campfire boundary waters

superior grandmarais

boy scout cooking

teepee camp fire

girl scout flintlock

woodsmoke tentpeg

merit badge mosquitos

samsonite tv

briefcase civilization

is after all a regular expression problem: matching the first two lines before the first occurrence of scout, and then continuing to match until we find two interrupted lines where scout does not occur as a pattern. Hence, we can load up the entire file into a pattern, and simply treat the solution as such:

Listing 12.1 cgrepRegexp.p

1 #!/usr/local/bin/perl5

2

3 use strict;

4

5 my $pattern = shift(@ARGV);

6

7 open(FD, "$ARGV[0]");

8 undef $/;

9 my $line = <FD>;

10 my ($beginning, $matched);

11

12 print "\n";

13 while (

14 $line =~ m{

15 (.*?)

16 ($pattern)

17 (([^\n]*\n){0,3}[^\n]*?(?=$pattern)|([^\n]*\n){0,3})

18 }sgxo

19 )

20 {

21 my ($allBeforeText, $pattern2, $afterText) = ($1,$2,$3);

22

23 my ($beforeText) =($allBeforeText =~ m"((?:[^\n]*?\n){0,3}.*?)$"s);

24

25 print "-----------------------------------\n"

26 if ((!$beginning) || ($beforeText ne ''));

27

28

29 print "$beforeText$pattern2$afterText";

30

31 $beginning = 1;

32 $matched = 1;

33 }

We'll take this slow because the regular expressions in this are quite a handful. In all the cases below, what the regular expression matches is in bold. First, Lines 14 through 19 match:

boat cruise

relax breathe easy

boundary waters

campfire cooking

boy scout grandmarais

teepee superior

ie: all the text up to and including 'scout'. Line 23 then prunes this down to

boat cruise

relax breathe easy

boundary waters

campfire cooking

boy scout grandmarais

teepee superior

and then line 29 prints out the text that we have found. On the second pass through, the regular expression matches:

boy scout grandmarais

teepee superior

girl scout flintlock

woodsmoke tentpeg

 

and then prints this out - and finally on the third pass, it matches:

teepee superior

girl scout flintlock

woodsmoke tentpeg

merit badge mosquitos

samsonite tv

briefcase civilization

And line 29 prints out the final text, plus a '------------' indicating that we are done.

context grep #2: via references

The problem with the above solution is that it is a bit slow. Regular expressions are on the whole pretty speedy, but in situations like this (where the regular expression gets quite complicated) the regular expression might end up doing a lot of backtracking. Hence we might want to do this by references. Say that we were trying to match the same text - we could turn something like:

boat cruise

relax breathe easy

campfire boundary waters

superior grandmarais

boy scout cooking

teepee camp fire

girl scout flintlock

woodsmoke tentpeg

merit badge mosquitos

samsonite tv

briefcase civilization

into two distinct matches, each with its own range. The first match would grab lines two through six, ie:

0 boat cruise

1 relax breathe easy

2 campfire boundary waters

3 superior grandmarais

4 boy scout cooking

5 teepee camp fire

6 girl scout flintlock

7 woodsmoke tentpeg

8 merit badge mosquitos

And the second match would grab lines four through eight, like:

2 campfire boundary waters

3 superior grandmarais

4 boy scout cooking

5 teepee camp fire

6 girl scout flintlock

7 woodsmoke tentpeg

8 merit badge mosquitos

9 samsonite tv

10 briefcase civilization

Now, if we keep track of all these matches, we could merge the two references together, thus removing all of the duplicate material:

2 campfire boundary waters

3 superior grandmarais

4 boy scout cooking

5 teepee camp fire

6 girl scout flintlock

7 woodsmoke tentpeg

8 merit badge mosquitos

as the lines that we are going to print out.

The following script does this via references. It keeps track in an array, '@matchedRange' of all such ranges that we will merge later. And then, at the end of each file we consider, the script 'merges' these ranges, printing out blocks of code.

Listing 12.2 cgrepRef.p

1 #!/usr/local/bin/perl5

2

3 use Getopt::Long;

4 use FileHandle;

5

6 GetOptions

7 (

8 \%opt, "savelines=i", "filelist:s"

9 );

10

11 my $pattern = shift(@ARGV);

12

13 if (defined ($opt{filelist}))

14 {

15 my $fd = new FileHandle("$opt{filelist}") if ($opt{filelist});

16 chop(@files = <$fd>);

17 }

18 else

19 {

20 @files = @ARGV;

21 }

22

23 my $file;

24 my $sl = $opt{savelines} || 3;

25

26 foreach $file (@files)

27 {

28 my $fd = new FileHandle("$file") ||(print "Couldn't open $file!!!\n",next);

29 my @lines = <$fd>;

30

31 my ($xx, @matchRange);

32 for ($xx = 0; $xx < @lines; $xx++)

33 {

34

35 if ($lines[$xx] =~ m"$pattern"o)

36 {

37 push

38 (

39 @matchRange, [

40 (($xx-$sl) > 0) ? $xx - $sl: 0,

41 (($xx+$sl) > $#lines) ? $#lines : $xx+$sl

42 ]

43 );

44 }

45 }

46 my $mr;

47 if (@matchRange)

48 {

49 print "$file\n";

50

51 foreach $mr (@matchRange)

52 {

53 if ($print_end)

54 {

55 print "----------------------------------------\n";

56 }

57

58 $print_end = 1;

59 my $low = $mr->[0];

60 my $high = $mr->[1];

61

62 foreach $xx ($low..$high)

63 {

64 if (!$printed{$xx})

65 {

66 print $lines[$xx];

67 $printed{$xx} = 1;

68 }

69 else

70 {

71 $print_end = 0;

72 }

73 }

74 }

75 undef %printed;

76 }

77 }

Again, lines 37 through 43 do the 'grabbing of ranges' looking at the regular expression we want to match to print out the right information. And lines 62 through 73 keep track of the merge process. In particular, line 67 keeps track of a hash, which 'tags' if a line has not been printed, so we only print a line once.

context grep #3: via a 'stack' (idea by

What can we say about the second, proposed solution? Well, it is more time efficient (takes a lot less time than the regular expression solution), but it sure isn't space efficient. To get space efficiency, we need to move to the concept of a 'shifting window'.

Again, lets look at our sample text:

boat cruise

relax breathe easy

campfire boundary waters

superior grandmarais

boy scout cooking

teepee camp fire

girl scout flintlock

woodsmoke tentpeg

merit badge mosquitos

samsonite tv

briefcase civilization

What would it mean to have a stack here? Well, using a stack would involve keeping a limited history of what lines we have matched before. Suppose that you wanted to match three lines surrounding the pattern 'scout': you would first match the first three lines:

boat cruise

relax breathe easy

campfire boundary waters

superior grandmarais

boy scout cooking

and then you would shift through them, until you see a line that has scout in it:

Step #1:

boat cruise

relax breathe easy

campfire boundary waters

superior grandmarais

boy scout cooking

Step #2:

boat cruise

relax breathe easy

campfire boundary waters

superior grandmarais

boy scout cooking

Now that you see 'scout', you then save the three lines in your stack (in bold), and then get rid of the stack. You keep on going:

Step #3:

boy scout cooking

teepee camp fire

girl scout flintlock

woodsmoke tentpeg

Step #4:

boy scout cooking

teepee camp fire

girl scout flintlock

woodsmoke tentpeg

and now, you save the two lines, since you have found the text 'scout' again. In the process, you keep track of whether or not you have found three lines in a row without 'stack' in them, at which point you print the lines you have saved.

Here is the code to do this, called 'cgrepStack.p':

Listing 12.3 cgrepStack.p

1 #!/usr/local/bin/perl5

2

3 use strict;

4

5 my $line;

6 my @stack;

7 my @printed;

8 my $countdown;

9

10 my $no;

11 if ($ARGV[0] =~ m"^\d+$")

12 { $no = shift(@ARGV) }

13 else

14 { $no = 3; }

15

16 my $pattern = shift(@ARGV);

17

18 my ($file, $fileprint);

19 foreach $file (@ARGV)

20 {

21 open(FD, $file);

22 while ($line = <FD>)

23 {

24 push(@stack, $line);

25 shift(@stack) if (@stack > $no);

26

27 my $pushed;

28 if (($line =~ m"$pattern") && (@printed == 0))

29 {

30 $countdown = $no;

31 $pushed = 1;

32 push (@printed, @stack);

33 }

34 elsif ($line =~ m"$pattern")

35 {

36 $countdown = $no;

37 }

38

39 if ($countdown > 0)

40 {

41 if ($pushed == 0) { push(@printed, $line); }

42 else { $pushed = 0; }

43 $countdown--;

44 }

45

46 if (($countdown == 0) && (@printed > 0))

47 {

48 (print ("$file:\n"), $fileprint = 1) if (!$fileprint);

49 print "--------------------\n";

50 print "@printed";

51 @printed = ();

52 }

53 }

54 if (@printed > 0)

55 {

56 print "--------------------\n";

57 print "@printed";

58 @printed = ();

59 }

60 @stack = ();

61 $countdown = 0;

62 $fileprint = 0;

63 close(FD);

64 }

65

66 if (@printed > 0)

67 {

68 print "-----------------\n";

69 print "@printed";

70 }

Lines 19-20 keep track of the stack, shifting as necessary. The rest of the lines keep track of what is to be printed and what is not to be printed.

Now what can we say about this algorithm? Well, it definitely saves space. The first thing that you might recognize is that we are using:

while ($line = <>)

instead of

for ($xx = 0; $xx < @lines; $xx++)

which means we can save on quite a bit of space - the only lines we need to store are the ones to be printed out. Second of all, it is pretty fast. We are only going through each line once, and matching on that line twice (with a little optimization, we could probably bring that down to matching once as well.)

Hence, this is probably the best of the three algorithms that we shall see. And hence we shall use it below.

Using cgrepStack.p

Since this is the fastest algorithm that we came up with, lets give a small idea of how it might be used. Ironically there was a very good example of using this script - I had talked to XXXXXX about 6 months ago, and I had forgotten who had contributed the idea for this code!

Anyway, I have all my mail I ever received saved away, and to find out who had contributed the prototype code, I said:

prompt(/mail)% cgrepStack.p 10 'shift\(' *

The '10' means that I was looking for 10 lines of surrounding context, and the 'shift\(' means that I was looking for the pattern 'shift(' (remember, these are perl regular expressions, so you need to back-shift special characters like \(.

This can save you plenty of time for this type of problem - we could probably improve this version of cgrep quite a bit as well.

filegrep - matching multiple files with multiple patterns.

The second, and last version of grep we will consider is called 'filegrep' and is useful in going the opposite direction of cgrep: grabbing information on multiple patterns and summarizing it all into a report.

For example, suppose you had some code that had a list of tables inside:

author

publisher

editor

binder

graphical_artist

and you want to find out:

1) a list of filenames where all the terms were used (each and every one)

2) lines where one or more of the terms were used

3) files where none of the terms are used.

4) a list of filenames where all the terms were used.

These issues are the staple of many computer science problems: migrating code to a new version, tracking down changes in a data model, and in general, it makes you a lot more powerful on the 'managing your data front'.

With the following script, you can do all of these things. It is called 'filegrep.p' because it supports the concept of having more than one pattern in a 'pattern list' file, and more than one file in a 'file list' file. Hence, whereas in unix you could say:

prompt% egrep 'a|b|c|d|e|f|g|h|i|j' *

with filegrep.p you could say:

prompt% filegrep.p -patlist patterns -filelist files

where 'patterns' was a file that contained a list of patterns, and 'files' was a file that contained a list of files.

This script is quite useful on Unix, and is a lifesaver for doing development on NT, which has no real equivalent of filegrep.p. As it has been so useful in the past, it has sprouted quite a few options:

Listing 12.4 filegrep.p

1 #!/home/epeschko/perl50043/install/bin/perl

2

3 use FileHandle;

4 use Getopt::Long;

5 use strict;

6

7 my %varb;

8

9 my ($opt, $usage) =

10 GetOptions

11 (

12 \%varb, '--filelist:s', '--patlist:s',

13 '--show!','--noshow!', '--and!', '--found!', '--notfound!' ,

14 '--debug'

15 );

16

17

18 my ($file, $line);

19 my @files;

20 my @patterns;

21 my $pattern;

22 my $code;

23

24 if ($varb{'patlist'})

25 {

26 my $fd = new FileHandle("$varb{'patlist'}");

27 chop(@patterns = <$fd>);

28 close($fd);

29 }

30 else

31 {

32 @patterns = shift(@ARGV);

33 }

34

35 if ($varb{'filelist'})

36 {

37 my $fd = new FileHandle("$varb{'filelist'}");

38 chop (@files = <$fd>);

39 close($fd);

40 }

41 else

42 {

43 @files = @ARGV;

44 }

45

46 @files = grep (!-d, @files);

47

48 grep (s{ ([^\\])" }{ $1\\" }xg, @patterns);

49

50 $code = <<"END_OF_CODE";

51 my \$yy = 0;

52 foreach \$file (\@files)

53 {

54 my (\$matched, \$have) = (1,1);

55 my \$fd = new FileHandle("\$file");

56 my (\$marker, \@skip) = (0,());

57 # print ("FILES: ", \$zz-1, "\\n") if ((\$zz++)%1000 == 0);

58 while (defined (\$line = <\$fd>))

59 {

60 study(\$line);

61 # print ("\tLINES: ", \$yy-1, "\\n") if ((\$yy++)%1000 == 0);

62 END_OF_CODE

63

64 my $xx = 0;

65 foreach $pattern (@patterns)

66 {

67 if ($varb{'show'})

68 {

69 $code .="\t\t(print (\"\$file:\$line\"), next) if (\$line =~ m\"$pattern\"o);\n";

70 }

71 elsif ($varb{'noshow'})

72 {

73 $code .="\t\t(\$matched = 1, last ) if (\$line =~ m\"$pattern\"o);\n";

74 }

75 elsif ($varb{'nothave'})

76 {

77 $code .= "\t\t(\$have = 1, last) if (\$line =~ m\"$pattern\"o);\n";

78 }

79 elsif ($varb{'and'})

80 {

81 $code .=

82 "\t\t(\$marker++, \$skip[$xx] = 1, \$marker == ${\($#patterns+1)} && last )

83 if ((!\$skip[$xx]) && (\$line =~ m\"$pattern\"o));\n";

84 $xx++;

85 }

86 elsif ($varb{'found'})

87 {

88 $code .= "\t\t(print (\"$pattern\\n\"), \$already{$pattern} = 1) if (\$line =~ m\"$pattern\"o && (!\$already{$pattern}));\n";

89 }

90

91 elsif ($varb{'notfound'})

92 {

93 $code .= "\t\t\$already{$pattern} = 1 if (\$line =~ m\"$pattern\"o && !\$already{$pattern});\n";

94 }

95 else

96 {

97 $code .="\t\t(print (\"\$file\\n\"), last) if (\$line =~ m\"$pattern\"o);\n";

98 }

99 }

100

101 $code .=

102 " }\n";

103

104 if ($varb{'nothave'})

105 {

106 $code .= "\n\t\tprint (\"\$file\\n\") if (!\$have);";

107 }

108 if ($varb{'noshow'})

109 {

110 $code .= "\n\t\tprint (\"\$file\\n\") if (!\$matched);";

111 }

112 if ($varb{'and'})

113 {

114 $code .= "\n\tprint (\"\$file\\n\") if (\$marker == ${\($#patterns+1)});";

115 }

116 $code .=

117 "

118 close(\$fd);

119 }

120 ";

121

122 if ($varb{'notfound'})

123 {

124 foreach $pattern (@patterns)

125 {

126 # $code .= "\nprint \"$pattern :\$already{$pattern}:\n\";";

127 $code .="\nif (\$already{$pattern} != 1){ print \"$pattern\\n\"; }";

128 }

129 }

130

131 print $code if ($varb{'debug'});

132 eval $code if (!$varb{'debug'});

133 die $@ if $@;

134

'filegrep.p's basic mechanism for working is 'code generation' (we could have put it under the code generation section as well) If you say something like:

C:\> perl cgrep.p -patlist patterns -filelist files

where 'patterns' contains:

pattern1

pattern2

pattern3

pattern4

pattern5

this generates the following code:

my $yy = 0;

foreach $file (@files)

{

my ($matched, $have) = (1,1);

my $fd = new FileHandle("$file");

my ($marker, @skip) = (0,());

while (defined ($line = <$fd>)

{

study($line);

(print ("$file\n"), last) if ($line =~ m"$pattern1"o);

(print ("$file\n"), last) if ($line =~ m"$pattern1"o);

(print ("$file\n"), last) if ($line =~ m"$pattern1"o);

(print ("$file\n"), last) if ($line =~ m"$pattern1"o);

(print ("$file\n"), last) if ($line =~ m"$pattern1"o);

}

close($fd);

}

The code it generates is pretty dumb; a wrapper around a bunch of 'if' statements. Lines 50 - 129 are the heart of this algorithm - starting at the 'here document' of lines 50-62:

50 $code = <<"END_OF_CODE";

51 my \$yy = 0;

52 foreach \$file (\@files)

...

62 END_OF_CODE

and ending at the last 'if' statement. The heart of the flexibility is that different code is generated by different statements at the command line. You can see this code generation in action by putting the '-debug' flag, by saying:

prompt% filegrep.p -patlist pattern -filelist files -debug

which then shows you the code generated rather than executing the code.

Some uses of filegrep.p are listed below:

1) filegrep.p pattern *. The default works like a regular grep, except it only prints out file names that have the pattern, rather than the line.

2) filegrep.p -patlist patterns -show * looks through all the files given by the expansion of '*' and actually shows the lines where a pattern inside the file 'patlist' is given.

3) filegrep.p -patlist patterns -filelist list -noshow looks through all the files named inside 'list', and prints out the ones that have one or more patterns inside them.

4) filegrep.p -patlist patterns -filelist list -and: Only lists filenames that have all the patterns that are contained inside the file 'patterns'.

Filegrep.p is my 'industrial strength' tool to look for patterns. And as such, it works wonders. If you are used to Unix, and are working on NT, you can use filegrep.p along with find2perl to emulate a lot of the functionality you can find on Unix.

Text/File Manipulation

Text/File manipulation is really just a more broad version of 'grep'. The goal here is to transform data that you already know the structure of into something else which is useful.

This covers a whole slew of problems, and since perl is the master of manipulating text in the first place, the number of applications that perl can support in this area is astounding.

Below, we take five of these applications:

1) indexing a file (so you can find occurences of given words)

2) comparing and contrasting directory structures

3) deleting garbage files

4) solving a simple cipher

5) a regular expression to match numbers.

All of these are useful; some in their own right (I used a version of the indexing script on this book, for example) Others, like the regular expression to match numbers, you can use in your own code.

indexing a file

Indexing a file is a major pain - its hard to believe that before the advent of computers, that people actually made concordances of Shakespeare - where they listed every single word, and where it occured in every play - by hand.

Of course, now with perl, the same concordances take about 5 minutes or less (you can do a concordance in C as well, but writing the program might take as long as doing the concordance by hand!)

Here is a small program which does an index of where words are in documents, given input files. If you say:

prompt% index.p -filelist file -wordlist words

where 'file' contains a list of files you want to index, and 'words' contain a list of words that you care about, then perl will produce a structure which looks like:

$VAR1 = {

'shift' => [

'./cgrepRef.p: 11',

'./cgrepRef.p: 12',

'./cgrepStack.p: 16',

'./cgrepStack.p: 25',

'./cgrepRegexp.p: 5'

]

};

I use 'Data::Dumper' to simply dump out the data in a way which is sort of legible. You may wish to do more:

Listing 12.5 index.p

1 #!/usr/local/bin/perl5

2

3 use Getopt::Long;

4 use strict;

5 use Data::Dumper;

6 use FileHandle;

7

8 my (%opt);

9

10 GetOptions(\%opt, "--filelist:s", "--wordlist:s" ,

11 "--page", "--line", "--delimiter");

12

13

14 my ($word, $words);

15 my (%inwords, %position);

16

17 if ($opt{'wordlist'})

18 {

19 my $FH = new FileHandle($opt{'wordlist'});

20 foreach $words (<$FH>)

21 {

22 chop($word = $words);

23 $inwords{$word} = 1;

24 }

25 }

26

27 local($/) = ($opt{'page'})? " Page " :

28 ($opt{'delimiter'})? $opt{'delimiter'} :

29 ($opt{'line'})? "\n" :

30 "\n";

31

32

33 my ($file, $unit, @files);

34 if ($opt{'filelist'})

35 {

36 my $fh = new FileHandle($opt{'filelist'});

37 @files = <$fh>;

38 }

39 else

40 {

41 @files = @ARGV;

42 }

43

44 foreach $file (@files)

45 {

46 my $fieldno = 0;

47 chomp($file);

48 my $fh = new FileHandle("$file") || next;

49

50 while (defined ($unit = <$fh>))

51 {

52 $fieldno++;

53 chop($line);

54 my @words = ($unit =~ m"((?:[\@\%\$]|\b)\w+)\b"sg);

55 my $word;

56

57 foreach (@words) { tr"A-Z"a-z"; }

58 foreach $word (@words)

59 {

60 if (!$opt{'wordlist'} || $inwords{$word})

61 {

62 push(@{$position{$word}}, "$file: $fieldno");

63 }

64 }

65 }

66 }

67

68 print Dumper(\%position);

The key to this whole algorithm lies in line 54 - which splits what we are calling a 'unit' into a series of words - which may be (not necessarily) preceeded by a '@','%' or '$'. This allows for perl variables to be indexed as well.

In lines 57-65 then, we actually look to see if we want to process the word (line 60), and then push the information about where the word is located onto a stack called '%position' (line 62). We go through each file, doing this for everything, and finally we print out the whole data structure as found. (line 68)

Other interesting things about this algorithm. Lines 27-30 actually let us index by *other* things, besides the line number. We can index by a different delimiter (if we say: '-delimiter whatever' on the command line)

Or, we can index by the special keyword 'Page ', which so happens to be something that the prints of this book were seperated by. Combining this fact with the ability to automate the saving of this file into text (via OLE and GuiDo), with the context grepping we saw above, and well, even a monster of a book like this wasn't too difficult to index.

comparing and contrasting directory structures

This script comes in quite handy when you are given a new installation of a package from another vendor. Suppose you've a piece of software, 'Snappy Keys' which so happens to run your whole business software department. And one day, you get a new version of Snappy Keys, and it simply stops working.

Now, there are two major ways to deal with this: one, install the old software, tell Snappy Keys about it, and delay installing the new software until it gets fixed.

This is the passive way, and unfortunately, sometimes it never gets fixed since the vendor of Snappy Keys can't tell what is wrong. The second way is to troubleshoot the problem with snappy keys, and that is what the follow script does.

To use it, you have two parallel installations of products on disk, and you specify, through an item called '@G::WantedElements', exactly which elements you wish to compare. The script then goes through each subdirectory, comparing file by file, and noting:

1) which files differ in the aspects that you mentioned via @G::WantedElements (default is size).

2) which files exist in one distribution, but don't exist in another.

So, with this information in hand, you can give the vendor an idea of where to look (surprisingly, I've found that vendors won't or can't do this type of analysis themselves. I wish vendors could do this - it would make them much more user friendly to deal with)

Listing 12.6 dircompare.p

1 #!/usr/local/bin/perl5

2 #

3 #use strict;

4 #use DirHandle;

5 #

6 #%G::StatElements =

7 # (

8 # 'dev' => 0, 'ino' => 1, 'mode' => 2, 'nlink' => 3,

9 # 'uid' => 4, 'gid' => 5, 'rdev' => 6, 'size' => 7,

10 # 'atime' => 8, 'mtime' => 9, 'ctime' => 10, 'blksize' => 11,

11 # 'blocks' => 12

12 # );

13 # @G::WantedElements = ('mode','nlink','uid','gid','size','atime','mtime');

14 @G::WantedElements = ('size');

15

16 my $firstRoot = $ARGV[0];

17 my $secondRoot = $ARGV[1];

18

19 die "Need Two Arguments!\n" if (@ARGV != 2);

20 filesystemDiff($firstRoot, $secondRoot);

21

22 sub filesystemDiff

23 {

24 my ($root1, $root2) = @_;

25

26 print "Comparing $root1 to $root2:\n";

27 print "---------------------------------\n";

28

29 my $diffs = diff($root1,$root2, 1);

30 print @$diffs;

31

32 print "Comparing $root2 to $root1:\n";

33 print "---------------------------------\n";

34

35 $diffs = diff($root2,$root1, 1);

36 print @$diffs;

37 }

38

39 sub diff

40 {

41 my ($dir1, $dir2, $reclevel) = @_;

42

43 my $diffs = [];

44 my $type1 = _type($dir1);

45 my $type2 = _type($dir2);

46

47 if ($type2 eq 'notexist')

48 {

49 push(@$diffs, "\t" x $reclevel . "$dir2 does not exist\n");

50 }

51 elsif ($type1 ne $type2)

52 {

53 push(@$diffs,

54 "\t" x $reclevel . "$dir1 is '$type1' => $dir2 is '$type2'\n");

55 }

56 elsif ($type1 eq 'file')

57 {

58 my $diff = statDiff($dir1, $dir2, $reclevel);

59 push (@$diffs, $diff) if ($diff);

60 }

61 elsif ($type1 eq 'directory')

62 {

63 my $diff = statDiff($dir1,$dir2, $reclevel);

64

65 my $dh = new DirHandle($dir1);

66 my @filesAndDirs = $dh->read();

67 @filesAndDirs = sort by_type @filesAndDirs;

68 my $file;

69

70 foreach $file (@filesAndDirs)

71 {

72 next if ($file eq '.');

73 next if ($file eq '..');

74 my $diff = diff("$dir1/$file","$dir2/$file", $reclevel+1);

75 push(@$diffs, @$diff) if (@$diff);

76 }

77 }

78

79 return($diffs);

80 }

81

82 sub statDiff

83 {

84 my ($element1, $element2, $reclevel) = @_;

85 my $xx;

86 my $diffs1;

87

88 my (@stat1) = stat($element1);

89 my (@stat2) = stat($element2);

90

91 foreach $xx (@G::StatElements{@G::WantedElements})

92 {

93 if ($stat1[$xx] ne $stat2[$xx])

94 {

95 return( "\t" x $reclevel . "$element1 => $element2\n" );

96 }

97 }

98 return(0);

99 }

100

101 sub by_type

102 {

103 my $atype = _type($a);

104 my $btype = _type($b);

105

106 return(1) if (($atype eq 'file') && ($btype ne 'file'));

107 return(0) if (($atype ne 'file') && ($btype eq 'file'));

108 return($a cmp $b);

109 }

110

111 sub _type

112 {

113 my ($entry) = @_;

114

115 return

116 (

117 (-d $entry )? 'directory' :

118 (-f $entry )? 'file' :

119 (!-e $entry)? 'notexist' :

120 'unknown'

121 );

122 }

'dircompare' uses recursion to do this magic: line 74 shows the recursion. Basically, we open up each directory, and then from there proceed to compare the files inside these directories together and note any differences. Lines 47 through 77 show the comparisons: if we hit a directory, line 74 opens up that directory to find out what files underneath that directory we need to compare.

We could have done this using source code out of the standard distribution package called 'File::Find', but this is an awfully good example of recursion. With File::Find, you can say something like:

use File::Find;

find(\&wanted, "$directory");

sub wanted

{

push (@files, $File::Find::name);

}

and get a list of files in one directory (recursively). Then you could take these files, and go through them one by one, substituting the '$directory' in File::Find, with the other directory you want to compare to. Your compare loop would then look like this:

foreach $file (@files)

{

my ($newfile = $file) =~ s"^$directory"$other_directory";

stat_diff($file, $newfile);

}

This is a very rough sketch, but we will fill in a bit more with the next example. Ah well. Chalk this up to the 'there's more than one way to do it' principle. Using dircompare, you can say:

c:\> perl dircompare.p dir1 dir2

and get output that looks something like this:

Comparing dir1 to dir2:

------------------------

dir1/tmp/filename => dir2/tmp/filename

dir1/hello does not exist

Comparing dir2 to dir1

------------------------

dir1/tmp/filename => dir2/tmp/filename

dir1/goodbye does not exist

This output is just an barebones sketch, it probably could be enhanced (by showing exactly how the two files differ)

deleting garbage files

Here's another good annoyance that perl can solve when you are working on large projects. Suppose you are working with editors, and other tools which leave a 'backup' file behind. Word, for example leaves several copies of itself behind. The question is then, how to clean up the junk?

Well, this script can help some. It is called 'delpat.p' (for delete pattern) and it looks through all of the files that you give on the command line, and selectively deletes the ones that you label as 'trash'. For example:

prompt% delpat.p '.bak$' *

will delete all files that have an extension of '.bak' on them, that live in the current directory or below. The script uses the 'File::Find' library that we talked about above, and therefore, does the recursion for you. Here it is:

Listing 12.7 delpat.p

1 #!/usr/local/bin/perl5

2

3 use File::Find;

4 use Getopt::Long;

5 use strict;

6

7 my %varb;

8 GetOptions(\%varb, '-force!');

9

10 my $pattern = shift(@ARGV);

11 $pattern =~ s"\."\\\."g;

12

13 my @files = grep (-f $_, @ARGV);

14 my @dirs = grep (-d $_, @ARGV);

15

16 find(\&del, @dirs);

17

18 my $file;

19 foreach $file (@files)

20 {

21 $File::Find::name = $file;

22 del();

23 }

24

25 sub del

26 {

27 if ($File::Find::name =~ m"($pattern)$")

28 {

29 if (!$varb{'force'})

30 {

31 print "Going to delete: $File::Find::name\n";

32 print "Do you wish to delete this file?!\n";

33 next if (<STDIN> !~ m"^[yY]");

34 }

35 print "Deleting file $File::Find::name\n";

36 unlink($File::Find::name) ||

37 print "Couldn't delete $File::Find::name\n";

38 }

39 }

Note how this is working. First, we shift off the first argument in line 10. (this is going to be the pattern that we are going to delete.). Second, we split up what is left on the command line into files and directories.(line 13 and 14). In line 16, then, we call the function 'del' (callback) on each of the files in each of the directories.

Then, we go through the files, and call del on each one of these. (lines 19-23). And finally, notice that the function 'del' prompts the user whether or not the file is to be deleted, unless -force is given on the command line.

You probably know why, but I'll tell you anyway since so many people forget - when you are doing something dangerous like deleting garbage files, one day you will delete something that is important. In fact, we already solved one of these hurdles with line 11. We make a dot always a literal dot and never its perl equivalent.

Why? Well, because if we say:

C:\> perl delpat.p .bak *

we don't usually want the perl regular expression version of dot, which means 'any character'. Instead, we want a literal, so we force it to be so with s"\."\\.". Also, we have given the user a way out: you can say:

C:\> perl delpat.p .bak -force

and then it will do the deleting by force. We could put an option called 'yes to all' in, as well.

solving a simple cipher

Ok, here's an example just to show off the power of perl's syntax in manipulating text. The idea is to solve a simple cipher, where you rotate the letters a certain amount, something like:

The source for internet discussion groups

becoming

Hvs gcifqs tcf wbhsfbsh rwgqiggwcb ufcidg.

Of course, the idea behind ciphers is that you don't have the info on how much the cipher has shifted letters over. Notice how we deal with this problem in the code:

Listing 12.8 cipher.p

1 #!/usr/local/bin/perl5

2

3 use FileHandle;

4 use strict;

5

6 my $FH = new FileHandle("$ARGV[0]") || die "Couldn't open $ARGV[0]!\n";

7

8 undef $/; # sets $/ globally, maybe not the best idea!

9 local($") = "";

10 my $cipher = <$FH>;

11 $cipher =~ tr[A-Z][a-z];

12 my $dictionary = "/usr/dict/words";

13 my $dict = _setDictionary($dictionary);

14

15 my @letters = ('a'..'z');

16 my @newLetters = @letters;

17

18 my $xx;

19 my ($newCipher, $saveCipher);

20 my ($newMatch, $bestMatch);

21 for ($xx = 0; $xx < 26; $xx++)

22 {

23 @newLetters=($newLetters[25],@newLetters[0..24]);# shift last

24 # letter to front

25 $newCipher = $cipher;

26 eval("\$newCipher =~ tr[@letters][@newLetters]");

27

28 if (($newMatch = _match($newCipher, $dict )) > $bestMatch)

29 {

30 $bestMatch = $newMatch;

31 $saveCipher = $newCipher;

32 }

33 }

34

35 print "HERE IS YOUR CIPHER: $saveCipher\n";

36

37 sub _setDictionary

38 {

39 my ($dictionary) = @_;

40

41 my (@words, $word);

42 my ($line);

43 my $return = {};

44

45 my $FH = new FileHandle($dictionary) ||die "Couldn't open $dictionary!\n";

46 $line = <$FH>;

47 @words = _makeWords($line);

48

49 foreach $word ( @words) { $return->{$word} = 1; }

50 $return;

51 }

52

53

54 sub _match

55 {

56 my ($cipher, $dict) = @_;

57 my $matches = 0;

58

59 print $cipher;

60 my (@words) = _makeWords($cipher);

61 my $word;

62

63 foreach $word (@words)

64 {

65 if ($dict->{$word})

66 {

67 $matches++;

68 }

69 }

70 return($matches);

71 }

72

73 sub _makeWords

74 {

75 my ($cipher) = @_;

76 $cipher =~ s"\n" "g;

77 $cipher =~ tr"A-Z"a-z";

78 my @words = split(' ', $cipher);

79 @words;

80 }

Lines 21-33 are the key here. We simply use the 'tr' function to 'rotate' the letters around, and then try to match each word against a 'dictionary'. Say our cipher is:

rfcpc gq

Well, we then translate this into

sgdqd hr

and check it against the $dict hash (the one in use here is '/usr/dict/words'; I know no equivalent on NT). We load that file into a hash, so that when we compare against it, we see that there is no such word as 'sgdqd' or 'hr', so our score is zero. Then we continue to get

there is

And we have a match. If this was actually running, we would continue and do the other 24 permutations, to see if any actually worked better. Of course the idea of a cipher is to code up pithy sayings (Games magazine does this all the time, although theirs are real ciphers which can't be solved by such a simple scheme. Here, I guess, is mine:

prompt% cipher.p 'W bsjsf ush am qoh rfiby - whg kfcbu hc uwjs zweicf hc obwaozg - pih kvohsjsf vs rcsg cb vwg ckb hwas wg vwg pigwbsgg.'

 

HERE IS YOUR CIPHER: i never get my cat drunk - its wrong to give liquor to animals - but whatever he does on his own time is his business.

 

A regular expression to match perlish numbers

And finally, we shall give a cipher which gives a list of numbers. This one is especially useful if you go ahead and put it into a library or a module (we shall talk a lot on how to do that later, believe me)

Anyway, the idea is that you could use this regular expression to match arbitrary numbers inside a text field. It isn't completely (usably) accurate; for instance it would miss the numbers in:

1+4

because it assumes a space before and after the number. However, we probably could accommodate this, by getting rid of the restriction.

Anyway, we construct the regular expression in a 'top down' manner, dealing with each type of number, and stringing them together in a big chain of alternates:

Listing 12.9 numbers.p

1 #!/usr/local/bin/perl5

2

3 my $hexadecimal = q{

4 0x # leading trailer

5 [0-9a-fA-F]+ # 0 through 9, a-f A-F

6 };

7

8 my $float = q{

9 (?:\+|\-){0,1} # beginning plus/minus

10 [0-9]*\. # stuff before the '.'

11 [0-9]+ # stuff after the '.'

12 };

13

14 my $integer = q{

15 (?:\+|\-){0,1} # beginning plus/minus

16 [0-9]+ # digits.

17 };

18

19 my $scientific = qq{$float} ." ". # float plus E or e

20 q{[E|e]} . qq{$integer};# plus integer makes scientific

21

22

23 my $underscore = q{

24 (?:\+|\-){0,1} # leading plus/minus

25 [0-9]{1,3} # first three digits, up to _

26 (?:_[0-9]{3})+ # underscores, plus groups of 3 digits

27 };

28

29 my $number = qq{

30 (\\s+|^) # leading spaces. since qq, need \\

31 (

32 ?:$hexadecimal|

33 $underscore|

34 $scientific|

35 $float |

36 $integer

37 )

38 (?=\\s+|\$) # again, trailing spaces, ?=

39 # because we don't want to throw

40 }; # away spaces

41 ########## END OF REGULAR EXPRESSION ####################

42

43 print $number; # just for fun... ;-)

44

45 my $line = " This is 0 a test -1.233 of the 0. emergency 1.03e-24-

46 broadcast 543 system 4_223_233";

47

48 while ($line =~ m"($number)"sgxo) # need to have x and o -- x will

49 { # use comment mode, o compiles once.

50 print "$1 ";

51 }

Lines 0-27 concern themselves with building up the individual, different types of numbers. Line 29 actually sticks all of this stuff together into one big regular expression which we then call in line 48.

Note that when we are putting together the alternates, that we put them together in a very specific order. Floats need to come first, before integers because a float:

233.33

has an integer inside it, and we would always match the integer first. Also notice that we use (?=) because the trailing spaces aren't part of the number itself. (In retrospect, it looks like \b would be just as good.)

Code Generators

Finally, we come to code generators. Code generators are the 'top' of the food chain for manipulating text (so to speak). If you can write code to make code, you can do things that (for some reason) to non-technical people seem to be 'magic'.*

When I submitted my first 50,000 line automatically generated perl script to do database integrity checking to my manager of some years ago, he just sort of shook his head in either shock, or dismay. I really can't tell which.

Of course, in order to generate code, you need a language to automatically generate. There are several good ones out there that benefit from automatic generation:

1) C

2) C++

3) SQL

4) perl (of course!)

5) Expect

We will choose Expect for the purposes of this book. Expect is probably the least known of the five above, but it also fills in a gap that, for some reason, people in the perl world have chosen to ignore (probably because expect does such a good job at it).* That area is the automation of programs.

See Don Libes, 'Exploring Expect', for more detail. Expect fills in a really big gap - I just wish it was written in perl.

Suppose for example, you are running an ftp session. If you type:

prompt% ftp ftp.cs.colorado.edu

you get a string back:

Connected to freestuff.cs.colorado.edu.

220 freestuff.cs.colorado.edu FTP server (Version wu-2.4.2-academ[BETA-12](1) Fri Jan 24 13:06:52 MST 1997) ready.

Name (ftp.cs.colorado.edu:ed):

ftp is now expecting you to log in. The job of 'Expect' however, is to handle that request without you being present. An expect script, built on top of tci, looks something like this:

spawn ftp ftp.cs.colorado.edu

set timeout 60

expect {

"Name*):*" { send "anonymous\r" }

"failed" { send "quit\r"; exit 1 }

timeout { puts "TIME_OUT\n"; exit 1 }

}

expect {

"Password:" { send "myemail@myhost.com\r" }

"Faield" { send "quit\r" }

timeout { puts "TIMEOUT\n"; }

}

 

expect {

"ftp>*" { send "get file1\r" }

timeout { puts "TIME_OUT\n"; }

}

'spawn' spawns off a process; and 'expect' indicates what we expect to see as 'returned text' from the program that we spawned off.

Now the key here is that if expect sees the text on the left come back from the process it spawns, it then sends the text on the right. If it sees 'Name' it sends 'anonymous<carriage return>'. If it sees 'Password' it sends 'myemail@myhost.com<carriage return>'.

There are a lot of intricacies to this process though. And I don't want to have to learn another scripting language. Hence, the use of perl to generate the scripts that I use. I can copy sample text of expect from a book; that's easy. I can have perl generate the scripts that I use; that's semi-easy.

However, there is a large overhead in learning a scripting language when the only reason you are going to be using it is for one thing.*

Note that, although I say that perl has no expect-like functions which are truly up-to-date, this is true. However, the 'libnet' distribution has a couple of modules: Net::Ftp and Net::Telnet which basically do the same thing as we do below, just not in a expect-like way.

Hence, you can think of the following two programs as 'examples only.', and you probably want to install them via CPAN or the CD that comes along with this book.

However, they can be used as templates for your own code, when you need to automate programs which are not as standard as telnet. (which I have done several times in the past)

Automating Telnet

Telnet is the standard (ie: agreed upon by committee) way of two machines talking to one another. There are a whole bunch of 'RFC' (request for comments) files that deal with the telnet standard, and hence it is a fairly reliable way to handle processes which require more than one machine to complete.*

See 'http://wombat.doc.ic.ac.uk/foldoc/foldoc.cgi?query=telnet' for more details. This is another one of those 'on-line' computer dictionaries which is invaluable some times

On the command line (or in the case of NT, through a GUI) you say something like:

prompt% telnet <site_name>

which then sends a request for connection to the site '<site_name>'. A telnet server that runs on the computer pointed to by '<site_name>' then recognizes that some computer is trying to talk to it, and sends back some text to your computer. The result? You get something like:

Trying 165.125.122.110…

Connected to mysite.apxtech.com.

Escape character is '^]'

 

UNIX(r) System V Release 4.0 (mysite)

 

Welcome to mysite.apxtech.com!

This system is a dual P-200 running Solaris 2.5.1, with 64 MB of RAM and plenty of attitude. All communications to and fro are monitored for security puroposes. By proceeding, you consent to this monitoring.

 

login:

Again, the computer is waiting for your login. You enter it, and get:

Password:

You enter the password, for verification, and get a prompt:

ed@mysite.apxtech.com(%):

at which point you can do anything that your user can do. However, all of this, as stands, is interactive. Some computers have an 'rsh' command (for remote shell) which implements this in an automated fashion, but even then this command is limited.

The following script, then, automates telnet from the command line, so that you can send multiple commands to the remote machine. You store all of the commands you want to run in a 'command file', something like 'telnet_commands':

---- telnet_commands ----

"ogin:" , "ed"

"Password:", "yeehaw!"

"(%):", "ls"

"(%):", "runscript.p"

"(%):", "exit"

and then run the command with the following line:

prompt% telnet.p -site mysite.apxtech.com -execfile telnet_commands

This then generates some expect code, and you see your commands being executed on your screen.*

Note: if you are running NT, you are going to have to 'fake' the telnet command, since the telnet that comes with NT is GUI based. To do this, there is a telnet clone that comes in the Net::Telnet distribution off of CPAN or on the CD coming along with the book.

It is called

Reminder: NEED TO GET THIS INFORMATION: XXXXXXXXXXXXX

and is totally written in perl. I know, I know - why bother generating expect code when you can write portable code in perl itself? Well, since this is an example about code generation first and telnet second, well just trust me.

Here is the code for 'telnet.p':

Listing 12.10 telnet.p

1 #!/usr/local/bin/perl5

2

3 my $opt = {};

4

5 use strict;

6 use Data::Dumper;

7 use FileHandle;

8 use Term::ReadKey;

9 use Getopt::Long;

10 use Text::ParseWords;

11 use String::Edit;

12

13 main();

14

15

16 sub main

17 {

18 GetOptions

19 (

20 $opt,"--site:s","--execfile:s","--expect:s","--pass:s",

21 "--telnet:s", "--debug"

22 );

23

24 modifyDefaults($opt);

25 my $code = _genTelnetCode($opt);

26 _execTelnetCode($opt, $code);

27 }

28

29 sub _modifyDefaults # We get the options from the command line, and then

30 { # proceed to modify them according to some

31 my ($opt) = @_; # pre-determined defaults.

32

33 my ($fileopt, @errors);

34 $fileopt = (defined ($opt->{'execfile'}))? _parseFile($opt) : {};

35 %$opt = (%$fileopt, %$opt);

36

37

38 $opt->{'site'} = (defined($opt->{'site'}))? $opt->{'site'} :

39 $ENV{'EXPECT_TELNET_SITE'};

40

41 $opt->{'expect'} = $opt->{'expect'} || $ENV{'EXPECT_EXEC'} || "expect -i";

42

43 $opt->{'user'} = $opt->{'user'} || $ENV{'TELNET_USER'} ||

44 $ENV{'USERNAME'} || getpwuid($<);

45

46 if (!defined ($opt->{'pass'}))

47 {

48 print "Enter password for $opt->{'user'}\n";

49 ReadMode 2; $opt->{'pass'} = <STDIN>; ReadMode 0;

50 }

51

52 $opt->{'telnet'} = $opt->{'telnet'} || $ENV{'TELNET'} || "telnet";

53

54 push (@errors, "You need to provide a telnet site!\n")

55 if (!defined ($opt->{'site'}));

56 push (@errors, "You need to provide a user!\n")

57 if (!defined ($opt->{'user'}));

58 push (@errors, "You need to provide a password!\n")

59 if (!defined ($opt->{'pass'}));

60 die @errors if (@errors);

61 }

62

63 sub _parseFile # We look through the file for commands that we are to

64 { # pass to expect.

65 my ($opt) = @_;

66 my $return = {};

67

68 my $fh = new FileHandle("$opt->{'execfile'}") ||

69 die "Couldn't open 'execfile'\n";

70

71 my @lines = <$fh>;

72

73 my ($line, $xx, $keep) = ('', 0, '');

74

75 foreach $line (@lines)

76 {

77 chop($line);

78 next if (!$line); # ignore blanks

79

80 my @array;

81 if (@array = _isaFlag($line))

82 {

83 $return->{$array[0]} = $array[1];

84 }

85 else

86 {

87 if ($xx == 0) # first line is always user

88 {

89 my ($userprompt, $user) = _parseCommand($line);

90 $opt->{userprompt} = $userprompt;

91 if ($user ne 'USER') { $opt->{user} = $user; }

92 }

93 elsif ($xx == 1) # second line is password

94 {

95 my ($passwordprompt, $password ) = _parseCommand($line);

96 $opt->{'passwordprompt'} = $passwordprompt;

97 if ($password ne 'PASSWORD') { $opt->{'pass'} = $password; }

98 }

99 else

100 {

101 push (@{$return->{commands}}, [ _parseCommand($line) ] );

102 }

103 $xx++;

104 }

105 }

106 die "You need to have at least one command!\n"

107 if (!$return->{'commands'});

108 return($return);

109 }

110

111 sub _parseCommand # command to take something like: "a", "b" and turn

112 { # turn it into a form that expect can use.

113 my ($line) = @_;

114 my $keep = 0;

115

116 my ($prompt, $action) = quotewords("\s*,\s*", $keep, $line);

117 $prompt = trim($prompt);

118 $action = trim($action);

119 return($prompt, $action);

120 }

121

122 sub _isaFlag # command to take flags from the exec file, and turn

123 { # them into options to pass to expect.

124 my ($line) = @_;

125 my $keep = 0;

126 my (@array) = quotewords("\s*:\s*", $keep, $line);

127 @array = trim(@array);

128 if (@array == 2) { return (@array); }

129 else { return (()); }

130 }

131

132 sub _genTelnetCode # command to make legal expect code, based on

133 { # on flags passed to program, plus command file.

134 my ($opt) = @_;

135

136 my ($commands, $line, $command, $key) = ($opt->{'commands'}, '', '', '');

137

138 foreach $key ('user','pass', 'userprompt', 'passwordprompt' )

139 {

140 $opt->{$key} =~ s"([\@\$\\])"\\$1"g; # we need to substitute

141 } # dollar signs and at signs

142 # with their backslashed

143 foreach $command (@$commands) # equivalents - expect does

144 { # interpolation like perl!

145 $command->[0] =~ s"([\@\$\\])"\\$1"g;

146 $command->[1] =~ s"([\@\$\\])"\\$1"g;

147 }

148 # start of code generation

149 # generation of user prompt.

150 $line .=<<"EOL"

151 set timeout -1

152

153 spawn $opt->{'telnet'} $opt->{'site'}

154 expect {

155 "$opt->{'userprompt'}" { send "$opt->{'user'}\\r" }

156 "failed" { send "quit\\r"; exit 1; }

157 "error*" { send "quit\\r"; exit 1; }

158 timeout { puts "Timed Out\\n"; exit 1; }

159 "Service not available" { puts "Connection Dropped\\n"; exit 1; }

160 "ftp>*" { send "quit\\r"; exit 1 }

161 }

162 EOL

163 ;

164

165 if ($opt->{'passwordprompt'}) # generation of password

166 { # part.

167 $line.=<<"EOL"

168 expect {

169 "$opt->{'passwordprompt'}" { send "$opt->{'pass'}\\r" }

170 "failed" { send "quit\\r"; exit 1; }

171 "error*" { send "quit\\r"; exit 1; }

172 "Service not available" { puts "Connection Dropped\\n"; exit 1; }

173 timeout { puts "Timed Out\\n"; exit 1; }

174 }

175

176 EOL

177 }

178

179 my $commands = $opt->{'commands'};

180 my $command;

181 foreach $command (@$commands) # generation of commands

182 { # to be executed.

183

184 $line.=<<"EOL"

185 expect {

186 "$command->[0]" { send "$command->[1]\\r" }

187 timeout { puts "Timed Out\\n"; exit 1; }

188 }

189 EOL

190 }

191 $line .=<<"EOL"

192 expect {

193 eof { exit 0 }

194 timeout { exit 1 }

195 }

196 EOL

197 ;

198 $line;

199 }

200

201 sub _execTelnetCode # command to actually execute

202 { # the telnet code that we have generated

203 my ($opt, $code) = @_;

204

205 my $exec = ($opt->{'expect'});

206 if ($opt->{'debug'})

207 {

208 print "Generated code:\n$code\n";

209 }

210 else

211 {

212 open (EXPECT, "| $exec");

213 print EXPECT $code;

214 close(EXPECT);

215 }

216 }

This code is getting a little long, and when it gets long like this, we turn to the Functional Diagrams. Figure 12.3 shows a functional diagram which also so happens to outline the logic of the program quite well:

123.fig

Figure 12.3

functional diagram of telnet.p

As you can see, we have designed telnet.p to be 'top down'. There are two major parts to the program; the interface, and the code generation. Lets take a look at both, since they work so closely together.

telnet.p interface

There are three ways to give options to telnet.p. Hence, there are three ways to give such things as passwords, user names, and so forth:

1) the 'default' options coming from the environment. For example, lines 38 through 39 define $ENV{'EXPECT_TELNET_SITE'}; the user can set this inside their environment to always co

2) options coming from the file '-execfile'. If you say something like in your file where you are going to execute commands:

1 site: my_site.here.com

2 "ogin:", "ed"

3 "Password:", "yummy2veg"

line 1 here will tell telnet.p that you want to go to my_site.here.com.

3) options from the command line. You can say: 'telnet.p -site 'my_site.here.com', and telnet.p realizes that you want to go to my_site.here.com.

However, this flexibility has a price; you can get conflicts: someone could have specified they wanted to go to the site 'A' in the environment, to site 'B' in the command file, and to site 'C' on the command line!

What to do? In this case, we prioritize. We assume that the command line is the most urgent; if people are typing out commands, it means that they definitely want to go there.

The second most urgent place is the command file. If the command file has an entry like:

site: my_site.apxtech.com

in it, then we know that the user probably wants to go there. The command line is more important, however.

And finally the least urgent place is the defaults, the environmental variables that we set. These just 'pick up the pieces' in the case that the first two methods come up blank.

This scheme allows for a lot of uses that are not otherwise apparent. For example, suppose that you have fifty different sites that you need to connect to, and do the same actions on (all at once). Well with this flexibility, you can say something like:

prompt% telnet.p -site 1 -execfile commandlist

prompt% telnet.p -site 2 -execfile commandlist

. . .

and so on, down the line, to run the commands on each and every site.

Code generation Part

The code generation part takes all the flexibility that we have developed in the interface, and translates it into code. This is the job of You can see the code that it generates by adding '-debug' onto the end. For example, if you had a file, 'command_list', that looked like:

site: happyface.org

"ogin:", "ed"

"Password:","nya2ghh"

"]$", "cd my_dir",

"]$", "ls",

"]$", "exit"

and then you actually ran telnet.p with the command:

prompt% telnet.p -site 2 -execfile commandlist -debug

then you would generate the following code:

1 Generated code:

2 set timeout -1

3

4 spawn telnet happyface.org

5 expect {

6 "ogin:" { send "ed\r" }

7 "failed" { send "quit\r"; exit 1; }

8 "error*" { send "quit\r"; exit 1; }

9 timeout { puts "Timed Out\n"; exit 1; }

10 "Service not available" { puts "Connection Dropped\n"; exit 1; }

11

12 }

13 expect {

14 "Password:" { send "nya2ghh\r" }

15 "failed" { send "quit\r"; exit 1; }

16 "error*" { send "quit\r"; exit 1; }

17 "Service not available" { puts "Connection Dropped\n"; exit 1; }

18 timeout { puts "Timed Out\n"; exit 1; }

19 }

20

21 expect {

22 "]\$" { send "cd my_dir\r" }

23 timeout { puts "Timed Out\n"; exit 1; }

24 }

25 expect {

26 "]\$" { send "ls\r" }

27 timeout { puts "Timed Out\n"; exit 1; }

28 }

29 expect {

30 "]\$" { send "exit\r" }

31 timeout { puts "Timed Out\n"; exit 1; }

32 }

33 expect {

34 eof { exit 0 }

35 timeout { exit 1 }

36 }

37

 

Notice the commands in bold here; these directly come out of the 'execfile' that we made. All we do in 'getTelnetCode' is pad them into text which so happens to be a legal expect program. The 'timeout' directive tells expect to do the action specified if the command times out (ie: there is no response from the telnet server, or your text doesn't match correctly.) And the other directives (failed:, error* ) indicate commonly found strings in telnet servers.

There are a couple of things we need to do to the input to do this; add '\r' onto the end of commands, and turn all of the '$' into \$' (since expect does interpolation just like perl!).

Execution of code part

Finally, we need to take the code that we have just generated, and send it to expect to actually be executed. This is done by _execTelnetCode in line 201 - 216, and we actually send the code that we just created to a pipe, as in:

212 open (EXPECT, "| expect -i");

213 print EXPECT $code;

214 close(EXPECT);

We could have just as easily done this with a system call ( if you are using Windows 95, you may just have to since the dos command prompt is pretty bad at this sort of thing), but doing it via a pipe has two advantages:

1) no extra files lying around

2) the passwords are much more secure (no place to trap them!)

Number 2 is the critical point. By putting the code that we've just generated into a pipe, we make our system much more secure. Lines 212-214 can be used to talk to the pipe; when you 'print' to the EXPECT handle, you actually make the EXPECT handle execute the code that you are mentioning.

Automating Ftp

Once we have gotten telnet.p down, we can now use the base code to make other programs, which automates different things. One such commonly used program is ftp, or 'file transfer protocol' which is what makes the internet go round.

However, when dealing with a browser, or on the command line, you can only download one file at a time. With the ftp.p we are going to write, you can say:

prompt% ftp.p -site prep.ai.mit.edu -filelist 'files'

where 'files' contains a list of files to be found at the site, ie:

/pub/gnu/bash.tar.gz

/pub/gnu/fileutils.tar.gz

/pub/gnu/sharutils.tar.gz

then ftp.p will go off and connect to the site 'prep.ai.mit.edu' and then get all of the files in that file list. This script is particularly good for people who have slow modem connections. You simply tell the program to 'transfer away' at night, and during the early daylight hours, ftp does its thing, downloading all the files, automatically.

Unlike telnet, Windows NT comes with a command line version of ftp, so you don't need to do anything special to make it work.

Listing 12.11 ftp.p

1 #!/usr/local/bin/perl5 -w

2

3 my $opt = {};

4

5 use FileHandle;

6 use Getopt::Long;

7

8 main();

9

10 sub main

11 {

12 GetOptions

13 (

14 $opt, "--site:s", "--filelist:s", "--expect:s", "--ftp:s",

15 "--user:s", "--pass:s", "--type:s", "--ftpfile:s",

16 "--debug"

17 );

18

19 _modifyDefaults($opt);

20 my $code = _genFtpCode($opt);

21 _execFtpCode($opt, $code);

22 }

23

24

25 sub _modifyDefaults

26 {

27 my ($opt) = @_;

28

29 my $fileopt = _parseFtpFile($opt) if ($opt->{'ftpfile'});

30 %$opt = (%$fileopt, %$opt);

31

32 $opt->{'expect'} = $opt->{'expect'} || $ENV{'EXPECT_EXEC'} ||

33 "/usr/local/bin/expect -i";

34

35 $opt->{'user'} = $opt->{'user'} || $ENV{'EXPECT_USER'} || "anonymous";

36 $opt->{'pass'} = $opt->{'pass'} || $ENV{'EXPECT_PASS'} || "me@";

37 $opt->{'ftp'} = $opt->{'ftp'} || $ENV{'FTP'} || "ftp";

38 $opt->{'type'} = $opt->{'type'} || $ENV{'EXPECT_TYPE'} || "bin";

39

40 if ($opt->{'filelist'})

41 {

42 @{$opt->{'files'}} = split(' ', $opt->{'filelist'});

43 }

44

45 push( @{$opt->{'files'}}, @{$fileopt->{'files'}});

46 }

47

48 sub _parseFtpFile

49 {

50 my ($opt) = @_;

51 my $return = {};

52

53 my $fh = new FileHandle("$opt->{'ftpfile'}")

54 || die "Couldn't open $opt->{'ftpfile'}\n";

55

56 my $line;

57 while (defined ($line = <$fh>))

58 {

59 next if ($line !~ m"\w");

60 if ($line =~ m":") { _addopt($return, $line); }

61 else { _addfile($return, $line); }

62 }

63 $return;

64 }

65

66 sub _addopt

67 {

68 my ($return, $line) = @_;

69 my ($key, $val) = ($line =~ m"^\s*(.*?)\s*:\s*(.*?)\s*$");

70 $return->{$key} = $val;

71 }

72 sub _addfile

73 {

74 my ($return, $line) = @_;

75

76 my ($val) = ($line =~ m"^\s*(.*)\s*");

77 push (@{$return->{'files'}}, $val);

78 }

79

80 sub _genFtpCode

81 {

82 my ($opt) = @_;

83

84 my $line = '';

85

86 foreach $key ('user', 'pass') { $opt->{$key} =~ s"[\$\\]"\\$1"g; }

87

88 $line .=<<"EOL"

89 set timeout -1

90

91 spawn $opt->{'ftp'} $opt->{'site'}

92 expect {

93 "Name*):*" { send "$opt->{'user'}\\r" }

94 "failed" { send "quit\\r"; exit 1; }

95 "error*" { send "quit\\r"; exit 1; }

96 timeout { puts "Timed Out\\n"; exit 1; }

97 "Service not available" { puts "Connection Dropped\\n"; exit 1; }

98 "ftp>*" { send "quit\\r"; exit 1 }

99 }

100

101 expect {

102 "assword:" { send "$opt->{'pass'}\\r" }

103 "failed" { send "quit\\r"; exit 1; }

104 "error*" { send "quit\\r"; exit 1; }

105 "Service not available" { puts "Connection Dropped\\n"; exit 1; }

106 timeout { puts "Timed Out\\n"; exit 1; }

107 }

108

109 expect {

110 "logged in" { send "\\r" }

111 "onnected to*" { send "\\r" }

112 "ftp>*" { send "\\r" }

113 "failed" { send "quit\\r"; exit 1; }

114 "error*" { send "quit\\r"; exit 1; }

115 "Service not available" { puts "Connection Dropped\\n"; exit 1; }

116 timeout { puts "Timed Out\\n"; exit 1; }

117 }

118

119 expect {

120 "successful" { send "$opt->{'type'}\\r" }

121 "onnected to*" { send "$opt->{'type'}\\r" }

122 "ftp>" { send "$opt->{'type'}\\r" }

123 "failed" { send "quit\\r"; exit 1; }

124 "error*" { send "quit\\r"; exit 1; }

125 "Service not available" { puts "Connection Dropped\\n"; exit 1; }

126 timeout { puts "Timed Out\\n"; exit 1; }

127 }

128 EOL

129 ;

130

131 foreach $file (@{$opt->{'files'}})

132 {

133 $file =~ s"[\$\\]"\\$1"g;

134 my ($dir, $filename) = ($file =~ m"(.*)[/\\](.*)");

135 $dir = $dir || "/";

136 $filename = $filename || $file;

137

138 $line .=<<"EOL"

139 expect {

140 "failed" { send "quit\\r"; exit 1; }

141 "error*" { send "quit\\r"; exit 1; }

142 "ftp>" { send "cd $dir\\r" }

143 Service not available" { puts "Connection Dropped\\n"; exit 1; }

144 timeout { puts "Timed Out\\n"; exit 1; }

145 }

146 expect {

147 "failed" { send "quit\\r"; exit 1; }

148 "error*" { send "quit\\r"; exit 1; }

149 "ftp>" { send "get $filename\\r" }

150 Service not available" { puts "Connection Dropped\\n"; exit 1; }

151 timeout { puts "Timed Out\\n"; exit 1; }

152 }

153 EOL

154 ;

155 }

156

157 $line .=<<"EOL"

158 expect {

159 "failed" { send "quit\\r"; exit 1; }

160 "error*" { send "quit\\r"; exit 1; }

161 "ftp>" { send "quit\\r"; exit 0; }

162 Service not available" { puts "Connection Dropped\\n"; exit 1; }

163 timeout { puts "Timed Out\\n"; exit 1; }

164 }

165 EOL

166 ;

167 $line;

168 }

169

170 sub _execFtpCode

171 {

172 my ($opt, $code) = @_;

173

174 my $exec = $opt->{'expect'};

175 if ($opt->{'debug'})

176 {

177 print "Generated code:\n$code\n";

178 }

179 else

180 {

181 open (EXPECT, "| $exec");

182 print EXPECT $code;

183 close(EXPECT);

184 }

185 }

Again, this code is almost exactly the same as telnet.p. In fact, I have simply cut and pasted lots of the sections of code from the previous example, and modified them from there!

As one might expect, this isn't exactly the best practice, but lots of people do it at one time or another (including me). And if you don't know the techniques of modular or Object Oriented programming, this is probably the only way to do code reuse.

However, we will revisit these two examples in chapter 18 and show a much better way of reuse - when we talk about 'objectizing' code like this - basically junking the code for parts, and starting fresh.

OLE Automation- Object Linking and Embedding (NT/Win95)

Ok, now lets look at a couple of places where perl is used on a specific platform; the Win32 platform. I hate to do this, since one of perl's main sells is portability. We must not forget that another one of perl's basic tenets is 'Let People do What They Want to Do' or 'It is Better to Have A Solution, than To Have No Solution at All'.

In this case, we want to make it so that perl can 'talk' to the MS-Office applications. There are many reasons for this:

1) they are popular

2) they can hold their data 'captive' in proprietary formats.

3) perl can be used to streamline the process of dealing with large amounts of their data.

In other words, perl can be used to translate to and fro between the free formats (ascii, lex, HTML) into the proprietary ones (word document, excel spreadsheet) and thus gain you power in controlling the data. And OLE is Microsoft's primary way of making this control happen.*

Although they certainly don't advertise it in the form of free information. One of the best ways to find out how to use OLE is to look at the interface inside Visual Basic/C++ through methods called ITypeInfo an ITypeLib which give you the correct interface on the spot. (These are not supported by perl right now, but may be at the point that you get this book)

Anyway, I searched the net for about a month solid to try to find good information on pragmatic, free sources of OLE code. There is a knowledge base at:

http://www.microsoft.com/kb

but if someone out there is reading this from Microsoft, please tell whomever is in charge of the knowledge base to

1) get rid of dead links

2) increase server bandwidth

3) put more pragmatic information on the server. I read a hundred times about what OLE 'was', and only half a dozen times on how to apply ole, all from 1995 and 1996.

The best (only) article I found in the knowledge base was called

'http://support.microsoft.com/support/kb/articles/Q167/2/23.asp'

and I actually didn't find it through the search engine, but through the newsgroup microsoft.public.vb.ole.automation (thanks to Rob Bovey). If the http server gives you trouble, you can get it directly via ftp at:

ftp://ftp.microsoft.com/Softlib/MSLFILES/AUTO97.EXE

Note that this only deals with office '97 automation. If anybody finds anything on automation for previous versions of the Office tools, let me know.

So below, we give a brief introduction on how to actually 'make' OLE automated perl scripts.

A brief Overview of OLE

So here is the quick, fifteen second tour of OLE (if you want to get more information about what OLE is, you can get it in spades by looking at the Microsoft Windows knowledge base).

OLE is way to harness applications and run them automatically. As we saw above with Expect, we could harness either the ftp server or the telnet server by running it and sending it keys.

However, GUI's are perceived to be a lot more difficult to run in this manner (however, see GuiDo below for a rebuttal) so Microsoft came up with OLE. With OLE, you could say something like:

my $excelObject = new Win32::OLE('Excel.Application');

and that would open the Excel application for you (don't worry about the syntax for now). If you then say:

$wordObject->Workbooks->Open('C:\excel.xls');

this will then open a workbook for you inside excel, and you can sit back and automatically control it to your heart's desire.

Now if you are unaware of how to program OLE, the best way to program OLE scripts is to actually take a look at Visual Basic scripts, and then 'gut' the code for syntax. For example, the AUTO97.EXE file I mentioned above contains quite a few OLE automation examples in visual basic; the examples look somewhat like this (see the actual thing for the real examples):

Listing 12.12 word.bas

1 Dim WordDocument As Word.Document

2 Dim WordParagraph As Word.Paragraph

3 Dim WordApplication As Word.Application

4 Dim WordRange As Word.Range

5

6 Set WordApplication = CreateObject("Word.Application")

7

8 With WordApplication

9 .WindowState = wdWindowStateMaximize

10 .Documents.Add

11

12 Set WordDocument = WordApplication.ActiveDocument

13

14 Set WordRange = WordDocument.Range

15 With WordRange

16 .Font.Bold = True

17 .Italic = False

18 .Font.Size = 16

19 .InsertAfter "Running Word 97!"

20 .InsertParagraphAfter

21 End With

22 .ActiveDocument.SaveAs "c:\Temp\Aha.doc"

23 .Quit

24 End With

This harnesses Windows 97 (version 8.0) to actually create a document and then save it as C:\Temp\Aha.Doc. You can convert this into the following perl script:

Listing 12.13 wordcopy.p

1 use Win32::OLE;

2

3 use Data::Dumper;

4 use strict;

5

6 my $word = new Win32::OLE('Word.Application') || die "ARRGH! $!\n";

7

8 $word->{WindowState} = 'wdWindowStateMaximize';

9 $word->Documents->Add();

10

11 my $worddoc = $word->ActiveDocument();

12 my $wordrng = $worddoc->Range();

13

14 $wordrng->Font->{'Bold'} = 'True';

15 $wordrng->Font->{'Italic'} = 'False';

16 $wordrng->Font->{'Size'} = '16';

17 $wordrng->InsertAfter("Running Word 97!\n");

18 $wordrng->InsertParagraphAfter();

19

20 $word->ActiveDocument->SaveAs("C:\temp\aha.doc");

21 $word->Quit();

Notice three things here. First, the lack of declarations. VB's version is four lines longer because it needs to declare each varaible and perl has no such need. Second, line 6 institutes some error control - if you can't open a file, you can have your process die, and tell the user why your process is dying.

But third - and most important - is that the syntax is pretty similar. We substitute

WordRange.Font.Bold = True

for

$wordrng->Font->{'Bold'} = 'True'

and we substitute

WordRange.InsertAfter "Running Word 97!\n"

for

$wordrng->InsertAfter("Running Word 97!\n);

Hence, given a VB OLE automation script, the transfer between the two languages is trivial. So lets see a couple more OLE automation scripts, but this time, lets make them more 'general purpose' (in the sense that they actually do something useful.

More controlling Word with OLE (thanks to Eric Zimmerman)

So here are two more scripts dealing with Word and OLE, which I've found somewhat useful. Both were translated from Visual Basic into perl, and then warped into doing something generically useful. I am indebted to Eric Zimmerman for the original Visual Basic source code for both of the following examples:

Printing out documents via OLE - an NT version of 'lpr'

If you use Word frequently, as some of us do (especially those who write books) one of the most annoying things about it is that everything is GUI based. There is no simple way (that I am aware of, tell me if there is, please!) of simply stacking up and queuing 100 documents to print. In Unix you could say:

prompt% lpr -Pcaps_printer *

and have all the documents in your current directory print out. However, in the NT world, you have to

1) select print from the file menu

2) hit 'Ok'

Now this may not sound like much, but anybody who has actually had to print out 100 documents knows that it is a major pain. To compound things, sometimes you need to wait for the print queue to empty before you can do the cycle again, so you could be sitting around with absolutely nothing to do for hours!

The following script fixes all that. To use, simply say:

C:\> wordprint.p -delay 100 *

and microsoft word will print out documents, one by one with 100 seconds between each printing (the delay is optional). Here is wordprint:

Listing 12.14 wordprint.p

1 use FileHandle;

2 use strict;

3 use Win32::OLE;

4 use Getopt::Long;

5

6 my %opt;

7 GetOptions( \%opt, '--delay:i');

8

9 my @files;

10

11 if (!-e $ARGV[0])

12 {

13 @files = @ARGV;

14 }

15 else

16 {

17 my $fh = new FileHandle($ARGV[0]) || die "Couldn't open $ARGV[0]\n";

18 chop(@files = <$fh>);

19 }

20

21 my $wordObj=new Win32::OLE('Word.Basic')|| die "Couldn't open Word !\n";

22 print $wordObj;

23 $wordObj->{Visible} = 1;

24

25 my $file;

26 die "Need to input some files!\n" if (!@files);

27

28 foreach $file (@files)

29 {

30 $wordObj->FileOpen ($file, 'True');

31 $wordObj->FilePrint();

32 if ($opt{'delay'}) { sleep ($opt{'delay'}); }

33 }

34 $wordObj->Close();

This script came in very handy when I was writing this book; I don't have Word 97, but one of the big things I needed to do was print out sections of text for cross referencing, or to give to other people for preview.

However, if you are going to use this effectively, you need to make sure that your copy of Word is set up correctly. Perl follows to the letter exactly how your word printer is set up; hence, if you aren't careful, you could send (for example) tons of output to the wrong printer.

Saving documents in a different format

The other big way that automation helped was to save documents into a different format. Since the Word format is proprietary, it is a big pain to index as is. You need to rely on the Visual Basic methods for text processing, and well, Visual Basic simply wasn't made to handle text the way perl was.

However, OLE automation again came in handy. Although I didn't have Word 97, I knew someone who did, and they used the following script to save everything as regular text documents (which can be manipulated quite easily by perl.) To use, you say:

C:\> wordsave.p *

or

C:\> wordsave.p test.doc

where the arguments to the script are legal word documents. The code then goes in and saves everything with a 'txt' extension for you, so you can look through the results with perl:

Listing: 12.15 wordsave.p

1 #!/usr/local/bin/perl5

2

3 use FileHandle;

4 use strict;

5 use Win32::OLE;

6

7 my @files;

8

9 if (!-e $ARGV[0])

10 {

11 @files = @ARGV;

12 }

13 else

14 {

15 my $fh = new FileHandle($ARGV[0]) || die "Couldn't open $ARGV[0]\n";

16 chop(@files = <$fh>);

17 }

18

19 my $wordObj = new Win32::OLE('Word.Application')

20 || die "Couldn't open Word $! !\n";

21

22 die "Need to input some files!\n" if (!@files);

23

24 foreach $file (@files)

25 {

26 next if ($file !~ m"\.doc$");

27 $wordObj->Documents->Open($file);

28 my ($txtfile = $file ) =~ s"\.doc"\.txt"g;

29

30 $worddoc->ActiveDocument->SaveAs

31 (

32 FileName=> $txtfile,

33 FileFormat => 'wdFormatText'

34 );

35

36 }

37 $wordObj->Close();

Unfortunately, this only works with Office 97. In version 7.0 and lower of Word, you needed to go through the 'Word.Basic' part of the program (as in the example preceeding this one). So if you have 7.0 or lower, you might want to consider automating it with GuiDo (we go over GuiDo below.)

Excel example - merging Two Spreadsheets

We gave an example of Excel in the chapter 'Perl at 30,000 feet'; however, then we didn't know about references.

Excel works exceedingly well with references - if you think about it, all a spreadsheet is, is a two dimensional array (Row A Column 10 for instance.) And if you have more than one spreadsheet open at the same time, this becomes a three dimensional array.

Hence, you can manipulate Excel objects in perl quite easily. Here's a small program that merges two or more spreadsheets together, and then saves them in the last workbook on the command line. So if you say:

C:\> perl merge_excel.p 1.xls 2.xls 3.xls 4.xls

it will merge 1,2, and 3.xls into 4.xls, stacking the spreadsheets on top of each other. Here's the code:

Listing 12.16 merge_excel.p

1 use Win32::OLE;

2

3 use Data::Dumper;

4 use strict;

5

6 my $excel = new Win32::OLE('Excel.Application') || die "ARRGH! $!\n";

7

8 $excel->{'Visible'} = 1;

9

10 print $excel;

11

12

13 my $workbook;

14 my $AoAoA = [];

15

16 foreach $workbook (@ARGV)

17 {

18

19 $excel->Workbooks->Open($workbook);

20 push(@$AoAoA, _processData($excel));

21 $excel->Workbooks(1)->Close();

22 }

23

24 _integrate($excel, $AoAoA);

25 $excel->Save($ARGV[$#ARGV]);

26 $excel->Quit();

27

28 sub _processData

29 {

30 my($excel) = @_;

31

32 my $AoA = [];

33 my $value;

34 my ($row, $col) = ('A','1');

35

36 while (1)

37 {

38

39 while

40 (

41 $value =

42 $excel->Workbooks(1)->Worksheets('Sheet1')->

43 Range("$row$col")->{Value}

44 )

45 {

46

47 print "$row$col: $value\n";

48 $AoA->[ord($row) - ord('A')][$col-1] = $value;

49 $col++;

50 }

51 last if ($value eq '' && $col == 1);

52 $row++; $col = 1;

53 }

54 print Dumper($AoA);

55 return($AoA);

56 }

57

58 sub _integrate

59 {

60 my ($excel, $AoAoA) = @_;

61

62 my $wb;

63 $excel->Workbooks->Add() || die "Couldn't open Workbook!\n";

64 print Dumper($AoAoA);

65

66 my %lastcol;

67 my ($rows, $cols, $rowcount, $colcount);

68

69 my $coltotal = 1;

70 foreach $wb (@$AoAoA)

71 {

72 my $rows = $wb;

73 for ($rowcount = 0; $rowcount < @$rows; $rowcount++)

74 {

75 my $cols = $rows->[$rowcount];

76

77 for ($colcount = 0; $colcount < @$cols; $colcount++)

78 {

79 my $row = sprintf("%c", $rowcount+ord('A'));

80 my $col = $cols->[$colcount];

81 $lastcol{$row}++;

82 my $range = "$row$lastcol{$row}";

83 print "Adding $col to :$range:\n";

84 $excel->Range("$range")->{Value}=$col;

85 }

86 }

87 }

88 }

The only thing quirky about this is in the translation between Excel spreadsheet and double array. We have the line:

48 $AoA->[ord($row) - ord('A')][$col-1] = $value;

which does the translation. Excel rows go from A through Z, AA-ZZ, AAA-ZZZ, etc. And hence, as a quick translation, we can take the ascii value for 'A' (ord('A')) and subtract it from the name of the row, to get a zero based array element. If we had cell G12, this would translate into:

$AoA->[ord('G') - ord ('A') ][12-1] = $value;

or

$AoA->[71-65][11] = $AoA->[6][11] = $value;

So the translation of excel to perl element is fairly straightforward. When we have our elements inside perl two dimensional arrays, the rest is easy. We accumulate the three arrays together, and then print them to the last element in the subroutine 'integrate' in line 58, and then show what we are doing in line 82. By the time we get to saving the file (line 25) we have an Excel file consisting of our original Excel documents on the command line.

Sending Mail via MAPI

Finally, we shall consider sending mail via MAPI. MAPI stands for Messaging Application Program Interface, and is Window's main method for automating mail. Hopefully, by the time you read this, MAPI might be incorporated into the 'Mail::Send' program of Graham Barr's so you can get true, platform independant mail, but we'll see.

Anyways, there are seven steps to sending a mail message with MAPI:

1) making the OLE object

2) logging into the OLE server,

3) Configuring a message to be in the 'Outbox'

4) adding text to the message, and then saving it via Update

5) figuring out where it is going to by adding Recipients

6) resolving those recipients via 'Resolve'

7) Sending the mail

No wonder I yearn for a simple perl interface to do all of this! Here's some sample, mapi code. It acts as 'mail' does in unix; if you say

c:\> perl send_message.p help@internet.com pleasehelp@internet.com

then perl will tell you to enter a subject, and then the text to the message you are about to send. It saves this text in a message, which you then send to the MAPI server. The code follows:

Listing: 12.17 send_message.p

1 use Win32::OLE;

2 use strict;

3

4 my $ORIGINAL_MAIL = 0;

5 my $FIRST_RECIPIENT = 1;

6 my $COPY_RECIPIENT = 2;

7 my $BLINDCOPY_RECIPIENT = 3;

8

9 my $SAVEMAIL = 1;

10 my $DONTSAVEMAIL = 0;

11

12 my $login = "Ed Peschko";

13 my $passwd = shift (@ARGV);

14

15

16 my @destination = @ARGV || die "You need to have a message recipient!\n";

17

18 my $mailobj = new Win32::OLE('MAPI.Session')

19 || die "Couldn't open a MAPI session! $!";

20

21

22 my $logon = $mailobj->Logon($login, $passwd);

23

24 print "Enter message please!\nSubject:";

25

26 my $subject = <STDIN>;

27 chop($subject);

28

29 print "Message Text:\n";

30 my ($mail, $line);

31

32 while ($line ne '.')

33 {

34 chop($line = <STDIN>);

35 $mail .= "$line\n";

36

37 }

38

39 my $message = $mailobj->Outbox->Messages->Add();

40

41 $message->{'Subject'} = $subject;

42 $message->{'text'} = $mail;

43 $message->Update();

44

45 my $type; # first or second recipient?

46

47 foreach $mail (@destination)

48 {

49 if ($mail eq $destination[0]) { $type = $FIRST_RECIPIENT; }

50 else { $type = $COPY_RECIPIENT = 2; }

51

52 my $sendto = $message->Recipients->Add();

53 $sendto->{'Name'} = $mail;

54 $sendto->{'Type'} = $type; # the first person we list is the primary,

55 $sendto->Resolve(); # the second and further are copies.

56 }

57 $message->Send($SAVEMAIL,$AUTO,0);

58

59 $mailobj->Logoff();

 

 

Lines 39 through 59 are the actual, MAPI part; lines 4-10 show some of the more common constants in MAPI. We use these constants in lines 49,50 and 57; if we have more than one destination we need to make the first person a 'first recipient' and all the others 'copy' recipients. (line 49-50) We therefore 'stack' up the people we are going to send to in line 52 (adding recipients as we go) and configuring what types of recipients they are (line 54)

And finally we send the mail (line 57) and quit the server (line 59). As I said, this seems to be an awful lot of work to send a message; hopefully you can look at Mail::Send on CPAN when this book is out, and it will send mail on NT in one clean step.

WinPerl++ ( aka GuiDo ) (95/NT only) (thanks to Todd Bradfute for examples)

WinPerl++ (by Todd Bradfute and Jeff Townsend) is an alternative to OLE Automation - in other words, another way of controlling GUI programs. Also called GuiDo, this is a perl package that I think has a very bright future. It is still in the beta stage, so the scripts on this page might change slightly in the months ahead, but you can always track its changes by going to:

http://www.pflugerville.org/guido/

GuiDo takes a vastly different approach to automating a GUI application. OLE is a way of accessing how a program was developed; hence you have to reach into the application and 'twiddle its bits' in order to automate it. Hence, it expects your application to be programmed in OLE to begin with.

However, not all programs are written in OLE. In fact, most aren't, so this vastly limits the amount of OLE automation you can do. GuiDo gets around this by letting YOU describe how the application is laid out.

In short, you make an application description file which tells perl exactly how the windows structure of your program is laid out. WinPerl then provides methods to send information to the window, just like Expect allowed you to send keys to telnet and ftp.

It is therefore a generic solution and will work with any GUI. Right now, the action is one way, ie: you can send information to the GUI but can't get out information out. You therefore engage in a 'one way conversation' so to speak, and hence can't automate anything dynamically like you could with Expect.

But still, even given that, GuiDo is damn useful. In my opinion, there are four major benefits to GuiDo over OLE:

1) it is controlled by you. Since you are the one who writes or controls the interface file, you can change the interface to your application when you like.

2) it will become multi-platform. I have talked to Todd Bradfute, one of the authors of WinPerl++ and they have immediate plans of releasing GuiDo on the Unix platform

3) it works with more applications than OLE. Since WinPerl is a generic solution, you don't have to be dependant on 'OLE conformancy'

4) it is easier to get information about WinPerl++ than OLE. It took me many searches on Microsoft's knowledge base to come up with the one public domain file about OLE that you see above.

Of course there is one probable drawback of WinPerl++ versus OLE; that of speed. I haven't done any time trials, but that is my guess. But as OLE itself is no speed demon, this isn't a big deal.

Anyways, lets see a couple of examples of WinPerl++ in action - we shall play around with Excel, and Word below.

Excel

The first application we will use GuiDo for is Excel. Since we already saw the solution for Excel through OLE above, we can use this as a 'comparison' between the two solutions. Note that you are going to have to provide your own path to excel in order to make this work:

Listing: 12.18 guidoExcel.p

1 #!perl5.exe

2 require 5.003;

3 use WinPerl 1.0007;

4 push(@INC,$1) if ($0 =~ /(.*)\\/);

5 use strict;

6 use vars qw ($MSExcel $MSExcelChartWizard1 $MSExcelChartWizard2

7 $MSExcelChartWizard3 $MSExcelChartWizard4 $MSExcelChartWizard5);

8 require "Excel.ph";

9

10 $MSExcel->{cmdline} = '\path\to\excel.exe'

11 $MSExcel->start($MSExcel) unless($MSExcel->Exists());

12 $MSExcel->TypeKeys("<Ctrl-Home>");

13

14 @data = (

15 [1, 2, 3],

16 [4, 5, 6],

17 [7, 8, 9],

18 );

19 EnterData(@data);

20 $MSExcel->{Edit}->{GoTo}->Pick();

21 $MSExcelGoTo->{Where}->TypeKeys("C3<Enter>");

22 $MSExcel->TypeKeys("<Shift-Ctrl-Home>");

23

24 $MSExcel->{Insert}->{Chart}->{OnThisSheet}->Pick();

25 $MSExcel->Click(50,140);

26 $MSExcelChartWizard1->TypeKeys("<Enter>");

27 $MSExcelChartWizard2->TypeKeys("<Enter>");

28 $MSExcelChartWizard3->TypeKeys("<Enter>");

29 $MSExcelChartWizard4->TypeKeys("<Enter>");

30 $MSExcelChartWizard5->TypeKeys("<Enter>");

31

32 sub EnterData {

33 my @data = @_;

34 my ($row, $col);

35

36 foreach $row (@data) {

37 foreach $col (@$row) {

38 $MSExcel->TypeKeys($col . "<Tab>");

39 }

40 $MSExcel->TypeKeys("<Enter><Ctrl-Left>");

41 }

42 }

Line #7 loads up the application header file I am talking about. In this case, Excel is a huge application, and therefore takes a 1000 line header file. This header file describes the application in gory detail, and in this case, it helps to have an Excel Window up while you are looking at it:

ExcelSnapshot.PNG

Figure 12.4

Excel Snapshot, with header open

Here, we have the 'File' menu open, and hence we can see all the things we can do under the menu. And if you look at the Excel.ph header you see.

Listing 12.19 Excel.ph

1 $MSExcel = &refdef('MSExcel',

2 Tag=>'MainWin(Microsoft Excel.*)',

3 Menu0=>

4 {

5 Tag=>'Menu()[0]',

6 RestoreCtrlF5=> { Tag=>'MenuItem(Restore.Ctrl\+F5)' },

7 MoveCtrlF7=> { Tag=>'MenuItem(Move.Ctrl\+F7)' },

8 SizeCtrlF8=> { Tag=>'MenuItem(Size.Ctrl\+F8)' },

9 MinimizeCtrlF9=> { Tag=>'MenuItem(Minimize.Ctrl\+F9)' },

10 MaximizeCtrlF10=> { Tag=>'MenuItem(Maximize.Ctrl\+F10)' },

11 CloseCtrlW=> { Tag=>'MenuItem(Close.Ctrl\+W)', },

12 },

13 File=>

14 {

15 Tag=>'Menu(File)',

16 New=> { Tag=>'MenuItem(New.Ctrl\+N)', },

17 Open=> { Tag=>'MenuItem(Open.Ctrl\+O)', },

18 Close=> { Tag=>'MenuItem(Close)', },

19 Save=> { Tag=>'MenuItem(Save.Ctrl\+S)', },

20 SaveAs=> { Tag=>'MenuItem(Save As)', },

21 SaveWorkspace=> { Tag=>'MenuItem(Save Workspace)', },

22 Properties=> { Tag=>'MenuItem(Properties)', },

23 SharedLists=> { Tag=>'MenuItem(Shared Lists)', },

24 PageSetup=> { Tag=>'MenuItem(Page Setup)', },

25

26 PrintArea=>

27 {

28 Tag=>'Menu(Print Area)',

29 SetPrintArea=> { Tag=>'MenuItem(Set Print Area)' },

30 ClearPrintArea=>{ Tag=>'MenuItem(Clear Print Area)'},

31 },

32 PrintPreview=> { Tag=>'MenuItem(Print Preview)' },

33 Print=> { Tag=>'MenuItem(Print.Ctrl\+P)' },

34 Recent1=> { Tag=>'MenuItem(1 .*)' },

35 Recent2=> { Tag=>'MenuItem(2 .*)' },

36 Recent3=> { Tag=>'MenuItem(3 .*)' },

37 Recen4=> { Tag=>'MenuItem(4 .*)' },

38 Exit=> { Tag=>'MenuItem(Exit)' },

39 },

# much stuff missing….

801 EDTBX3=> { Tag=>'EDTBX()[3]' },

802 );

Notice how each entry in the hash here corresponds to an entry in the menu? Inside the Excel.ph file, the defined lines here correspond to the menu-items, windows, and keys in the Excel application - these entries are passed to a function called 'refdef' which then binds these 'Tags' to an action that will be taken if you run the special functions 'Pick()' or 'Typekeys()'.

In short, the point behind WinPerl is to define your windows application as a series of data structure. If you have menus, or sub-menus, or what have you, you can define them in terms of a hash as we did above.

And if you have many windows - as excel does - you can define multiple data structures, with multiple calls to refdef. Lets trace a couple of the key calls above, and turn them into what GuiDo is actually doing:

20 $MSExcel->{Edit}->{GoTo}->Pick();

21 $MSExcelGoTo->{Where}->TypeKeys("C3<Enter>");

In this case, we hit the $MSExcel object, with the request to pick the menu item 'GoTo' inside the menu 'Edit'. And then, in line 21, we hit the 'Where' menu inside the sub 'GoTo' menu, and type in the keys 'C3<Enter>'. This has the effect therefore, to go to the cell C3.

In short, if we take a look at what the application is doing as a whole, we get a screen diagram that looks something like Figure 21.5:

ExcelChart.PNG

Figure 21.5

Excel chart

We have therefore used GuiDo to draw an excel chart out of data that was stored in Perl. As you can see, this is very similar to the functionality that we saw above (with OLE) however, to me this is much more intuitive. After all, it tracks the actual way the application feels; you don't need to go to the (almost nonexistant) documentation to find the correct methods to use. You simply do it, and can see everything you can do.

Word With GuiDo

Now, lets look at the other major office application that we played around with up above, with OLE; Word. Word is just as adept at being controlled by 'sending keys to the keyboard'; there is even a module called 'Win::Clipboard' ( that controlls the Windows Clipboard) that could possibly be used to get text out of the application, and into perl for analysis!

Anyways, here's a small script that types a message out (of a pitifully bad joke of I'm afraid to say, my own devising), and does some cool things with the Fonts, areas, and so forth.

Listing 12.20 guidoWord

1 #!perl5.exe

2 require 5.003;

3 use WinPerl 1.0006002;

4 push(@INC,$1) if ($0 =~ /(.*)\\/);

5 require "MSWord.ph";

6

7 $MSWord->start($MSWord) unless($MSWord->Exists());

8 $MSWord->{Documents}->{Doc1}->{EditArea}->SetText

9 ("Why did the chicken cross the sea?");

10 $MSWord->{Edit}->{SelectAll}->Pick();

11

12 $MSWord->{Format}->{Font}->Pick();

13 $MSWordFont->{Font}->SetText("Playbill");

14 $MSWordFont->TypeKeys("<Enter>");

15

16 $MSWord->{Format}->{Font}->Pick();

17 $MSWordFont->{FontStyle}->SetText("Italic");

18 $MSWordFont->TypeKeys("<Enter>");

19

20 $MSWord->{Format}->{Font}->Pick();

21 $MSWordFont->{Size}->SetText("12");

22 $MSWordFont->TypeKeys("<Enter>");

23

24 $MSWord->{Documents}->{Doc1}->{EditArea}->TypeKeys

("<End><Enter>To get to the other tide.");

Never mind. (I can't believe that joke actually made it through my editor. . .) You can see again, that the way the perl application mimics the keystrokes to the application, and again how we needn't know anything about the application except for the actual use of it.

Summary of GuiDo

WinPerl++ (aka GuiDo) is a way of commandeering applications, and sending keystrokes to them. Right now, it is only available on NT/Win95, but there is a concerted effort to make it an OS independent. When that happens, WinPerl++ could become a very good way of making what are called regression tests : where you come up with several possible scenarios to try to break a given application.

To use guido, you

1) 'use WinPerl;'

2) define a application header file, which describes your application in detail.

3) Send keystrokes to the application as defined inside the application header file.

For more information, I would go to the WinPerl++ online documentation.

Small Note: since it is going to be platform independent, Todd Bradfute and Jeff Townsend are looking for a new name besides WinPerl++ (they are quite happy with GuiDo). If you go to:

http://www.pflugerville.org/guido/

you can suggest a name for the development environment ( they are staying away from Win - my arguments that you could make it platform independent by putting an 'X' in front of win, hence 'XWinSight' as a good name were to no avail).

LibWWW examples

By now, about anybody who has to do with computers chokes on the incredibly ugly term 'Information SuperHighway'. For one thing, the term is misleading; web is a better term (also overused), 'mesh' might be more accurate.

But in any case not only do the news people have exceedingly bad taste for metaphors, they are also missing out on about 80% of the web as they know it.

The popular media concentrates on what they see on the web; home pages, people glued in front of computers, surfing, and so forth. However, nobody considers how the information that people are looking at actually got there in the first place.

This is the second side of the web-puzzle. For if you can't collect information, you can't make an effective web page. And if you can't make an effective web-page, then there is no web.

In perl, LibWWW is the primary way that you would collect that information. LibWWW deals with the automation of collection of data off of the web, and all that entails.

With LibWWW you can easily make a web crawler. You can easily download files from ftp sites on the web. You can easily extract http tags from any web page you come across, and you can easily make your applications interact with the web for good effect. A lot of this is more simple than people think - here are some small examples on how to do it:

Getting a html page from the web

If you go to any reasonable browser like netscape, you can see the source of a document by saying 'View' => 'Document Source'. This then gives you an overview of what 'html' is actually causing the page that you are witnessing. Here is a small perl script which does exactly the same thing:

Listing 12.21 gethtmlpage.p

1 #!/usr/local/bin/perl5

2

3 use LWP::UserAgent;

4 $ua = new LWP::UserAgent;

5 ($address,$params)=split(/\?/,$ARGV[0]);

6

7 $ua->agent('Mozilla/3.0');

8 my $req = new HTTP::Request 'POST',$address;

9 $req->content_type('application/x-www-form-urlencoded');

10 $req->content($params);

11

12 my $res = $ua->request($req);

13 print $res->as_string;

Here, we do exactly the same thing. In line 4, we make a new 'UserAgent'. This is an object that is like a portable 'mini-browser', only again, it is totally automated. In line 7, we lie to the user agent, telling it that it is in fact a 'Netscape' browser. (Some webpages look to see what browser is hitting them)

Then, we tell what type of request we are sending (a form, lines 8 through 10). We then pass that information to the user agent (line 12) which then returns back a string, which we print.

Easy enough! If you say something like:

C:\> perl gethtmlpage.p "http://www.dejanews.com"

This will print out:

<html>

<head>

<title> Deja News</title>

<meta name="keywords" content="dejanews usenet newsgroup articles search query">

</head>

<body background="/gifs/bgbar_gw.gif" bgcolor="#ffffff" text="#000000" link="#0000ff" vlink="#52188c" alink="#ff0000">

# much deleted.

In other words, the http://www.dejanews.com home page. Now the question is, how do we manipulate this data. Well, we do one form of manipulation below, when we parse this information to get http 'tags'.

getting http tags from a web form

To get the http tags from a form, we exploit one of perl's greatest strengths: its compilcated regular expressions. After all, when you think about it a tag in a form looks something like:

<a href="http://www.dejanews.com">

where the tag is the item inside the double quotes. All we have to do to extract this is make a regular expression that matches double quotes, preceed that by a 'HREF', and then print the code out. The code looks something like this:

Listing 12.22 simplehtmlparse.p

1 #!/usr/local/bin/perl5

2

3 use LWP::Simple;

4

5 my $doubleparen = q{"(([^\\\\"]|\\\\")*)"};

6

7 print $doubleparen;

8 die "Need to supply an argument!\n" if (!defined $ARGV[0]);

9 $doc = get($ARGV[0]);

10

11 while ($doc =~ m/HREF=$doubleparen/sgi)

12 {

13 print ":$1:\n";

14 }

The regular expression is in line 5 - and it should look familiar for we already covered it earlier in the chapter on regular expressions.

Line 9 is an even simpler way of getting an html page than we covered earlier; all you do is say 'get(<url>)' and you've got the text of that URL inside '$doc'.

Lines 11 through 13 then cycle through the document and print out the http tags that they find.

These are fairly simple examples, but I hope you can see how valuable they are at extracting information off of the web. In fact, a web-crawler is really only a fancified version of a program that has three steps:

1) extract the text - and html tags - off of one url

2) save the text of that url into a database

3) repeat steps one and two for all html tags that we find.

We have already done steps 1 and 2; all we would have to do is take the stuff that we have programmed above, and put it into a loop, and we would have a web crawler.*

Be careful if you are going to try making a web crawler. You can easily overload a system by sending out too many requests, and you might end up being banned if you don't introduce delays in your program (via sleep statements and so forth.)

In addition, it seldom is very useful to start a general web crawler like this; sites like lycos and yahoo have already done it for you in a way that is far more extensive than a program of yours could ever do. The trick is selective searching - web masters are not happy if someone makes a web crawler that is CPU intensive, and you could get into a lot of trouble. You have been forewarned.

A 'formbuilder' parsing through an online form for user information

We are now in a position to make a more complicated program. The one that we choose to make is called 'formbuilder.p' and we will use it to actually make it so we can use the two simple scripts that we described above (simplehtmlparse.p, gethtmlpage.p).

The trick to how this script is going to be so useful is in the format which CGI scripts use to 'pass arguments' around. If you are working on Unix or NT, you can say:

C:\> perl script.p 1 2 3

and have your 'argument stack' (@ARGV) end up looking like:

@ARGV = (1, 2, 3);

Well, the 'arguments' that CGI receives are in hash value pairs, separated by a '?'. Next time that you go to lycos, or yahoo, take a look at what happens when you enter something into the 'search' window:

Search: perl script

The resulting 'Location:' header will then look something like this:

Location: http://search.yahoo.com/bin/search?p=perl+script

This is the server 'translating' your search into an html tag. The application search is free to use the 'p=perl+script' information in any way it chooses. Yahoo so chooses to make it search its database for anything that has the keywords 'perl' and 'script' in it.

Now the question is - how do you find the correct arguments to any given script on the web? After all, if we wanted to, we could bypass the 'Search' web page on yahoo, if we in the location 'http://search.yahoo.com/bin/search?p=perl+script ourselves.

The script has no way of knowing that the request came from the 'search' page, or from you or your script - hence, if we could 'find' the correct arguments to pass to 'search' - or any other script - we could bypass many forms on the web, and automate the process quite a bit.

The trick is to look at the yahoo page itself for clues. If we look at the url 'http://www.yahoo.com', we see the following lines:

<form action="http://search.yahoo.com/bin/search">

<input size=30 name=p>

The first 'action' tag is the script that gets executed if we complete the form, and press submit. The second is a named parameter, the one we are searching for. Both are used to combine into something like:

http://search.yahoo.com/bin/search?p=perl+script

for a valid tag that can be passed to a browser, or be automatically called by a script. Below is a script which handles much of the drudgery of looking at forms and figuring out the arguments to pass to them.

Listing 12.23 formbuilder.p

1 #!/usr/local/bin/perl5

2

3 use strict;

4

5 my $doubleparen = q#(?:[^\\"]|\\\\")*#;

6 my $noparen = q#[^>=\s]+#;

7 my $selecttag = q#<SELECT.*?</SELECT>#;

8 my $inputtag = q#<INPUT .*?>#;

9 my $form = q#<FORM .*?>#;

10

11 use LWP::UserAgent;

12 my $ua = new LWP::UserAgent();

13

14 $ua->agent("Mozilla/3.0");# we need to do this to get the 'right' web page

15 # (the default gives you a sparse webpage sometimes)

16

17 print "Web Site: $ARGV[0]\n";

18 print "------------------\n";

19 my $req = new HTTP::Request("GET" => "$ARGV[0]");

20 $req->header("Accept" => "text/html");

21

22 my $res = $ua->request($req);

23

24 if ($res->is_success)

25 {

26 _buildForm( $ARGV[0], $res->content());

27 }

28 else

29 {

30 print "Error: " . $res->code() . " " . $res->message();

31 }

32

33 sub _buildForm

34 {

35 my ($formname, $wholeform) = @_;

36

37 my ($formline) =

38 ($wholeform =~ m{

39 ($form) # form name

40 }six

41 );

42

43 my @selects =

44 ($wholeform =~ m{

45 ($selecttag) # getting all selects

46 }sgix

47 );

48

49 my @inputs =

50 ($wholeform =~ m{

51 ($inputtag)

52 }sgix

53 );

54

55 my $action = _parseForm ( $formname, $formline );

56 my $selectKeyValues = _parseSelect( $formname, \@selects );

57 my $inputKeyValues = _parseInput ( $formname, \@inputs );

58

59 print <<"EOL"

60 Script Name:

61 $action

62

63 Actions:

64 @$selectKeyValues

65 @$inputKeyValues

66 EOL

67 }

68

69 sub _parseForm

70 {

71 my ($formname, $formline) = @_;

72

73 if ($formline =~ m#.*?ACTION.*?=.*?("$doubleparen"|$noparen)#si)

74 {

75 my $scriptname = $1;

76 return($scriptname);

77 }

78 else

79 {

80 print "WEIRDNESS!\n";

81 return('NO_SCRIPT');

82 }

83 }

84

85 sub _parseSelect

86 {

87 my ($formname, $selects) = @_;

88

89 my ($select, $name, $values, $return) = ('','',[],[]);

90 local ($") = ",";

91 foreach $select (@$selects)

92 {

93 ($name) =

94 ($select =~ m{

95 SELECT\s*NAME.*? # gets junk before name

96 ("$doubleparen"|$noparen) # gets the actual name

97 }six

98 );

99

100 @$values =

101 ($select =~ m{

102 <.*?VALUE.*? # gets junk before value

103 ("$doubleparen"|$noparen) # gets actual value

104 .*?> # gets junk after value

105 }sgix

106 );

107 push (

108 @$return, sprintf("\tName: %-.20s\tValues: %-20s\n",

109 $name,"@$values")

110 );

111 }

112 return($return);

113 }

114

115 sub _parseInput

116 {

117 my ($formname, $inputs) = @_;

118

119 my ($input, $name, $default_value,$return) = ('','','',[]);

120 foreach $input (@$inputs)

121 {

122 next if ($input =~ m#TYPE\s*=\s*("submit"|submit\s*)#i);# skip submit tags

123

124 ($name) =

125 ($input =~ m{

126 .*?NAME\s*=\s*

127 ( "$doubleparen"|$noparen)

128 }six

129 );

130

131 ($default_value) =

132 ($input =~ m{

133 .*?VALUE\s*=\s*

134 ("$doubleparen"|$noparen)

135 }six

136 );

137

138 push

139 (

140 @$return,

141 sprintf (

142 "\tName: %-.20s\tValues: %-20s\n",

143 $name, $default_value

144 )

145 );

146 }

147 return($return);

148 }

This is getting kind of complicated, so lets draw another functional diagram. Figure 12.6 shows this functional diagram in action:

126.fig

Figure 12.6

Functional Diagram for the formbuilder

Here again, the key is regular expressions. The function _buildForm( ) parses out the whole html document into several pieces of text:

1) the action that the form will take when executed ( the regular expression m"<FORM .*?>")

2) the input tags that we mentioned above ( the regular expression m"<INPUT .*?>

3) in the 'radio buttons that occur in forms ( the regular expression m"<SELECT.*?</SELECT>" )

Each of these regular expressions key for places that we can find the arguments that can be passed to CGI scripts. This is the first step. The 'rough text' we get from matching for these three regular expressions we now send to _parseForm(), _parseSelect(), and _parseInput() for processing.

And finally, in lines 59-66, we print out the output. This shows the arguments we can pass, at a best guess, to a given script. When we run this script for 'http://www.lycos.com' we get something like:

Web Site: http://www.lycos.com

-------------------------------

Script Name:

/cgi-bin/pursuit

 

Actions:

Name: cat Values: lycos,sounds,graphics,point,geo,ups,bn,stockfind

Name: matchmode Values: and

Name: query Values: ""

So now, we can use this information to build a reasonable url to do searches with. If we say something like:

http://www.lycos.com/cgi-bin/pursuit?cat=lycos&matchode=and&query=perl

this is then a legal url for getting all the information on perl from Lycos.

This is not the last time we have seen this script. We shall see it below when we use it to build a 'meta-web browser'.

CGI Examples

Both fortunately, and unfortunately, perl has been saddled as being a 'web' language. Yes, it is true perl is the most popular scripting language for web use, and yes, perl can interface with java, and yes perl has been integrated into web servers like IIS and Apache (for performance reasons…)

But this popularity tends to shadow perl's uses in other locations, which I hope that this chapter is showing! Companies do themselves a great disservice by only having perl be used in 'one area' (the CGI area) and ignoring the rest of the places.

One can't deny it though. One of the primary uses of perl is on the web, and any book calling itself 'Perl Complete' has to give at least a few examples of CGI GUI's in action.

So here goes. We will start off with a small example ( a web counter), move up to a medium sized example ( a meta-browser) , and finally, finish with a relatively large example ( a bug tracker).

an embeddable web-counter (code adapted from F. Cusmano. Thanks to Liang Gu)

Our first example is one that shows up in many different contexts: a counter to count the accesses of a certain web-page. Although, in ways, it has been done to death ( the example that is ) it shows many important things about how CGI and html can work together.

The first thing that we need to look is the HTML that we actually embed the counter in. Listing 12.24 shows the sample web page that we are going to use:

Listing 12.24 counterinfo.html

1 <HTML>

2 <HEAD>

3 <TITLE>Interesting places to find counters</TITLE>

4 </HEAD>

5 <body bgcolor=#DDDDDD text =#000000">

6 <left>

7 <h2> List of sites to get more sophisticated counters from: </h2>

8 <ul>

9 <li><h3><a href = "http://www.worldwidemart.com/scripts"> Matt's Script Archive</a>

10 <li> <a href = "http://www.counter.com"> counter.com </a>

11 <li> <a href = "http://www.countestatsr.com"> counterstats.com </a>

12 <li> <a href = "http://www.countmania.com"> countmaina.com </a>

13 <li> <a href = "http://www.freecount.com"> freecount.com </a>

14 <li> <a href = "http://www.jinko.com/counters"> Jinko mania </a>

15 <li> <a href = "http://www.freecount.com"> countmaina.com </a>

16 <li> <a href = "http://www.digits.com"> Web counter</a>

17 </ul>

18 <img src="/cgi-bin/fcount.cgi">

19 </body>

20 </html>

This page is simply a list of links that show, well, more sophisticated examples of web counters than we do here. The key to our example though, is in line #7. Line #7 is where we will get our web counter; the script '/cgi-bin/fcount.cgi' will get the webcounter picture for us.

But how does html recognize that we aren't trying to access '/cgi-bin/fcount.cgi itself as a picture? How does it interpret it as a script instead?

The key here is by context. cgi-bin is by definition for the webserver, a place where you keep binaries. And therefore you can use the binary to generate, or return pictures - whatever you would like.

'fcount.cgi' then, has to return a bitmap that a users browser can decode as a picture. Below is fcount.cgi. Notice that the picture it produces isn't exactly the greatest:

Listing 12.25 fcount.cgi

1 #!/usr/local/bin/perl

2

3

4

5 use CGI;

6

7 # create an array of digit bits for the XBM

8 #use the hex code (2chars) to define each line of the digits (working on graph paper helps) Remember bits dispalyed lsb 1st

9

10 # 00 11 22 33 44 55 66 77 88 99

11 my $digs =

12 [

13 [qw ( 00 00 00 00 00 00 00 00 00 00 )],

14 [qw ( 00 00 00 00 00 00 00 00 00 00 )],

15 [qw ( 3c 30 3c 3c 30 7e 38 7e 3c 3c )],

16 [qw ( 66 38 66 66 30 06 0c 66 66 66 )],

17 [qw ( 66 30 60 60 38 06 06 60 66 66 )],

18 [qw ( 66 30 60 60 38 06 06 60 66 66 )],

19 [qw ( 66 30 30 60 34 06 06 60 66 66 )],

20 [qw ( 66 30 30 38 34 3e 3e 30 3e 66 )],

21 [qw ( 66 30 18 38 32 60 66 30 3e 7c )],

22 [qw ( 66 30 0c 60 32 60 66 30 66 60 )],

23 [qw ( 66 30 0c 60 7e 60 66 18 66 60 )],

24 [qw ( 66 30 06 60 30 60 66 18 66 60 )],

25 [qw ( 66 30 06 60 30 66 66 18 66 60 )],

26 [qw ( 3c 7c 7e 3c 30 3c 3c 18 3c 3c )],

27 [qw ( 00 00 00 00 00 00 00 00 00 00 )],

28 [qw ( 00 00 00 00 00 00 00 00 00 00 )]

29 ];

30

31 if (!($fname = $ENV{'QUERY_STRING'})) { $fname = "fcount"; }

32

33 my ($fcnt, $flog, $cnt) = ( "$fname.cnt", "$fname.log", 0 );

34

35 if (open(FCNT, "<$fcnt")) { $cnt = <FCNT>; close(FCNT); }

36 if (++$cnt > 9999) { $cnt = 1; }

37 if (open(FCNT, ">$fcnt")) { print FCNT $cnt; close(FCNT); }

38

39 my @out = split('', sprintf("%04d", $cnt));

40

41

42 print "Content-type: image/x-xbitmap\n\n";

43 print "#define count_width 32\n";

44 print "#define count_height 16\n";

45 print "static char count_bits[] = {\n";

46

47 for ($ii = 0; $ii < 16; $ii++)

48 {

49 my $digit;

50 foreach $digit (@out) { print"0x$digs->[$ii][$digit]," }

51 }

52 print "};\n";

53

54 #now do the log file

55 if (open(FLOG, ">>$flog"))

56 {

57 $dd = localtime(time);

58 $ra = $ENV{'REMOTE_ADDR'};

59 $rh = $ENV{'REMOTE_HOST'};

60 $hu = $ENV{'HTTP_USER_AGENT'};

61 printf FLOG "$dd | $ra | $rh | $hu\n";

62 close(FLOG);

63 }

64

65 print end_header();

66 print end_html();

Lines 11-29 define the bitmap that we are going to use. Word quoting (qw) comes in quite handy here. Otherwise there would be tons of extra quote marks (['00','00','00','00'] - yuck.). And lines 35 - 37 get the number of users that have accessed the website. We increment that, and then in lines 42-51 proceed to generate the XBM that is going be printed out by the 'image =' directive.

And then, we do one more thing. Using the REMOTE_ADDR, REMOTE_HOST, and HTTP_USER_AGENT variables, we save a record into a log, of each site that has accessed the page, and what browser they were using. In lines 65 and 66, we then end the form.

When it is all done, the combination of webcounter and script make a page that looks something like Figure 12.7:

counter_info.PNG

Figure 12.7

Counter plus links to more elaborate counters

Well, not much, but by following the links on that page, you could probably find a counter to suit your needs.

a 'meta-webcrawler': using formbuilder to build a script (thanks to Alan Switzer)

Ok, lets go up a level (two levels?) in difficulty, and look at building a fairly complicated web page. Remember our formbuilder script? Well, with a little elbow grease, we can use the output of our formbuilder to build what is called a meta browser. Meta browsers don't do much in themselves, except query other browsers to see what they know, and display their information in a easy to see way.

Well, if you remember from our discussion up above, the way to make an engine search was to send the script that did the searching a certain string of arguments. For yahoo, the script/search string combo looked like:

http://search.yahoo.com/bin/search?p=perl+script

So the first thing we need to do is collect a list of these script/search string combos. We do this by running formbuilder:

prompt% formbuilder.p http://www.lycos.com

Web Site: http://www.lycos.com

------------------

Script Name:

/cgi-bin/pursuit

 

Actions:

Name: cat Values: lycos,sounds,graphics,point,geo,ups,bn,stockfind

 

Name: matchmode Values: and

Name: query Values: ""

Name: Values:

 

prompt% formbuilder http://www.altavista.com

Web Site: http://www.altavista.com

------------------

Script Name:

"http://www.altavista.digital.com/cgi-bin/query"

 

Actions:

Name: what Values: web,news

Name: fmt Values: ".",c,d

 

Name: pg Values: q

Name: q Values: ""

 

From here, we can construct http tags that will work on pretty much any search engine - we choose 'www.yahoo.com', 'www.altavista.com', 'www.excite.com', 'www.lycos.com', 'www.infoseek.com', and 'www.metacrawler.com'. So in a sense, we are making a 'meta-meta-crawler' script, because 'www.metacrawler.com' does exactly the same thing we are trying to do!

Anyway, that forms the heart of our script. The rest of it is listed below; it becomes a bit of a mouthful to do this so that the usage feels right:

Listing 12.26 meta.cgi

1 #!/usr/local/bin/perl5

2

3 # meta.cgi - this perl script demonstrates how to create a cgi program that

4 # can access multiple search engines on the web.

5

6 use CGI qw(:all);

7 use CGI::Carp qw (fatalsToBrowser confess);

8 use strict;

9

10 # Define JavaScript function for hit number validation

11 my $jscript=<<END;

12 <!-- Hide script

13

14 function validateHits(form)

15 {

16 var hitnum = parseInt(form.hits.value);

17 if (isNaN(hitnum))

18 {

19 alert("Please choose an integer number of hits!");

20 form.hits.focus();

21 form.hits.select();

22 return false;

23 }

24 return true;

25 }

26 // End script hiding -->

27 END

28

29 # Create the query CGI object

30 my $query = new CGI;

31

32 # Create an html document

33 print $query->header;

34 print $query->start_html

35 (

36 -title => 'THE META SEARCH ENGINE',

37 -author => 'switzer.alan@decatech.com',

38 -script => $jscript,

39 -BACKGROUND => 'images/background.jpg'

40 );

41

42 # Determine if script called with argument or not...

43 if (param())

44 {

45 my $subject = $query->param('subject');

46 my $hits = $query->param('hits');

47 my $engines = $query->param('engines');

48

49 print $query->h1('META SEARCH RESULTS:');

50

51 if ($subject)

52 {

53 # Display requested query string and number of hits for the search

54 print

55 "Your subject is: <B>", em($query->param('subject')), "</B>", hr(),

56 "You wanted to find <B>", em($query->param('hits')), "</B> hits", hr();

57

58 # Save array of information for each search engine's characteristics

59 my @search_engines =

60 (

61 {

62 name => "Yahoo",

63 url => "http://search.yahoo.com",

64 tag => "/bin/search?p=$subject",

65 img => "images/yahoo.gif",

66 txt => "Yahoo hits for \"<B>$subject</B>\""

67 },

68 {

69 name => "Lycos",

70 url => "http://www.lycos.com",

71 tag =>

72 "/cgi-bin/pursuit?cat=lycos&query=$subject&matchmode=and",

73 img => "images/lycos.gif",

74 txt => "Lycos hits for \"<B>$subject</B>\""

75 },

76 {

77 name => "Alta-Vista",

78 url => "http://www.altavista.digital.com",

79 tag =>

80 "/cgi-bin/query?what=web&fmt=.&pg=q&q=$subject",

81 img => "images/alta_vista.gif",

82 txt => "Alta-Vista hits for \"<B>$subject</B>\""

83 },

84 {

85 name => "Infoseek",

86 url => "http://www.infoseek.com",

87 tag =>

88 "/Titles?col=WW&qt=$subject&sv=IS&lk=noframes&nh=$hits",

89 img => "images/infoseek.gif",

90 txt => "Infoseek hits for \"<B>$subject</B>\""

91 },

92 {

93 name => "Excite",

94 url => "http://www.excite.com",

95 tag => "/search.gw?trace=a&search=$subject",

96 img => "images/excite.gif",

97 txt => "Excite hits for \"<B>$subject</B>\""

98 },

99 {

100 name => "Metacrawler",

101 url => "http://www.metacrawler.com",

102 tag => "/crawler?general=$subject&method=0&region=&rpp=20&timeout=10&hpe=$hits",

103 img => "images/metacrawler.gif",

104 txt => "Metacrawler hits for \"<B>$subject</B>\""

105 }

106 );

107

108 print "<table align=center>\n";

109

110 my $search;

111 foreach $search (@search_engines)

112 {

113 my ($yc, $xc) = (100,40);

114 my ($img, $url, $tag, $txt) =

115 (

116 $search->{'img'}, $search->{'url'},

117 $search->{'tag'}, $search->{'txt'}

118 );

119

120 if ($engines eq "All")

121 {

122 # Create links to all search engine results

123 print <<"EOF_PRINT";

124 <tr>

125 <td align=center>

126 <a href= "$url$tag">

127 <img src="$img" align=center border=2 height=$xc width=$yc></a>

128 </td>

129 <td>

130 <a href= "$url$tag">

131 $search->{'txt'}</a>

132 </td>

133 EOF_PRINT

134 }

135 elsif ($engines eq $search->{'name'})

136 {

137 # Create a link to the specific search engine the user

138 # asked for

139 print <<"EOF_PRINT";

140 <tr>

141 <td align=center>

142 <a href= "$url$tag">

143 <img src="$img" align=center border=2 height=$xc width=$yc></a>

144 </td>

145 <td>

146 <a href= "$url$tag"></a>

147 $txt

148 </td>

149 </tr>

150 EOF_PRINT

151 last;

152 }

153 }

154 # End the table

155 print "</table>\n";

156

157 # Provide a link back to the main page

158 print p(),

159 $query->a({href=>"meta.cgi"}, "Return to META SEARCH home");

160 }

161 else

162 {

163 # User didn't input a query string, create page to tell them

164 print "You submitted an empty query.", p(),

165 "Hit ", $query->b("back"), " on your",

166 " browser, or click below and ", $query->i("try again"), "!", p(),

167 $query->a({href=>"meta.cgi"}, "Return to META SEARCH home");

168 }

169 }

170 else

171 {

172 # Main/Start page creation

173 # Create a form to ask user for query, number of hits, and search

174 # engines to use

175 print $query->h1('META SEARCH ENGINE'),

176 start_form(-name=>'form1', -onSubmit=>"return validateHits(this)"),

177 "<table>",

178 Tr(), td(), "What subject do you want to look for?",

179 td(), $query->textfield('subject'),

180 p(), Tr(), td(), "How many hits do you want to find? ",

181 td(), $query->textfield(-name=>'hits',-default=>'20'),

182 p(), Tr(), td(), "Your choice of search engines?",

183 td(),

184 $query->radio_group(-name=>'engines',

185 -values=>['All', 'Yahoo', 'Lycos', 'Alta-Vista',

186 'Infoseek', 'Excite', 'Metacrawler'],

187 -default=>'All', -linebreak=>'true'),

188 "</table>", "<center>", $query->submit(), "</center>",

189 p(), end_form();

190 }

191

192 # End the html document

193 print end_html();

Although this looks quite complicated, the logic of it is fairly simple. There are two main pages that we create. The first one, is created in lines 171-190 if the user gives no arguments to the script. This is shown in Figure 12.8:

main.png

Figure 12.8

Meta search engine home page.

We give the user a series of questions in lines 178 through 183; in particular, we are constructing a table so the meta-search engine looks fairly respectable. The tr() and td() functions define table rows, and table pieces of data respectively, and the 'radio_group' function provides the group of buttons that actually determine which sites the perl script is going to 'seek out and find information on'.

We also choose a picture 'background in line 39, as we kind of get sick of the grey that comes as a default with browsers, and we set up a 'callback' that determines whether or not the person has entered a valid number of hits in line 176.

When a person hits the 'submit query' button, the html page then uses the Java script up top (lines 11-27) to look at the value for the 'hits' parameter. If it is null (line 17) we call the alert function in Java script (line 19) which puts up the following warning:

alert.png

Figure 12.9

alert shown on the screen signifying null string

Now, if the user has entered legal values, they are used to generate the html page which shows links to browsers which you can then use to process your query. It works like this:

1) the values that you enter into the meta search engine home page, become 'parameters' to the meta.cgi command line. These parameters are then stored, and meta.cgi forks off a copy of itself.

When line 43 is hit (the check for param() or parameters) this statement then is true - there are parameters. (when you first started this example, this statement was false). Then, perl goes to work taking the parameters out of the command line (lines 45-48), and creating a webpage out of the results. In a critical loop (lines 111-153) we create the links to the actual search engines that we are interested in. This web page looks like Figure 12.10:

results.png

Figure 12.10

Results of the search.

So there you have it. If you now hit on any one of these links, the 'meta browser' will pass these links on to the respective webcrawler, which will then find the subject 'perl' for you, and print out as many hits as you requested.

a bug tracker - a maintainable application for communicating bugs to team members ( thanks to Sally Thompson )

The last example showed something that is very peculiar to programming in CGI: it is essentially a 'stateless' environment. We talk about this in great detail when we get to our 'main' CGI programming project in chapter 24 (porting scripts from a shell to making them GUI's on the web). But for now, simply realize what this means.

The only information that any CGI script can get is from the command line. Hence, if you take the time to make a huge structure in one window, and then forget to 'unpack' this structure and put it on the command line, perl will remember nothing about the structure.

Each time you hit that submit button, whatever you have built up in memory is gone, because perl is firing off a new instance of itself and wiping the slate clean so to speak.

There are several ways to deal with this limitation:

1) ignore it, and pass long command lines.

2) Use a permanent data structure to hold the information you have accumulated, and then load it back into memory each time a person hits 'submit'.

3) Use a permanent instance of perl. Find a method to make it so the perl interpreter 'stays around' after the person hits submit.

Number 1 can get silly, especially if you have lots of data you want to share between pages. Number 2 is done fairly frequently - and will be done in the following script. It's a 'fair' (ie: workable) solution.

But if you can do it, Number 3 is the best way of approaching the problem. You have no overhead for starting perl, and you can keep a much larger website than otherwise. Two major servers support an embedded perl: Apache, and IIS, and you may want to check out their homepages. * (www.apache.com and www.microsoft.com)

The interpreter that is embedded into apache is called 'mod_perl' and it is really cool. It lets you embed perl syntax exactly as if it were JavaScript.

Here's a script that shows how one might create a 'bugtracker', where people enter in bugs, and get assigned bugs from a centralized source. CGI is really adept at this because it can be extremely centralized, moreso than any GUI. For example, a script like this could be used to make a 'distributed' development team that only communicated through email, and the passing of bugs into a centralized database via browser.

Listing 12.27 bugtrack.p

1 #!/home/epeschko/perl5.004_50/install/bin/perl

2

3 use CGI;

4 use CGI::Carp qw (fatalsToBrowser confess);

5 use Data::Dumper;

6 use FileHandle;

7 use Date::Format qw(time2str);

8 use Date::Parse qw(str2time);

9

10 use strict;

11

12 my $JSCRIPT=<<EOF;

13 function do_submit()

14 {

15 document.forms[0].submit();

16 }

17 EOF

18 ;

19 select(STDOUT); $|=1;

20 my $query = new CGI;

21

22

23 print $query->header;

24 print $query->start_html

25 (

26 -title=> 'Defect Track',

27 -script=> $JSCRIPT,

28 -author=> 'sthompso\@elmer.tcinc.com'

29 );

30

31 print $query->startform

32 (

33 -name=> 'form2',

34 -action=> 'bugtrack.p',

35 -method=> 'get'

36 );

37 my @severity=

38 (

39 'Catastrophic',

40 'No Workaround Avail',

41 'Workaround Exists',

42 'Cosmetic',

43 'Enhancement'

44 );

45

46 my @defectStat =

47 ('New', 'In Progress','Complete');

48

49 my @phase=

50 ('System','Integration','Production');

51

52 my @area=

53 (

54 'Software Tools',

55 'User Software',

56 'Scripts',

57 'Data',

58 'Documentation',

59 'Installation'

60 );

61

62 my @_fields =

63 qw (date defect_id status area short_desc long_desc severity phase);

64

65 my $defect_data = setup_defect_data();

66

67 main();

68

69 sub main

70 {

71 if ($query->param("submit_defect") )

72 {

73 save_defect_data($query);

74 $defect_data = setup_defect_data();

75 }

76

77 if ($query->param("submit") )

78 {

79 &display_defects($defect_data);

80 }

81 else

82 {

83 &search_defects($defect_data);

84 }

85 print $query->endform;

86 }

87

88

89 sub display_defects

90 {

91 my ($defect_data) = @_;

92 my ( $defect_id, $data, $date, $status, $area,

93 $short_desc, $long_desc, $severity, $phase,

94 $timeval);

95

96 my ($i) = -1;

97 my ($radio_button);

98

99 while ($date= $query->param('date.' . ++$i))

100 {

101 my $radio_button = $query->param('select.' . $i);

102

103 if ($radio_button eq 'on')

104 {

105 $defect_id = $query->param("defect_id.$i");

106 $data = $defect_data->{$defect_id};

107 $date=$data->{date};

108 $status=$data->{status};

109 $area=$data->{area};

110 $short_desc=$data->{short_desc};

111 $long_desc=$data->{long_desc};

112 $severity=$data->{severity};

113 $phase=$data->{phase};

114 last;

115 }

116 }

117

118 if (!defined $data)

119 {

120 $defect_id =_pick_defect_id($defect_data);

121 $timeval= time();

122 $date = localtime($timeval);

123 $date = substr($date, 4, 6) . ", " . substr($date, 20, 4);

124 $status='NEW';

125 $area='';

126 $short_desc = '';

127 $long_desc='';

128 $severity='';

129 $phase='';

130 }

131

132 print "<HTML><H1><center>Defect Track</center></H1>";

133 print "<HTML><H2><center>Entry Screen</center></H2>";

134

135 print "<P><b>Date</b>",

136 $query->textfield

137 (

138 -name=>'date',

139 -value=>$date,

140 -size=>20

141 );

142

143 print "<b>Defect Id</b>",

144 $query->textfield

145 (

146 -name=> 'defect_id',

147 -value=> $defect_id,

148 -size=> 20

149 );

150

151 print "<b>Ticket Status</b>",

152 $query->scrolling_list

153 (

154 -name=> 'status',

155 -values=> [@defectStat],

156 -default=> $status,

157 -size=> 1

158 );

159

160 print "<P><b>Short Description</b>",

161 $query->textfield

162 (

163 -name=> 'short_desc',

164 -value=> $short_desc,

165 -size=> 80

166 );

167

168 print "<P><b>Phase Found in</b>",

169 $query->scrolling_list

170 (

171 -name=>'phase',

172 -values=>[@phase],

173 -default=>$phase,

174 -size=>1

175 );

176

177 print "<P><b>Defect Severity</b>",

178 $query->scrolling_list

179 (

180 -name=>'severity',

181 -values=>[@severity],

182 -default=>$severity,

183 -size=>1

184 );

185

186 print "<b>Area Affected</b>",

187 $query->scrolling_list

188 (

189 -name=>'area',

190 -values=>[@area],

191 -default=>$area,

192 -size=>1

193 );

194

195 print "<P><b>Long Description</b>",

196 $query->textarea

197 (

198 -name=>'long_desc',

199 -default=>$long_desc,

200 -rows=>10,

201 -columns=>80

202 );

203

204 $query->delete('submit');

205

206 print "<center>",

207 $query->submit

208 (

209 -name=>'submit_defect',

210 -value=>'Submit'

211 ),

212 "</center>";

213 }

214

215 sub search_defects

216 {

217 my ($defect_data) = @_;

218 my ($srchText);

219

220 print "<HTML><H1><center>Defect Track</center></H1>";

221 print "<HTML><H2><center>Search Screen</center></H2>";

222

223 print "<P><b>Enter Search Text</b>",

224 $query->textfield

225 (

226 -name => 'srchText',

227 -size => 20,

228 -onChange => "do_submit()",

229 -value => $srchText

230 );

231

232 print "<center><table border>";

233 print "<caption><H4><b>Existing Defects</b></H4></caption>\n";

234 print "<th>Select</th>\n";

235 print "<th>Date</th>\n";

236 print "<th>Defect Id</th>\n";

237 print "<th>Area</th>\n";

238 print "<th>Defect Description</th>\n";

239 print "<P>\n";

240

241 my $i = 0;

242 my ($defect_id, $data, $date, $area, $short_desc);

243

244 foreach $defect_id (keys %$defect_data)

245 {

246 $data = $defect_data->{$defect_id};

247 $date=$data->{date};

248 $area=$data->{area};

249 $short_desc=$data->{short_desc};

250 my $search = $query->param('srchText');

251 if ($query->param('srchText') && $short_desc !~ /$search/ig) {next};

252

253 print "<tr>";

254 print "<td>",

255 $query->checkbox(-name=>"select.$i",

256 label=>'',

257 -size=>1),

258 "</td>";

259

260 print "<td>",

261 $query->textfield(-name=>"date.$i",

262 -value=>$date,

263 -size=>9),

264 "</td>";

265 print "<td>",

266 $query->textfield(-name=>"defect_id.$i",

267 -value=>$defect_id,

268 -size=>6),

269 "</td>";

270 print "<td>",

271 $query->textfield

272 (

273 -name=>"area.$i",

274 -value=>$area,

275 -size=>14

276 ),

277 "</td>";

278 print "<td>",

279

280 $query->textfield

281 (

282 -name=>"short_desc.$i",

283 -value=>$short_desc,

284 -size=>50

285 ),

286 "</td>";

287 $i++;

288 print "</tr>";

289 }

290

291 print "</table></center>\n";

292 print "<center>",

293 $query->submit

294 (

295 -name => 'submit',

296 -value => 'Submit'

297 ),

298 "</center>";

299 }

300

301 sub save_defect_data

302 {

303 my ($query) = @_;

304 my @text;

305

306 local($") = "|";

307

308 my $defects = setup_defect_data();

309

310 my $field;

311 foreach $field (@_fields)

312 {

313 push(@text, $query->param($field));

314 }

315 my $new_defects = setup_defect_data("@text", $defects);

316

317 my $fh = new FileHandle("> $ENV{'DOCUMENT_ROOT'}/defect_data") ||

318 croak ("couldn't open $ENV{'DOCUMENT_ROOT'}/defect_data");

319

320 write_defects($fh, $new_defects);

321 }

322

323 sub write_defects

324 {

325 my ($fh, $new_defects) = @_;

326

327 my ($key, $field, @text);

328 foreach $key (keys (%$new_defects))

329 {

330 my $defect = $new_defects->{$key};

331 foreach $field (@_fields)

332 {

333 my $text = $defect->{$field};

334 if ($field eq 'date')

335 {

336 my $secs = str2time($text);

337 $text = time2str("%x", $secs);

338 }

339 push(@text, $text);

340 }

341 print $fh "@text\n";

342 undef (@text);

343 }

344 close($fh);

345 }

346

347 sub setup_defect_data

348 {

349 my ($type, $defects) = @_;

350

351 my (@data);

352 my $return = {};

353 if (@_ == 0)

354 {

355 my $fh= new FileHandle("$ENV{'DOCUMENT_ROOT'}/defect_data");

356 chop(@data = <$fh>);

357 }

358 else

359 {

360 my $text = $type; @data = ($text); $return = $defects;

361 }

362

363

364 my ($row);

365

366 foreach $row (@data)

367 {

368 my $defect = {};

369 my @stuff = split(/\|/,$row);

370

371 my $i = 0;

372 map ($defect->{$_fields[$i++]} = $_, @stuff);

373 $return->{$defect->{defect_id}}= $defect;

374 }

375 $return;

376 }

377

378 sub _pick_defect_id

379 {

380 my ($defect_data) = @_;

381

382 my $xx = -1;

383 while (defined ($defect_data->{++$xx})) { }

384 return($xx);

385 }

Again, this is getting pretty hairy, so lets make a functional diagram. Figure 12.11 shows the bugtracker's skeleton:

1211.fig

Figure 12.11

Bugtracker functional diagram.

Here, we are again juggling two web pages through a loop; the loop in lines 69-86. The first of the two web pages that we set is the 'defect search' page, which the user sees first when he enters the application. That looks something like Figure 12.12:

defect_track.png

Figure 12.12

Defect Search screen

The defect search screen is responsible for showing at a high-level what bugs are available to work on, and to also give the user the chance to:

1) select a specific bug for perusal by checking the appropriate radio box in the 'Select' column

2) Enter in a search text string, and show all the bugs associated with that string.

3) Do neither of these things, and instead submit a new bug.

When the user hits 'submit' then, the current version of the script stops running, and like a runner handing a baton to the next runner, the script passes the knowledge what the user did to the next perl script that fires up in the form of arguments in the http string. In this case, there were two possibilities.

1) user types text in the Search window. Suppose the user searches for the bugs given the search text typed in the 'Search window'. Say they typed 'perl'. Bugtrack.p creates a 'http tag' that looks like:

'http://www.apxtech.com:8080/cgi-bin/tmp/bugtrack.p?srchText=perl&

(all the bugs matching that description)

With the string srchTxt=perl, the next version of the script bugtrack.pl knows to limit the display of bugs inside the text in line 251.

2) user presses one of the radio buttons on the side, or presses 'Submit' without selecting anything . Suppose that the user just hits 'Submit' without selecting anything else on the screen. In that case, bugtrack.p creates a http tag that looks like:

'http://www.apxtech.com:8080/cgi-bin/tmp/bugtrack.p?srchText=&submit=Submit (all the bugs matching that radio button command)

and this signifies to the bugtracker that it should prompt for a new bug description to be entered. Thus, the following, second screen comes up.

Defect_track_new_bug.png

Figure 12.13

New entry screen for bugtracker.p

We are then free to enter in the information that we want, and when we are done, we hit 'submit'. This causes the http tag to be passed back to look like:

http://www.apxtech.com:8080/cgibin/tmp/bugtrack.p?srchText=&submit_defect=Submit (etc.)

And then we call the function string 'save_defect_data()' then 'setup_defect_data()' then 'search_defects()'. In other words, we save the data for the bug, get the new saved data from the place we saved it to ( a file) and then redisplay the screen. When done, we get a screen that looks like Figure 12.14:

defect_track_new_bug_added.png

Figure 12.14

A new bug added by bugtrack.p

If you are new to web programming, this 'passing by parameters' seems a bit odd. It seems like you are passing the world around, and going through a massive amount of overhead in constructing these 'batons' for communication between CGI scripts.

And you know what, if you feel that, you are right. You pretty much owe it to yourself to check out 'mod_perl' with the Apache Webserver, or PerlScript with the IIS webserver. Here the perl-interpreter is built into the web server so you don't have the overhead of starting each time or passing around parameters each time.

In any case though, you can pretty much see everything we talked about again in that one loop:

69 sub main

70 {

71 if ($query->param("submit_defect") )

72 {

73 save_defect_data($query);

74 $defect_data = setup_defect_data();

75 }

76

77 if ($query->param("submit") )

78 {

79 display_defects($defect_data);

80 }

81 else

82 {

83 search_defects($defect_data);

84 }

85 print $query->endform;

86 }

See how well this shows the web design we talked about? Some people in fact think of this as a state machine, where depending on the state (the '$query->param()') you go to different functions (display_defects, search_defects).

One more note - what if you don't have mod-perl and Apache, or IIS and PerlScript, and therefore need to start up a perl interpreter each time? Well, by compiling your script (see chapter 23) you can reduce this overhead of running the 'perl' executable each time to nothing.

And, by saving your text in a database (via either DBI as we will talk about below, or DBM which we will talk about in chapter 16), you can reduce the overhead to where you are passing a single key around:

http://www.apxtech.com:8080/bugtracker.p?id=24234223123

In this case, the 'id' would stand for a text string in a database, which bugtrack would then read to get more information on what to do next.

Summary of CGI

So there it is, a quick one-hour tutorial on the basic concepts of CGI - by example. Of course, there is a lot more to it, but there are several good CGI books out there ('How to Set Up and Maintain a World Wide Web Site' by Dr Lincoln Stein, author of CGI.pm comes to mind)

In short, there are four basic principles that you need to remember:

1) perl's interface for CGI is through the module CGI.pm. Methods in CGI are used to dynamically construct HTML pages.

2) HTML documents can communicate with perl scripts via certain directives (<img src = "/cgi-bin/counter.pl");

3) where you are in a cgi server's hierarchy determines the behaviour that you get via loading a document into a web browser.

4) If you want to make programs that have more than one web-page, you need to pass around that information via the command line.

Notice that we spent very little time talking about the interface of all of this. And for good reason - there are plenty of web sites out there that do the CGI thing. Here are a couple of corollary principles you might want to remember:

1) To learn HTML, you can always see how something was done in HTML by going to 'Document Source' on your favorite browser.

2) To get the pictures and/or structure off of HTML pages, you can always go to 'Document Info' inside your favorite browser.

3) the page 'http://www-genome.wi.mit.edu/ftp/pub/software/WWW/examples/ is the only page on CGI.pm you are ever going to need.

And if that doesn't satisfy you, see the appendix at the end of the book for some good websites and archives.

Databases (thanks to Michael Peppler for example)

I hate to say it, but databases are going to get a little bit of a short shrift here - not because of them being unimportant, but because of the ease in which you can manipulate them in perl. Perl has about the easiest interface into databases that I have ever seen. Of course in order to use databases, you need to know SQL, and this section assumes that you do.

Like any other language, talks to a database via the means of a driver.Perl sends data to that driver, which then goes to the database, retrieves the relevant info back and hands it off to perl. Kind of like figure 12.15:

1215.fig

Figure 12.15

High level diagram of Driver communicating to a Database

Now in this case, the 'driver' can be one of many things. There is 'oraperl', 'sybperl', 'Win32::ODBC'; these are the old ways of doing database manipulation in perl, and they still have merit.

However, the interface of perl's future is called 'DBI', standing for Database Interface. DBI's idea is to make the most common commands for databases work for any given database vendor out there.

To do this, DBI acts as 'traffic cop'. Underneath the hood (so to speak) of DBI is a bunch of modules called 'DBD': DBD::Sybase, DBD::Oracle, DBD::Informix, DBD::ODBC, etc. which you need to have installed in order for DBI to work with your particular system.

If you have Sybase, you need DBD::Sybase. For oracle, you need DBD::Oracle. These are the database drivers that do the actual job of talking to the database.

So when you say 'use DBI'; (or load the DBI module) you are in effect saying 'load up the databases that I know about - I don't care what they are'. And then DBI - through DBD - handles all the complexity of talking to the database for you.

From then on, it is simply a matter of sending the database commands. Here is a s

a database monitor - Viewing Database Information

One of the most frustrating things about databases is actually maneuvering around inside of them - especially for beginners.

There are several commands, tables and objects that make maneouvering around databases tough for beginners; it is usually a matter of groping around in the blind for a while, getting used to the command language in order to get anything done.

Of course, once you get the command language down you can work miracles; but that takes time. Here's a small script that displays any database objects in a sybase database, and can help beginners learn the sybase lingo. It comes from the DBD::Sybase distribution:

Listing: 12.28 sqlDisplayDBI.p

1

2

3 use lib '.';

4

5 BEGIN

6 {

7 $ENV{'SYBASE'} = "/path/to/sybase";

8 }

9

10 use strict;

11 use DBI; # use Sybase::DBlib;

12

13 use CGI;

14

15 my $query = new CGI;

16

17 print $query->header;

18 print $query->start_html(-title => "Show a Sybase Object");

19

20 my $server = $query->param('server');

21 my $database = $query->param('database');

22

23 my $state = $query->param('__state__') || 0;

24

25 if(!$database)

26 {

27 error("Please supply the <b>database</b> parameter.<p>");

28 }

29

30

31

32

33 # WAS my $dbh = new Sybase::DBlib( "sa", undef, $server);

34

35 my $dbh = DBI->connect('dbi:Sybase:', 'sa','',

36 {syb_dbd_server => $server });

37

38 ($dbh->do("use $database") != -2

39 || error("The database <b>$database</b> doesn't exist");

40 # ($dbh->dbuse($database) == SUCCEED) ||

41 # error("The database <b>$database</b> deosn't exist");

42

43

44 SWITCH_STATE:

45 while (1)

46 {

47 ($state == 0) && do

48 {

49 my($values, $labels) = getObjects();

50 print "<h1>Show a Sybase objects definition:</h1>\n";

51 print "<p><p>Please select an object:<p>\n";

52 print $query->start_form;

53

54 print $query->scrolling_list

55 (

56 -name => 'object',

57 -values => $values,

58 -labels => $labels,

59 -size => 10

60 );

61

62 $query->param(-name=>'__state__', '-values'=>1);

63 print $query->hidden(-name=>'__state__');

64 print $query->hidden(-name=>'database');

65 print $query->hidden(-name=>'server');

66

67 print $query->submit;

68 print $query->end_form;

69

70 last SWITCH_STATE;

71 };

72

73 ($state == 1) && do

74 {

75 print "<h1>Show a Sybase object's definition:</h1>\n";

76

77 my $objId = $query->param('object');

78 my $html = getText($objId);

79 print $html;

80

81 last SWITCH_STATE;

82 };

83 }

84

85 print $query->end_html;

86 exit(0);

87

88

89 sub getObjects

90 {

91

92 # WAS $dbh->dbcmd(

93

94 my $sth = $dbh->prepare

95 (

96 "select distinct 'obj' = o.name, 'user' = u.name, o.id, o.type

97 from dbo.sysobjects o,

98 dbo.sysusers u,

99 dbo.sysprocedures p

100 where u.uid = o.uid

101 and o.id = p.id

102 and p.status & 4096 != 4096

103 order by o.name"

104 );

105

106

107

108 # WAS $dbh->dbsqlexec; $dbh->dbresults;

109

110 $sth->execute();

111

112 my %dat;

113 my @values;

114 my %labels;

115 my $value;

116

117 # WAS while(%dat = $dbh->dbnextrow(TRUE))

118 # {

119

120 while($dat = $sth->fetchrow_hashref)

121 {

122 # WAS $value = "$dat{id} - $dat{type}";

123 $value = "$dat->{id} - $dat->{type}";

124

125 push(@values, $value);

126

127 # WAS $value = "$dat{id} - $dat{type}";

128 $labels{$value} = "$dat->{user}.$dat->{obj}";

129 }

130

131 return (\@values, \%labels);

132 }

133

134 sub getText

135 {

136 my $objId = shift;

137

138 $objId =~ s/[\D\-\s]+$//;

139

140 # WAS $dbh->dbcmd ("select text from dbo.syscomments where id = $objId");

141

142 my $sth =

143 $dbh->prepare("select text from dbo.syscomments where id = $objId");

144

145 # $dbh->dbsqlexec; $dbh->dbresults;

146

147 $sth->execute();

148 my $html = '';

149 my $text;

150

151 # WAS while(($text) = $dbh->dbnextrow)

152 while(($text) = $sth->fetchrow)

153 {

154 $html .= $text;

155 }

156 TsqlToHtml($html);

157 }

158

159 sub TsqlToHtml

160 {

161 my $html = shift;

162 $html =~ s/\n/<br>\n/g;

163

164 local($") = '|';

165

166 # bolding the keywords

167 my @keywords = qw ( as begin between declare delete drop else end exec

168 exists go if insert procedure return set update values

169 from select where and or create

170 );

171

172 push(@keywords , 'order by');

173

174 $html =~ s"\b(@keywords)\b"<b>$1<\/b>"ig;

175

176 my @types = qw ( tinyint smallint int char varchar datetime

177 smalldatetime money numeric decimal text binary

178 varbinary image

179 );

180

181 # italicising the types

182 $html =~ s"\b(@types)\b"<i>$1</i>"gi;

183

184 $html =~ s"\t"\&nbsp\&nbsp\&nbsp\&nbsp"g;

185 $html =~ s" "\&nbsp"sg;

186

187 $html;

188 }

189

190 sub error

191 {

192 print "<h1>Error!</h1>\n";

193 print @_;

194 print $query->end_html;

195 exit(0);

196 }

Again, here is a web page that is implemented as a state machine. We won't say much about this aspect of it, except that line 174 and 182 are really cool ways of italicising (or bolding) words.

More to the point, we have bolded every single occurrence of a DBI call, and set it next to the corresponding 'Sybase::Dblib' call.

When you are actually getting data out of the database with DBI, you go through four major steps:

1) logging in. Lines 35 and 36 show the most common way of logging in to a server:

35 my $dbh = DBI->connect('dbi:Sybase:', 'user','password',

36 {syb_dbd_server => $server });

where the first parameter is the type of database you are connecting to, the second and third parameters are user and password, and the 'syb_dbd_server' is a hash reference parameter which tells which server to connect to.

2) preparing the sql for processing. Lines 142-143 show the preparation for a SQL command to be processed via DBI (and therefore by DBD::Sybase).

142 my $sth =

143 $dbh->prepare("select text from dbo.syscomments where id = $objId");

 

3) Executing the sql. Line 147 shows the execution of a sql command:

147 $sth->execute()

This actually tells the database to run the request, and to wait for any arguments to 'fetch' the data.

4) getting the data out of the server. Once you have executed the command, you can snarf the data into perl via the form of an array:

152 while(($text) = $sth->fetchrow)

153 {

154 $html .= $text;

155 }

Or a hash_reference:

120 while($dat = $sth->fetchrow_hashref)

121 {

123 $value = "$dat->{id} - $dat->{type}";

where the 'id' and 'type' fields are actually the names of fields inside a table.

That's about it for simple database access. And, if you are familiar with doing this type of operation in C, you should realize exactly what a pain it is to do it in C in comparison to perl! In fact, it needn't be as complicated as the above; if you want to simply execute a piece of sql you can say:

38 ($dbh->do("use $database") != -2

39 || error("The database <b>$database</b> doesn't exist");

instead of going through all of the bother of actually 'preparing', 'executing' and so forth.

Now, lets turn to the application side of sqldisplayDBI.p. When you run sqldisplayDBI.p, you give it a command string; supplying it the database and the server - like so:

http://www.apxtech.com/sqldisplay.p?server=MYSERV&database=mydatabase

When you do so, it gives you a window in which you can select an object for viewing:

sybase_window.png

Figure 12.16

sqldisplayDBI.p results in browser.

If you then hit 'Submit query', sqldisplayDBI.p will open that particular object's definition up, and give you a display like Figure 12.X:

sybase_object_def.png

Figure 12.17

sqldisplayDBI.p showing the object definition.

Hence you get a 'pretty printed' view of what the object actually is doing - which is a good technique for actually learning the database system itself!

Database Summary

This is of course just a small example of what you can do with DBI. DBI and DBD are fully functional database programmers intefaces, which means that you can manipulate just about any aspect of a database with them.

However, as just a warning note, DBI is fairly solid code for certain database engines; but other database engines have a certain amount of 'catchup' to do. For example, DBI with DBD::Oracle is stable, as is DBD::Informix - but DBD::Sybase is in an alpha state.

This is, of course ironic because it was the database that we used in the example above - I don't have access to the other databases. However, notice that I put in comments each and every equivalent 'Sybase::Dblib' command.

Sybase::Dblib is stable, well tested, and object oriented, hence you should probably be using that in the short term. Between DBD::ODBC and Win32::ODBC, it is a toss up decision - I prefer DBI. For Oracle and Informix, however, there really is no choice; you should be using DBI.

perlTk Examples

The last subject that we are going to cover is perlTk. PerlTk is a way of making crossplatform GUI's which are fast, flexible, and easily configurable. It is based off of the tcl/tk package of John Ousterhout, and has much the same interface and is available via CPAN or the CD that comes with this book.

However, perltk I believe is much more powerful than tcl/tk. (no offense to tcl, but it really needs to get object oriented). Perltk has perl's power of an interface - and the code that you create is totally OO. Before we get into the examples though, its probably a good idea to give some background on Tk:

Small Tk Tutorial

Here is a small tutorial of how you can use perlTk to make OO GUI's. First, the important points:

1) tk uses what is called an event loop. Here is the simplest tk app that you could possibly make:

use Tk;

MainLoop();

The 'MainLoop()' function says that you are done configuring how your Tk app is going to look like, and you are ready to actually display your GUI application.

All the MainLoop() function does then, is handle control of the program to a very simple-minded process, which basically (every few milliseconds) asks the screen

'has the user pressed any buttons, or interacted with the Tk windows yet?'

It keeps on doing this, over and over, until the user actually does something, in which case the MainLoop handles the requests.

Of course, this 'MainLoop' process here simply is an infinite loop, since we haven't actually made any GUI's yet. Read on.

2) tk is composed of widgets that have parents, and children. Perltk works off of the principle of a geometry manager that controls widgets. Widgets are graphical objects that are used to control a form; there are several widgets in perltk but the following are the most important:

MainWindow - the main window of your application

Toplevel - sub-windows inside your application

Menu - a menu for your applications

Entry - a single line of text

Frame - a place to put other widgets

Text - a type of widget in which you can enter in text (one with scroll bars, the other without)

Canvas - a canvas for displaying widgets in a free form way.

Now, if you want to make a tk screen, you make a 'MainWindow' and then populate it with several other objects (like a Menu, Canvas, or what have you.)

Hence, some sample code might look like:

use Tk;

my $window = new MainWindow();

$window->title('this is a window');

 

my $text = $window->Text

(

'-wrap' => 'word',

'-relief' => 'sunken',

'-borderwidth' => 2,

'-setgrid' => '1'

);

my $button = $window->Button('-text' => 'Configure', '-command' => \&configure );

This would

a) make a 'MainWindow',

b) entitle it 'this is a window',

c) create (but not put) a text window inside this 'mainwindow'.,

d) create (but not put) a button inside the window as well.

Hence, the MainWindow owns the subwindow, and it owns the button inside the subwindow as well.

3) you configure widgets with hashes. Each one of these widgets is configured by passing it a hash when creating it (or in configuring it afterwards). We already saw this above:

my $text = $window->Text

(

'-wrap' => 'word',

'-relief' => 'sunken',

'-borderwidth' => 2,

'-setgrid' => '1'

);

Here, we are making the 'Text' widget (inside the window $window) have the qualities of:

a) wrapping around such that words are not split onto two lines

b) the relief is sunken, such that it looks 'darker' than the rest of the window.

c) the '-borderwidth' or empty space surrounding the text object is 2.

setgrid is a little bit more complicated. It is used to communicate with the top-level window for sizing reasons. See the 'Text' page for more detail.

4) you 'pack', 'place' or 'grid' objects to place them on the screen. When you have made a relationship between two objects, as we did above with my $text = $window->Text(...), you have not yet 'put' that object onto the window it is associated with.

In order to do this, you need to call either the 'pack', 'place' or 'grid' methods. Leaving off where we were before, we could 'pack' the $text and $button widgets onto the $window widget by saying:

$text->pack('-side' => 'top');

$button->pack('-side' => 'top');

 

When you say this, perltk goes through a bit of magic. The 'pack' algorithm is one of the coolest GUI things I've seen to date, and it makes building GUI applications extremely easy. The best way is to see it in example. Here is a complete, Tk Application, which shows 'pack' in action:

Listing 12.29 packexample.p

1 use Tk;

2 my $window = new MainWindow();

3 $window->title('this is a window');

4

5 my $text = $window->Text

6 (

7 '-wrap' => 'word',

8 '-relief' => 'sunken',

9 '-borderwidth' => 2,

10 '-setgrid' => '1'

11 );

12 my $button = $window->Button('-text' => 'Configure');

13 $text->pack('-side' => 'top');

14 $button->pack('-side' => 'top');

15 MainLoop();

This creates something that looks like Figure 12.18:

pack_example.PNG

Figure 12.18

Example of pack in action.

So what happened here? Well, we 'packed' the text first, up at the top. And then, we 'packed' the button. But since the text was packed first, it got priority and was put in the primary, 'top' position. The button came next since it was packed second.

Pack has a lot of cool options to it that we can only cover here in the basics. You can

1) pack things to the 'left' or to the 'right' or 'top' or 'bottom';

2) '-fill' your objects so they take up all the available space in the 'x', 'y', or both dimensions.

3) '-anchor' your widgets to the 'n' (north) 'ne' (northeast), 'nw' (northwest), 'w' (west), 's' (south), 'se' (southeast), 'sw' (southwest, or the 'center' ( center is default)

And more; read the documentation on pack (pack.html) for more detail.

It is surprising, but this simple algorithm will actually solve about 95% of your simple GUI needs. If you need to go further, see the 'grid' and 'place' functions in the documentation - they provide you with a little more control.

5) you control what the application does via callbacks. All of the four tenets above are good for is drawing 'pretty pictures' - it is the job of callbacks to actually take those pretty pictures and have them do anything.

Callbacks, as you may remember from last chapter, are perl's way of calling functions without actually knowing their name. For example,

$a = \&b;

makes $a a reference to the function 'b', and

&$a;

then actually calls the function &b.

In tk, callbacks are used to make the application work. You define a callback and tie it to a button, menu event, or what have you. When that button gets pushed, the callback behind that button gets executed. Taking the small example that we had above, we can actually make it so the button puts text into the textbox for us:

Listing 12.30 packexample.p

1 use Tk;

2 my $window = new MainWindow();

3 $window->title('this is a window');

4

5 my $text = $window->Text

6 (

7 '-wrap' => 'word',

8 '-relief' => 'sunken',

9 '-borderwidth' => 2,

10 '-setgrid' => '1'

11 );

12 my $button = $window->Button('-text' => 'Configure',

13 '-command' => [\&insert, $text,

14 'My text to insert' ] );

15 $text->pack ('-side' => 'top');

16 $button->pack ('-side' => 'top');

17 MainLoop();

18

19 sub insert

20 {

21 my ($textwidget, $text) = @_;

22 $textwidget->insert('0.0', $text);

23 }

The code outlined in bold defines the callback. The command '\&insert' is associated with the button '$button'. Hence, when you hit the 'Configure' button in this example, perl inserts the text 'My text to insert' into the text box. Line #22. As in Figure 12.19:

callback_test.png

Figure 12.19

Callback Test, showing screen after 'Configure' has been pressed.

Callbacks like this glue your application together. They are used to take the disparate buttons, dials, knobs, menus, and other widgets of your application and make them talk to each other. In the above case, we had the callback &insert, which took as a parameter '$textwidget', and then inserted text into that $textwidget. We could have just as easily saved a file, loaded a file, or whatever.

Summary of PerlTk tutorial.

PerlTk is a melding of perl - the text manipulation language, and tk - the package for creating quick GUIs. Each of the two tools complements the other, filling in weak spots in the other tools.

There were five basic concepts that you needed to know to get fluent in Tk:

1) tk's 'main' function (ie: the one you will call all the time) is an event loop.

2) tk works on the idea that widgets control other widgets through a parent child relationship.

3) you configure each of the widgets with hashes

4) you 'pack' the widgets onto each other with the pack, place and grid commands

5) you use perl callbacks to tie all of the underlying widgets together.

We shall see exactly how powerful perltk is, in the following two examples. First we will consider the problem of junk mail and how to solve it, and second we will consider the problem of running applications inside a windows environment in a clean fashion.

a Mail Filter

The growth of computer networks around the world has had one bad result: spam.

A day doesn't go by when I get mail about get rich quick schemes, pornography announcements, pyramid marketing, water-filters, whatever; this is probably all based on the 'sucker born every minute' principle.

Unfortunately, the internet makes this type of conning all the more easy. With automated tools, zero cost of distribution, and people selling 50Million block chunks of email addresses, the problem will grow and grow and grow..

Here's a small script that lets you filter out this type of junk. It is based on the 'eval' function in perl, and the fact that there is a cool module on CPAN called 'Mail::Filter' which lets you construct filters like this pretty easily:

listing 12.31 mailparse.p

1 #!/home/epeschko/perl50043/install/bin/perl

2

3 use Tk;

4 use FileHandle;

5

6 use strict;

7

8 my $window = new MainWindow();

9 $window->title('Mail Filter');

10

11 my $configure = $window->Button # define configure button

12 (

13 'text' => 'Configure',

14 'command' => \&configure,

15 )

16 -> pack('side' => 'right');

17

18 my $filter = $window->Button # define filter button

19 (

20 'text' => 'Filter',

21 'command' => \&filter

22 )

23 -> pack('side' => 'left');

24 MainLoop();

25

26 sub filter

27 {

28 print "Filtering!!!!\n";

29 my (@delete_flags) = ('-delete', '1') if ($ENV{'MAILFILTER_ON'});

30 my $file = $ENV{'FILTER_FILE'}|| "$ENV{'HOME'}/.mailfilt";

31 my $mailbox = $ENV{'MAILBOX'} || die "Need to have a mailbox to filter!\n";

32

33 my $folder = new Mail::Folder('AUTODETECT', $mailbox) ||

34 error_window("You need to have the 'MAILBOX environmental

35 variable set !!!! Either that, or the mail Folder program couldn't detect

36 your type of mailbox! ");

37

38 require $file;

39 my $ft= new Mail::Filter( \&mailbox_filter );

40 $ft->filter($folder, @delete_flags);

41 }

42 sub configure

43 {

44 my $file = $ENV{'FILTER_FILE'} || "$ENV{'HOME'}/.mailfilt";

45 my $editor = $ENV{'FILTER_EDITOR'} || '';

46 if (!$editor)

47 {

48 my $textwindow = $window->Toplevel(-title => 'email filter');

49 my $text = gettext($file);

50 my $field = $textwindow->Scrolled

51 ( 'Text',

52 '-scrollbars' => 'e',

53 '-wrap' => 'word',

54 'relief' => 'sunken',

55 'borderwidth' => 2,

56 'setgrid' => '1'

57 );

58

59 $field ->pack ( 'expand' => 'yes', 'fill' => 'both' );

60 $field->insert('0.0', $text);

61

62 my $save = $textwindow->Button

63 (

64 'text' => 'Save',

65 'command' =>

66 [ \&savetext, $field, $file ]

67 )

68 -> pack( 'side' => 'left' );

69

70 my $dismiss = $textwindow->Button

71 (

72 'text' => 'Dismiss',

73 'command' =>

74 [ $textwindow => 'withdraw']

75 )

76 -> pack( 'side' => 'right' );

77 }

78 else

79 {

80 system("$editor $file");

81 _checkText($file, 'file');

82 }

83 }

84

85 sub gettext

86 {

87 my ($file) = @_;

88 local($/) = undef;

89 my $fd = new FileHandle("$file") || return ('');

90 return(<$fd>);

91 }

92

93 sub savetext

94 {

95 my ($window, $file) = @_;

96 my $text = $window->get('0.0', 'end');

97

98 return(0) if (!_checkText($text, 'text'));

99

100 my $fd = new FileHandle("> $file");

101 print $fd $text;

102 }

103

104 sub error_window

105 {

106 my ($window, $text, $file) = @_;

107

108 my ($height, $width, $type) = _dimensions($text);

109

110 my $errorwindow = $window->Toplevel();

111 my $error;

112

113 if ($type eq 'Scrolled')

114 {

115 $error = $errorwindow->Scrolled

116 (

117 'Text',

118 'height' => $height,

119 'width' => $width,

120 'borderwidth' => 2,

121 'setgrid' => 'true'

122 );

123 $errorwindow->title("Errors in mail filter file");

124 }

125 else

126 {

127 $error = $errorwindow->Text

128 (

129 'height' => $height,

130 'width' => $width,

131 'borderwidth' => 2,

132 'setgrid' => 'true'

133 );

134 $errorwindow->title("Errors in mail filter file");

135 }

136

137 $error->pack ('expand' => 'no', 'fill' => 'both' );

138

139 $error->insert('0.0',

140 "Your mail filter program does not

141 have the correct syntax! Error:

142

143 $text"

144 );

145

146 my $button = $errorwindow->Button

147 (

148 'text' => 'Dismiss',

149 'command' =>

150 [ $errorwindow => 'withdraw']

151 )

152 -> pack( 'side' => 'right' );

153 }

154

155 sub _checkText

156 {

157 my ($text,$type) = @_;

158 local($/) = undef;

159

160 if ($type eq 'file') { my $fd = new FileHandle($text); $text = <$fd>; }

161

162 eval ( "sub { $text } ");

163 if ($@) { error_window($window,$@); return(0) }

164 return(1);

165 }

166

167 sub _dimensions

168 {

169 my ($text) = @_;

170 my ($height, $width, $type);

171

172 my @lines = split(m"\n", $text);

173 grep

174 { my $tlength = length($_);

175

176 $width = ($tlength + 2 > 80)? 80 :

177 ($tlength > $width)? $tlength + 2

178 : $width

179 }

180 @lines;

181

182 $height = (@lines < 2)? 5 : @lines + 3;

183

184 if ($height > 20) { $height = 20; $type = 'Scrolled'; }

185 else { $type = 'Text'; }

186

187 return($height, $width, $type);

188 }

What to say about this application? First of all, from the outside it looks pretty simple. If you run it, you get two buttons, looking like Figure 12.20:

email_top.png

Figure 12.20

Filter top two buttons.

Now the idea is twofold. If you press 'configure' you get a screen like figure 12.21:

email_top.png

Figure 12.21

Configure screen.

The idea of this screen is for you to type a legal mail filter. As I said earlier, CPAN has a module called Mail::Filter by Graham Barr, which lets you take an arbitrary mailbox, and then filter it given any criteria you want. Here the mail filter will be applied to all of your messages in your mailbox, as supplied by the environmental variable '$ENV{'MAILBOX'}'.

In this sample, the mail_filter will look for messages with the subject 'junk' and get rid of them (this is what return(undef) does). Otherwise, it will keep them in your incoming mailbox. For more filters, see the MailTools available on CPAN or the CD with this book.

Note that there are a couple of issues here: 1) filtering your mailbox is a pretty dangerous thing to do, and 2) there needs to be a trigger to actually delete the junk mail for you.

The first problem we solve by introducing another environmental variable: 'MAILFILTER_ON'. This prevents you from inadvertantly deleting any messages - when we do the filtering in line 40:

40 $ft->filter($folder, @delete_flags);

we need to pass it ('-delete', 1') in order for the actual altering of the mailbox to occur. This allows you to test your mail filters before using them.

The second problem is solved by the 'filter' button. When you press 'filter' - and only when you press filter does your mailbox get effected. This lets you have selective control over when you want to actually delete your junk mail

The only question left then is: if this application only has two buttons, why is it 188 lines ling? The answer is because we need to make auxilliary checks in order to insure 'stability'. We need to check the syntax of the filter people enter - if it is wrong we need to print out the errors. We need to save the filters to a file, or we would have to type it over again each time we entered the application.

And so on - applications like this, although simple in purpose can get pretty long in order to deal with all the different items that take place.

running scripts from inside a tk app.

As we saw in chapter 2, perl is a little bit awkward to run on NT. As it stands, you need to either make your perlscript a batch file, run it from the Command prompt or associate a icons with a perl executable.

And all three ways have drawbacks. Making a batch file is a pain, as is running it from the command prompt. And associating perl script icons with a perl executable only halfway works - it works for applications like Tk which are basically infinite loops until told when to quit.

But for command line scripts, associating the icon is not that useful. Windows runs through your script, and then 'boom!' closes the dos prompt that was running the application, thus making it almost impossible to monitor what occured.

Here's a possible solution. It emulates a shell, giving you a place to run the command line, and a directory to run it in. It also gives you an option to save the text that you have just generated by the perl script.

Listing 12.32 runscript.p

1 #!/home/epeschko/perl50043/install/bin/perl

2

3 use Tk;

4 use Tk::FileDialog;

5 use FileHandle;

6 use Cwd;

7

8 my $nolines = 0;

9

10 use strict;

11

12 my $window = new MainWindow();

13 $window->title('Run Script');

14 my ($text, %entries);

15

16 die "Need zero or one argument!\n" if (@ARGV > 1);

17 my ($directory, $command) = (@ARGV == 1)? _getFile($ARGV[0]) : (cwd(), '');

18 my (@labels) = ("run directory\t", "Command Line\t");

19

20 foreach $text (@labels)

21 {

22 my $frame = $window->Frame( 'borderwidth' => 2);

23

24 my $entry = $frame->Entry( 'relief' => 'sunken', 'width' => 60);

25 $entry->insert('end', $directory)if ($text eq "run directory\t");

26 $entry->insert('end', $command) if ($text eq "Command Line\t");

27 $entry->bind("<Control-f>", [\&getdir, $entry ])

28 if ($text eq "run directory\t");

29 $entry->bind("<Control-f>", [\&getfile, $entry ])

30 if ($text eq "Command Line\t");

31

32 my $label = $frame->Label( 'text' => "$text" );

33

34 $frame->pack('side' => 'top', 'fill' => 'x');

35 $label->pack('side' => 'left');

36 $entry->pack('side' => 'left');

37 $entry->focus() if ($text eq 'directory');

38

39 $entries{$text} = $entry;

40 }

41

42 my $frame = $window->Frame( 'borderwidth' => 2 )->pack('side' => 'top');

43

44

45 $text = $window->Scrolled('Text', 'setgrid' => 'true', '-scrollbars' => '');

46

47 my $runbutton = $frame->Button

48 (

49 'text' => 'Run',

50 'command' => [ \&run, \%entries, $text ]

51 )

52 -> pack('side' => 'left');

53

54

55 my $savetextbutton = $frame->Button

56 (

57 'text' => 'Save Text',

58 'command' => [ \&saveas, \%entries, $text]

59 )

60 -> pack('side' => 'left');

61

62 my $savecommandbutton = $frame->Button

63 (

64 'text' => 'Save Command',

65 'command' => [ \&savecommand, \%entries, $text]

66 )

67 -> pack('side' => 'left');

68

69

70 my $loadcommandbutton = $frame->Button

71 (

72 'text' => 'Load Command',

73 'command' => [ \&loadcommand, \%entries, $text ]

74 )

75 -> pack('side' => 'left');

76

77

78 my $clearbutton = $frame->Button

79 (

80 'text' => 'Clear',

81 'command' => [\&clear => $text ]

82 )

83 -> pack('side' => 'left');

84

85 my $quitbutton = $frame->Button

86 (

87 'text' => 'Quit',

88 'command' => [$window => 'destroy']

89 )

90 -> pack('side' => 'left');

91

92

93 $text->pack('side' => 'bottom');

94

95 MainLoop();

96

97 sub run

98 {

99 my ($entries,$text) = @_;

100

101 my $dir = $entries->{"run directory\t"}->get();

102 my $script = $entries->{"Command Line\t"}->get();

103 $text->insert

104 (

105 'end',

106 "Running\n\t$script\nin\n\t$dir\n------------------------------------------\n"

107 );

108 $nolines += 3;

109

110 my $line;

111 chdir($dir);

112

113 if ($script =~ m"\.p|\.pl|\.pm") { open (FD, "$^X -S $script 2>&1 |"); }

114 else { open(FD, "$script 2>&1 |"); }

115

116 while ($line = <FD>)

117 {

118 $text->insert( 'end', $line );

119 $nolines++;

120 $text->configure('-scrollbars' => 'e') if ($nolines > 22);

121 }

122 $text->yview('moveto', 1);

123 }

124

125 sub getdir

126 {

127 my ($e) = @_;

128 my $filed = $e->FileDialog

129 (

130 '-Title' => 'Directories',

131 '-SelDir' => 1,

132 '-Path' => $e->get()

133 );

134

135 my $dir = $filed->Show('-Horiz' => 1);

136 $e->delete(0, 'end');

137 $e->insert('end', $dir);

138 }

139

140 sub getfile

141 {

142 my ($e) = @_;

143

144 my $filed = $e->FileDialog

145 (

146 '-Title' => 'Files',

147 '-Path' => $e->get()

148 );

149

150 my $file = $filed->Show('-Horiz' => 1);

151 $e->delete(0, 'end');

152 $e->insert('end', $file);

153 }

154

155 sub clear

156 {

157 my ($text) = @_;

158 $text->configure('-scrollbars' => '');

159 $text->delete('0.0','end');

160 $nolines = 0;

161 }

162

163 sub saveas

164 {

165 my ($entries, $textwidget) = @_;

166

167 my $startpath = $ENV{'TEXT_DIR'} || $entries->{"run directory\t"}->get();

168 my $command = $entries->{"Command Line\t"}->get();

169

170

171 my $filed = $textwidget->FileDialog

172 (

173 '-Title' => 'File To Save',

174 '-Path' => $startpath

175 );

176 my $file = $filed->Show('-Horiz' => 1);

177 return() if (!$file);

178 my $fh = new FileHandle( "> $file") || (error($file, $textwidget),return());

179 print $fh $text;

180 }

181

182 sub loadcommand

183 {

184 my ($entries, $textwidget) = @_;

185 my $path = $ENV{'COMMAND_DIR'} || $entries->{"run directory\t"}->get();

186

187 print "HERE $path\n";

188 my $filed = $textwidget->FileDialog

189 (

190 '-Title' => 'File To Save',

191 '-Path' => $path

192 );

193 my $file = $filed->Show('-Horiz' => 1);

194 return() if (!$file);

195 my $fh = new FileHandle( "$file") || (error($file, $textwidget), return());

196 my @lines = <$fh>;

197 chop(@lines);

198

199 $entries->{"run directory\t"}->delete(0,'end');

200 $entries->{"run directory\t"}->insert('0', $lines[$#lines-1]);

201 $entries->{"Command Line\t"}->delete(0,'end');

202 $entries->{"Command Line\t"}->insert('0', $lines[$#lines]);

203 }

204

205

206 sub savecommand

207 {

208 my ($entries, $textwidget) = @_;

209

210 my $path = $ENV{'COMMAND_DIR'} || $entries->{"run directory\t"}->get();

211 my $command = $entries->{"Command Line\t"}->get();

212

213 my $filed = $textwidget->FileDialog

214 (

215 '-Title' => 'File To Save',

216 '-Path' => $path

217 );

218 my $file = $filed->Show('-Horiz' => 1);

219 return() if (!$file);

220 my $fh = new FileHandle( "> $file") ||(error($file, $textwidget), return());

221 print $fh <<"EOL";

222 #!$

223

224 my \$tmpfile = ".runfile.\$\$";

225

226 open(FD, "> .runfile.\$\$");

227 print FD <DATA>;

228 close(FD);

229

230 system("$ $0 \$tmpfile");

231 unlink(".runfile.\$\$");

232

233 __END__

234 $path

235 $command

236 EOL

237 }

238

239 sub _getFile

240 {

241 my ($file) = @_;

242 my $fd = new FileHandle("$file") || die "Couldn't open $file!\n";

243 my @lines = <$fd>;

244 chop(@lines);

245 return(@lines);

246 }

247

248 sub error

249 {

250 my ($file, $tw) = @_;

251 my $errwindow = $tw->Toplevel();

252 my $text = $errwindow->Text

253 (

254 'setgrid' => 'true',

255 'height' => 10, 'width' => 40

256 );

257 $text->insert ('0.0', "Error in opening $file");

258 $text->pack('side' => 'top', 'fill' => 'x');

259 $errwindow->Button

260 (

261 'text' => 'OK',

262 'command' => [ $errwindow, 'withdraw' ]

263 ) -> pack ('side' => 'bottom');

264 }

This time, lets give a functional diagram to show the structure of this application. For now, we shall treat each button as having a separate function, and 'label' them. We shall see that this is not far from the truth when we get to Object Oriented Programming:

1222.fig

Figure 12.22

The structure of 'runscript'.

As you can see, the addition of buttons, and text objects, and toolbars makes the code a lot longer. It doesn't help that we are showing off - lines 27 and 28 actually bind keys to the text objects, so we can type 'Control-F' instead of clicking the buttons!

The key lines are 21-94, which define the look of the application, and the 'run' subroutine which actually runs the perl application for us. Figure 12.23 shows runscript in action:

1223.fig

Figure 12.23

The 'runscript' windows application.

As you can see, you simply type the directory to run the script from, the name of the command to run (this case ls) and the place to run the command from (in this case /tmp/dk1). When you run the application, it displays the text in the window below.*

Since this program uses pipes, it may not run on Windows 95. Actually XXXXXX let me catch this in author review

*

We give facilities to save the buffer we have created, save the command set that we have created, load the command from a file, clear the buffers, and quit.

Further releases of this script (if we should go farther) should be able to make icons out of this application, so that you can simply click on an icon to run a given command.

Oh yes, and one more note Lines 188-190 actually gets what is called a 'filedialog' box. This is a pre-built tk window which lets the user pick a file. and we use it to load and save commands to files. You've probably seen them before:

runscript_filesave.png

Figure 12.24

File dialog box.

Perltk has quite a few of these prebuilt windows lying around; I heartily recommend that you check out the online documentation, and the online code that comes with perltk.

Summary of Tk Examples

Perl-tk is one of the best ways in perl to build GUI applications - the other way is CGI, and we all know about that. We have given up above a small tutorial to help you get started with perltk - however, this is no substitute for the extensive online documentation that comes when you install perltk.

In addition, there is one good way to learn Tk that we didn't mention up above; and that is with the command:

C:\> widget

When you get perltk installed into your perl5 distribution, widget comes too. It is a beautiful way to learn tk because there are about 30 different applications (like the above) that show you all of the tricks of the perltk trade.

Summary of Chapter

One of the mottos of the perl community (the writer of perl, larry wall said it first) is 'there's more than one way to do it'. This statement talks about diversity of solutions, that a problem can be tackled in more than one way.

However, just because perl syntax is so flexible doesn't mean anything if you can only apply perl to a small set of problems! And, as you can see above, the number of problems that perl can tackle has grown, and will continue to grow, all due to the fact that perl was designed extraordinarily well at quite a few levels.

As you can see, perl can pretty much span the gamut of applications, from utilities to find text, to database assistants, to scanning the web, to actually publishing on the web, to filtering email, to making your tasks at printing or merging lots of documents easier.

And as we shall see in the next half of this book, you needn't stop here. Perl is just as flexible as an object oriented language; ideal for either learning OO programming, converting non-object oriented scripts like above into objects, or building large, very flexible applications.

Orders Orders Backward Forward
Comments Comments

COMPUTING MCGRAW-HILL | Beta Books | Contact Us | Order Information | Online Catalog


HTML conversions by Mega Space.

This page updated on October 14, 1997 by Webmaster.

Computing McGraw-Hill is an imprint of the McGraw-Hill Professional Book Group.

Copyright ©1997 The McGraw-Hill Companies, Inc. All Rights Reserved.
Any use is subject to the rules stated in the Terms of Use.